How to Create a Blog from an RSS Feed

Create a blog generator to convert RSS feeds into websites

This article describes how to create a serverless static blog generator to automatically convert RSS feeds into beautiful static websites. The system uses Python for content processing, Azure Functions for automation, and Azure Storage with CDN for hosting.

This solution is part of the NotesRSS service, enabling users to create personal blogs directly from an Evernote Notebook with Notes tagged “Publish”

System Architecture

Components

  1. RSS Feed Processor: Python script that fetches and processes RSS content
  2. Static Site Generator: Converts processed content into HTML using Jinja2 templates
  3. Azure Function: Triggers the generation process on schedule
  4. Azure Blob Storage: Hosts the static website files
  5. Azure CDN: Provides fast content delivery and HTTPS support

Flow

  1. RSS feed updates trigger Azure Function
  2. Function runs Python generator script
  3. Generated files are uploaded to Azure Storage
  4. CDN distributes content globally

Implementation Guide

1. Setting Up the Development Environment

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install required packages
pip install feedparser jinja2 azure-functions azure-storage-blob python-frontmatter markdown beautifulsoup4

2. RSS Processing and HTML Generation

The core of the system is a Python script that processes RSS feeds and generates static HTML. Here’s the main class:

import feedparser
from jinja2 import Environment, FileSystemLoader
from bs4 import BeautifulSoup
import os
from datetime import datetime

class StaticBlogGenerator:
def __init__(self, rss_url, output_dir, template_dir):
self.rss_url = rss_url
self.output_dir = output_dir
self.env = Environment(loader=FileSystemLoader(template_dir))

def generate(self):
# Fetch and parse RSS feed
feed = feedparser.parse(self.rss_url)

# Process entries
posts = []
for entry in feed.entries:
post = self._process_entry(entry)
posts.append(post)

# Generate HTML files
self._generate_posts(posts)
self._generate_index(posts)
self._generate_archives(posts)

def _process_entry(self, entry):
# Clean and process HTML content
content = entry.get(‘content’, [{‘value’: ”}])[0].value
soup = BeautifulSoup(content, ‘html.parser’)

return {
‘title’: entry.title,
‘date’: datetime.fromtimestamp(mktime(entry.published_parsed)),
‘content’: str(soup),
‘author’: entry.get(‘author’, ‘Anonymous’),
‘link’: entry.link,
‘slug’: self._create_slug(entry.title)
}

3. Azure Function Implementation

An Azure Function is essentially a piece of code that runs in the cloud without you needing to manage any infrastructure. Think of it as a small, self-contained function that can be triggered by various events, like an HTTP request, a message in a queue, or a timer. Here are some key features:

  • Serverless: You don’t have to worry about provisioning or managing servers. Azure handles all the infrastructure for you.
  • Scalable: Azure Functions automatically scale out to handle the load, so your code can run seamlessly regardless of the number of requests.
  • Cost-Effective: You pay only for the compute resources your functions consume, making it a cost-effective solution for many scenarios.

This makes Azure Functions ideal for scenarios like event-driven programming, microservices architectures, and background processing tasks. Azure Functions offer quite a bit of flexibility when it comes to triggering events. Here are some of the most common ways to trigger Azure Functions:

  • Timer Trigger: Executes at specified times or intervals, perfect for scheduling tasks.
  • HTTP Trigger: Responds to HTTP requests, making it suitable for creating APIs.
  • Queue Storage Trigger: Activates when a new message arrives in an Azure Storage queue.
  • Blob Storage Trigger: Initiates when a new or updated blob is detected in an Azure Storage container.
  • Service Bus Trigger: Responds to messages in Azure Service Bus queues or topics.
  • Event Hub Trigger: Fires in response to events in Azure Event Hubs, useful for telemetry and logging.
  • Cosmos DB Trigger: Triggers when there are changes in a specified Azure Cosmos DB.
  • Event Grid Trigger: Activates based on events in Azure Event Grid, often used for reactive programming.
  • SignalR Trigger: Works with Azure SignalR Service for real-time communication scenarios.

Each of these triggers suits different use cases, helping you to build serverless solutions tailored to your specific needs

Our first step is to create an Azure Function that triggers the static site generation:

import azure.functions as func
import logging
from .blog_generator import StaticBlogGenerator
from azure.storage.blob import BlobServiceClient

def main(mytimer: func.TimerRequest) -> None:
# Configuration
connection_string = os.environ[‘AzureWebJobsStorage’]
container_name = ‘static-blogs’

# Initialize blog generator
generator = StaticBlogGenerator(
rss_url=”https://api.notesrss.com/feed/user123″,
output_dir=”/tmp/blog”,
template_dir=”templates”
)

# Generate static files
generator.generate()

# Upload to Azure Storage
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
container_client = blob_service_client.get_container_client(container_name)

# Upload all generated files
for root, dirs, files in os.walk(“/tmp/blog”):
for file in files:
file_path = os.path.join(root, file)
blob_name = os.path.relpath(file_path, “/tmp/blog”)

with open(file_path, “rb”) as data:
container_client.upload_blob(
name=blob_name,
data=data,
overwrite=True
)

How the Azure function is triggered:

Timer Webhook (HTTP)
  • Regular blog updates
  • Daily/weekly rebuilds
  • Sitemap generation
  • Cache refreshing
  • Immediate updates when content changes
  • Manual regeneration requests
  • Integration with external systems
  • Testing and debugging

4. Azure Infrastructure Setup

Azure Blob Storage with Azure Content Delivery Network (CDN) is a powerful combination for delivering content globally with high performance and low latency. Here’s a breakdown of how it works:

  • Azure Blob Storage
    Azure Blob Storage is a scalable object storage solution for unstructured data, such as text or binary data. It’s ideal for storing large amounts of data like images, videos, and backups.
  • Azure Content Delivery Network (CDN)
    Azure CDN is a network of distributed servers that deliver content to users based on their geographic location. By caching content closer to users, it reduces latency and improves load times.

When you integrate Azure Blob Storage with Azure CDN, the CDN caches copies of your blobs (files) at edge locations around the world. This means that when a user requests a file, it is delivered from the nearest edge location rather than from the original storage location, significantly reducing the time it takes to access the content

Storage Account Configuration
  1. Create Storage Account:

az storage account create \
–name notesrssstatic \
–resource-group notesrss \
–location eastus \
–sku Standard_LRS \
–kind StorageV2

  1. Enable Static Website:

az storage blob service-properties update \
–account-name notesrssstatic \
–static-website \
–index-document index.html \
–404-document 404.html

CDN Setup
  1. Create CDN Profile:

az cdn profile create \
–name notesrss-cdn \
–resource-group notesrss \
–sku Standard_Microsoft

  1. Create CDN Endpoint:

az cdn endpoint create \
–name user-blogs \
–profile-name notesrss-cdn \
–resource-group notesrss \
–origin notesrssstatic.z13.web.core.windows.net \
–origin-host-header notesrssstatic.z13.web.core.windows.net

5. Custom Domain Configuration

To enable https://user.notesrss.com/blog:

  1. Add DNS CNAME Record:

user.notesrss.com -> user-blogs.azureedge.net

  1. Enable HTTPS:

az cdn custom-domain enable-https \
–endpoint-name user-blogs \
–profile-name notesrss-cdn \
–resource-group notesrss \
–name user-notesrss

Maintenance and Monitoring

Monitoring Setup

  1. Enable Azure Monitor:

az monitor diagnostic-settings create \
–name cdn-logs \
–resource $(az cdn endpoint show -g notesrss -n user-blogs –profile-name notesrss-cdn –query id -o tsv) \
–logs ‘[{“category”: “CoreAnalytics”,”enabled”: true}]’ \
–workspace $(az monitor log-analytics workspace show -g notesrss -n notesrss-logs –query id -o tsv)

  1. Set up alerts for:
  • Function execution failures
  • High CDN latency
  • Storage capacity issues

Maintenance Tasks

  1. Regular cleanup:

def cleanup_old_posts():
“””Remove posts older than 180 days”””
cutoff_date = datetime.now() – timedelta(days=180)
# Implementation details…

  1. Performance optimization:
  • Enable CDN compression
  • Implement image optimization
  • Use efficient caching strategies

Best Practices

  1. Content Organization
    • Use clear URL structure
    • Implement proper metadata
    • Maintain consistent formatting
  2. Performance
    • Optimize images
    • Minimize CSS/JavaScript
    • Use efficient caching
  3. Security
    • Sanitize HTML content
    • Implement proper access controls
    • Regular security updates

Conclusion

This system provides a scalable, maintainable solution for converting NotesRSS feeds into blogs, by leveraging Azure Functions and Storage with CDN for hosting.

Reference: Article on Optimizing Azure for NotesRSS

Learn more about notesRSS

Michael Stuart
Azure Solutions Architect