All posts
Written by Sujith Quintelier Aug 13, 2025

⚙️ Automated Updates Pipeline

Deep dive into a PowerShell + GitHub Actions pipeline that auto-generates weekly Azure, GitHub and Terraform update posts with AI summarisation.
Aug 13, 2025

Why this pipeline exists

Manually curating weekly platform updates (Azure service changes, GitHub changelog entries, Terraform provider releases) is repetitive, time‑sensitive, and error‑prone. This pipeline automates the whole path from data acquisition → AI summarisation → Hugo markdown generation → commit, delivering consistent, timestamped update posts under content/updates.

High‑level architecture

┌───────────────┐   cron 05:15 UTC   ┌───────────────────┐   grouped items  ┌──────────────────┐
│ GitHub Action │ ─────────────────> │ PowerShell Script │ ───────────────> │ AI Summarisation │
└─────┬─────────┘                    └──────────┬────────┘                  └────────┬─────────┘
      │  commit (if changed)                    │ filtered items                     │ summaries
      V                                         V                                    V
  Repo content <────────────── write markdown posts <──────────── assemble front matter

Scheduling & cadence

The workflow runs daily at 05:15 UTC but per‑source publication cadence is controlled inside the script using a frequency map (Azure=weekly,GitHub=weekly,Terraform=weekly by default). The script:

  • Computes a base window (week vs rolling period).
  • Derives per‑source windows (weekly / biweekly / monthly) without duplicating logic.
  • Caps items per source (MaxAzure, MaxGitHub, MaxTerraform) to keep posts readable.

Window calculation

Compute-BaseWindow returns UTC and local (Europe/Brussels) boundaries. Compute-PerSourceWindows adjusts start dates biweekly (−7 days) or monthly (month start) per source.

Data acquisition layer

Fetcher functions isolate network concerns:

  • Get-AzureUpdates parses an RSS feed with defensive XML handling.
  • Get-GitHubChangelog mirrors the pattern for GitHub’s changelog.
  • Get-TerraformReleases loops repositories (hashicorp/terraform, hashicorp/terraform-provider-azurerm) and filters releases in the window.

Each returns plain objects with a common shape (title, url, publishedAt, raw, source), simplifying later aggregation.

Summarisation engine

Summarize-Items:

  • Expands a model pool (defaults: openai/gpt-4.1, openai/gpt-4o, openai/gpt-5, openai/o1, openai/o3 plus -mini variants) or uses caller override.
  • Normalises + deduplicates model identifiers.
  • Iteratively attempts summarisation per item, promoting the first successful model to the front (adaptive ordering).
  • Implements exponential backoff with jitter (SummaryRetryBaseSeconds) and bounded attempts (MaxSummaryRetries).
  • Falls back to truncated raw title when all models fail.

Output objects include optional bullet points (bullets) and a condensed factual summary.

Resilience patterns

  • Retry loop with model deactivation on repeated failure (429 / transient).
  • Cache placeholder (pattern enables future caching without refactor).
  • Graceful degradation when token absent (-DisableSummaries).

Markdown emission

Write-PerTypePost builds a clean front matter block every run, avoiding op_Addition issues seen when concatenating PSCustomObjects. It:

  • Creates a dated folder (content/updates/<slug>/index.md).
  • Generates deterministic title + description from window and type.
  • Uses lastmod refresh for diff hygiene.
  • Ensures UTF‑8 encoding.

Example (trimmed) output block:

+++
title = 'Azure Weekly – 2025 Week 33'
date = 2025-08-13T12:50:21Z
lastmod = 2025-08-13T12:50:21Z
draft = false
tags = ['updates', 'weekly', 'azure']
description = 'Highlights from Azure between 2025-08-11 and 2025-08-17.'
[params]
    author = 'sujith'
+++

Body lines are a bullet list with escaped markdown characters and optional detail indents per item.

Workflow integration (update.yml)

Key steps:

  1. Checkout (full history for potential diff logic).
  2. Inject a fine‑grained Personal Access Token (PAT) with Models: read permission via secrets.WEEKLY and expose it as the environment variable GITHUB_TOKEN (the script expects that name). This PAT is required because the default ephemeral Actions token may not always grant Models: read on all repositories / org policies.
  3. Run script (PowerShell) which uses that token for release API calls and AI summarisation.
  4. Commit changes only when the working tree differs:
    • Prevents empty commits.
    • Enables predictable downstream indexing (e.g., search, sitemap).

Snippet (core commit logic):

- name: Commit & push (if changed)
  run: |
    git config user.name "github-actions[bot]"
    git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
    git add -A
    if git diff --staged --quiet; then
      echo "No changes"
      exit 0
    fi
    git commit -m "chore(updates): daily refresh of per-type posts"
    git push

Full workflow file

Below is the complete workflow (.github/workflows/update.yml) for reference so you can copy/adapt without switching contexts:

name: Updates (Daily Refresh)

on:
  schedule:
    # Run daily at 05:15 UTC (~07:15 Europe/Brussels during DST)
    - cron: '15 5 * * *'
  workflow_dispatch:

permissions:
  contents: write
  models: read

jobs:
  build-weekly:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Run tracker
        shell: pwsh
        env:
          GITHUB_TOKEN: ${{ secrets.WEEKLY }}
        run: |
          pwsh ./.github/scripts/generate-updates.ps1 -RepoRoot . -ContentDir content/updates -MaxAzure 20 -MaxGitHub 12 -MaxTerraform 8 -ShowApiUrls -Frequencies Azure=weekly,GitHub=weekly,Terraform=weekly -ModelPool 'openai/gpt-5','openai/gpt-5-mini','openai/gpt-4.1','openai/gpt-4.1-mini','openai/gpt-4o','openai/gpt-4o-mini'

      - name: Commit & push (if changed)
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
          git add -A
          if git diff --staged --quiet; then
            echo "No changes"
            exit 0
          fi
          git commit -m "chore(updates): daily refresh of per-type posts"
          git push

Script source (generate-updates.ps1)

The full PowerShell script is lengthy (~800 lines) and modular. Expand the section below to view it inline; canonical source lives at .github/scripts/generate-updates.ps1.

Abbreviated listing (top section):

#!/usr/bin/env pwsh
[CmdletBinding()]
param(
  [string]$RepoRoot = (Resolve-Path -LiteralPath .).Path,
  [string]$ContentDir = 'content/updates',
  [ValidateRange(0,200)][int]$MaxAzure = 20,
  [ValidateRange(0,200)][int]$MaxGitHub = 12,
  [ValidateRange(0,200)][int]$MaxTerraform = 8,
  [string]$Frequencies = 'Azure=weekly,GitHub=weekly,Terraform=weekly'
  ,
  [ValidateSet('week','rolling')][string]$WindowType = 'week',
  [int]$RollingDays = 7
  ,
  [switch]$DisableSummaries,
  [int]$MaxSummaryRetries = 4,
  [int]$SummaryRetryBaseSeconds = 2
  ,
  [switch]$ShowApiUrls,
  [switch]$DumpSummaries,
  [switch]$CleanOutput,
  [string[]]$ModelPool = @()
)
# (Truncated commentary for brevity in article)

[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$ErrorActionPreference = 'Stop'

function Log { param([string]$Message) Write-Host "[$(Get-Date -Format 'HH:mm:ss')] $Message" }

function Initialize-Environment { param([string]$Token,[switch]$AllowMissingToken) if(-not $Token){ if($AllowMissingToken){ Write-Warning 'GITHUB_TOKEN missing'; return @{ HeadersGitHub = @{} } } else { throw 'GITHUB_TOKEN is required (with models:read, contents:write).' } } return @{ HeadersGitHub = @{ 'Authorization' = "Bearer $Token"; 'Accept'='application/vnd.github+json'; 'X-GitHub-Api-Version'='2022-11-28' } } }

# ... (All helper + fetch + summarisation + write functions unchanged; see repository file for full context)

function Invoke-WeeklyUpdates { <# full body preserved in repo #> }

# Entry point invoking Invoke-WeeklyUpdates with normalisation logic.
# (Full body intentionally omitted here to keep article scannable.)

For the complete, unabridged script (including all functions such as Compute-BaseWindow, Summarize-Items, Write-PerTypePost), view the source directly in the repository to benefit from future updates.

Note: Full script (including all helper functions) is in .github/scripts/generate-updates.ps1 in the repository.

Security considerations

  • A fine‑grained PAT stored as secrets.WEEKLY (minimum scope: models: read; optionally contents: write if not already inherited) is mapped to GITHUB_TOKEN only for the summarisation step. This tightly scopes access and keeps broader org tokens out of the workflow.
  • The default ephemeral GITHUB_TOKEN usually works for releases + (in many cases) model inference, but: (a) rate limits for models can be lower, (b) org policy might restrict Models: read, and (c) token rotation / revocation granularity is coarser. Using a fine‑grained PAT improves reliability under higher summarisation volume and gives explicit auditability.
  • Principle of least privilege: exclude unneeded scopes (no admin, no workflow write, no repo deletion). Rotate on a regular cadence; label the PAT (e.g. “weekly-updates-pipeline”).
  • Throttling resilience: higher model rate limits reduce retry pressure and lower the chance of exhausting attempts during summarisation bursts.
  • No external secret leakage; feed and release endpoints are public; only summary requests hit the GitHub Models API.
  • Summarisation prompt sanitises whitespace, reducing injection surface. Consider future allow‑list validation for model names and a deny‑list for unexpected outbound URLs in model output.

Performance & efficiency

Applied optimisation patterns:

  • Bounded item counts to cap summarisation cost.
  • Early exit when limits < 1.
  • Single pass aggregation into an array (avoids + which can cast to string inadvertently).
  • Stopwatch instrumentation around summarisation and Terraform fetch for future telemetry.

Extensibility roadmap

EnhancementRationale
Add caching layer (file or repo issues)Avoid re-summarising unchanged releases.
Structured JSON artifact outputEnables dashboards / trend analysis.
Model latency metrics + rankingOptimise pool ordering adaptively.
Additional sources (Azure DevOps, .NET release notes)Broader ecosystem coverage.
Semantic diff of prior periodHighlight new vs previously reported changes.
Optional Slack / Teams notificationPush updates to collaboration channels.

Local testing

# Run with summaries disabled (fast)
pwsh .\.github\scripts\generate-updates.ps1 -DisableSummaries -ShowApiUrls -ContentDir content/updates -MaxAzure 5 -MaxGitHub 5 -MaxTerraform 3

# Custom rolling 3‑day window
pwsh .\.github\scripts\generate-updates.ps1 -WindowType rolling -RollingDays 3 -DisableSummaries

# Specify custom model pool
pwsh .\.github\scripts\generate-updates.ps1 -ModelPool 'openai/gpt-4.1-mini','openai/gpt-4o-mini'

Validate output under content/updates then run a local Hugo server to inspect rendering.

Troubleshooting

SymptomCauseFix
Empty post folderNo items in time windowAdjust RollingDays or verify feeds.
Summaries are just titlesAll model attempts failedCheck GITHUB_TOKEN scopes; reduce model list.
Terraform section missingNo releases in windowIncrease window or confirm repos.
RSS parse warningTransient or format changeRe-run; add defensive logging if persistent.
Timezone mismatchTZ ID not resolved on runnerFallback logs warn; acceptable; optionally pin to UTC.

Design decisions (WHY, not WHAT)

  • Function modularity: Facilitates unit testing and future module extraction.
  • Front matter rebuild: Avoids mutation complexity and ensures determinism.
  • Adaptive model ordering: Reduces average latency by reusing the last successful model first.
  • Explicit limits: Keeps markdown lightweight and scannable for weekly cadence.
  • Window abstraction: Allows switching between weekly calendar semantics and rolling ranges.

When to fork vs reuse

Reuse the script if you only need periodic feed aggregation + AI summaries + Hugo output. Fork (or parameterise) if you require auth-protected APIs, structured JSON export, or per-item diffing.

  • Platform engineering content automation.
  • Continuous content delivery patterns.
  • AI-assisted summarisation pipelines.

Next steps

  1. Add model performance telemetry.
  2. Produce combined weekly digest (across sources) for newsletter distribution.
  3. Expose a JSON feed for downstream automation.
  4. Integrate Slack notification on new commits.

References

Conclusion

This pipeline is a compact, extensible pattern for continuous knowledge harvesting. By combining deterministic windowing, modular fetchers, adaptive summarisation, and idempotent markdown generation, it eliminates manual toil and delivers a reliable weekly knowledge artefact for the platform engineering estate.

Sponsored by GitAds
comments powered by Disqus