monorepo content-sync parallel-agents github-cli web-scraping lovable brand-config registry-sync

Parallel Content Sync from External Sources — Populating 12 Sites with Real Data

Parallel Content Sync from External Sources

Problem

After bulk-scaffolding 12 new sites (see bulk-site-scaffolding-and-shared-components.md), all had identical placeholder content — generic taglines, placeholder service descriptions, default colors. Each site needed real business data pulled from its external sources (GitHub repos, Lovable apps, existing websites, social profiles).

Investigation & Approach

1. Source mapping

Each business has different external sources. The /content-sync skill maps these:

BusinessGitHub RepoLovable AppWebsiteSocial
adventure-weddingsveganpolice/adventure-weddings
adventure-oasisveganpolice/adventure-oasis
create-makerspaceveganpolice/create-makerspace-apicreatemakerspacesociety.lovable.app
southside-lodgesouthsidelodge.lovable.app
healthcalveganpolice/healthcalhealthcal.app
mountain-lifeveganpolice/mountainlife.techmountainlife.tech
hot-tub-every-day@hottubeveryday (IG)

2. Data extraction techniques

GitHub repos (most reliable):

gh repo view veganpolice/<repo> --json description
# Then read README, package.json, component text for services/features

Lovable apps (WebFetch):

WebFetch → extract page structure, headings, copy, colors, features

Lovable apps render enough SSR content for WebFetch to extract business data, unlike pure SPAs.

Existing websites (WebFetch):

WebFetch → extract meta descriptions, about text, service listings, color schemes

Social profiles (limited): Instagram and LinkedIn are login-gated. Extract what’s available from public profile, supplement with user-provided context.

3. Four files updated per site

For each business, real content goes into exactly 4 files:

  1. brand.config.ts — domain, tagline, voice description, audience, meta title/description
  2. CLAUDE.md — full business context, services, voice guide, content guidelines
  3. src/styles/brand.css — real brand colors extracted from existing sites/apps
  4. src/pages/index.astro — real hero text, service cards, about section, CTAs

4. Parallel execution: 12 agents

Launched 12 parallel agents, each responsible for one site. Each agent:

  1. Received the scraped content for its business
  2. Updated all 4 files with real data
  3. Adapted content to match the brand voice from CLAUDE.md

All 12 completed in ~90 seconds. Total: 46 files changed, 1251 insertions.

5. Registry sync

After content updates, the central businesses.ts registry had stale data. Domains and taglines had changed:

// Before (placeholder)
{ domain: 'adventureweddings.ca', tagline: 'Say "I do" in the wild' }

// After (real, from brand.config.ts)
{ domain: 'adventureweddings.love', tagline: 'Unique and intimate elopement experiences for adventure lovers' }

Sync technique: Grep all brand.config.ts files for domain: and tagline:, then rewrite businesses.ts with corrected values.

# Extract real domains
grep 'domain:' apps/*/brand.config.ts
# Extract real taglines
grep 'tagline:' apps/*/brand.config.ts

Key Design Decisions

1. Four-file content pattern

Why: Separating concerns across 4 files means agents can work on one file at a time without conflicts, and each file has a clear purpose: metadata (brand.config), AI context (CLAUDE.md), visuals (brand.css), content (index.astro).

2. Brand colors from real sources

Why: Extracting actual hex values from Lovable apps and existing websites ensures visual continuity. Don’t invent colors — use what already exists.

3. Registry as derived data

Why: businesses.ts is derived from individual brand.config.ts files. After bulk updates, always re-sync the registry. The per-site config is the source of truth; the registry is for cross-site display only.

4. Parallel agents over sequential

Why: 12 sequential content syncs would take 15+ minutes and block the conversation. Parallel agents complete in ~90 seconds with no quality loss, since each site is independent.

Prevention / Best Practices

  1. After bulk content updates: Always sync packages/ui/src/data/businesses.ts with the updated brand.config.ts domains and taglines.

  2. After any registry change: Also update the root CLAUDE.md businesses table — it’s the human-readable reference.

  3. Three places to update: Per-site config → shared registry → root CLAUDE.md. Miss one and data drifts.

  4. Build after content changes: Run pnpm turbo build --filter='./apps/*' to verify all sites compile. Content changes can introduce Astro template errors.

  5. Login-gated sources: Instagram and LinkedIn won’t yield much via scraping. Note what was tried, use user-provided context, and move on.

Files Modified

Per site (4 files × 12 sites = 48 files):

  • apps/<slug>/brand.config.ts — real metadata
  • apps/<slug>/CLAUDE.md — real business context
  • apps/<slug>/src/styles/brand.css — real brand colors
  • apps/<slug>/src/pages/index.astro — real page content

Shared:

  • packages/ui/src/data/businesses.ts — synced domains and taglines

Cross-References

  • See also: bulk-site-scaffolding-and-shared-components.md (Phase 1: scaffolding)
  • See also: spa-scraping-and-second-wave-scaffolding.md (Phase 2: SPA scraping + 10 more sites)
  • See also: .claude/skills/content-sync/SKILL.md (the skill that maps businesses to sources)