Parallel Content Sync from External Sources — Populating 12 Sites with Real Data
Parallel Content Sync from External Sources
Problem
After bulk-scaffolding 12 new sites (see bulk-site-scaffolding-and-shared-components.md), all had identical placeholder content — generic taglines, placeholder service descriptions, default colors. Each site needed real business data pulled from its external sources (GitHub repos, Lovable apps, existing websites, social profiles).
Investigation & Approach
1. Source mapping
Each business has different external sources. The /content-sync skill maps these:
| Business | GitHub Repo | Lovable App | Website | Social |
|---|---|---|---|---|
| adventure-weddings | veganpolice/adventure-weddings | — | — | — |
| adventure-oasis | veganpolice/adventure-oasis | — | — | — |
| create-makerspace | veganpolice/create-makerspace-api | createmakerspacesociety.lovable.app | — | — |
| southside-lodge | — | southsidelodge.lovable.app | — | — |
| healthcal | veganpolice/healthcal | — | healthcal.app | — |
| mountain-life | veganpolice/mountainlife.tech | — | mountainlife.tech | — |
| hot-tub-every-day | — | — | — | @hottubeveryday (IG) |
2. Data extraction techniques
GitHub repos (most reliable):
gh repo view veganpolice/<repo> --json description
# Then read README, package.json, component text for services/features
Lovable apps (WebFetch):
WebFetch → extract page structure, headings, copy, colors, features
Lovable apps render enough SSR content for WebFetch to extract business data, unlike pure SPAs.
Existing websites (WebFetch):
WebFetch → extract meta descriptions, about text, service listings, color schemes
Social profiles (limited): Instagram and LinkedIn are login-gated. Extract what’s available from public profile, supplement with user-provided context.
3. Four files updated per site
For each business, real content goes into exactly 4 files:
brand.config.ts— domain, tagline, voice description, audience, meta title/descriptionCLAUDE.md— full business context, services, voice guide, content guidelinessrc/styles/brand.css— real brand colors extracted from existing sites/appssrc/pages/index.astro— real hero text, service cards, about section, CTAs
4. Parallel execution: 12 agents
Launched 12 parallel agents, each responsible for one site. Each agent:
- Received the scraped content for its business
- Updated all 4 files with real data
- Adapted content to match the brand voice from CLAUDE.md
All 12 completed in ~90 seconds. Total: 46 files changed, 1251 insertions.
5. Registry sync
After content updates, the central businesses.ts registry had stale data. Domains and taglines had changed:
// Before (placeholder)
{ domain: 'adventureweddings.ca', tagline: 'Say "I do" in the wild' }
// After (real, from brand.config.ts)
{ domain: 'adventureweddings.love', tagline: 'Unique and intimate elopement experiences for adventure lovers' }
Sync technique: Grep all brand.config.ts files for domain: and tagline:, then rewrite businesses.ts with corrected values.
# Extract real domains
grep 'domain:' apps/*/brand.config.ts
# Extract real taglines
grep 'tagline:' apps/*/brand.config.ts
Key Design Decisions
1. Four-file content pattern
Why: Separating concerns across 4 files means agents can work on one file at a time without conflicts, and each file has a clear purpose: metadata (brand.config), AI context (CLAUDE.md), visuals (brand.css), content (index.astro).
2. Brand colors from real sources
Why: Extracting actual hex values from Lovable apps and existing websites ensures visual continuity. Don’t invent colors — use what already exists.
3. Registry as derived data
Why: businesses.ts is derived from individual brand.config.ts files. After bulk updates, always re-sync the registry. The per-site config is the source of truth; the registry is for cross-site display only.
4. Parallel agents over sequential
Why: 12 sequential content syncs would take 15+ minutes and block the conversation. Parallel agents complete in ~90 seconds with no quality loss, since each site is independent.
Prevention / Best Practices
-
After bulk content updates: Always sync
packages/ui/src/data/businesses.tswith the updatedbrand.config.tsdomains and taglines. -
After any registry change: Also update the root
CLAUDE.mdbusinesses table — it’s the human-readable reference. -
Three places to update: Per-site config → shared registry → root CLAUDE.md. Miss one and data drifts.
-
Build after content changes: Run
pnpm turbo build --filter='./apps/*'to verify all sites compile. Content changes can introduce Astro template errors. -
Login-gated sources: Instagram and LinkedIn won’t yield much via scraping. Note what was tried, use user-provided context, and move on.
Files Modified
Per site (4 files × 12 sites = 48 files):
apps/<slug>/brand.config.ts— real metadataapps/<slug>/CLAUDE.md— real business contextapps/<slug>/src/styles/brand.css— real brand colorsapps/<slug>/src/pages/index.astro— real page content
Shared:
packages/ui/src/data/businesses.ts— synced domains and taglines
Cross-References
- See also:
bulk-site-scaffolding-and-shared-components.md(Phase 1: scaffolding) - See also:
spa-scraping-and-second-wave-scaffolding.md(Phase 2: SPA scraping + 10 more sites) - See also:
.claude/skills/content-sync/SKILL.md(the skill that maps businesses to sources)