scaffolding CLAUDE.md monorepo content-pipeline provenance

Scaffolded Sites Must Track Source Provenance

Scaffolded Sites Must Track Source Provenance

Problem

After scaffolding 22 business sites from various sources (GitHub repos, Lovable app prototypes, Instagram profiles, external websites), the CLAUDE.md files contained summarized descriptions but no links back to the original source material. This made it impossible to:

  • Trace back to original repos for images, styling, or deeper content
  • Know which businesses had existing codebases vs. were concept-only
  • Pull assets (logos, photos, color palettes) from the originals
  • Understand the technical stack of the source project

Only 2 out of 22 businesses had any source reference, and those were vague mentions buried in descriptions.

Root Cause

The /site-scaffold skill creates brand.config.ts, CLAUDE.md, pages, and styles — but has no step for recording where the source content came from. When content is summarized from a GitHub repo or Lovable app during scaffolding, the original URL is consumed but not persisted.

Solution

Added a ## Sources section to every business CLAUDE.md, placed after the opening description and before any other ## headings.

Source mapping approach

  1. GitHub repos — Match app slug to repo name via gh repo list
  2. Lovable apps — Check CLAUDE.md descriptions for lovable.app mentions
  3. Related repos — Cross-reference by topic (e.g., healthcal has 3 related repos)
  4. Live sites — Check repo homepageUrl field for deployed versions
  5. No source — Mark explicitly as “No external source repository”

Format used

## Sources

- GitHub: https://github.com/veganpolice/repo-name
- Live: https://example.com
- Related: https://github.com/veganpolice/related-repo

For businesses with no external source:

## Sources

No external source repository. Existing business.

Commands used to build the mapping

# List all repos with homepage URLs
gh repo list veganpolice --limit 50 --json name,url,homepageUrl

# Check existing CLAUDE.md files for source mentions
grep -r "github.com\|lovable\|base44" apps/*/CLAUDE.md

Prevention

Update /site-scaffold skill

When scaffolding a new site, the skill should:

  1. Ask for source URL — “Where is the original content? (GitHub URL, website, Lovable app, or ‘new concept’)”
  2. Write it to CLAUDE.md — Add ## Sources section automatically
  3. Validate the URL — Fetch the source to confirm it’s reachable

CLAUDE.md template should include Sources

The Sources section should be a required part of every CLAUDE.md:

# Business Name

Description here.

## Sources

- [Required: at least one source link or "No external source"]

## Brand Voice
...

Source Types Encountered

TypeCountExample
GitHub repo (direct match)10pirate-radi0, adventure-oasis
GitHub repo (related)5lana-ai → healthcal repos
Lovable app1pocket-lunch
StackBlitz1create-makerspace
Instagram1hot-tub-every-day
No external source6sendy-visors, southside-lodge

References

  • Commit: 54a4368 — “docs: add source links to all business CLAUDE.md files”
  • Related: Bulk Site Scaffolding
  • Skill: .claude/skills/site-scaffold/