SPA Scraping & Second-Wave Scaffolding — Scaling from 14 to 24 Sites
SPA Scraping & Second-Wave Scaffolding
Problem
After scaling from 2 to 14 sites (see bulk-site-scaffolding-and-shared-components.md), the user wanted to import all projects from their portfolio at aaronr.info. The site is a Lovable-built SPA — WebFetch and curl return only a shell <div id="root"></div> with no content.
Investigation & Approach
1. Failed: Standard scraping
WebFetch → "Portfolio" title only, no project data
curl → <div id="root"></div> + JS bundle reference
Root cause: Lovable apps are React SPAs. All content lives in the JavaScript bundle, not the HTML.
2. Solution: JS bundle extraction
Instead of rendering the SPA, we extracted data directly from the compiled JS bundle:
# Step 1: Find the bundle URL from the HTML
curl -sL https://aaronr.info/ | grep -oE 'src="/assets/[^"]*\.js"'
# → src="/assets/index-D_aM2TFg.js"
# Step 2: Extract all URLs from the bundle
curl -sL https://aaronr.info/assets/index-D_aM2TFg.js \
| grep -oE '"https?://[^"]+"' | sort -u
# Step 3: Extract project name mappings from display logic
curl -sL https://aaronr.info/assets/index-D_aM2TFg.js \
| tr ',' '\n' | grep -E 'lovable\.app|base44\.app|netlify|mountainlife'
This revealed the name-mapping function in the bundle:
// Bundle contained display name overrides:
g==="Sendy Visor Vibes"?"Sendy Visors"
g==="Pocket Lunch Ascend"?"Pocket Lunch"
g==="Lana Health Planner"?"HealthCal AI"
c==="ido"?"Weddings Lana"
// etc.
Key insight: SPA bundles contain all the data — you don’t need to render them. Look for URL arrays, display name mappings, and config objects in the compiled JS.
3. Discovered 18 projects, mapped to existing + new
| Source URL | Display Name | Action |
|---|---|---|
| adventure-oasis.lovable.app | Adventure Oasis | Already exists |
| adventure-weddings.lovable.app | Adventure Weddings | Already exists |
| audacious-art.lovable.app | Audacious Art | Already exists |
| lana-health-planner.lovable.app | HealthCal AI | Already exists |
| healthcalai.netlify.app | Health Cal | Alt version of healthcal |
| mountainlife.tech | Final Fest | Already exists (mountain-life) |
| mountainlife.tech/resume | Resume | Already exists (aaronr) |
| ai.aaronr.info | Aaron Consulting | Already exists (squamish-ai) |
| sendy-visor-vibes.lovable.app | Sendy Visors | New site |
| pocket-lunch-ascend.lovable.app | Pocket Lunch | New site |
| email-health-helper.lovable.app | Lana AI | New site |
| unity-of-sound.com | Unity of Sound | New site |
| adventure-product-school.lovable.app | Adventure Product School | New site |
| festivalweddings.ca | Festival Weddings | New site |
| ido.lovable.app | Weddings Lana | New site |
| maker-spark (base44) | Maker Mentor | New site |
| liquid-donations (base44) | Liquid Donations | New site |
| perplexity.ai app | Imagine CNC | New site |
4. Parallel scaffolding: 10 agents
Same pattern as Phase 1 but with cross-references baked in:
- Read template files from existing site once
- Define per-site config with real scraped content where available
- Launch 10 parallel agents (~90 seconds total)
- Fix any dependency issues (see Gotcha below)
- Update businesses.ts registry (14 → 24 entries)
- Update root CLAUDE.md businesses table
- Build all 24 sites to verify
5. Cross-referencing related businesses
Several new sites share context with existing ones. Added cross-references in each site’s CLAUDE.md:
- Lana AI ↔ HealthCal (health/AI vertical)
- Festival Weddings ↔ Adventure Weddings (wedding vertical)
- Weddings Lana ↔ Adventure Weddings (wedding vertical)
- Unity of Sound ↔ Pirate Radi0 (music/community vertical)
- Maker Mentor ↔ Create Makerspace (maker vertical)
Gotcha: @fontsource Variable font availability
Problem: One agent chose @fontsource-variable/lato and @fontsource-variable/playfair-display for wedding-lana. pnpm install failed:
ERR_PNPM_FETCH_404 GET https://registry.npmjs.org/@fontsource-variable%2Flato: Not Found - 404
Root cause: Not all Google Fonts have Variable versions on fontsource. Lato has a static package (@fontsource/lato) but no variable package (@fontsource-variable/lato).
Fix: Use the standard Montserrat/Inter variable fonts that all other sites use:
"@fontsource-variable/montserrat": "^5",
"@fontsource-variable/inter": "^5",
Prevention: When scaffolding sites, stick to the known-good font packages unless explicitly requested. If using a custom font, verify the -variable package exists on npm first:
npm view @fontsource-variable/<font-name> version
Key Design Decisions
1. JS bundle scraping over browser automation
Why: No need for Puppeteer/Playwright. The compiled bundle contains all project data as string literals. curl + grep is faster and has zero dependencies.
Trade-off: Only works for data embedded in bundles. Dynamically loaded content (API calls) would need actual rendering.
2. Present candidates for user approval
Why: Not all discovered projects should be scaffolded. Some are duplicates, some are alt versions, some are experiments. Always present the full list and let the user decide.
Pattern: Scrape → categorize (existing/new/duplicate) → present table → user picks → scaffold chosen ones.
3. Cross-references in CLAUDE.md, not code
Why: Related businesses share context (wedding vertical, maker vertical) but are independent sites. Cross-references in CLAUDE.md give AI assistants context without creating code dependencies.
Prevention / Best Practices
-
When scraping Lovable/SPA sites: Don’t use WebFetch — extract from the JS bundle directly. Look for the
src="/assets/*.js"tag in the HTML shell. -
When scaffolding fonts: Only use
@fontsource-variable/*packages that are known to exist. Default to montserrat/inter. Check npm before using custom fonts. -
When scaling to many sites: Update THREE places: (a) site files, (b)
packages/ui/src/data/businesses.ts, (c) rootCLAUDE.mdbusinesses table. -
When sites share a vertical: Add cross-references in each site’s CLAUDE.md under a
## Related Businessessection.
Files Created/Modified
New sites (10):
apps/sendy-visors/,apps/pocket-lunch/,apps/lana-ai/apps/unity-of-sound/,apps/adventure-product-school/,apps/festival-weddings/apps/weddings-lana/,apps/maker-mentor/,apps/liquid-donations/,apps/imagine-cnc/
Modified:
packages/ui/src/data/businesses.ts— 14 → 24 entriesCLAUDE.md— businesses table updated to 24 entriesapps/weddings-lana/package.json— fixed font depsapps/weddings-lana/src/layouts/SiteLayout.astro— fixed font imports
Cross-References
- See also:
bulk-site-scaffolding-and-shared-components.md(Phase 1: 2→14 sites) - See also:
build-errors/pnpm-monorepo-missing-dependency.md(pnpm strict deps)