1) Start the Control Server
Install dependencies (first time only)
npm install
npx playwright install chromium
Launch the control server
npm run control
- Open
http://localhost:3100/control
- Postgres must be running (set
DATABASE_URL).
2) Run a Crawl (Control Page)
Pick a target
- Select a site from the dropdown or paste a URL.
- Use the "Use NHG Health" button for
https://www.nhghealth.com.sg/.
- Confirm the status pill shows "Ready".
Choose a run mode
- Verify Status: fast HTTP check, no snapshots.
- Archive Snapshot: saves HTML + assets for later review.
Monitor and manage
- Watch progress in Recent Jobs and Job Details.
- Pause/resume a crawl if needed.
- Open output files (CSV, links, snapshots) from the job panel.
3) Review Results (Dashboard)
Open the dashboard
- Navigate to
/dashboard.
- Use filters to narrow by site, group, status, and snapshots.
Understand the data
- Summary shows success and error counts.
- Results table lists each crawled URL.
- Root Links section shows discovered top-level links.
4) Migration Analytics
Generate comparison data
- Run the comparison script to create migration reports.
- Example:
node migration-compare.js
- Reports are written to
migration-reports/.
- To load existing reports into Postgres:
npm run import:migration-reports.
Use the migration views
- Status:
/migration/status
- Comparison:
/migration/report
- Verification:
/migration/verification
Import baseline page lists
- Load
URL/*.csv into Postgres for Status tracking.
- Run:
npm run import:migration
Troubleshooting
- Missing data? Ensure the crawl finished and outputs exist in
runs/.
- No progress? The crawler writes state every few pages; wait a moment.
- Migration reports empty? Re-run
migration-compare.js.