NHG Crawl Help Guide

Step-by-step guide to run crawls, view dashboards, and analyze migration reports.

1) Start the Control Server

Install dependencies (first time only)

  • npm install
  • npx playwright install chromium

Launch the control server

  • npm run control
  • Open http://localhost:3100/control
  • Postgres must be running (set DATABASE_URL).

2) Run a Crawl (Control Page)

Pick a target

  • Select a site from the dropdown or paste a URL.
  • Use the "Use NHG Health" button for https://www.nhghealth.com.sg/.
  • Confirm the status pill shows "Ready".

Choose a run mode

  • Verify Status: fast HTTP check, no snapshots.
  • Archive Snapshot: saves HTML + assets for later review.

Monitor and manage

  • Watch progress in Recent Jobs and Job Details.
  • Pause/resume a crawl if needed.
  • Open output files (CSV, links, snapshots) from the job panel.

3) Review Results (Dashboard)

Open the dashboard

  • Navigate to /dashboard.
  • Use filters to narrow by site, group, status, and snapshots.

Understand the data

  • Summary shows success and error counts.
  • Results table lists each crawled URL.
  • Root Links section shows discovered top-level links.

4) Migration Analytics

Generate comparison data

  • Run the comparison script to create migration reports.
  • Example: node migration-compare.js
  • Reports are written to migration-reports/.
  • To load existing reports into Postgres: npm run import:migration-reports.

Use the migration views

  • Status: /migration/status
  • Comparison: /migration/report
  • Verification: /migration/verification

Import baseline page lists

  • Load URL/*.csv into Postgres for Status tracking.
  • Run: npm run import:migration

Troubleshooting