NHG Migration System Documentation

Status Comparison Verification Documentation

Commands & Scripts

URL Processing Scripts

convert-urls-to-nhghealth.js

Purpose: Convert old SharePoint URLs to new NHG Health URLs

How to run:

node convert-urls-to-nhghealth.js

Input: CSV files from URL/ folder

Output: Converted URLs in Migrated-URL/ folder

Conversion Rules:

  • Removes /Pages/ from paths
  • Removes .aspx extensions
  • Converts /Newsroom/ to /news/
  • Converts /News-Releases/ to /releases/
  • Replaces spaces with hyphens
  • Converts to lowercase
  • For labs/testcatalogue: appends ?SID=NUMBER from old URL

convert-wh-doctors-to-nhghealth.js

Purpose: Convert WH doctor profile URLs to NHG Health format

How to run:

node convert-wh-doctors-to-nhghealth.js

Input: WH doctor URLs from CSV files

Output: Converted doctor profile URLs

URL Cleaning Commands

Clean and Filter URLs

Remove duplicates and filter by path:

# For Partners URLs
grep "/partners/" URL-RAW/WHPartnersCustomAndWebPartPages.csv | \
  cut -d'#' -f1 | sort -u | \
  grep -v "/_layouts/" > URL/WHPartnersCustomAndWebPartPages.csv

# For Labs URLs
grep "/labs/" URL-RAW/WHLabsCustomAndWebPartPages.csv | \
  cut -d'#' -f1 | sort -u | \
  grep -v "/_layouts/" > URL/WHLabsCustomAndWebPartPages.csv

Merge CSV Files

Merge multiple cleaned CSV files:

awk 'NR==1 || FNR>1' URL/WHPartnersCustomAndWebPartPages.csv \
  URL/WHLabsCustomAndWebPartPages.csv > URL/WHCustomAndWebPartPages.csv

Crawling & Comparison Scripts

crawl-from-csv.js

Purpose: Crawl websites based on URLs from CSV files

How to run:

node crawl-from-csv.js <csv-file> <site-code>

Example:

node crawl-from-csv.js example-urls.csv WH

compare-pages.js

Purpose: Compare old and new site pages for migration verification

How to run:

node compare-pages.js

Output: Comparison reports in migration-reports/

migration-compare.js

Purpose: Generate detailed migration comparison reports

How to run:

node migration-compare.js

Server Commands

serve-dashboard.js

Purpose: Start the web dashboard server

How to run:

node serve-dashboard.js

Default port: 3100

Access: http://localhost:3100

crawl-control-server.js

Purpose: Start the crawl control server with UI

How to run:

node crawl-control-server.js

Utility Scripts

verify-images.js

Purpose: Verify image assets in migrated pages

How to run:

node verify-images.js

fix-*-asset-paths.js

Purpose: Fix asset paths for specific sites (IMH, KTPH, TTSH)

How to run:

node fix-imh-asset-paths.js
node fix-ktph-asset-paths.js
node fix-ttsh-asset-paths.js

Features & Workflows

Migration Status Dashboard

Access: http://localhost:3100/migration-status.html

Features:

  • Site Selection: Filter pages by site (WH, IMH, KTPH, etc.)
  • Status Filtering: Filter by migration status (Pending, Migrated, Verified, etc.)
  • Search: Search by name, path, or URL
  • Inline Editing: Edit New URL, Status, and Notes directly in the table
  • Export to Excel: Export filtered data to CSV (summary section excluded)
  • Check URLs: Verify NHG Health URLs are accessible
  • Auto-save: Changes are saved automatically

Page Classification:

  • Type: Dynamic or Static
    • Dynamic: News, Events, Medical Services, Conditions, Doctors
    • Static: Regular content pages
  • Category: Based on URL depth and type
    • Home, Landing, Detail, SitePages
    • News, Doctor, Events, Specialties-Services, Diseases-Conditions

Special URL Handling:

  • /lab-services-details.aspx?SID=NUMBER → Dynamic, Detail category
  • /Newsroom/ → converts to /news/
  • /News-Releases/ → converts to /releases/

Migration Comparison

Access: http://localhost:3100/migration-report.html

Features:

  • Side-by-side comparison: Old vs New page content
  • Content extraction: Extracts main content, headings, links
  • Similarity scoring: Calculates content similarity percentage
  • Visual diff: Highlights differences between versions
  • SEO comparison: Compares meta tags, titles, descriptions
  • Link verification: Checks for broken links

Migration Verification

Access: http://localhost:3100/migration-verification.html

Features:

  • Bulk URL checking: Verify multiple URLs at once
  • HTTP status codes: Check for 200, 404, 500, etc.
  • Response time tracking: Monitor page load times
  • Redirect detection: Identify redirects and chains

Typical Workflows

1. Processing New URLs

  1. Place raw URLs in URL-RAW/ folder
  2. Run cleaning commands to remove duplicates and filter
  3. Format CSV files to match standard format
  4. Merge files if needed into URL/ folder
  5. Run convert-urls-to-nhghealth.js
  6. Check output in Migrated-URL/ folder
  7. Review in Migration Status dashboard

2. Verifying Migration

  1. Start dashboard: node serve-dashboard.js
  2. Open Migration Status page
  3. Select site and filter by status
  4. Click "Check" to verify URLs
  5. Review results and update status
  6. Export to Excel for reporting

3. Content Comparison

  1. Ensure old and new sites are accessible
  2. Run node compare-pages.js
  3. Open Migration Report page
  4. Review side-by-side comparisons
  5. Check similarity scores
  6. Identify missing content or issues

Data Sources & Files

Folder Structure

URL-RAW/

Purpose: Raw, unprocessed URL lists from SharePoint

Files:

  • WHPartnersCustomAndWebPartPages.csv - WH Partners URLs
  • WHLabsCustomAndWebPartPages.csv - WH Labs URLs

Format: Raw URLs, may contain duplicates, fragments (#), and /_layouts/ paths

URL/

Purpose: Cleaned and formatted URL lists ready for conversion

Files:

  • WHCustomAndWebPartPages.csv - Merged WH URLs
  • WHPartnersCustomAndWebPartPages.csv - Cleaned Partners URLs
  • WHLabsCustomAndWebPartPages.csv - Cleaned Labs URLs
  • NSCCustomAndWebPartPages.csv - NSC URLs
  • Other site-specific CSV files

Format: Standard CSV with columns:

Library,Site,PageType,URL,PageName

Migrated-URL/

Purpose: Converted URLs with old → new mappings

Files: Mirror the URL/ folder structure

Format: CSV with additional columns for new URLs and migration status

migration-reports/

Purpose: Generated comparison reports

Files: JSON and HTML reports from page comparisons

outputs/

Purpose: Crawled page data and screenshots

Structure: Organized by site code and date

*-old/ and *-new/

Purpose: Archived old site data and new site data

Examples:

  • wh-old/, wh-new/
  • imh-old/, imh-new/
  • ktph-old/, ktph-new/
  • ttsh-old/, ttsh-new/
  • nhgp-old/, nhgp-new/

Database Files

*.db files

Purpose: SQLite databases for migration tracking

Tables:

  • sites - Site configurations
  • pages - Page URLs and metadata
  • migration_status - Status tracking

Note: Automatically created by the dashboard server

Configuration Files

sites.json

Purpose: Site configuration and URL mappings

Contains:

  • Site codes (WH, IMH, KTPH, TTSH, NSC, etc.)
  • Old and new domain URLs
  • Site-specific settings

url-matching-rules.js

Purpose: Rules for matching old URLs to new URLs

Contains: Pattern matching logic and transformations

content-extraction-rules.js

Purpose: Rules for extracting content from pages

Contains: CSS selectors and extraction patterns

CSV File Format

Standard URL CSV Format

Library,Site,PageType,URL,PageName
CustomPages,WH,CustomPage,https://www.wh.com.sg/...,PageName

Columns:

  • Library: SharePoint library name
  • Site: Site code (WH, IMH, etc.)
  • PageType: CustomPage, WebPartPage, etc.
  • URL: Full URL of the page
  • PageName: Display name of the page

Migrated URL CSV Format

Additional columns:

  • nhghealth_url: Converted new URL
  • migration_status: Pending, Migrated, Verified, etc.
  • notes: Migration notes and comments

Data Flow

URL Processing Pipeline

URL-RAW/*.csv
    ↓ (clean, filter, deduplicate)
URL/*.csv
    ↓ (convert-urls-to-nhghealth.js)
Migrated-URL/*.csv
    ↓ (serve-dashboard.js)
SQLite Database (*.db)
    ↓ (migration-status.html)
Web Dashboard

Comparison Pipeline

Old Site URLs + New Site URLs
    ↓ (crawl-from-csv.js)
outputs/
    ↓ (compare-pages.js)
migration-reports/
    ↓ (migration-report.html)
Comparison Dashboard