Commands & Scripts
URL Processing Scripts
convert-urls-to-nhghealth.js
Purpose: Convert old SharePoint URLs to new NHG Health URLs
How to run:
node convert-urls-to-nhghealth.js
Input: CSV files from URL/ folder
Output: Converted URLs in Migrated-URL/ folder
Conversion Rules:
- Removes
/Pages/from paths - Removes
.aspxextensions - Converts
/Newsroom/to/news/ - Converts
/News-Releases/to/releases/ - Replaces spaces with hyphens
- Converts to lowercase
- For labs/testcatalogue: appends
?SID=NUMBERfrom old URL
convert-wh-doctors-to-nhghealth.js
Purpose: Convert WH doctor profile URLs to NHG Health format
How to run:
node convert-wh-doctors-to-nhghealth.js
Input: WH doctor URLs from CSV files
Output: Converted doctor profile URLs
URL Cleaning Commands
Clean and Filter URLs
Remove duplicates and filter by path:
# For Partners URLs grep "/partners/" URL-RAW/WHPartnersCustomAndWebPartPages.csv | \ cut -d'#' -f1 | sort -u | \ grep -v "/_layouts/" > URL/WHPartnersCustomAndWebPartPages.csv # For Labs URLs grep "/labs/" URL-RAW/WHLabsCustomAndWebPartPages.csv | \ cut -d'#' -f1 | sort -u | \ grep -v "/_layouts/" > URL/WHLabsCustomAndWebPartPages.csv
Merge CSV Files
Merge multiple cleaned CSV files:
awk 'NR==1 || FNR>1' URL/WHPartnersCustomAndWebPartPages.csv \ URL/WHLabsCustomAndWebPartPages.csv > URL/WHCustomAndWebPartPages.csv
Crawling & Comparison Scripts
crawl-from-csv.js
Purpose: Crawl websites based on URLs from CSV files
How to run:
node crawl-from-csv.js <csv-file> <site-code>
Example:
node crawl-from-csv.js example-urls.csv WH
compare-pages.js
Purpose: Compare old and new site pages for migration verification
How to run:
node compare-pages.js
Output: Comparison reports in migration-reports/
migration-compare.js
Purpose: Generate detailed migration comparison reports
How to run:
node migration-compare.js
Server Commands
serve-dashboard.js
Purpose: Start the web dashboard server
How to run:
node serve-dashboard.js
Default port: 3100
Access: http://localhost:3100
crawl-control-server.js
Purpose: Start the crawl control server with UI
How to run:
node crawl-control-server.js
Utility Scripts
verify-images.js
Purpose: Verify image assets in migrated pages
How to run:
node verify-images.js
fix-*-asset-paths.js
Purpose: Fix asset paths for specific sites (IMH, KTPH, TTSH)
How to run:
node fix-imh-asset-paths.js node fix-ktph-asset-paths.js node fix-ttsh-asset-paths.js
Features & Workflows
Migration Status Dashboard
Access: http://localhost:3100/migration-status.html
Features:
- Site Selection: Filter pages by site (WH, IMH, KTPH, etc.)
- Status Filtering: Filter by migration status (Pending, Migrated, Verified, etc.)
- Search: Search by name, path, or URL
- Inline Editing: Edit New URL, Status, and Notes directly in the table
- Export to Excel: Export filtered data to CSV (summary section excluded)
- Check URLs: Verify NHG Health URLs are accessible
- Auto-save: Changes are saved automatically
Page Classification:
- Type: Dynamic or Static
- Dynamic: News, Events, Medical Services, Conditions, Doctors
- Static: Regular content pages
- Category: Based on URL depth and type
- Home, Landing, Detail, SitePages
- News, Doctor, Events, Specialties-Services, Diseases-Conditions
Special URL Handling:
/lab-services-details.aspx?SID=NUMBER→ Dynamic, Detail category/Newsroom/→ converts to/news//News-Releases/→ converts to/releases/
Migration Comparison
Access: http://localhost:3100/migration-report.html
Features:
- Side-by-side comparison: Old vs New page content
- Content extraction: Extracts main content, headings, links
- Similarity scoring: Calculates content similarity percentage
- Visual diff: Highlights differences between versions
- SEO comparison: Compares meta tags, titles, descriptions
- Link verification: Checks for broken links
Migration Verification
Access: http://localhost:3100/migration-verification.html
Features:
- Bulk URL checking: Verify multiple URLs at once
- HTTP status codes: Check for 200, 404, 500, etc.
- Response time tracking: Monitor page load times
- Redirect detection: Identify redirects and chains
Typical Workflows
1. Processing New URLs
- Place raw URLs in
URL-RAW/folder - Run cleaning commands to remove duplicates and filter
- Format CSV files to match standard format
- Merge files if needed into
URL/folder - Run
convert-urls-to-nhghealth.js - Check output in
Migrated-URL/folder - Review in Migration Status dashboard
2. Verifying Migration
- Start dashboard:
node serve-dashboard.js - Open Migration Status page
- Select site and filter by status
- Click "Check" to verify URLs
- Review results and update status
- Export to Excel for reporting
3. Content Comparison
- Ensure old and new sites are accessible
- Run
node compare-pages.js - Open Migration Report page
- Review side-by-side comparisons
- Check similarity scores
- Identify missing content or issues
Data Sources & Files
Folder Structure
URL-RAW/
Purpose: Raw, unprocessed URL lists from SharePoint
Files:
WHPartnersCustomAndWebPartPages.csv- WH Partners URLsWHLabsCustomAndWebPartPages.csv- WH Labs URLs
Format: Raw URLs, may contain duplicates, fragments (#), and /_layouts/ paths
URL/
Purpose: Cleaned and formatted URL lists ready for conversion
Files:
WHCustomAndWebPartPages.csv- Merged WH URLsWHPartnersCustomAndWebPartPages.csv- Cleaned Partners URLsWHLabsCustomAndWebPartPages.csv- Cleaned Labs URLsNSCCustomAndWebPartPages.csv- NSC URLs- Other site-specific CSV files
Format: Standard CSV with columns:
Library,Site,PageType,URL,PageName
Migrated-URL/
Purpose: Converted URLs with old → new mappings
Files: Mirror the URL/ folder structure
Format: CSV with additional columns for new URLs and migration status
migration-reports/
Purpose: Generated comparison reports
Files: JSON and HTML reports from page comparisons
outputs/
Purpose: Crawled page data and screenshots
Structure: Organized by site code and date
*-old/ and *-new/
Purpose: Archived old site data and new site data
Examples:
wh-old/,wh-new/imh-old/,imh-new/ktph-old/,ktph-new/ttsh-old/,ttsh-new/nhgp-old/,nhgp-new/
Database Files
*.db files
Purpose: SQLite databases for migration tracking
Tables:
sites- Site configurationspages- Page URLs and metadatamigration_status- Status tracking
Note: Automatically created by the dashboard server
Configuration Files
sites.json
Purpose: Site configuration and URL mappings
Contains:
- Site codes (WH, IMH, KTPH, TTSH, NSC, etc.)
- Old and new domain URLs
- Site-specific settings
url-matching-rules.js
Purpose: Rules for matching old URLs to new URLs
Contains: Pattern matching logic and transformations
content-extraction-rules.js
Purpose: Rules for extracting content from pages
Contains: CSS selectors and extraction patterns
CSV File Format
Standard URL CSV Format
Library,Site,PageType,URL,PageName CustomPages,WH,CustomPage,https://www.wh.com.sg/...,PageName
Columns:
- Library: SharePoint library name
- Site: Site code (WH, IMH, etc.)
- PageType: CustomPage, WebPartPage, etc.
- URL: Full URL of the page
- PageName: Display name of the page
Migrated URL CSV Format
Additional columns:
- nhghealth_url: Converted new URL
- migration_status: Pending, Migrated, Verified, etc.
- notes: Migration notes and comments
Data Flow
URL Processing Pipeline
URL-RAW/*.csv
↓ (clean, filter, deduplicate)
URL/*.csv
↓ (convert-urls-to-nhghealth.js)
Migrated-URL/*.csv
↓ (serve-dashboard.js)
SQLite Database (*.db)
↓ (migration-status.html)
Web Dashboard
Comparison Pipeline
Old Site URLs + New Site URLs
↓ (crawl-from-csv.js)
outputs/
↓ (compare-pages.js)
migration-reports/
↓ (migration-report.html)
Comparison Dashboard