HOUSLY

Real-time Monitoring & Analytics Dashboard

đŸĨ Server Health Monitor

Real-time monitoring & diagnostics for manual ingestion pipeline

Ready

🔍 Full Portal Check

Detailed

Deep diagnostics: network, parser, and freshness health for a single portal

Analyze data from the last N hours
â„šī¸ Diagnostic Layers
  • Network: Fetch health, empty HTML, duplicates, error streaks
  • Parser: Parse rate, field completeness, anomalies
  • Freshness: Processing lag, backlog, throughput

🔧 Debug Dump

Advanced

Export HTML, selectors, and error bundles for investigation

Default: 10 pages. With check-selectors ON: ~20-30 seconds per page.
Creates: extractly/debug_dumps/
â„šī¸ Bundle Contents & Filter Explanation

Each bundle contains:

  • page.html - Raw or sliced HTML content
  • selectors.json - Portal selector configuration
  • info.json - Metadata, errors, selector hit counts

Filter Options - Real Scenarios:

❌ Only unparsed pages has_ad_manual = false

When to use: Parser success rate is LOW (e.g., 1000 pages → 10 ads = 1% success)
What it shows: The 990 FAILED pages where parser ran but created NO ad
Use case: "Why did 990 pages fail? Missing required fields? Broken selectors? Site structure changed?"

âš ī¸ Only pages with errors NetworkPageError records exist

When to use: Error rate is HIGH in logs (lots of 404s, timeouts, exceptions)
What it shows: ONLY pages with logged errors (network failures, parse crashes)
Use case: "What's causing errors? Same pages failing? One portal broken? Need to fix scraper logic?"

🐌 Check selector hits (SLOW) Tests all selectors with BeautifulSoup

When to use: Parser creates ads but fields are EMPTY (lat:0, lon:0, price:null)
What it shows: Hit count for EVERY selector (gas:1, lat:0, lon:0, price:1...)
Use case: "Which selectors work vs broken? lat/lon selectors broken? Need to fix CSS selectors?"
âš ī¸ Speed: 20-30 seconds per page (10 pages = 5+ minutes)

💡 Pro Tip: Combine filters! Check "unparsed + check selectors" to see exactly which selectors fail on failed pages.

✅ Selector Linter

Validation

Validate selector configurations against model schema

â„šī¸ What gets validated
  • Unknown field paths (not in AdsManual model)
  • Invalid configuration properties
  • Reports issues per portal for quick fixes

📚 System Documentation

Health Check Thresholds

Network Layer
  • Empty HTML: <10% healthy, >20% warning
  • Duplicate HTML: >80% critical (captcha/block)
  • Error streak: <10 healthy, >5 warning
Parser Layer
  • Parse rate: >70% healthy, <50% critical
  • Error rate: <15% healthy, >30% critical
  • Critical fields: >75% healthy, <60% warning
Freshness Layer
  • Lag: <4h healthy, >4h critical
  • Backlog: <1000 healthy, >1000 warning
  • Throughput: >5 pages/hour expected

CLI Commands

python manage.py manual_health health --all

Quick health snapshot for all portals

python manage.py manual_health run --portal otodom

Full diagnostics for a specific portal

python manage.py manual_health debug-dump --name otodom --check-selectors

Create debug bundles with selector validation

python manage.py manual_health lint --all

Validate selector configurations

Data Models

  • SourceManual: Portal configuration and selectors
  • NetworkMonitoredPage: Fetched pages with HTML content
  • NetworkPageError: Network and page-level errors
  • AdsManual: Parsed advertisement data