⥠Quick Health Check
PrimaryGet instant health status across all portals or specific ones
âšī¸ What this does
- Command:
python manage.py manual_health health - Computes pages fetched, errors, parse rates, and field fill rates
- Error threshold determines healthy/unhealthy status
- Returns ranked list sorted by error rate
đ Full Portal Check
DetailedDeep diagnostics: network, parser, and freshness health for a single portal
âšī¸ Diagnostic Layers
- Network: Fetch health, empty HTML, duplicates, error streaks
- Parser: Parse rate, field completeness, anomalies
- Freshness: Processing lag, backlog, throughput
đ§ Debug Dump
AdvancedExport HTML, selectors, and error bundles for investigation
âšī¸ Bundle Contents & Filter Explanation
Each bundle contains:
page.html- Raw or sliced HTML contentselectors.json- Portal selector configurationinfo.json- Metadata, errors, selector hit counts
Filter Options - Real Scenarios:
has_ad_manual = false
When to use: Parser success rate is LOW (e.g., 1000 pages â 10 ads = 1% success)
What it shows: The 990 FAILED pages where parser ran but created NO ad
Use case: "Why did 990 pages fail? Missing required fields? Broken selectors? Site structure changed?"
NetworkPageError records exist
When to use: Error rate is HIGH in logs (lots of 404s, timeouts, exceptions)
What it shows: ONLY pages with logged errors (network failures, parse crashes)
Use case: "What's causing errors? Same pages failing? One portal broken? Need to fix scraper logic?"
Tests all selectors with BeautifulSoup
When to use: Parser creates ads but fields are EMPTY (lat:0, lon:0, price:null)
What it shows: Hit count for EVERY selector (gas:1, lat:0, lon:0, price:1...)
Use case: "Which selectors work vs broken? lat/lon selectors broken? Need to fix CSS selectors?"
â ī¸ Speed: 20-30 seconds per page (10 pages = 5+ minutes)
đĄ Pro Tip: Combine filters! Check "unparsed + check selectors" to see exactly which selectors fail on failed pages.
â Selector Linter
ValidationValidate selector configurations against model schema
âšī¸ What gets validated
- Unknown field paths (not in AdsManual model)
- Invalid configuration properties
- Reports issues per portal for quick fixes
đ System Documentation
Health Check Thresholds
- Empty HTML: <10% healthy, >20% warning
- Duplicate HTML: >80% critical (captcha/block)
- Error streak: <10 healthy, >5 warning
- Parse rate: >70% healthy, <50% critical
- Error rate: <15% healthy, >30% critical
- Critical fields: >75% healthy, <60% warning
- Lag: <4h healthy, >4h critical
- Backlog: <1000 healthy, >1000 warning
- Throughput: >5 pages/hour expected
CLI Commands
python manage.py manual_health health --all
Quick health snapshot for all portals
python manage.py manual_health run --portal otodom
Full diagnostics for a specific portal
python manage.py manual_health debug-dump --name otodom --check-selectors
Create debug bundles with selector validation
python manage.py manual_health lint --all
Validate selector configurations
Data Models
- SourceManual: Portal configuration and selectors
- NetworkMonitoredPage: Fetched pages with HTML content
- NetworkPageError: Network and page-level errors
- AdsManual: Parsed advertisement data