_verifiability block. It exists for one reason: if you are going to pass this data into a billing decision, a compliance check, or an LLM context window, you need to know when it was extracted, how it was extracted, how confident the extractor is, and where a human can independently verify it. Plain JSON can’t answer those questions. _verifiability can.
The schema
| Field | Meaning |
|---|---|
source_timestamp | UTC ISO-8601 timestamp of when the extraction was run against the upstream source. On cache hits, this is the original extraction time — not the time you received the response. |
extraction_confidence | Float between 0.0 and 1.0. Deterministic ratio of required fields successfully extracted. Never hard-coded. See “How confidence is computed” below. |
raw_data_evidence_url | The URL a human can visit to manually verify this data on the upstream carrier or agency site. For API-mirror extractions, it is the JSON endpoint. For DOM parsing, it is the public-facing HTML page. |
extraction_method | One of api_mirror, playwright_dom, structured_parse, scraper_api, scraper_api_fallback. Tells you how the data was obtained on this specific request. |
data_freshness_ttl_seconds | How long this response is considered fresh in the Redis cache. LTL fuel: 604800 (7 days). ABC license: 86400 (24 hours). |
How confidence is computed
effective_date, fuel_surcharge_pct, and doe_diesel_price_usd — three of three extracted cleanly yields 1.0. Its HTML fallback path only requires effective_date and fuel_surcharge_pct (the HTML page doesn’t expose diesel price), so a successful fallback returns 1.0 as well.
Because the score is a ratio, you can interpret it as “percentage of required fields successfully extracted on this request.” A
0.67 on a three-field extractor means one field was missing. The missing field could be the one you need, so always log the response and gate appropriately.Extraction method — what each value means
api_mirror
api_mirror
The carrier or agency publishes a hidden JSON endpoint used by their own web frontend. NexusFeed calls it directly. This is the most reliable method — no parsing fragility, no rendering overhead. Used by ODFL (primary path) and Averitt Express.
playwright_dom
playwright_dom
The source renders a public HTML table or page. NexusFeed loads the page in a headless Chromium, waits for the relevant selector, and extracts the fields from the DOM. Used by Saia, Estes, R+L, TForce, XPO, SEFL, and the ODFL HTML fallback. Resources (images, stylesheets, fonts, media) are blocked to cut bandwidth.
structured_parse
structured_parse
The source returns an HTML table or a POST response body that can be parsed with BeautifulSoup or lxml without a browser. Used by California ABC, Florida DBPR, Texas TABC (post-CAPTCHA), and New York SLA’s lookup endpoint.
scraper_api
scraper_api
The upstream source is protected by Akamai, Cloudflare, or similar anti-bot infrastructure that rejects both direct and proxied traffic. NexusFeed routes through ScraperAPI’s render API (a managed browser farm with built-in anti-bot bypass) to obtain the HTML, then parses it. Used by FedEx Freight and Illinois ILCC’s Salesforce Experience Cloud portal.
scraper_api_fallback
scraper_api_fallback
The extractor tried a cheaper primary path first (direct httpx or passthrough proxy) and fell back to ScraperAPI after failure. Used by ABF Freight.
How to gate your pipeline
If you are using NexusFeed data in a decision that matters, implement at least one of these gates:Confidence threshold
Reject any response where
_verifiability.extraction_confidence < 0.90. Log the raw response for manual review.Age check
On cache hits, compute
now - source_timestamp. If older than your business requirement (e.g. 48 hours for real-time freight quoting), either accept the staleness explicitly or bypass cache with a fresh request.Method check
For high-stakes use (billing, invoicing), log the
extraction_method. A scraper_api or scraper_api_fallback on a carrier that normally returns api_mirror is a signal the primary source went down — human review may be warranted.What confidence 0.0 means
A confidence score of exactly0.0 indicates the extractor fell back, the upstream source was unreachable, or required fields could not be extracted. NexusFeed converts this to ExtractionError in the router and returns HTTP 503 SOURCE_UNAVAILABLE. You will never receive a 200 response body with extraction_confidence: 0.0 — by design, those become errors so your pipeline cannot silently consume empty data.