The verifiability contract

Every NexusFeed response carries a _verifiability block. It exists for one reason: if you are going to pass this data into a billing decision, a compliance check, or an LLM context window, you need to know when it was extracted, how it was extracted, how confident the extractor is, and where a human can independently verify it. Plain JSON can’t answer those questions. _verifiability can.

The schema

{
  "_verifiability": {
    "source_timestamp": "2026-04-10T14:22:11Z",
    "extraction_confidence": 0.97,
    "raw_data_evidence_url": "https://www.abc.ca.gov/licensing/license-lookup/?RPTTYPE=15&DBANAME=TRADER%20JOE",
    "extraction_method": "structured_parse",
    "data_freshness_ttl_seconds": 86400
  }
}

Field	Meaning
`source_timestamp`	UTC ISO-8601 timestamp of when the extraction was run against the upstream source. On cache hits, this is the original extraction time — not the time you received the response.
`extraction_confidence`	Float between 0.0 and 1.0. Deterministic ratio of required fields successfully extracted. Never hard-coded. See “How confidence is computed” below.
`raw_data_evidence_url`	The URL a human can visit to manually verify this data on the upstream carrier or agency site. For API-mirror extractions, it is the JSON endpoint. For DOM parsing, it is the public-facing HTML page.
`extraction_method`	One of `api_mirror`, `playwright_dom`, `structured_parse`, `scraper_api`, `scraper_api_fallback`. Tells you how the data was obtained on this specific request.
`data_freshness_ttl_seconds`	How long this response is considered fresh in the Redis cache. LTL fuel: 604800 (7 days). ABC license: 86400 (24 hours).

How confidence is computed

def compute_confidence(required_fields, found_fields, fallback_triggered=False):
    if fallback_triggered:
        return 0.0
    return round(
        len([f for f in required_fields if f in found_fields]) / len(required_fields),
        2,
    )

Each extractor has a fixed list of required fields. ODFL’s primary JSON path requires effective_date, fuel_surcharge_pct, and doe_diesel_price_usd — three of three extracted cleanly yields 1.0. Its HTML fallback path only requires effective_date and fuel_surcharge_pct (the HTML page doesn’t expose diesel price), so a successful fallback returns 1.0 as well.

Because the score is a ratio, you can interpret it as “percentage of required fields successfully extracted on this request.” A 0.67 on a three-field extractor means one field was missing. The missing field could be the one you need, so always log the response and gate appropriately.

Extraction method — what each value means

api_mirror

The carrier or agency publishes a hidden JSON endpoint used by their own web frontend. NexusFeed calls it directly. This is the most reliable method — no parsing fragility, no rendering overhead. Used by ODFL (primary path) and Averitt Express.

playwright_dom

The source renders a public HTML table or page. NexusFeed loads the page in a headless Chromium, waits for the relevant selector, and extracts the fields from the DOM. Used by Saia, Estes, R+L, TForce, XPO, SEFL, and the ODFL HTML fallback. Resources (images, stylesheets, fonts, media) are blocked to cut bandwidth.

structured_parse

The source returns an HTML table or a POST response body that can be parsed with BeautifulSoup or lxml without a browser. Used by California ABC, Florida DBPR, Texas TABC (post-CAPTCHA), and New York SLA’s lookup endpoint.

scraper_api

The upstream source is protected by Akamai, Cloudflare, or similar anti-bot infrastructure that rejects both direct and proxied traffic. NexusFeed routes through ScraperAPI’s render API (a managed browser farm with built-in anti-bot bypass) to obtain the HTML, then parses it. Used by FedEx Freight and Illinois ILCC’s Salesforce Experience Cloud portal.

scraper_api_fallback

The extractor tried a cheaper primary path first (direct httpx or passthrough proxy) and fell back to ScraperAPI after failure. Used by ABF Freight.

How to gate your pipeline

If you are using NexusFeed data in a decision that matters, implement at least one of these gates:

Confidence threshold

Reject any response where _verifiability.extraction_confidence < 0.90. Log the raw response for manual review.

Age check

On cache hits, compute now - source_timestamp. If older than your business requirement (e.g. 48 hours for real-time freight quoting), either accept the staleness explicitly or bypass cache with a fresh request.

Method check

For high-stakes use (billing, invoicing), log the extraction_method. A scraper_api or scraper_api_fallback on a carrier that normally returns api_mirror is a signal the primary source went down — human review may be warranted.

Evidence URL in audit trail

Store raw_data_evidence_url alongside the extracted value in your own database. When a dispute arises, you can produce both the value and the source URL that was scraped.

What confidence 0.0 means

A confidence score of exactly 0.0 indicates the extractor fell back, the upstream source was unreachable, or required fields could not be extracted. NexusFeed converts this to ExtractionError in the router and returns HTTP 503 SOURCE_UNAVAILABLE. You will never receive a 200 response body with extraction_confidence: 0.0 — by design, those become errors so your pipeline cannot silently consume empty data.

Documentation Index

​The schema

​How confidence is computed

​Extraction method — what each value means

​How to gate your pipeline

​What confidence 0.0 means

The schema

How confidence is computed

Extraction method — what each value means

How to gate your pipeline

What confidence 0.0 means