Three research modules. One typed pipeline.
The DOCBOT Framework is decomposed into three cooperating research modules, each governed by an explicit type contract and each independently evaluable through the unit-evaluation harness described in §6. The decomposition is intentionally symmetric — every module exposes an interface boundary, a computational kernel, a provenance fabric and an emission boundary — which permits the same orchestration, observability and ablation infrastructure to be reused without modification across the three agents. Together they form the canonical configuration of the framework; in isolation each constitutes a falsifiable scientific contribution in its own right.
DOCBOT — Document Intelligence
DOCBOT is the document intelligence layer of the framework. It admits heterogeneous source documents — scanned PDFs, semi-structured forms, machine-readable XML and structurally noisy email bodies — through a single typed acquisition interface and emits schema-bound structured records. Extraction is decomposed into three composable subsystems: a layout analyser that recovers the physical structure of the page, a typed-entity extractor that projects the layout onto a domain-specific schema, and a confidence calibrator that attaches a per-field reliability score derived from agreement across redundant decoders. The module is deliberately agnostic to downstream interpretation: its sole responsibility is to convert unstructured evidence into a typed envelope amenable to formal reasoning further down the pipeline.
Formalises document intelligence as a typed, side-effect-free transformation from unstructured corpora to schema-bound representations admitting compositional reasoning, ablation and audit-grade replay.
# DOCBOT — Document Intelligence Kernel
from typing import Mapping
from docbot.types import (
SourceRef, DocumentEnvelope, ExtractedRecord,
LayoutTree, Typed, ProvenanceTree,
)
def acquire(source: SourceRef) -> DocumentEnvelope:
"""Typed, source-agnostic acquisition boundary."""
raw = source.fetch()
meta = extract_metadata(raw)
return DocumentEnvelope(
payload = raw,
mime = meta.mime,
provenance = meta.provenance,
)
# Composable, pure transformations
pipeline = compose(
parse_pdf,
segment_layout, # → LayoutTree
project_to_schema, # → Mapping[str, Typed]
calibrate_confidence, # redundant-decoder agreement
)
record: ExtractedRecord = pipeline(envelope)
# record.fields :: Mapping[str, Typed]
# record.confidence :: Mapping[str, float ∈ [0,1]]
# record.layout :: LayoutTree
# record.provenance :: ProvenanceTreeSYSTEMBOT — Cross-Source Validation
SYSTEMBOT introduces validation as a first-class pipeline stage rather than a post-hoc quality check. Given an extracted record R and an indexed family of independent evidence sources E = {S₁, …, Sₙ}, it computes an agreement functional a(R, E) weighted by an empirically calibrated reliability prior over the sources and returns a confidence-weighted verdict together with a closed provenance subgraph. The module is designed for fan-in/fan-out topologies in which redundant sources attenuate the variance of any individual extractor; disagreement is resolved through a principled arbitration procedure rather than through ad-hoc heuristics, eliminating the silent failure modes characteristic of first-source-wins strategies.
Provides formal consistency guarantees across heterogeneous evidence sources and converts validation from an implicit assumption of the pipeline into an explicit, evaluable kernel.
# SYSTEMBOT — Cross-Source Validation Kernel
from systembot.types import (
ExtractedRecord, Source, Verdict, Evidence,
)
from systembot.priors import RELIABILITY_PRIOR
THRESHOLD: float = 0.78 # empirically calibrated
def validate(
record: ExtractedRecord,
sources: list[Source],
) -> Verdict:
"""Agreement functional a(R, E) under heterogeneous reliability."""
evidence: list[Evidence] = [
s.lookup(record.key) for s in sources
]
# Weighted agreement under reliability prior
weights = [RELIABILITY_PRIOR[s.id] for s in sources]
score = weighted_agreement(record, evidence, weights)
return Verdict(
consistent = score >= THRESHOLD,
confidence = score,
evidence = evidence,
arbitration = arbitrate_disagreement(record, evidence),
provenance = build_dag(record, evidence, weights),
)
verdict = validate(record, sources=[s1, s2, s3])
# verdict.consistent :: bool
# verdict.confidence :: float ∈ [0,1]
# verdict.provenance :: DAG[Source, Transform]RESTRICTIONBOT — Restriction Analysis
RESTRICTIONBOT encodes operational restrictions as a declarative constraint set C and evaluates it symbolically against the validated record R. The output is an auditable decision-support emission carrying the decision status, the set of violated constraints, and a machine-readable rationale paired with a natural-language justification suitable for downstream human review. The constraint system is constructively monotone in C — adding a restriction never converts a reject into an approve — which guarantees that constraint catalogues can be extended without invalidating prior decisions, a property required for stable longitudinal audit and regulatory compliance.
Enables transparent, restriction-aware decision support with full traceability and longitudinal stability under constraint catalogue evolution.
# RESTRICTIONBOT — Symbolic Restriction Evaluator
from restrictionbot.types import (
ExtractedRecord, Verdict, Constraint,
Decision, Justification, Status,
)
from restrictionbot.catalogue import RESTRICTION_SET
def evaluate(
record: ExtractedRecord,
verdict: Verdict,
rules: frozenset[Constraint] = RESTRICTION_SET,
) -> Decision:
"""Monotone symbolic evaluation of C against (R, verdict)."""
violations: set[Constraint] = {
c for c in rules if not c.holds(record, verdict)
}
status: Status = (
Status.APPROVE if not violations and verdict.consistent
else Status.REVIEW if verdict.confidence >= 0.6
else Status.REJECT
)
return Decision(
status = status,
violations = frozenset(violations),
rationale = Justification.from_violations(violations),
provenance = verdict.provenance.extend(rules),
)
decision = evaluate(record, verdict, RESTRICTION_SET)
# decision.status :: {approve, review, reject}
# decision.violations :: Set[Constraint]
# decision.rationale :: Justification