A credible funding ask for turning Causal Memory Layer from an early open-source causal audit artifact into a benchmark-backed, externally reproducible research prototype for causal-validity checking in agentic AI workflows.
We request $75k–$100k to expand CML into a reproducible benchmark and external-validation package for detecting causally invalid actions in agentic AI workflows.
AI agents increasingly perform actions, not only generate text. In high-stakes workflows, an action can succeed operationally while lacking valid authorization, approval, intent, or responsibility lineage.
CML already has a Python causal validation and audit engine, causal chain reconstruction utilities, CLI and API surfaces, deterministic benchmark fixtures, tracked benchmark results, Docker demo walkthrough, grant evidence package, explicit non-claims, LTP/CML architecture bridge, and reviewer checklist.
Total cases: 6
Matched cases: 6
Mismatches: 0
Expected passed / failed: 3 / 3
Predicted passed / failed: 3 / 3
30–50 deterministic benchmark fixtures
10–12 causal invalidity failure classes
machine-readable expected findings
Markdown + JSON benchmark reports
2–5 external validation notes
Docker-based reproducibility path
short technical report
clear benchmark limitations and non-claims
A smaller grant can fund maintenance and documentation. A $75k–$100k grant funds a complete evidence package: benchmark taxonomy, fixture expansion, expected finding metadata, report generation, external validation, technical report, API/demo hardening, and integration boundary docs.
The value is not only code. The value is producing a reusable evaluation artifact that other researchers and engineers can run, inspect, critique, and extend.
Move from 6 curated fixtures to 30–50 benchmark fixtures across 10–12 causal failure classes.
Make benchmark outputs easy to reproduce, compare, and cite.
Show that reviewers outside the project can reproduce CML results from public instructions.
Make the demo path reviewer-friendly and reliable on common local environments.
Publish a short technical report explaining the model, benchmark method, results, and limitations.
Clarify how CML relates to LTP, T-Trace, CaPU, TTM DB, logs, observability, and runtime policy systems.
| Category | Amount | Purpose |
|---|---|---|
| Benchmark expansion | $25k | taxonomy, fixtures, expected findings, controls |
| Runner/reporting | $12k | JSON/Markdown reports, metadata, mismatch summaries |
| External validation | $12k | protocol, validator support, reproduction issue handling |
| Docker/API hardening | $8k | walkthrough reliability, payloads, demo scripts |
| Technical report | $10k | methodology, results, limitations, publication-quality draft |
| Maintenance/community | $8k | contributor review, docs polish, CI support |
| Category | Amount | Purpose |
|---|---|---|
| Benchmark expansion | $32k | 50 fixtures, 12 failure classes, stronger controls |
| Runner/reporting | $15k | richer reports, version comparison, machine-readable outputs |
| External validation | $18k | 3–5 validators, validation notes, reproduction issue loop |
| Docker/API hardening | $10k | stronger local demo, API examples, troubleshooting |
| Technical report | $15k | full technical report + publishable artifact |
| Integration boundary docs | $5k | LTP/T-Trace/CaPU boundary clarity |
| Maintenance/community | $5k | contributor coordination and review |
I can run CML locally, reproduce benchmark findings, inspect expected results,
read external validation notes, and understand exactly what the benchmark proves
and does not prove.
This grant does not claim that CML will solve AI alignment, provide certified compliance, replace IAM/SIEM/EDR/observability stacks, prevent all unsafe actions, or prove a deployed AI system is safe.
CML will provide a reproducible benchmark-backed research prototype for causal-validity checking in structured agentic action traces.
We request $75,000 to expand CML into a benchmark-backed, externally reproducible research artifact for causal-validity checking in agentic AI workflows, including 30–50 fixtures, report generation, external validation notes, and a technical report.
We request $100,000 to produce a more complete open-source causal-validity evaluation package for agentic AI workflows, including expanded benchmarks, machine-readable expected findings, external validation across multiple reviewers, Docker/API reproducibility, and a publication-ready technical report.