AI Lawmaking and AI Arbitration test results.

This section separates system validation from research papers. It contains benchmark reports, recorded runs, and the public evidence room where the path from input material to final legal artifact can be traced.

Open public evidence room Book demo

AI Lawmaking empirical benchmark

AI Lawmaking: Empirical and GPT Baseline Comparison

This report evaluates twenty final bills produced by the AI Lawmaking pipeline against real legislative regimes with known empirical results. The cases include historically effective practices and historically weak or adverse practices, from the Acid Rain Program and EITC to Prohibition, Three Strikes, demonetization, and No Child Left Behind.

A separate comparison uses GPT-5.5 Extra high outputs for the same normative requests. The result highlights a substantive difference in regulatory priority: GPT often selected familiar legislative templates, while AI Lawmaking more often designed around consequences, failure modes, side effects, actors, constraints, and causal mechanisms.

282 pages20 legislative casesGPT-5.5 baseline

Open PDF

AI Lawmaking benchmark

AI Lawmaking Test Results

The report tests JudgeAI’s legislative mode on 22 regulatory tasks. The system receives a normative problem, selects a normative universe, assembles a legislative report, generates strict bill text, passes legislative counsel validation, and produces a methodological appendix.

The validation matters because the output is a full bill with an explainable method of choice. The benchmark records how the system builds regulatory architecture from consequences, constraints, actors, and admissible options for a legal regime.

79 pages22 regulatory tasksfull bill text

Open PDF

AI Arbitration benchmark

JudgeAI as AI Judge Test Results

The report tests the arbitration mode on 92 cases from courts, arbitral tribunals, and international dispute-resolution bodies. The data room accepts party positions and documents; the pipeline extracts claims and defenses, sets issue filters, evidence criteria, factual chronology, legal criteria, and a consequence model.

The final artifact is a draft final award with legal decision structure. The validation shows how the system connects case materials, admissible legal worlds, party behavior, and consequence evaluation into one reproducible route of reasoning.

75 pages92 casesdraft final award

Open PDF

Public evidence room

Executable validation trace

PDF reports record the test results, while the public evidence room shows materials inside the platform interface: inputs, intermediate artifacts, final documents, and limitations observed during system runs.

Go to evidence room