Independent engineering review

OpenAI Review Suite

A Codex-native engineering review system that inspects real repositories, freezes its findings before reading another model's work, and preserves review, remediation, and verification evidence through GitHub.

Read the project notes Back to work

What It Does

Independent review became a working system.

What began as a second-model check is now a reusable review suite with its own evidence rules, review modes, blind-first protocol, GitHub exchange, and verification lifecycle.

Codex-native engineering review

The suite reviews local repositories, working-tree diffs, pull requests, full codebases, and release readiness across the engineering domains that apply to each project.

Blind-first independence

OpenAI findings are frozen before earlier reviews or another model's conclusions are opened. Reconciliation happens afterward, with the source of every finding preserved.

Review exchange, not copy-paste

OpenAI and Claude use a GitHub-backed exchange with reviewer-owned records for findings, responses, remediation, and verification. The evidence no longer has to be carried manually between chats.

Workflow

The reviewers stay independent before they cooperate.

The system separates discovery, reconciliation, remediation, and verification so one model cannot quietly rewrite another model's work or claim findings it did not independently make.

Choose the review scope

Targeted mode finds the highest-value review lenses. Diff mode inspects a change. Full mode assesses the codebase. Release mode adds deployment, migration, rollback, and operational readiness.

Probe the repository and run the gates

The suite maps the actual project, identifies applicable domains, runs deterministic checks, and collects file, command, runtime, browser, database, and deployment evidence where available.

Freeze the independent findings

Codex completes and hashes its blind ledger before reading Claude findings or prior review conclusions. This protects independent discovery from anchoring and accidental agreement.

Reconcile without rewriting history

The two ledgers are compared after both are frozen. Overlap, prior-only findings, rejected claims, and decisions that held up keep their original provenance.

Remediate, verify, and continue

Findings move through numbered, hash-linked response and verification records. Fixes are checked against primary evidence, and each real review becomes input for improving the suite.

Controls

The controls protect independence and provenance.

The goal is not model agreement. It is a review record that shows what each reviewer found, what evidence supported it, how the project responded, and whether the fix held up.

Prior-review quarantine

Earlier conclusions are treated as untrusted review input until the blind pass is complete. Verification and independent discovery are kept as different claims.

Reviewer-owned evidence

GitHub branch and path rules separate OpenAI findings, Claude submissions, reconciliation, and the later exchange. CI checks ownership, hashes, and target alignment.

Evidence before acceptance

A finding must point to primary evidence or be labeled as an inference. Andrew decides what is fixed, disputed, accepted as a limit, published, or held back.

Technology

The stack tells the product story.

These pages are not replacing the private repositories. They summarize what a reviewer would see there: the architecture choices, evidence surfaces, tests, and boundaries behind the build.

Review modes

Targeted review for the highest-value applicable lenses
Diff review for working-tree changes or pull requests
Full review across the applicable codebase domains
Release review for deployment and operational readiness

Review domains

Correctness, reliability, tests, security, and accessibility
AI governance, data, auditability, and operational controls
Performance, maintainability, architecture, and documentation
Deployment, migration, rollback, recovery, and external state

Evidence model

File-and-line references and reproducible command output
Runtime, browser, database, and deployment observations
Frozen ledgers with content hashes and target-state metadata
Explicit unknowns instead of treating an unrun check as clean

Continuous improvement

Real projects expose gaps in prompts, probes, and review coverage
Rejected findings help distinguish defects from preferences
Re-review tests whether the proposed fix actually closes the issue
The suite evolves without pretending there is an external standard

Repository Signals

What the repository contains.

For now, the repositories stay private while the public site explains the work. These notes summarize the files, routes, docs, checks, and review artifacts without exposing private configuration, credentials, or organization-specific data.

The repository contains the Codex review skill, a conservative repository probe, review-domain catalog, report contract, prior-review isolation rules, and scripts that create and validate review runs.

JSON schemas and executable tests enforce manifest fields, frozen-ledger hashes, reviewer identity, target alignment, branch ownership, append-only exchange sequencing, and hash-linked records.

The GitHub exchange separates OpenAI blind findings, Claude submissions, reconciliation, and later remediation records onto reviewer-owned branches and paths.

The repository contains committed self-review and Claude engineering-plugin review records. The suite has also produced full and remediation reviews of Horizon Scanning locally; those records are complete but not yet published to the review repository.

The `go` entry point resolves the active project and exchange state so Codex or Claude can take the correct next action without Andrew manually transferring findings between conversations.