Turn evals, A/B tests, and human review into a single ship / get approval / block outcome. Rules you control. Full audit trail. One binary — no npm, no dashboards.
So: do you ship or not? Today that call happens in Slack — inconsistent, hard to audit, easy to forget.
Eval improved, engagement dropped, review flagged risk. No single place to reconcile them.
Release calls are ad-hoc. No version-controlled rules, no clear pass / block / get-approval.
When compliance asks 'why did this ship?', teams dig through threads and hope someone remembers.
Geval consumes signals (evals, A/B, review) and your policy YAML. Every run returns PASS, REQUIRE_APPROVAL, or BLOCK — with a full audit trail.
Policy in YAML
Write rules in priority order: metric, threshold, action. Version it. Review it. Commit it.
policy: environment: prod rules: - priority: 1 name: business_block when: metric: engagement_drop operator: ">" threshold: 0 then: action: block - priority: 2 name: hallucination_guard when: component: generator metric: hallucination_rate operator: ">" threshold: 0.05 then: action: blockSingle binary for Linux, macOS, and Windows. No npm, no pip. Pick your OS and add to PATH.
# Linux (x86_64)
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval
# macOS (Apple Silicon)
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval
# Then run
geval check --signals signals.json --policy policy.yaml --env prod