One clear decision
for every AI change.

Turn evals, A/B tests, and human review into a single ship / get approval / block outcome. Rules you control. Full audit trail. One binary — no npm, no dashboards.

Single binarySignals + policyGitHub Actions
terminal
The Problem

Evals say one thing. A/B says another. Review flags an edge case.

So: do you ship or not? Today that call happens in Slack — inconsistent, hard to audit, easy to forget.

Conflicting signals

Eval improved, engagement dropped, review flagged risk. No single place to reconcile them.

Decisions in Slack

Release calls are ad-hoc. No version-controlled rules, no clear pass / block / get-approval.

No audit trail

When compliance asks 'why did this ship?', teams dig through threads and hope someone remembers.

The Solution

One place to write the rules. One clear answer.

Geval consumes signals (evals, A/B, review) and your policy YAML. Every run returns PASS, REQUIRE_APPROVAL, or BLOCK — with a full audit trail.

01

Define

Policy in YAML

Write rules in priority order: metric, threshold, action. Version it. Review it. Commit it.

policy.yaml
policy:
environment: prod
rules:
- priority: 1
name: business_block
when:
metric: engagement_drop
operator: ">"
threshold: 0
then:
action: block
- priority: 2
name: hallucination_guard
when:
component: generator
metric: hallucination_rate
operator: ">"
threshold: 0.05
then:
action: block
Features

Decision layer, not eval runner.

Signals + Policy

Feed evals, A/B, and review as JSON. Define rules in YAML. Priority order, first match wins.

Signals + Policy

Signals are any evidence you use: eval metrics, A/B results, human review flags. Policy is version-controlled YAML with priority rules. Business over evals, safety over everything — you encode it. No scoring, no ML, just deterministic rules.

Single Binary

Download and run. No npm, no pip, no runtime. One binary for Linux, macOS, Windows.

Single Binary

Geval is a single static binary. Download from GitHub Releases, add to PATH, run in CI or locally. Your evals and scripts produce signals.json; Geval reads it and applies your policy. No dashboards, no APIs, no vendor lock-in.

CI/CD Native

Exit codes 0 / 1 / 2. Gate merges on PASS. Works with GitHub Actions, GitLab, any CI.

CI/CD Native

PASS (0), REQUIRE_APPROVAL (1), BLOCK (2). Use these in your pipeline to allow or block merges. One command in your workflow. No extra services. Works everywhere shell commands run.

Eval-Agnostic

Geval doesn't run evals. It consumes their results. Use Promptfoo, LangSmith, or your own.

Eval-Agnostic

You run evals with your tools; you produce signals (JSON). Geval only reads those files and applies your policy. So evals answer 'what happened?'; Geval answers 'given what happened, are we allowed to ship?'

Open Source

MIT. Inspect, audit, fork. No black box on your release pipeline.

Open Source

The decision engine is fully open source. Audit every rule. Fork and customize. Run air-gapped. Your release decisions are too important for opaque tooling.

Audit Trail

Every run logs policy hash, signals hash, decision, matched rule, timestamp.

Audit Trail

Decisions are written to .geval/decisions/. When compliance asks 'why did this ship?' or 'who approved it?', you have the artifact. Policy and signals are hashed; approvals are explicit with reason and timestamp.

Why Geval

Evals answer "What happened?" Geval answers "Are we allowed to ship?"

Eval & observability tools

  • Run evals, show metrics
  • Dashboards & score tracking
  • Manual review workflows
  • Post-hoc analysis

Geval

  • Consumes signals (no eval running)
  • One decision: PASS / APPROVAL / BLOCK
  • Policy in YAML, version-controlled
  • Audit trail with hashes & timestamps
Install

Download and run

Single binary for Linux, macOS, and Windows. No npm, no pip. Pick your OS and add to PATH.

# Linux (x86_64)

curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval

# macOS (Apple Silicon)

curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval

# Then run

geval check --signals signals.json --policy policy.yaml --env prod

1
Binary
0
npm / pip
3
Outcomes (PASS / APPROVAL / BLOCK)
MIT
License