Validation-Driven Development: Let Your Tests Guide the AI

January 20, 2026 · 3 min read

ralph-starter maintainers

AI can write code fast. But can it write correct code? The answer is yes—if you let your validation pipeline guide it.

The Quality Problem

AI-generated code has a reputation: it looks right but breaks in subtle ways. Missing edge cases. Type errors. Failed tests. The usual response is manual review and iteration.

But what if the AI could iterate itself?

Validation-Driven Loops

ralph-starter runs your validation suite after every coding loop:

ralph-starter run "add payment processing" \
  --test \
  --lint \
  --build \
  --loops 5

Here's what happens:

Loop 1: Implementing payment processing...
  → Running tests... 2 failed
  → Analyzing failures...

Loop 2: Fixing test failures...
  → Running tests... passed
  → Running linter... 3 issues
  → Analyzing issues...

Loop 3: Fixing lint issues...
  → Running tests... passed
  → Running linter... passed
  → Running build... success

✨ Completed in 3 loops

The AI doesn't just generate code—it iterates until validation passes.

Configure Your Validators

Built-in Commands

# ralph.config.yaml
validation:
  test: "npm test"
  lint: "npm run lint"
  build: "npm run build"

Custom Validators

validation:
  test: "pytest -x"
  lint: "ruff check ."
  typecheck: "mypy ."
  security: "bandit -r src/"

Per-Project Overrides

# Override for a single run
ralph-starter run "fix the bug" \
  --test "npm test -- --coverage" \
  --lint "eslint --fix ."

Error Recovery

When validation fails, ralph-starter:

Captures the error output
Analyzes what went wrong
Generates a fix in the next loop
Re-runs validation

Loop 2: Running tests...
  FAIL src/payment.test.ts
    ✕ should handle declined cards (15ms)
      Expected: "DECLINED"
      Received: undefined

Loop 3: Fixing test failure...
  → Added error handling for declined cards
  → Running tests... passed

Best Practices

1. Start with Good Tests

The better your test coverage, the better ralph-starter performs. Tests act as a specification the AI must satisfy.

2. Use Strict Linting

Strict linting catches issues early. Enable type checking, unused variable detection, and style enforcement.

3. Set Appropriate Loop Limits

# Simple fixes: 2-3 loops
ralph-starter run "fix typo in header" --loops 2

# Complex features: 5-7 loops
ralph-starter run "add OAuth integration" --loops 7

# Exploratory: 10+ loops
ralph-starter run "refactor auth system" --loops 10

4. Review the Diffs

Even with validation, review the changes:

# Don't auto-commit—review first
ralph-starter run "add feature" --test --lint

# Check what changed
git diff

# Then commit manually
git add . && git commit -m "feat: add feature"

The Feedback Loop

Validation-driven development creates a tight feedback loop:

Spec → Generate → Validate → Fix → Validate → Ship

Each iteration improves the code. The AI learns from your test suite what "correct" means for your project.

Metrics That Matter

Track your validation-driven development:

ralph-starter run "task" --verbose

Summary:
  Loops: 4
  Test runs: 4 (2 failed, 2 passed)
  Lint runs: 3 (1 failed, 2 passed)
  Build runs: 2 (0 failed, 2 passed)
  Total cost: $0.34
  Time: 2m 15s

Ready to let your tests guide the AI? Configure validation.

The Quality Problem​

Validation-Driven Loops​

Configure Your Validators​

Built-in Commands​

Custom Validators​

Per-Project Overrides​

Error Recovery​

Best Practices​

1. Start with Good Tests​

2. Use Strict Linting​

3. Set Appropriate Loop Limits​

4. Review the Diffs​

The Feedback Loop​

Metrics That Matter​