The problem nobody likes to debug
If you’ve ever seen a test pass locally but fail in CI, you already know the problem.
Not timing. Not infra. Not “just rerun it”.
It’s almost always test data.
Most test setups rely on:
- random generators
- manually written fixtures
- partially mocked services
They work. Until they don’t.
Why random data breaks CI
Random data introduces hidden state:
- different values per run
- different edge cases per environment
- no way to reproduce failures locally
When a CI job fails, you often can’t recreate the exact conditions that caused it.
That’s the definition of a flaky test.
Deterministic data changes the workflow
Deterministic test data means:
- same input → same output
- same seed → same dataset
- local dev behaves exactly like CI
Instead of debugging symptoms, you debug real logic errors.
What we changed
We stopped generating random data at test runtime.
Instead, we:
- defined schemas as the source of truth
- generated data deterministically per seed
- consumed data via HTTP inside CI jobs
curl "$TESTSEED_URL/api/seeds/seed_orders/generate?count=50" \
-H "x-api-key: $TESTSEED_API_KEY"Same seed. Same dataset. Every run.
The result
- CI failures became reproducible
- local debugging matched CI behavior
- no more “just rerun the pipeline”
Flaky tests weren’t a tooling problem.
They were a data problem.