Examples / Evaluations
Evaluations
This chapter explores Tactus evaluations - quantitative assessments of AI workflow performance across datasets.
Examples in This Chapter
Simple Eval
A basic evaluation demonstrating core concepts without requiring LLM API calls. This example shows: - Defining inline datasets with test cases - Evaluation syntax with `Evaluation({...})` - Expected output validation - Success criteria based on exact output matching - Running evaluations with `tactus eval`
View example →
Success Rate
Demonstrates success rate calculations and aggregated metrics. This example shows: - Running multiple test cases in a single evaluation - Calculating success rate across dataset - Pass/fail criteria for individual test cases - Aggregated statistics and reporting - Using success rate to measure overall quality
View example →
Thresholds
Shows how to set minimum acceptable thresholds for metrics. This example demonstrates: - Defining threshold requirements (e.g., success_rate >= 0.95) - Evaluation failure when thresholds aren't met - Multiple threshold configurations - Using thresholds for quality gates in CI/CD - Balancing strictness with practical tolerance
View example →
Want to contribute?
The examples repository is open source. Add your own examples or improve existing ones.