Tactus
Learn MoreGet StartedDownload

Examples / Evaluations

Evaluations

This chapter explores Tactus evaluations - quantitative assessments of AI workflow performance across datasets.

Examples in This Chapter

Simple Eval

Has Specs

A basic evaluation demonstrating core concepts without requiring LLM API calls. This example shows: - Defining inline datasets with test cases - Evaluation syntax with `Evaluation({...})` - Expected output validation - Success criteria based on exact output matching - Running evaluations with `tactus eval`

View example →

Success Rate

Has SpecsRequires API Keys

Demonstrates success rate calculations and aggregated metrics. This example shows: - Running multiple test cases in a single evaluation - Calculating success rate across dataset - Pass/fail criteria for individual test cases - Aggregated statistics and reporting - Using success rate to measure overall quality

View example →

Thresholds

Has SpecsRequires API Keys

Shows how to set minimum acceptable thresholds for metrics. This example demonstrates: - Defining threshold requirements (e.g., success_rate >= 0.95) - Evaluation failure when thresholds aren't met - Multiple threshold configurations - Using thresholds for quality gates in CI/CD - Balancing strictness with practical tolerance

View example →

← Previous ChapterSpecificationsNext Chapter →Advanced Features

Want to contribute?

The examples repository is open source. Add your own examples or improve existing ones.

View on GitHub

Tactus is free and open source.

View Source
Tactus icon
Code Responsibly
Designed cybernetically
by Ryan Porter