Examples / Evaluations / Simple Eval

Simple Eval

Has Specs

A basic evaluation demonstrating core concepts without requiring LLM API calls. This example shows: - Defining inline datasets with test cases - Evaluation syntax with `Evaluation({...})` - Expected output validation - Success criteria based on exact output matching - Running evaluations with `tactus eval`

Source Code

-- Simple Pydantic Evals Demo (No LLM calls)
-- Demonstrates evaluation without requiring OpenAI API

-- Simple procedure that just returns a greeting
Procedure {
    input = {
        name = field.string{required = true}
    },
    output = {
        greeting = field.string{required = true},
        length = field.number{required = true}
    },
    function(input)
        local greeting = "Hello, " .. input.name .. "!"

        return {
            greeting = greeting,
            length = string.len(greeting)
        }
    end
}

Specification([[
Feature: Simple eval procedure

  Scenario: Greets Alice
    Given the procedure has started
    And the input name is "Alice"
    When the procedure runs
    Then the output greeting should be "Hello, Alice!"
    And the output length should be 13
]])

-- Pydantic Evals (output quality)
Evaluation({
    dataset = {
        {
            name = "greet_alice",
            inputs = {name = "Alice"},
            expected_output = {
                greeting = "Hello, Alice!"
            }
        },
        {
            name = "greet_bob",
            inputs = {name = "Bob"},
            expected_output = {
                greeting = "Hello, Bob!"
            }
        },
        {
            name = "greet_charlie",
            inputs = {name = "Charlie"},
            expected_output = {
                greeting = "Hello, Charlie!"
            }
        }
    },

    evaluators = {
        -- Deterministic: Check exact match
        field.equals_expected{},

        -- Deterministic: Check minimum length
        field.min_length{field = "greeting", value = 1},

        -- Deterministic: Check that greeting contains "Hello"
        field.contains{field = "greeting", value = "Hello"}
    },

    runs = 1,
    parallel = true
})

Quick Start

Run the example:

$tactus run 05-evaluations/01-simple-eval.tac

Test with mocks:

$tactus test 05-evaluations/01-simple-eval.tac --mock

View source on GitHub →

Explore more examples

Learn Tactus through practical, runnable examples organized by topic.

Part of the Anthus Platform
Tactus icon

Tactus

Tactus is a programming language and runtime for durable AI agent procedures with checkpointing, sandboxing, and built-in human-in-the-loop controls.

PART OF

The Anthus Platform

Solve complex business problems with AI and ML using a proven, reusable technology stack. These interoperable building blocks give our solutions a stronger operational foundation: durable procedures, MLOps control loops, workload orchestration, knowledge systems, observability, and programmable media workflows.

Plexus

MLOps platform for agent evaluation and iteration.

Tactus

Durable runtime for agent procedures.

Korporus

Agent operating system and federated shell.

Biblicus

Corpus analysis for extraction and retrieval.

Babulus

Marketing automation built around VideoML.

Kanbus

Durable multi-agent task management.

Caducus

Monitoring, alerts, and operator support.

Free and open-source softwareDesigned cybernetically by Ryan Porter
Contact us

How can we help?

GitHub

Browse the code.

LinkedIn

Company updates.

Discord

Join the chat.