Use Case

Text Classification

A simple, high-signal workflow: take input text and classify it into a small set of labels, with guardrails that keep the output structured and measurable.

Example: Support inbox triage

Imagine a small team triaging a support inbox. If a billing issue gets labeled as a bug, customers wait longer and engineers lose focus. A lightweight classifier gives you fast routing, but it only works if you keep the output constrained and measurable.

Your application hands raw text (e.g., an email) to the embedded runtime.

Procedure {
  input = {
    message = field.string{required = true, description = "Incoming support message"}
  },
  output = {
    label = field.string{required = true, description = "One of: billing, account, bug, other"},
    retries = field.number{required = true, description = "How many retries were needed"}
  },
  function(input)
    local triage = Classify {
      name = "support_triage_llm",
      method = "llm",
      classes = {"billing", "account", "bug", "other"},
      prompt = [[
You are a support triage assistant.
Return only one label from: billing, account, bug, other.
If the request is unclear, choose "other".
]],
      model = "openai/gpt-4o-mini",
      temperature = 0,
      max_retries = 3
    }

    local result = triage(input.message)
    return {label = result.value, retries = result.retry_count}
  end
}

Source: 02-classification/01-support-inbox-triage.tac

Why this works

The label set is explicit, so validation is trivial.
The prompt is domain-specific, not a generic classifier.
temperature = 0 and max_retries = 3 keep output stable.
Behavior specs can lock down edge cases like "double charged" being billing.
Evaluations let you measure accuracy and drift on real tickets.

What the standard library handles for you

The high-level Classify primitive is designed so you can focus on your task (triage, tagging, routing) instead of rewriting the same reliability plumbing every time. Under the hood, it enforces the contract “the answer must be one of these labels” and does the annoying parts: parsing, validation, and retries. You can use method = "llm" for semantic classification or method = "fuzzy" for fast string similarity matching.

LLM classification guardrails

Strict label contract: it only accepts values from your classes list.
Defensive parsing: it extracts the classification from the first line, tolerates common formatting (quotes, markdown, punctuation), and handles cases like "billing - because...".
Retry loop with feedback: if the model responds with an invalid label, it retries up to max_retries and tells the model exactly what went wrong and what labels are allowed.
Structured results: you get value, confidence (optional), explanation, retry_count, and the raw_response for debugging.

Fuzzy matching (no LLM calls)

Fast and offline: string similarity matching is nearly instant and needs no API keys.
Two modes: binary expected matching (Yes/No) or multi-class “best match” from a classes list.
Tunable behavior: adjust threshold and choose an algorithm like token_set_ratio to handle reordering and extra words.

Knobs you can tune (no rewrites)

name: a stable identifier for traces and BDD mocking.
classes: the allowed output space (and your evaluation labels).
prompt: your domain rules and edge-case guidance.
max_retries: how hard to push for a valid label before failing.
temperature: determinism vs. flexibility (lower = more stable).
model: pick a faster/cheaper model for routing, or a stronger one for nuance.
confidence_mode: keep the default heuristic confidence, or disable it for pure labels.

This behavior is centralized and covered by Tactus’s behavior specs, so you get the improvements without duplicating the logic in every procedure. Want the full API and implementation details? Start with the Classification module docs, then check the Classify primitive and LLM classifier source.

What You’re Building

Given a piece of text (an email, a ticket, a note), return a label like spam, support, or billing - and do it reliably.

Guardrails to use

Validation: output must always contain a valid label
Behavior specs: hard rules (no forbidden tools, required fields exist)
Evaluations: measure accuracy and stability across a dataset

You can also mix strategies. For example, use fuzzy matching to cheaply catch “obvious” cases (typos, reordering, abbreviations), then fall back to the LLM for the messy long tail.

Procedure {
  input = {
    message = field.string{required = true, description = "Incoming support message"}
  },
  output = {
    label = field.string{required = true, description = "One of: billing, account, bug, other"},
    path = field.string{required = true, description = "Which classifier decided: fuzzy or llm"}
  },
  function(input)
    local fuzzy = Classify {
      method = "fuzzy",
      classes = {"double charged", "refund", "invoice", "login", "password reset", "crash", "error"},
      threshold = 0.90,
      algorithm = "token_set_ratio"
    }

    local llm = Classify {
      name = "support_triage_llm_fallback",
      method = "llm",
      classes = {"billing", "account", "bug", "other"},
      prompt = "Classify this support message into one label: billing, account, bug, other.",
      model = "openai/gpt-4o-mini",
      temperature = 0,
      max_retries = 3
    }

    local fuzzy_result = fuzzy(input.message)
    if fuzzy_result.value ~= "NO_MATCH" then
      local v = fuzzy_result.value
      if v == "double charged" or v == "refund" or v == "invoice" then
        return {label = "billing", path = "fuzzy"}
      end
      if v == "login" or v == "password reset" then
        return {label = "account", path = "fuzzy"}
      end
      if v == "crash" or v == "error" then
        return {label = "bug", path = "fuzzy"}
      end
    end

    local llm_result = llm(input.message)
    return {label = llm_result.value, path = "llm"}
  end
}

Source: 02-classification/02-composite-fuzzy-then-llm.tac

More runnable examples

Download and run these locally: Examples
Source repo: github.com/AnthusAI/Tactus-examples
Support inbox triage: 02-classification/01-support-inbox-triage.tac
Compose fuzzy matching and LLM classification: 02-classification/02-composite-fuzzy-then-llm.tac

Next we can add a runnable spec suite and a small evaluation dataset to measure stability over time.

Browse more use cases

Pick another workflow pattern to learn.

Use Cases