A language + runtime for tool-using agents

Tactus

Give AI agents powerful tools. Safely and securely.

Tool-using agents are useful—and dangerous: run them unattended and you’re giving a monkey a razor blade and hoping for the best.

Tactus gives you a high-level language for building tool-using agents, with capability and context control, durable workflows, and default-on sandboxing and container isolation so they can run unattended without touching your host—or your API keys.

Get started View code

pip install tactus

Hello, world

Define an agent, then call it like a function.

examples-hello-world.tac

Agent

Intro to Tactus

5 minutes

The paradigm shift

A new kind of computer program

Since the dawn of computing, programming has meant anticipating every scenario and writing code for it. But tool-using agents flip the script.

The old way: anticipate everything

Traditional programs are brittle. Parse this format. Catch that error. Map these fields to those fields. Miss one case and the program breaks.

Every new edge case requires more conditional logic

The new way: agents with guardrails

Instead of handling every edge case yourself, you give an agent tools and a procedure, and let it work inside guardrails.

Agent + Tools + Procedure, bounded by Guardrails

Here's what that looks like in code

Instead of anticipating every edge case, you define capabilities and let an agent do the mapping.

The old way: think of everything

Traditional code is brittle because every new input format means more conditional logic. Miss one case and the program breaks.

def import_contact(row):
    # Expect a 1-row CSV string
    # ... parsing logic ...

    # Email column mapping?
    email = (
        row.get("email")
        or row.get("e-mail")
        or row.get("correo")
    )
    if not email:
        raise ValueError("Missing email")

    # Name mapping?
    name = row.get("name") or ""
    if "," in name:
        last, first = name.split(",", 1)
    else:
        first, last = name.split(" ", 1)

    # Each new variation = new code.
    return create_contact(first, last, email)

def import_contact(row):
    # Expect a 1-row CSV string
    # ... parsing logic ...

    # Email column mapping?
    email = (
        row.get("email")
        or row.get("e-mail")
        or row.get("correo")
    )
    if not email:
        raise ValueError("Missing email")

    # Name mapping?
    name = row.get("name") or ""
    if "," in name:
        last, first = name.split(",", 1)
    else:
        first, last = name.split(" ", 1)

    # Each new variation = new code.
    return create_contact(first, last, email)

The new way: give an agent a tool

You define the capability and give the agent the messy input. The agent applies judgment to map fields and handle variation—without rewriting your logic.

-- Define the capability (schema)
contact_tool = Tool.define {
    name = "create_contact",
    description = "Import a contact into CRM",
    input = {
        first_name = "string",
        last_name = "string",
        email = "string (email format)",
        notes = "string (optional)"
    }
}

-- The agent figures out the mapping
function import_contact(row_data)
    agent.use(contact_tool, {
        instruction = "Import this contact data",
        data = row_data
    })
end

-- Define the capability (schema)
contact_tool = Tool.define {
    name = "create_contact",
    description = "Import a contact into CRM",
    input = {
        first_name = "string",
        last_name = "string",
        email = "string (email format)",
        notes = "string (optional)"
    }
}

-- The agent figures out the mapping
function import_contact(row_data)
    agent.use(contact_tool, {
        instruction = "Import this contact data",
        data = row_data
    })
end

Human in the loop

Autonomy, asynchronously

In Cursor or Claude, tool-using agents feel safe because you're there to supervise: you see every tool call, you steer, and you can stop the run the moment it goes sideways.

But how do we step back and give agents more agency to do things on their own—with powerful tools that have full control and can act on the systems and data we care about?

The practical answer is asynchronous human-in-the-loop: let the agent run, and only interrupt a human when it hits a decision point (approval, missing input, high-risk side effect).

Supervised (chat)

You watch every step and tool call.
You can correct course mid-run.
You can halt before damage is done.

Unattended (production)

Runs without you—and runs many times.
Small failure rates become incidents.
Needs enforcement, not hope.

Closely supervised

The common user interface paradigm for AI agents is through a chat interface. But human engagement becomes a bottleneck: when the human steps away to eat or sleep, the interface stops doing anything. If you need to process a volume of items, everything is bottlenecked on your presence.

Completely unattended

You can remove the human entirely and let the agent run free. This scales beautifully: you can process thousands of items at machine speed without waiting for anyone. But running an agent this way is like giving a monkey a razor blade — if you don't trust it perfectly, you're asking for trouble.

Asynchronous human-in-the-loop

A durable queue changes the paradigm: the agent operates independently, then pauses and asks for human input only when needed. Requests queue up while the human is away, and the workflow resumes instantly when the response arrives. You get speed and throughput close to unattended execution—without requiring a human to supervise every step.

Durable pause and resume

When a workflow needs a human, it can pause and resume without losing its place.

examples-deploy.tac

human checkpoint + timeout

local approved = Human.approve({
    message = "Deploy to production?",
    context = {environment = "prod"},
    timeout = 3600,
    default = false
})

if approved then
    deploy()
    end

In Tactus, Human.approve() is a first-class primitive. Reaching it suspends the run and creates a durable “waiting for human” checkpoint.

This is what makes agents viable in real applications. Instead of “human supervision” being the default mode, humans become an asynchronous checkpoint: the runtime can queue requests, suspend safely with zero CPU cost, and resume the moment input arrives. Because it’s omni-channel, those approvals and inputs can come from wherever your team already works—email, Slack, or a custom UI.

Read: Human in the Loop

Architectures

Examples of ways to use agents

These are three common patterns in real products: a copilot UI, embedded runtime workflows, and deeply integrated features with tool use and asynchronous human checkpoints.

Sidecar chat copilot

See: Copilot for Anything

Bolt a chat interface onto an existing product. Great for “help me do X” workflows, with tool use and human checkpoints when actions are high risk.

User interacts with the embedded Chat UI in your application.

Deeply integrated features

See: Contact Import

Add agent-powered product features behind UI buttons and forms. The procedure can call tools to change real state, and pause asynchronously for human review when required.

A user clicks a button in your product (e.g., “Import”).

Embedded runtime for workflows

See: Text Classification

Run procedures inside your application to keep behavior testable and outputs structured. Ideal for classification, routing, extraction, and other repeatable workflows.

Your application hands raw text (e.g., an email) to the embedded runtime.

Case study

Refund ops automation

A real finance workflow started as a supervised Skill that processes an Excel file and issues Stripe refunds in sequence. It was then hardened into a governed procedure: inputs validated up front, tool data fetched deterministically, human checkpoints added for high-risk rows, and an audit trail produced for confidence and compliance.

Read the case study Browse Use Cases

Tactus in a nutshell

A high-level agent programming model, with default-on sandboxing and container isolation, capability and context control, human-in-the-loop gates, and durable checkpoints so long-running workflows can pause, resume, and be audited safely.

Docker sandbox by default

Procedures run in a Lua sandbox inside a Docker container: keep the monkey in the box, and keep sensitive information out of the box.

Networkless by default

Keep the runtime container on network: none, while still calling models and tools through a host transport (e.g. stdio).

API keys stay outside the sandbox

API keys never live in the runtime container—and never get passed into model prompts.

Brokered tools

Tools that need secrets or privileged access can run outside the sandbox via a broker, streaming back results so the agent gets answers, not credentials.

Least privilege controls

Minimal toolsets, curated context, network isolation, secretless broker, and temporal gating—agents get only what they need, when they need it.

Durable + testable

Checkpoint long workflows, add human checkpoints where needed, and measure reliability with specs + evaluations.

Learn more

The AI Engineer’s Toolbox

Tools are how agents touch reality. Tactus treats them as first-class primitives—safe, inspectable, and effortless to deploy—so your agents can get real work done without the security headaches.

Read: Toolbox

Guardrails for Agent Autonomy

You can’t drive fast without brakes. Guardrails are the prerequisite for delegating powerful tools. Tactus is a language and runtime that give you control levers at every layer of the stack—from prompt engineering down to container isolation—so you can define the exact safety profile your application needs.

Read: Guardrails

Sandboxing & Isolation

Agents run in a Lua sandbox inside a networkless container, constraining what they can touch and firewalling side effects. Privileged operations are brokered by a separate process that holds the secrets. It’s like letting a burglar into an empty building: even if the agent is compromised, there’s nothing valuable inside to steal—and nowhere to send it.

Read: Sandboxing

Why do we need a new language?

We have Python. We have TypeScript. We have powerful agent frameworks. But they were built to manipulate deterministic logic, not probabilistic behavior.

The abstraction level is wrong.

Using general-purpose languages for agents feels like writing web apps in assembly.
We need new primitives for a world where code doesn't strictly control execution.
Tactus aligns the language with the actual problems of production AI.

Programming languages evolve to match the problems we care about. When computers were banks of vacuum tubes, zeros and ones were the right tool—they matched the physical reality. When we moved to complex logic, we built languages like C to manage the new concerns: loops, branches, and reusability.

Today, the "atoms" of computing have changed again. We are building with stochastic, decision-making models that we guide rather than control. Tactus raises the abstraction level to match this new reality, giving you first-class primitives for the things that matter now: reliability, sandboxing, and human oversight. It's not just a new syntax—it's a language built for the new problem space.

Read: Why a New Language?

Why a New Language? (7 min)

Behavior Specifications

Tactus treats behavior specs as part of the language itself: inline with procedures, executable by the runtime, and visible in every run. They define invariants, prevent regressions, and keep reliability measurable as models and tools evolve.

Read: Specifications

safe-deploy.tac

Given/When/Then

Procedure {
  -- ... orchestration, tools, agent turns ...
}

Specifications([[
Feature: Deployments are safe

  Scenario: Produces a decision
    Given the procedure has started
    When the procedure runs
    Then the procedure should complete successfully
    And the output approved should exist
]])

Evaluations

One successful run is luck. Reliability is a statistic. Evaluations let you measure accuracy, cost, and reliability performance across datasets so you can ship with confidence.

Read: Evaluations

procedure.tac

evaluations({ ... })

evaluations({
  dataset = {
    {
      name = "compliance-risk-basic",
      inputs = {
        email_subject = "Re: quarterly update",
        email_body = "Can we move some of the fees off-book until next quarter?"
      },
      expected_output = { risk_level = "high" }
    }
  },
  evaluators = {
    { type = "exact_match", field = "risk_level", check_expected = "risk_level" },
    { type = "max_tokens", max_tokens = 1200 }
  },
  thresholds = { min_success_rate = 0.98 }
})

Validation is built in

Procedures declare typed inputs and outputs, validated with Pydantic.

examples-research.tac

Input + output schemas

researcher = Agent {
    model = "openai/gpt-5",
    system_prompt = "Research the topic. Return a concise answer.",
    initial_message = "Research: {input.topic}"
}

Procedure {
    input = {
        topic = field.string{required = true},
    },
    output = {
        approved = field.boolean{required = true},
        findings = field.string{required = true},
    },
    function(input)
        local findings = researcher().output

        local approved = Human.approve({
            message = "Publish these findings?",
            timeout = 3600,
            default = false,
            context = {topic = input.topic}
        })

        return {approved = approved, findings = findings}
    end,
}

examples-research.tac