Examples / Classification / Model Train Imdb Naive Bayes

Model Train Imdb Naive Bayes

Has Specs

Train a real (local) sentiment classifier using Naive Bayes + TF-IDF on the IMDB dataset, register it, then call it from a Procedure. Demonstrates: - A single Model block that includes both runtime config and training config - A real training + evaluation loop via the CLI (`tactus train`, `tactus models evaluate`) - CI-safe specs that test your branching logic via `Mocks { ... }`

Source Code

-- Trainable Model Primitive: Naive Bayes sentiment on IMDB
--
-- This example demonstrates the full "Model lifecycle" in one file:
-- - Declare a registry-backed Model with an explicit training configuration
-- - Use the Model inside a Procedure (your code branches on model output)
-- - Test deterministically with Mocks (CI-safe)
-- - Optionally train + evaluate the real model and run the same Procedure/specs
--
-- Install ML extras (required for real training/inference):
--   pip install tactus[ml]
--
-- Train (writes a version to the registry):
--   tactus train 02-classification/04-model-train-imdb-naive-bayes.tac --model imdb_nb
--
-- Evaluate (optional; compares registered versions/tags against the test split):
--   tactus models evaluate 02-classification/04-model-train-imdb-naive-bayes.tac --model imdb_nb
--
-- Test in mock mode (fast, deterministic, no downloads):
--   tactus test 02-classification/04-model-train-imdb-naive-bayes.tac --mock
--
-- Test in real mode (requires you trained the model first):
--   tactus test 02-classification/04-model-train-imdb-naive-bayes.tac

Model "imdb_nb" {
  type = "registry",
  name = "imdb_nb",
  version = "latest",

  input = { text = "string" },
  output = { label = "string", confidence = "float" },

  training = {
    data = {
      source = "hf",
      name = "imdb",
      train = "train",
      test = "test",
      shuffle = { train = true, test = true },
      limit = { train = 25000, test = 25000 },
      seed = 42,
      text_field = "text",
      label_field = "label"
    },
    candidates = {
      {
        name = "nb-tfidf",
        trainer = "naive_bayes",
        hyperparameters = {
          alpha = 1.0,
          max_features = 50000,
          ngram_min = 1,
          ngram_max = 2
        }
      }
    }
  }
}

Procedure {
  input = {
    text = field.string{
      required = true,
      description = "A movie review to classify as positive/negative"
    }
  },
  output = {
    label = field.string{required = true},
    confidence = field.number{required = false},
    decision = field.string{
      required = true,
      description = "yes | no | review (a simple policy driven by the model output)"
    }
  },
  function(input)
    local classifier = Model("imdb_nb")
    local result = classifier({text = input.text})
    local output = result.output or result

    -- This is the point of the example: your procedure stays deterministic,
    -- and you use the model's output as a signal that drives policy.
    local decision = "review"
    if output.confidence ~= nil and output.confidence >= 0.7 then
      if output.label == "positive" then
        decision = "yes"
      else
        decision = "no"
      end
    end

    return {
      label = output.label,
      confidence = output.confidence,
      decision = decision
    }
  end
}

-- Deterministic model responses for CI-safe specs.
Mocks {
  imdb_nb = {
    conditional = {
      {
        when = {text = "A wonderful movie with great acting."},
        returns = {label = "positive", confidence = 0.92}
      },
      {
        when = {text = "This was a terrible movie with bad acting."},
        returns = {label = "negative", confidence = 0.87}
      },
      {
        when = {text = "A confusing movie with uneven pacing."},
        returns = {label = "positive", confidence = 0.42}
      }
    }
  }
}

Specification([[
Feature: Trainable Model primitive (Naive Bayes IMDB)

  Scenario: Positive review routes to yes
    Given the procedure has started
    And the input text is "A wonderful movie with great acting."
    When the procedure runs
    Then the output decision should be "yes"
    And the output label should be "positive"
    And the procedure should complete successfully

  Scenario: Negative review routes to no
    Given the procedure has started
    And the input text is "This was a terrible movie with bad acting."
    When the procedure runs
    Then the output decision should be "no"
    And the output label should be "negative"
    And the procedure should complete successfully

  Scenario: Low confidence routes to review
    Given the procedure has started
    And the input text is "A confusing movie with uneven pacing."
    When the procedure runs
    Then the output decision should be "review"
    And the output confidence should exist
    And the procedure should complete successfully
]])

Quick Start

Run the example:

$tactus run 02-classification/04-model-train-imdb-naive-bayes.tac

Test with mocks:

$tactus test 02-classification/04-model-train-imdb-naive-bayes.tac --mock

View source on GitHub →

Explore more examples

Learn Tactus through practical, runnable examples organized by topic.