Skip to content

athena-sdk-lite — Overview

Version: 0.1.0 Audience: anyone hearing about this for the first time — stakeholders, new engineers, or partner teams evaluating fit.


What it is

A small Python library for building data + AI workflows as DAGs. You import it, declare the nodes you want (a Postgres read, an AI classification, a branch, a transform), wire them with inputs=, and the library produces a workflow object you can validate, visualize, and run locally.

from athena_sdk_lite import Workflow
from athena_sdk_lite.nodes import postgres, ai_tagging, output

with Workflow("ticket-triage") as wf:
    rows = postgres("load", operation="select", query="SELECT ...", connection={...})
    tagged = ai_tagging("classify", inputs=rows, agent_url="https://...")
    output("results", inputs=tagged, format="json")

print(wf.visualize())   # ascii DAG
issues = wf.validate()  # [] if good
result = wf.run()       # local, in-process

That's the whole API a normal user sees. No CLI. No backend. No API key. No managers / mixins / codegen.

What problem it solves

Today, "build a pipeline that touches a database and an AI model" is solved by writing scattered scripts — one for the DB pull, one to call the model, one to format output, glued together with cron and a Slack message. Each script is bespoke; there's no shared shape; testing is ad-hoc; reasoning about what runs when is hard.

This SDK gives a single shape — Workflow of typed nodes — that:

  • Reads top-to-bottom like a script (no hidden framework magic)
  • Validates structurally before you run it (wf.validate())
  • Renders an ASCII diagram on demand (wf.visualize())
  • Runs locally with no service dependency
  • Can be extended without monkey-patching when you outgrow the 11 built-in node helpers

Who uses it

  • End users building one-off or recurring workflows in Python. They write 5–50 line scripts using the 11 starter helpers.
  • Wrapper authors packaging domain logic (e.g. pharma-workflows, marketing-pipelines) on top. They register custom helpers, compose sub-workflows, and add lifecycle hooks. End users of their package never see the wrapping layer.

Position in the broader stack

                   ┌──────────────────────────────┐
                   │   end-user Python script     │   ← you import & write here
                   │   (or wrapper package)       │
                   └──────────────┬───────────────┘
                   ┌──────────────▼───────────────┐
                   │       athena-sdk-lite        │   ← thin, obvious surface
                   │       (this package)         │
                   └──────────────┬───────────────┘
                   ┌──────────────▼───────────────┐
                   │   vendored _engine/          │   ← workflow execution
                   │   (from athena-sdk)          │
                   └──────────────┬───────────────┘
              ┌───────────────────┼───────────────────┐
              ▼                   ▼                   ▼
        ┌──────────┐        ┌──────────┐        ┌──────────┐
        │ Postgres │        │   S3     │        │  Athena  │
        │          │        │          │        │  agent   │
        └──────────┘        └──────────┘        └──────────┘

The library is local-only by design. There is no service to deploy, no backend to call. The vendored engine inside _engine/ does the actual node execution; the SDK is a typed builder on top.

What's in the box (the starter set)

Helper Purpose
pubmed Biomedical literature search
postgres DB select / insert / update / upsert
s3 Object storage read / write
local_file Read CSV / JSON / Excel from the local filesystem
http Generic HTTP request
ai_tagging Athena agent / classification call
filter Row filter (eq/gt/contains/...)
transform User-supplied Python code
output Terminal sink (json/csv/text)
branch Two-way conditional (engine if node)
merge Fan-in (join or concat)

For anything outside these 11, the escape hatch wf.add_node(name, type, category, config, inputs) reaches any node type the underlying engine supports.

What it does NOT do

Stated up front so expectations are clear:

  • No remote execution. Workflows run in your Python process. Production deployment is a separate concern (see the full athena-sdk or nexus-backend for hosted execution).
  • No scheduler. Cron, Airflow, or a wrapping process supplies the trigger.
  • No registry/UI. Workflows live as Python files in your repo.
  • No state store. Each run is independent; persistence is the user's responsibility (write to Postgres, S3, etc. via the node helpers).
  • No CLI. It's a Python library. Compose with subprocess, make, or your own entrypoint if you need command-line invocation.

When to use it vs. something heavier

Use this when Reach for something heavier when
Workflow runs in one Python process You need distributed execution across machines
You want to author and test locally You need a UI / registry / scheduler
The 11 helpers + escape hatch cover your nodes You need first-class support for many bespoke node types
You want stakeholders to read the workflow code You need non-engineers to author workflows

Pointers

  • Architecture (how the parts fit): architecture.md
  • Reference / how to use each feature: technical.md
  • Worked examples: examples/01_pubmed_to_ai.py through examples/11_triage_pipeline.py
  • Public-API source files (read these first if you're contributing): src/athena_sdk_lite/workflow.py, nodes.py, _compat.py