The Agent Reliability Stack

Five small, focused libraries that fix the boring problems every long-running AI agent eventually hits. Pure Python, zero runtime deps. BYO LLM.

fit→ guard→ snap→ vet→ cast

The five libraries

🪟 fit

Fit a chat history into a token budget.

Drop-oldest, drop-middle, or priority-based truncation. Pluggable tokenizers. Preserves system + last-N messages.

PyPI · Token counter

🛡️ guard

Network egress firewall for agent tools.

Declarative allow/deny lists for any URL the agent tries to fetch. Throws on violation, before the request leaves your process.

PyPI

📸 snap

Diff two tool-call traces.

Catch silent regressions in agent pipelines: same input, different tools, different output. Snapshot tests for agents.

PyPI · Sample traces

✅ vet

Validate tool-call args before execution.

Wraps any tool. Returns LLM-friendly retry hints when args are wrong, so the model can self-correct.

PyPI

🎯 cast

Extract JSON from messy LLM output.

Tolerant extractor that handles fenced blocks, prose-wrapped JSON, refusals. Validates against a shape.

PyPI · JSON extractor

Install

pip install mk-agentkit         # all five
pip install agentfit-py         # individual

npm install @mukundakatta/agentkit   # all five
npm install @mukundakatta/agentfit   # individual

Try the live demos

agent-stack-demo — all 5 libs in one Space
token-counter — across 5 model families
json-extractor — for messy LLM text
pii-redactor — find emails / secrets / IDs
prompt-injection-detector — heuristic scanner
mcp-config-validator — sanity-check MCP configs

Datasets

13 public datasets covering agent traces, prompt injections, MCP configs, hallucination cases, and more. Browse all on HuggingFace →