GitHub - harbor-framework/harbor-cookbook: Realistic examples of building evals and optimizing agents with Harbor (original) (raw)

Docs

Realistic examples of building evals and optimizing agents using Harbor.

Getting Started

Install Harbor:

Run any task recipe:

harbor run -p harbor_cookbook/recipes/ -a claude-code -m anthropic/claude-opus-4-6

Task Recipes

Name Description
simple‑task Minimal single-container task.
multi‑container Docker Compose task where the agent interacts with a locally hosted REST API.
mcp‑tools Giving the agent custom tools via a locally hosted FastMCP server.
multi‑reward Multiple independent verifiers each producing their own score.
simulated‑user Agent discovers requirements by talking to a simulated user.
computer‑use‑ubuntu Computer use reference implementation on an Ubuntu virtual desktop.
computer‑use‑windows Computer use reference implementation on a remote Windows desktop (Daytona).
dns‑blacklisting Network-level hostname blacklisting with exact, wildcard, and regex rules.
skills Giving agents access to custom skills.
multi‑step Ordered multi-step task with per-step instructions, tests, workdir uploads, healthcheck, early stopping, and per-step artifacts.

Optimization Examples

Name Description
harbor‑rl RL training on Harbor tasks using harbor.rl + Tinker.
gepa Agent harness optimization for MedAgentBench using Harbor+GEPA.
tinker‑rl RL training on Harbor tasks using the Tinker Cookbook integration.
prime‑rl RL training on Harbor tasks using Prime RL and Verifiers.
sky‑rl RL training on Harbor tasks using SkyRL.