GitHub - harbor-framework/harbor-cookbook: Realistic examples of building evals and optimizing agents with Harbor (original) (raw)

Realistic examples of building evals and optimizing agents using Harbor.

Getting Started

Install Harbor:

Run any task recipe:

harbor run -p harbor_cookbook/recipes/ -a claude-code -m anthropic/claude-opus-4-6

Task Recipes

Name	Description
simple‑task	Minimal single-container task.
multi‑container	Docker Compose task where the agent interacts with a locally hosted REST API.
mcp‑tools	Giving the agent custom tools via a locally hosted FastMCP server.
multi‑reward	Multiple independent verifiers each producing their own score.
simulated‑user	Agent discovers requirements by talking to a simulated user.
computer‑use‑ubuntu	Computer use reference implementation on an Ubuntu virtual desktop.
computer‑use‑windows	Computer use reference implementation on a remote Windows desktop (Daytona).
dns‑blacklisting	Network-level hostname blacklisting with exact, wildcard, and regex rules.
skills	Giving agents access to custom skills.
multi‑step	Ordered multi-step task with per-step instructions, tests, workdir uploads, healthcheck, early stopping, and per-step artifacts.

Optimization Examples

Name	Description
harbor‑rl	RL training on Harbor tasks using harbor.rl + Tinker.
gepa	Agent harness optimization for MedAgentBench using Harbor+GEPA.
tinker‑rl	RL training on Harbor tasks using the Tinker Cookbook integration.
prime‑rl	RL training on Harbor tasks using Prime RL and Verifiers.
sky‑rl	RL training on Harbor tasks using SkyRL.