Fine-Tuning

Fine-Tuning jobs let an agent analyze evaluation results, propose and save a better agent version, then run verification against the same evaluation sets so you can inspect what changed and whether the score improved.

Fine-Tuning

Create agent-led improvement jobs from one or more evaluation sets and verify the generated agent version automatically.

Fine-tuning job model

A job references the agent to improve, the computer that should run the work, one or more evaluation sets, and optional instructions. Verification always runs after the new agent version is generated; there is no verification toggle in the API or SDK.

Choose one agent, one computer, and one or more evaluation sets.

Use optional instructions to focus the improvement on a behavior, rubric, domain, or failure class.

The job stores the thread ID, before runs, after runs, created agent version, score deltas, and CT cost.

Verification always runs after the candidate version is created.

Create and inspect jobs

Use the fine-tuning manager to list jobs, create jobs, inspect progress, cancel active work, and delete stale history.

JavaScript uses client.fineTuning.listJobs(), getJob(), createJob(), cancelJob(), and deleteJob().

Python uses client.fine_tuning.list_jobs(), get_job(), create_job(), cancel_job(), and delete_job().

Direct HTTP uses /v1/fine-tuning/jobs.

Create a fine-tuning job

Fine-tuning over HTTP