Get Started
Overview
The Navigator API provides a computer-use model family that predicts actions to interact with a browser. Given a task in natural language, the current screenshot, and the full action history, the model predicts the next action to take to accomplish the task. The API follows the OpenAIchat.completions format. The latest model is:
| Model | API model id | Description |
|---|---|---|
| Navigator n1.5 | n1.5-latest | Computer-use model with expanded action space, selectable tool sets, and JSON structured output. Details → |
Screenshot Requirements
Screenshots should capture only the browser content itself. Do not include the operating system UI, window title bars, browser tabs, URL bars, or other chrome elements. For the best performance, render screenshots in WXGA (1280×800, 16:10). The model should generalize well to most other resolutions, but grounding accuracy may degrade with extreme aspect ratios. We recommend using the WebP format for screenshots, as it offers significantly better compression than PNG — especially for multi-step trajectories with many images. The Python SDK provides helpers that handle resizing, WebP conversion, and base64 encoding:yutori.navigator.images for full options (custom resolution, quality settings).
Coordinate System
All Navigator models output normalized coordinates in a 1000×1000 space. Convert to absolute pixel coordinates before executing actions in your browser:yutori.navigator.coordinates for the inverse normalize_coordinates function.
Response Format
Actions are returned via thetool_calls field in the response message:
content field contains the model’s reasoning, tool_calls contains the predicted action(s), and request_id is a unique identifier useful for debugging.
When the model intends to stop, it returns a response with only content text and no tool_calls. This content field is the model’s final response to the task.
Multi-Turn Conversations
The model expects full chat history to best predict the next action. We do not recommend removing any messages when constructing requests. For longer trajectories, we suggest dropping only old screenshots while keeping every message intact. The Python SDK providescreate_trimmed / acreate_trimmed (in yutori.navigator) which strip older screenshots automatically while preserving recent ones and all text.
Include the assistant’s previous response with its tool_calls, followed by tool results with the new screenshot:
extract_elements, execute_js), the raw output of the tool can be provided as the tool result so the model also has visibility of the extracted information.
Custom Tools
You can provide additional tools alongside the built-in browser actions using thetools parameter. Custom tools are appended after the default tool set.
Tool Choice
Control whether tool calls are parsed into thetool_calls array:
"auto"(default): Parses and returns tool calls as a structuredtool_callslist"none": Returns the raw model response as content text (tool calls may still appear inside<tool_call>XML tags in content)
Prompting Guidance
We use a default system prompt when none is provided, and generally do not recommend providing custom system prompts — extra behavioral instructions may degrade results. Instead, place additional instructions in the first user message, after the main task description:Structured Decoding
By default, the API uses astructural_tag response format to enforce valid tool call generation via guided decoding. You do not need to provide this yourself — the API generates it automatically based on the active model and tool set. If custom tools are included in your request, their schemas are automatically incorporated.
We do not recommend overriding the response_format unless you also set tool_choice="none" to work with the raw model output directly.Authorizations
Use Authorization: Bearer <api_key>
Body
Developer-provided instructions that the model should follow, regardless of
messages sent by the user. With o1 models and newer, developer messages
replace the previous system messages.
- ChatCompletionDeveloperMessageParam
- ChatCompletionSystemMessageParam
- ChatCompletionUserMessageParam
- ChatCompletionAssistantMessageParam
- ChatCompletionToolMessageParam
- ChatCompletionFunctionMessageParam
- ChatCompletionToolImageMessageParam
- ChatCompletionObservationMessageParam
n1.5-latest, n1.5-20260428 Penalizes token repetition. 1.0 = no penalty, >1.0 = less repetition. Only supported by vLLM-backed models.
Additional tools to extend the default browser action tools. Tools are merged with the built-in browser actions (left_click, scroll, type, etc.).
Controls whether tool calls are parsed from the response. Model always decides whether to call a tool. 'none' treats the response as text-only, but tool calls may be present inside <tool_call> tags, 'auto' (default) parses tool calls automatically as tool_calls list in response.
An object specifying the format that the model must output.
Named tool set (n1.5+ models only). 'browser_tools_core-20260403' (default): coordinate-based tools. 'browser_tools_expanded-20260403': adds extract_elements, find, set_element_value, execute_js.
List of tool names to remove from the selected tool set (n1.5+ models only).
JSON Schema for structured output (n1.5+ models only). Appended to your task message. Model returns JSON in ```json fences, parsed and returned as 'parsed_json' in the response.
Response
Successful Response