Get Started

curl --request POST \
  --url https://api.yutori.com/v1/chat/completions \
  --header "Authorization: Bearer <api_key>" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "n1-latest",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe the screenshot and search for Yutori."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Google_homepage_%28as_of_January_2024%29.jpg/1280px-Google_homepage_%28as_of_January_2024%29.jpg"
            }
          }
        ]
      }
    ]
  }'

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "n1-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I can see the Google homepage. I'll click on the search bar to begin searching for Yutori.",
        "tool_calls": [
          {
            "id": "chatcmpl-tool-abc123",
            "type": "function",
            "function": {
              "name": "left_click",
              "arguments": "{\"coordinates\": [640, 400]}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 1234,
    "completion_tokens": 56,
    "total_tokens": 1290
  },
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Overview

n1 is a pixels-to-actions LLM that predicts actions to interact with browser environments. It takes the user’s task in natural language, the current screenshot, and the full action history (including all previous screenshots) as input, and predicts the next action.

Want to try n1 without writing code? Check out the n1 Browser Extension for a quick way to test and explore n1’s capabilities in your own local browser.

Model Versions

Model	Description
`n1-latest`	Points to the latest stable model. Automatically updated when new versions are released. Currently points to `n1-20260203`.
`n1-experimental`	Points to the latest experimental research model. Currently points to `n1-experimental-20260309`. Model behavior and support may not be stable. May be deprecated silently at any time.
`n1-experimental-20260309`	Experimental research model released 2026-03-09.
`n1-experimental-20260212`	Experimental research model released 2026-02-12.
`n1-20260203`	Latest stable release.

We recommend using n1-latest for most use cases to automatically receive improvements as new versions are released.

Capabilities

n1 supports:

Mouse control: click, scroll, drag, and hover
Keyboard input: type text, use keyboard shortcuts
Control actions: wait, refresh, go_back, goto_url
Custom tool calls

n1 can be accessed through the OpenAI chat.completions API.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.yutori.com/v1",
    api_key=api_key,
)

response = client.chat.completions.create(
    model="n1-latest",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Google_homepage_%28as_of_January_2024%29.jpg/1280px-Google_homepage_%28as_of_January_2024%29.jpg"
                    }
                }
            ]
        }
    ]
)

Screenshot Requirements

Screenshots should capture only the browser content itself. Do not include the operating system UI, window title bars, browser tabs, URL bars, or other chrome elements — n1 is designed strictly as a browser-use model, not a general computer-use model. For the best performance, render screenshots in WXGA (1280×800, 16:10). The model should generalize well to most other resolutions as well, but grounding accuracy may begin to degrade the more different or extreme the aspect ratios are. We recommend using the WebP format for screenshots, as it offers significantly better compression than formats like PNG—especially when sharing multi-step trajectories with many images.

Chat Completions API

n1 expects screenshots as URLs or base64 strings. Screenshots can be included in:

User messages: For the initial task with the starting screenshot
Tool messages: For subsequent screenshots after executing actions

Passing a URL:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.yutori.com/v1",
    api_key=api_key,
)

response = client.chat.completions.create(
    model="n1-latest",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Google_homepage_%28as_of_January_2024%29.jpg/1280px-Google_homepage_%28as_of_January_2024%29.jpg"
                    }
                }
            ]
        }
    ]
)

Passing a Base64 image string:

from openai import OpenAI
import base64

client = OpenAI(
    base_url="https://api.yutori.com/v1",
    api_key=api_key,
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

image_path = "example.webp"
base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="n1-latest",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
)

Response Format

n1 returns actions via the tool_calls field in the response message. Example response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "n1-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I can see the Google homepage. I'll click on the search bar to begin searching for Yutori.",
        "tool_calls": [
          {
            "id": "chatcmpl-tool-abc123",
            "type": "function",
            "function": {
              "name": "left_click",
              "arguments": "{\"coordinates\": [640, 400]}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

The content field contains the model’s reasoning, while tool_calls contains the predicted action(s) to execute. The request_id is a unique identifier for the request, useful for debugging and support.

Multi-Turn Conversations

n1 expects full chat history to best predict the next action. We do not recommend removing any messages or screenshots when constructing requests. For multi-turn conversations, include the assistant’s previous response with its tool_calls, followed by tool results with the new screenshot:

response = client.chat.completions.create(
    model="n1-latest",
    messages=[
        # Initial user message with task and first screenshot
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Search for Yutori on Google."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image_1}"
                    }
                }
            ]
        },
        # Assistant's response with tool call
        {
            "role": "assistant",
            "content": "I can see the Google homepage. I'll click on the search bar.",
            "tool_calls": [
                {
                    "id": "chatcmpl-tool-123",
                    "type": "function",
                    "function": {
                        "name": "left_click",
                        "arguments": "{\"coordinates\": [500, 465]}"
                    }
                }
            ]
        },
        # Tool result with current URL and new screenshot
        {
            "role": "tool",
            "tool_call_id": "chatcmpl-tool-123",
            "content": [
                {
                    "type": "text",
                    "text": "Current URL: https://www.google.com/"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image_2}"
                    }
                }
            ]
        },
        # Assistant's next response
        {
            "role": "assistant",
            "content": "The search box is now active. I'll type the search query.",
            "tool_calls": [
                {
                    "id": "chatcmpl-tool-456",
                    "type": "function",
                    "function": {
                        "name": "type",
                        "arguments": "{\"text\": \"Yutori\", \"press_enter_after\": true}"
                    }
                }
            ]
        },
        # Next tool result with updated screenshot
        {
            "role": "tool",
            "tool_call_id": "chatcmpl-tool-456",
            "content": [
                {
                    "type": "text",
                    "text": "Current URL: https://www.google.com/search?q=Yutori"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image_3}"
                    }
                }
            ]
        }
    ]
)

Including the current URL as the tool response for default browser action tools is recommended—but not required—for better attribution of information sources. For any custom tool calls, custom tool responses may be included instead to provide information to the model. Alternatively, the tool response can simply be “Tool call completed.” to let the model know that the tool has been executed successfully.

Prompting Guidance

We will use the default system prompt when it’s absent from the request, and we generally do not recommend providing custom system prompts — adding extra behavioral instructions may degrade results. Instead, we suggest placing any additional necessary instructions in the first user message, after the main task description. For example:

{
  "messages": [
    {
      "role": "user",
      "content": "Describe the screenshot. Additional instructions: Output a structured summary in Markdown.",
    },
  ]
}

Also, we recommend not interrupting trajectory execution with additional user messages. The only exception is when you want the model to stop (e.g., when it has reached the maximum number of steps) and summarize its progress before it decides to stop on its own. Using this prompt will force the model to stop and provide a nice summary:

messages.append(
    {
        "role": "user",
        "content": (
            f"Stop here. Summarize your current progress and list in detail all the findings relevant to the given task: {task}"
        ),
    }
)

Supported Actions

These are all the actions that n1-latest currently supports. Note that n1-latest outputs relative coordinates in 1000×1000, which should be converted to absolute coordinates when executing actions in a browser environment.

Action	Description	Arguments	Example
`left_click`	Left mouse click at a specific point on the page.	`coordinates`: [x, y]	`{ "name": "left_click", "arguments": { "coordinates": [500, 300] } }`
`double_click`	Double left mouse click at a specific point on the page.	`coordinates`: [x, y]	`{ "name": "double_click", "arguments": { "coordinates": [500, 300] } }`
`triple_click`	Triple left mouse click at a specific point on the page.	`coordinates`: [x, y]	`{ "name": "triple_click", "arguments": { "coordinates": [500, 300] } }`
`right_click`	Right mouse click at a specific point on the page.	`coordinates`: [x, y]	`{ "name": "right_click", "arguments": { "coordinates": [500, 300] } }`
`scroll`	Scrolls the page in a given direction by a specified amount, centered around a given position.	`coordinates`: [x, y] `direction`: “up” \| “down” \| “left” \| “right” `amount`: int	`{ "name": "scroll", "arguments": { "coordinates": [632, 500], "direction": "down", "amount": 3 } }`
`type`	Types text into the currently focused input element, optionally clearing first or pressing Enter afterwards.	`text`: string `press_enter_after`: boolean (optional) `clear_before_typing`: boolean (optional)	`{ "name": "type", "arguments": { "text": "example", "press_enter_after": true } }`
`key_press`	Sends a keyboard input (Playwright-compatible key combination).	`key_comb`: string (compatible with Playwright Keyboard press)	`{ "name": "key_press", "arguments": { "key_comb": "Escape" } }`
`hover`	Hovers over a specific point on the page.	`coordinates`: [x, y]	`{ "name": "hover", "arguments": { "coordinates": [540, 210] } }`
`drag`	Drags an element from a starting position to a target position.	`start_coordinates`: [x, y] `coordinates`: [x, y]	`{ "name": "drag", "arguments": { "start_coordinates": [63, 458], "coordinates": [273, 458] } }`
`wait`	Pauses execution without interacting, allowing the page to update.	(none)	`{ "name": "wait", "arguments": {} }`
`goto_url`	Navigates directly to the specified URL.	`url`: string	`{ "name": "goto_url", "arguments": { "url": "https://example.com" } }`
`go_back`	Navigates back to the previous page in browser history.	(none)	`{ "name": "go_back", "arguments": {} }`
`refresh`	Reloads the current page.	(none)	`{ "name": "refresh", "arguments": {} }`

Note that there is no explicitly defined stop tool in the default list. During task execution, when the model intends to stop, it will return a response with only content text and without any tool_calls. This content field is the model’s final response to the initial user task.

Custom Tools

You can provide additional tools alongside n1’s built-in browser actions:

response = client.chat.completions.create(
    model="n1-latest",
    messages=[...],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "extract_content_and_links",
                "description": "Extracts page content and hyperlinks relevant to the user task. This operation is strictly read-only and never interacts with or alters the page",
                "parameters": {
                    "type": "object",
                    "properties": {},
                    "required": [],
                }
            }
        },
    ]
)

Custom tool calls can provide custom tool responses that may be informative for the n1 model to complete the task. This would be a tool response you define that may seem appropriate after a successful tool call. If unsure, simply providing “Tool call completed.” as the tool response is also acceptable.

        {
            "role": "tool",
            "tool_call_id": "chatcmpl-tool-123",
            "content": [
                {
                    "type": "text",
                    "text": "Tool call completed."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image}"
                    }
                }
            ]
        }

For the example custom extract_content_and_links tool call above, an example custom tool response in the message history may look like:

        {
            "role": "tool",
            "tool_call_id": "chatcmpl-tool-123",
            "content": [
                {
                    "type": "text",
                    "text": "Visible buttons and corresponding links:\n- About https://about.google/?fg=1&utm_source=google-US&utm_medium=referral&utm_campaign=hp-header \n- Store https://store.google.com/us/?utm_source=hp_header&utm_medium=google_ooo&utm_campaign=GS100042&hl=en-US \nCurrent page URL (source): https://www.google.com/"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image}"
                    }
                }
            ]
        }

Any custom tools provided in tools will be automatically appended to the list of default browser-use actions.

View default browser-use tool definitions

[
    {
        "type": "function",
        "function": {
            "name": "left_click",
            "description": "Left mouse click at a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to click.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "double_click",
            "description": "Double left mouse click at a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to click.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "triple_click",
            "description": "Triple left mouse click at a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to click.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "right_click",
            "description": "Right mouse click at a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to click.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "scroll",
            "description": "Scrolls the page in a given direction by a specified amount, centered around a given position.",
            "parameters": {
                "type": "object",
                "properties": {
                    "direction": {
                        "type": "string",
                        "description": "Direction to scroll (e.g., 'down', 'up', 'left', 'right').",
                    },
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location used as the scroll center.",
                        "items": {"type": "integer"},
                    },
                    "amount": {"type": "integer", "description": "Scroll amount (10% of screen size per unit)."},
                },
                "required": ["direction", "coordinates", "amount"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "type",
            "description": "Types text into the currently focused input, optionally clearing first or pressing Enter afterwards.",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {"type": "string", "description": "Text to type."},
                    "press_enter_after": {"type": "boolean", "description": "Whether to press Enter after typing."},
                    "clear_before_typing": {
                        "type": "boolean",
                        "description": "Whether to clear the field before typing.",
                    },
                },
                "required": ["text"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "key_press",
            "description": "Sends a keyboard input (Playwright-compatible key combination).",
            "parameters": {
                "type": "object",
                "properties": {"key_comb": {"type": "string", "description": "Keyboard combination to press."}},
                "required": ["key_comb"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "drag",
            "description": "Drags an element from a starting position to a target position.",
            "parameters": {
                "type": "object",
                "properties": {
                    "start_coordinates": {
                        "type": "array",
                        "description": "The [x, y] location of the starting position.",
                        "items": {"type": "integer"},
                    },
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location of the target position.",
                        "items": {"type": "integer"},
                    },
                },
                "required": ["start_coordinates", "coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "hover",
            "description": "Hovers over a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to hover over.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "go_back",
            "description": "Navigates back to the previous page in browser history.",
            "parameters": {"type": "object", "properties": {}, "required": []},
        },
    },
    {
        "type": "function",
        "function": {
            "name": "wait",
            "description": "Pauses execution without interacting, allowing the page to update.",
            "parameters": {"type": "object", "properties": {}, "required": []},
        },
    },
    {
        "type": "function",
        "function": {
            "name": "goto_url",
            "description": "Navigates directly to the specified URL.",
            "parameters": {
                "type": "object",
                "properties": {"url": {"type": "string", "description": "Destination URL."}},
                "required": ["url"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "refresh",
            "description": "Reloads the current page.",
            "parameters": {"type": "object", "properties": {}, "required": []},
        },
    },
]

Tool Choice

You can control whether n1 parses and returns tool calls as a tool_calls list using the tool_choice parameter:

"auto" (default): The model will parse and return tool calls as a tool_calls list in the model response
"none": The model will return the model response directly, treating it as raw content text without parsing (though tool calls may still appear inside <tool_call> tags in the content text)

response = client.chat.completions.create(
    model="n1-latest",
    messages=[...],
    tool_choice="none"  # Return model response directly as content text
)

An example model response when tool_choice is set to “none”:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "n1-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I can see the Google homepage. I'll click on the search bar to begin searching for Yutori.\n<tool_call>\n{\"name\": \"left_click\", \"arguments\": {\"coordinates\": [640, 400]}}\n</tool_call>",
        "tool_calls": []
      },
      "finish_reason": "tool_calls"
    }
  ],
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Structured Decoding

By default, n1 calls will use a response format schema with structural tags to enforce the model to generate only valid tool calls.

View example response_format schema

{
    "type": "structural_tag",
    "structures": [
        {
            "begin": "<tool_call>\n",
            "end": "\n</tool_call>",
            "schema": {
                "anyOf": [
                    # Default browser tools — untyped arguments to avoid guided
                    # decoding interference with the model's natural output
                    {
                        "type": "object",
                        "properties": {
                            "name": {
                                "enum": [
                                    "left_click", "double_click", "triple_click",
                                    "right_click", "scroll", "type", "key_press",
                                    "drag", "hover", "go_back", "wait",
                                    "goto_url", "refresh"
                                ]
                            },
                            "arguments": {"type": "object"}
                        },
                        "required": ["name", "arguments"]
                    },
                    # Custom tool definitions are added as strict per-tool branches
                    {
                        "type": "object",
                        "properties": {
                            "name": {"const": "extract_content_and_links"},
                            "arguments": {
                                "type": "object",
                                "additionalProperties": false,
                                "properties": {}
                            }
                        },
                        "required": ["name", "arguments"]
                    }
                ]
            }
        }
    ],
    "triggers": ["<tool_call>\n"]
}

Note: You do NOT have to provide the response_format yourself manually, because the API will generate the proper response schema by default. The above snippet is an example showing the schema structure. If custom tools are included in your request, our API will automatically add their schemas as additional anyOf branches. We do not recommend providing a custom response_format input, which will override the default response format. This may impact model output parsing and may degrade performance. If absolutely necessary to provide a custom response format, we suggest also setting tool_choice="none" to work with the raw model output directly.

Authorizations

Authorization

string

header

required

Use Authorization: Bearer <api_key>

Body

application/json

messages

(ChatCompletionDeveloperMessageParam · object | ChatCompletionSystemMessageParam · object | ChatCompletionUserMessageParam · object | ChatCompletionAssistantMessageParam · object | ChatCompletionToolMessageParam · object | ChatCompletionFunctionMessageParam · object | ChatCompletionToolImageMessageParam · object | ChatCompletionObservationMessageParam · object)[]

Show child attributes

model

enum<string>

Available options:

n1-latest,

n1-experimental,

n1-experimental-20260309,

n1-experimental-20260212,

n1-20260203

max_completion_tokens

integer | null

temperature

number | null

default:0.3

top_p

number | null

tools

Tools · object[] | null

Additional tools to extend the default browser action tools. Tools are merged with the built-in browser actions (left_click, scroll, type, etc.).

tool_choice

default:auto

Controls whether tool calls are parsed from the response. Model always decides whether to call a tool. 'none' treats the response as text-only, but tool calls may be present inside <tool_call> tags, 'auto' (default) parses tool calls automatically as tool_calls list in response.

response_format

Response Format · object

An object specifying the format that the model must output.

Response

Successful Response

Getting Started

General

n1 API

Browsing API

Research API

Scouting API

Webhooks

Overview

Model Versions

Capabilities

Screenshot Requirements

Chat Completions API

Response Format

Multi-Turn Conversations

Prompting Guidance

Supported Actions

Custom Tools

Tool Choice

Structured Decoding

Authorizations

Body

Response

Getting Started

General

n1 API

Browsing API

Research API

Scouting API

Webhooks

​Overview

​Model Versions

​Capabilities

​Screenshot Requirements

​Chat Completions API

​Response Format

​Multi-Turn Conversations

​Prompting Guidance

​Supported Actions

​Custom Tools

​Tool Choice

​Structured Decoding

Authorizations

Body

Response

Overview

Model Versions

Capabilities

Screenshot Requirements

Chat Completions API

Response Format

Multi-Turn Conversations

Prompting Guidance

Supported Actions

Custom Tools

Tool Choice

Structured Decoding