Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://api.yutori.com/v1/chat/completions \
  --header "Authorization: Bearer <api_key>" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "n1-latest",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe the screenshot and search for Yutori."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Google_homepage_%28as_of_January_2024%29.jpg/1280px-Google_homepage_%28as_of_January_2024%29.jpg"
            }
          }
        ]
      }
    ]
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "n1-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I can see the Google homepage. I'll click on the search bar to begin searching for Yutori.",
        "tool_calls": [
          {
            "id": "chatcmpl-tool-abc123",
            "type": "function",
            "function": {
              "name": "left_click",
              "arguments": "{\"coordinates\": [640, 400]}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 1234,
    "completion_tokens": 56,
    "total_tokens": 1290
  },
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
curl --request POST \
  --url https://api.yutori.com/v1/chat/completions \
  --header "Authorization: Bearer <api_key>" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "n1-latest",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe the screenshot and search for Yutori."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Google_homepage_%28as_of_January_2024%29.jpg/1280px-Google_homepage_%28as_of_January_2024%29.jpg"
            }
          }
        ]
      }
    ]
  }'
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "n1-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I can see the Google homepage. I'll click on the search bar to begin searching for Yutori.",
        "tool_calls": [
          {
            "id": "chatcmpl-tool-abc123",
            "type": "function",
            "function": {
              "name": "left_click",
              "arguments": "{\"coordinates\": [640, 400]}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": {
    "prompt_tokens": 1234,
    "completion_tokens": 56,
    "total_tokens": 1290
  },
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Overview

n1 is a pixels-to-actions LLM that predicts actions to interact with browser environments. It takes the user’s task in natural language, the current screenshot, and the full action history (including all previous screenshots) as input, and predicts the next action.
Want to try n1 without writing code? Check out the n1 Browser Extension for a quick way to test and explore n1’s capabilities in your own local browser.

Model Versions

ModelDescription
n1-latestPoints to the latest stable model. Automatically updated when new versions are released. Currently points to n1-20260203.
n1-experimentalPoints to the latest experimental research model. Model behavior and support may not be stable. May be deprecated silently at any time.
n1-20260203Latest stable release.
We recommend using n1-latest for most use cases to automatically receive improvements as new versions are released.

Capabilities

n1 supports:
  • Mouse control: click, scroll, drag, and hover
  • Keyboard input: type text, use keyboard shortcuts
  • Control actions: wait, refresh, go_back, goto_url
  • Custom tool calls
n1 can be accessed through the OpenAI chat.completions API.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.yutori.com/v1",
    api_key=api_key,
)

response = client.chat.completions.create(
    model="n1-latest",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Google_homepage_%28as_of_January_2024%29.jpg/1280px-Google_homepage_%28as_of_January_2024%29.jpg"
                    }
                }
            ]
        }
    ]
)

Screenshot Requirements

Screenshots should capture only the browser content itself. Do not include the operating system UI, window title bars, browser tabs, URL bars, or other chrome elements — n1 is designed strictly as a browser-use model, not a general computer-use model. For the best performance, render screenshots in WXGA (1280×800, 16:10). The model should generalize well to most other resolutions as well, but grounding accuracy may begin to degrade the more different or extreme the aspect ratios are. We recommend using the WebP format for screenshots, as it offers significantly better compression than formats like PNG—especially when sharing multi-step trajectories with many images.

Chat Completions API

n1 expects screenshots as URLs or base64 strings. Screenshots can be included in:
  • User messages: For the initial task with the starting screenshot
  • Tool messages: For subsequent screenshots after executing actions
Passing a URL:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.yutori.com/v1",
    api_key=api_key,
)

response = client.chat.completions.create(
    model="n1-latest",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/53/Google_homepage_%28as_of_January_2024%29.jpg/1280px-Google_homepage_%28as_of_January_2024%29.jpg"
                    }
                }
            ]
        }
    ]
)
Passing a Base64 image string:
from openai import OpenAI
import base64

client = OpenAI(
    base_url="https://api.yutori.com/v1",
    api_key=api_key,
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

image_path = "example.webp"
base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="n1-latest",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
)

Response Format

n1 returns actions via the tool_calls field in the response message. Example response:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "n1-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I can see the Google homepage. I'll click on the search bar to begin searching for Yutori.",
        "tool_calls": [
          {
            "id": "chatcmpl-tool-abc123",
            "type": "function",
            "function": {
              "name": "left_click",
              "arguments": "{\"coordinates\": [640, 400]}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
The content field contains the model’s reasoning, while tool_calls contains the predicted action(s) to execute. The request_id is a unique identifier for the request, useful for debugging and support.

Multi-Turn Conversations

n1 expects full chat history to best predict the next action. We do not recommend removing any messages or screenshots when constructing requests. For multi-turn conversations, include the assistant’s previous response with its tool_calls, followed by tool results with the new screenshot:
response = client.chat.completions.create(
    model="n1-latest",
    messages=[
        # Initial user message with task and first screenshot
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Search for Yutori on Google."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image_1}"
                    }
                }
            ]
        },
        # Assistant's response with tool call
        {
            "role": "assistant",
            "content": "I can see the Google homepage. I'll click on the search bar.",
            "tool_calls": [
                {
                    "id": "chatcmpl-tool-123",
                    "type": "function",
                    "function": {
                        "name": "left_click",
                        "arguments": "{\"coordinates\": [500, 465]}"
                    }
                }
            ]
        },
        # Tool result with current URL and new screenshot
        {
            "role": "tool",
            "tool_call_id": "chatcmpl-tool-123",
            "content": [
                {
                    "type": "text",
                    "text": "Current URL: https://www.google.com/"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image_2}"
                    }
                }
            ]
        },
        # Assistant's next response
        {
            "role": "assistant",
            "content": "The search box is now active. I'll type the search query.",
            "tool_calls": [
                {
                    "id": "chatcmpl-tool-456",
                    "type": "function",
                    "function": {
                        "name": "type",
                        "arguments": "{\"text\": \"Yutori\", \"press_enter_after\": true}"
                    }
                }
            ]
        },
        # Next tool result with updated screenshot
        {
            "role": "tool",
            "tool_call_id": "chatcmpl-tool-456",
            "content": [
                {
                    "type": "text",
                    "text": "Current URL: https://www.google.com/search?q=Yutori"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image_3}"
                    }
                }
            ]
        }
    ]
)
Including the current URL as the tool response for default browser action tools is recommended—but not required—for better attribution of information sources. For any custom tool calls, custom tool responses may be included instead to provide information to the model. Alternatively, the tool response can simply be “Tool call completed.” to let the model know that the tool has been executed successfully.

Prompting Guidance

We will use the default system prompt when it’s absent from the request, and we generally do not recommend providing custom system prompts — adding extra behavioral instructions may degrade results. Instead, we suggest placing any additional necessary instructions in the first user message, after the main task description. For example:
{
  "messages": [
    {
      "role": "user",
      "content": "Describe the screenshot. Additional instructions: Output a structured summary in Markdown.",
    },
  ]
}
Also, we recommend not interrupting trajectory execution with additional user messages. The only exception is when you want the model to stop (e.g., when it has reached the maximum number of steps) and summarize its progress before it decides to stop on its own. Using this prompt will force the model to stop and provide a nice summary:
messages.append(
    {
        "role": "user",
        "content": (
            f"Stop here. Summarize your current progress and list in detail all the findings relevant to the given task: {task}"
        ),
    }
)

Supported Actions

These are all the actions that n1-latest currently supports. Note that n1-latest outputs relative coordinates in 1000×1000, which should be converted to absolute coordinates when executing actions in a browser environment.
ActionDescriptionArgumentsExample
left_clickLeft mouse click at a specific point on the page.coordinates: [x, y]{ "name": "left_click", "arguments": { "coordinates": [500, 300] } }
double_clickDouble left mouse click at a specific point on the page.coordinates: [x, y]{ "name": "double_click", "arguments": { "coordinates": [500, 300] } }
triple_clickTriple left mouse click at a specific point on the page.coordinates: [x, y]{ "name": "triple_click", "arguments": { "coordinates": [500, 300] } }
right_clickRight mouse click at a specific point on the page.coordinates: [x, y]{ "name": "right_click", "arguments": { "coordinates": [500, 300] } }
scrollScrolls the page in a given direction by a specified amount, centered around a given position.coordinates: [x, y]
direction: “up” | “down” | “left” | “right”
amount: int
{ "name": "scroll", "arguments": { "coordinates": [632, 500], "direction": "down", "amount": 3 } }
typeTypes text into the currently focused input element, optionally clearing first or pressing Enter afterwards.text: string
press_enter_after: boolean (optional)
clear_before_typing: boolean (optional)
{ "name": "type", "arguments": { "text": "example", "press_enter_after": true } }
key_pressSends a keyboard input (Playwright-compatible key combination).key_comb: string (compatible with Playwright Keyboard press){ "name": "key_press", "arguments": { "key_comb": "Escape" } }
hoverHovers over a specific point on the page.coordinates: [x, y]{ "name": "hover", "arguments": { "coordinates": [540, 210] } }
dragDrags an element from a starting position to a target position.start_coordinates: [x, y]
coordinates: [x, y]
{ "name": "drag", "arguments": { "start_coordinates": [63, 458], "coordinates": [273, 458] } }
waitPauses execution without interacting, allowing the page to update.(none){ "name": "wait", "arguments": {} }
goto_urlNavigates directly to the specified URL.url: string{ "name": "goto_url", "arguments": { "url": "https://example.com" } }
go_backNavigates back to the previous page in browser history.(none){ "name": "go_back", "arguments": {} }
refreshReloads the current page.(none){ "name": "refresh", "arguments": {} }
Note that there is no explicitly defined stop tool in the default list. During task execution, when the model intends to stop, it will return a response with only content text and without any tool_calls. This content field is the model’s final response to the initial user task.

Custom Tools

You can provide additional tools alongside n1’s built-in browser actions:
response = client.chat.completions.create(
    model="n1-latest",
    messages=[...],
    tools=[
        {
            "type": "function",
            "function": {
                "name": "extract_content_and_links",
                "description": "Extracts page content and hyperlinks relevant to the user task. This operation is strictly read-only and never interacts with or alters the page",
                "parameters": {
                    "type": "object",
                    "properties": {},
                    "required": [],
                }
            }
        },
    ]
)
Custom tool calls can provide custom tool responses that may be informative for the n1 model to complete the task. This would be a tool response you define that may seem appropriate after a successful tool call. If unsure, simply providing “Tool call completed.” as the tool response is also acceptable.
        {
            "role": "tool",
            "tool_call_id": "chatcmpl-tool-123",
            "content": [
                {
                    "type": "text",
                    "text": "Tool call completed."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image}"
                    }
                }
            ]
        }
For the example custom extract_content_and_links tool call above, an example custom tool response in the message history may look like:
        {
            "role": "tool",
            "tool_call_id": "chatcmpl-tool-123",
            "content": [
                {
                    "type": "text",
                    "text": "Visible buttons and corresponding links:\n- About https://about.google/?fg=1&utm_source=google-US&utm_medium=referral&utm_campaign=hp-header \n- Store https://store.google.com/us/?utm_source=hp_header&utm_medium=google_ooo&utm_campaign=GS100042&hl=en-US \nCurrent page URL (source): https://www.google.com/"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image}"
                    }
                }
            ]
        }
Any custom tools provided in tools will be automatically appended to the list of default browser-use actions.
[
    {
        "type": "function",
        "function": {
            "name": "left_click",
            "description": "Left mouse click at a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to click.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "double_click",
            "description": "Double left mouse click at a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to click.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "triple_click",
            "description": "Triple left mouse click at a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to click.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "right_click",
            "description": "Right mouse click at a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to click.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "scroll",
            "description": "Scrolls the page in a given direction by a specified amount, centered around a given position.",
            "parameters": {
                "type": "object",
                "properties": {
                    "direction": {
                        "type": "string",
                        "description": "Direction to scroll (e.g., 'down', 'up', 'left', 'right').",
                    },
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location used as the scroll center.",
                        "items": {"type": "integer"},
                    },
                    "amount": {"type": "integer", "description": "Scroll amount (10% of screen size per unit)."},
                },
                "required": ["direction", "coordinates", "amount"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "type",
            "description": "Types text into the currently focused input, optionally clearing first or pressing Enter afterwards.",
            "parameters": {
                "type": "object",
                "properties": {
                    "text": {"type": "string", "description": "Text to type."},
                    "press_enter_after": {"type": "boolean", "description": "Whether to press Enter after typing."},
                    "clear_before_typing": {
                        "type": "boolean",
                        "description": "Whether to clear the field before typing.",
                    },
                },
                "required": ["text"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "key_press",
            "description": "Sends a keyboard input (Playwright-compatible key combination).",
            "parameters": {
                "type": "object",
                "properties": {"key_comb": {"type": "string", "description": "Keyboard combination to press."}},
                "required": ["key_comb"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "drag",
            "description": "Drags an element from a starting position to a target position.",
            "parameters": {
                "type": "object",
                "properties": {
                    "start_coordinates": {
                        "type": "array",
                        "description": "The [x, y] location of the starting position.",
                        "items": {"type": "integer"},
                    },
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location of the target position.",
                        "items": {"type": "integer"},
                    },
                },
                "required": ["start_coordinates", "coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "hover",
            "description": "Hovers over a specific point on the page.",
            "parameters": {
                "type": "object",
                "properties": {
                    "coordinates": {
                        "type": "array",
                        "description": "The [x, y] location to hover over.",
                        "items": {"type": "integer"},
                    }
                },
                "required": ["coordinates"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "go_back",
            "description": "Navigates back to the previous page in browser history.",
            "parameters": {"type": "object", "properties": {}, "required": []},
        },
    },
    {
        "type": "function",
        "function": {
            "name": "wait",
            "description": "Pauses execution without interacting, allowing the page to update.",
            "parameters": {"type": "object", "properties": {}, "required": []},
        },
    },
    {
        "type": "function",
        "function": {
            "name": "goto_url",
            "description": "Navigates directly to the specified URL.",
            "parameters": {
                "type": "object",
                "properties": {"url": {"type": "string", "description": "Destination URL."}},
                "required": ["url"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "refresh",
            "description": "Reloads the current page.",
            "parameters": {"type": "object", "properties": {}, "required": []},
        },
    },
]

Tool Choice

You can control whether n1 parses and returns tool calls as a tool_calls list using the tool_choice parameter:
  • "auto" (default): The model will parse and return tool calls as a tool_calls list in the model response
  • "none": The model will return the model response directly, treating it as raw content text without parsing (though tool calls may still appear inside <tool_call> tags in the content text)
response = client.chat.completions.create(
    model="n1-latest",
    messages=[...],
    tool_choice="none"  # Return model response directly as content text
)
An example model response when tool_choice is set to “none”:
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "n1-latest",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "I can see the Google homepage. I'll click on the search bar to begin searching for Yutori.\n<tool_call>\n{\"name\": \"left_click\", \"arguments\": {\"coordinates\": [640, 400]}}\n</tool_call>",
        "tool_calls": []
      },
      "finish_reason": "tool_calls"
    }
  ],
  "request_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}

Structured Decoding

By default, n1 calls will use a response format schema with structural tags to enforce the model to generate only valid tool calls.
{
    "type": "structural_tag",
    "structures": [
        {
            "begin": "<tool_call>\n",
            "end": "\n</tool_call>",
            "schema": {
                "anyOf": [
                    # Default browser tools — untyped arguments to avoid guided
                    # decoding interference with the model's natural output
                    {
                        "type": "object",
                        "properties": {
                            "name": {
                                "enum": [
                                    "left_click", "double_click", "triple_click",
                                    "right_click", "scroll", "type", "key_press",
                                    "drag", "hover", "go_back", "wait",
                                    "goto_url", "refresh"
                                ]
                            },
                            "arguments": {"type": "object"}
                        },
                        "required": ["name", "arguments"]
                    },
                    # Custom tool definitions are added as strict per-tool branches
                    {
                        "type": "object",
                        "properties": {
                            "name": {"const": "extract_content_and_links"},
                            "arguments": {
                                "type": "object",
                                "additionalProperties": false,
                                "properties": {}
                            }
                        },
                        "required": ["name", "arguments"]
                    }
                ]
            }
        }
    ],
    "triggers": ["<tool_call>\n"]
}
Note: You do NOT have to provide the response_format yourself manually, because the API will generate the proper response schema by default. The above snippet is an example showing the schema structure. If custom tools are included in your request, our API will automatically add their schemas as additional anyOf branches. We do not recommend providing a custom response_format input, which will override the default response format. This may impact model output parsing and may degrade performance. If absolutely necessary to provide a custom response format, we suggest also setting tool_choice="none" to work with the raw model output directly.

Authorizations

Authorization
string
header
required

Use Authorization: Bearer <api_key>

Body

application/json
messages
(ChatCompletionDeveloperMessageParam · object | ChatCompletionSystemMessageParam · object | ChatCompletionUserMessageParam · object | ChatCompletionAssistantMessageParam · object | ChatCompletionToolMessageParam · object | ChatCompletionFunctionMessageParam · object | ChatCompletionToolImageMessageParam · object | ChatCompletionObservationMessageParam · object)[]
model
enum<string>
Available options:
n1-latest,
n1-experimental,
n1-20260203
max_completion_tokens
integer | null
temperature
number | null
default:0.3
top_p
number | null
tools
Tools · object[] | null

Additional tools to extend the default browser action tools. Tools are merged with the built-in browser actions (left_click, scroll, type, etc.).

tool_choice
default:auto

Controls whether tool calls are parsed from the response. Model always decides whether to call a tool. 'none' treats the response as text-only, but tool calls may be present inside <tool_call> tags, 'auto' (default) parses tool calls automatically as tool_calls list in response.

response_format
Response Format · object

An object specifying the format that the model must output.

Response

Successful Response