Skip to main content
POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://api.yutori.com/v1/chat/completions \
  --header "Authorization: Bearer <api_key>" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "n1-preview-2025-11",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe the screenshot and search for Yutori."
          }
        ]
      },
      {
        "role": "observation",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/5/53/Google_homepage_%28as_of_January_2024%29.jpg"
            }
          }
        ]
      }
    ],
    "temperature": 0.3
  }'
{
  "detail": [
    {
      "loc": [
        "<string>"
      ],
      "msg": "<string>",
      "type": "<string>"
    }
  ]
}
curl --request POST \
  --url https://api.yutori.com/v1/chat/completions \
  --header "Authorization: Bearer <api_key>" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "n1-preview-2025-11",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe the screenshot and search for Yutori."
          }
        ]
      },
      {
        "role": "observation",
        "content": [
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/5/53/Google_homepage_%28as_of_January_2024%29.jpg"
            }
          }
        ]
      }
    ],
    "temperature": 0.3
  }'

Overview

n1 is a pixels-to-actions LLM that predicts actions to interact with browser environments. It takes the user’s task in natural language, the current screenshot, and the full action history (including all previous screenshots) as input, and predicts the next action. It supports:
  • Mouse control: click, scroll, drag, and hover
  • Keyboard input: type text, use keyboard shortcuts
  • Control actions: wait, refresh, go_back, goto_url, and stop
n1 can be accessed through the OpenAI chat.completions API.
from openai import OpenAI

client = OpenAI(
    base_url="https://api.yutori.com/v1",
    api_key=api_key,
)

response = client.chat.completions.create(
    model="n1-preview-2025-11",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                }
            ]
        },
        {
            "role": "observation",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/5/53/Google_homepage_%28as_of_January_2024%29.jpg"
                    }
                }
            ],
        }
    ]
)
For n1-preview-2025-11, we use the observation role to pass in screenshots.

Screenshot Requirements

Screenshots should capture only the browser content itself. Do not include the operating system UI, window title bars, browser tabs, URL bars, or other chrome elements — n1 is designed strictly as a browser-use model, not a general computer-use model. For the best performance, render screenshots in WXGA (1280×800, 16:10). If screenshots are provided in other resolutions, reasoning will remain unaffected, but grounding accuracy may degrade for n1-preview-2025-11. Future models will be robust to other resolutions. We recommend using the WebP format for screenshots, as it offers significantly better compression than formats like PNG—especially when sharing multi-step trajectories with many images.

Chat Completions API

n1 expects screenshots in observation blocks as urls or base64 strings. Passing a URL:
from openai import OpenAI

client = OpenAI(
    base_url="https://api.yutori.com/v1",
    api_key=api_key,
)

response = client.chat.completions.create(
    model="n1-preview-2025-11",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                }
            ]
        },
        {
            "role": "observation",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/5/53/Google_homepage_%28as_of_January_2024%29.jpg"
                    }
                }
            ],
        }
    ]
)
Passing a Base64 image string:
from openai import OpenAI
import base64

client = OpenAI(
    base_url="https://api.yutori.com/v1",
    api_key=api_key,
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

image_path = "example.webp"
base64_image = encode_image(image_path)

response = client.chat.completions.create(
    model="n1-preview-2025-11",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                }
            ]
        },
        {
            "role": "observation",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image}"
                    }
                }
            ],
        }
    ]
)
Example response:
{
  "thoughts": "Great. I can see I'm on Google with the main homepage visible. I'll click on the search bar and search for 'Yutori'.",
  "actions": [
    {
      "action_type": "click",
      "center_coordinates": [ 500, 500 ]
    }
  ]
}
n1 expects full chat history to best predict the next action. We do not recommend removing any messages or screenshots when constructing requests. For the next request:
response = client.chat.completions.create(
    model="n1-preview-2025-11",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the screenshot and search for Yutori."
                }
            ]
        },
        {
            "role": "observation",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image_1}"
                    }
                }
            ]
        },
        {
            "role": "assistant",
            "content": "{\"thoughts\":\"Great. I can see I'm on Google with the main homepage visible. I'll click on the search bar and search for 'Yutori'.\",\"actions\":[{\"action_type\":\"click\",\"center_coordinates\":[500,500]}]}"
        },
        {
            "role": "observation",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/webp;base64,{base64_image_2}"
                    }
                }
            ]
        }
    ]
)
Including the URL of the current page in observation messages is recommended—but not required—for better attribution of the information source in stop messages. For example:
{
    "role": "observation",
    "content": [
      {
        "type": "text",
        "text": "Current URL: https://www.amazon.com/"
      },
      {
        "type": "image_url",
        "image_url": {
            "url": f"data:image/webp;base64,{base64_image}"
        }
      }
    ]
}

Prompting Guidance

We will use the default system prompt when it’s absent from the request, and we generally do not recommend providing custom system prompts — adding extra behavioral instructions may degrade results. Instead, we suggest placing any additional necessary instructions in the first user message, after the main task description. For example:
{
  "messages": [
    {
      "role": "user",
      "content": "Describe the screenshot. Additional instructions: Output a structured summary in Markdown.",
    },
  ]
}
Also, we recommend not interrupting trajectory execution with additional user messages. The only exception is when you want the model to stop (e.g., when it has reached the maximum number of steps) and summarize its progress before it decides to stop on its own. Using this prompt will force the model to stop and provide a nice summary:
messages.append(
    {
        "role": "user",
        "content": (
            f"Stop here. Summarize your current progress and list in detail all the findings relevant to the given task: {task}"
        ),
    }
)

Function Calling

We currently do not support custom function calling. We’re looking into it for future releases.

Supported Actions

These are all the actions that n1-preview-2025-11 currently supports. Note that n1-preview-2025-11 outputs relative coordinates in 1000×1000, which should be converted to absolute coordinates when executing actions in a browser environment.
ActionDescriptionArgumentsExample
clickLeft mouse click at a specific point on the page.center_coordinates: [x, y]{ "action_type": "click", "center_coordinates": [500, 300] }
scrollScrolls the page in a given direction by a specified amount, centered around a given position. We recommend treating each scroll amount as 10-15% of the screen.direction: string
center_coordinates: [x, y]
amount: int
{ "action_type": "scroll", "direction": "down", "center_coordinates": [632, 500], "amount": 3}
typeTypes text into the currently focused input element, optionally clearing it first and/or pressing Enter afterward.text: string
press_enter_after: bool
clear_before_typing: bool
{ "action_type": "type", "text": "example", "press_enter_after": false, "clear_before_typing": true}
key_pressSends keyboard input (e.g. Escape).key_comb: string (compatible with Playwright Keyboard press){ "action_type": "key_press", "key_comb": "Escape" }
hoverMoves the mouse pointer to a specific location without clicking.center_coordinates: [x, y]{ "action_type": "hover", "center_coordinates": [540, 210] }
dragClick-and-hold on a starting coordinate, move the cursor to a destination coordinate. Note: center_coordinates is the destination.start_coordinates: [x, y]
center_coordinates: [x, y]
{"action_type": "drag", "start_coordinates": [63, 458], "center_coordinates": [273, 458] }
waitPauses without performing any UI action, usually to allow the page/UI to update.(none){ "action_type": "wait" }
refreshReloads the current page (browser refresh).(none){ "action_type": "refresh" }
go_backNavigates back to the previous page in browser history.(none){ "action_type": "go_back" }
goto_urlNavigates directly to a specified URL.url: string{ "action_type": "goto_url", "url": "https://example.com" }
read_texts_and_linksReads visible on-screen text and saves relevant URLs for citation. No interaction with the page. This is implemented as an external VLM call using the current screenshot, the user’s task, and a simplified DOM (for links) as inputs.(none){ "action_type": "read_texts_and_links" }
stopEnds the current trajectory immediately and returns the final answer or summary.answer: string{ "action_type": "stop", "answer": "example" }

Authorizations

Authorization
string
header
required

Use Authorization: Bearer <api_key>

Body

application/json
messages
(ChatCompletionDeveloperMessageParam · object | ChatCompletionSystemMessageParam · object | ChatCompletionUserMessageParam · object | ChatCompletionAssistantMessageParam · object | ChatCompletionToolMessageParam · object | ChatCompletionFunctionMessageParam · object | ChatCompletionObservationMessageParam · object)[]
model
string
Allowed value: "n1-preview-2025-11"
max_completion_tokens
integer | null
temperature
number | null
top_p
number | null
tools
Tools · object[] | null

This field will be supported in future releases.

tool_choice

This field will be supported in future releases.

Response

Successful Response