Documentation Index
Fetch the complete documentation index at: https://docs.yutori.com/llms.txt
Use this file to discover all available pages before exploring further.
Navigator n1.5 is the latest generation of the Navigator model family. Use the API model id n1.5-latest (or a dated version) in the model field of your chat.completions requests.
Model Versions
| API model id | Description |
|---|
n1.5-latest | Points to the latest stable Navigator n1.5 model. Currently points to n1.5-20260428. |
n1.5-20260428 | Stable release (2026-04-28). |
Supported Actions
The default tool set (browser_tools_core-20260403). 18 coordinate-based browser tools.
| Action | Description | Required Args | Optional Args |
|---|
left_click | Left mouse click | coordinates | ref, modifier |
double_click | Double left click | coordinates | ref, modifier |
triple_click | Triple left click | coordinates | ref, modifier |
middle_click | Middle mouse click | coordinates | ref, modifier |
right_click | Right mouse click | coordinates | ref, modifier |
scroll | Scroll in a direction | coordinates, direction, amount | ref, modifier |
type | Type text into focused input | text | |
key_press | Press a key or combination | key | |
drag | Drag from start to end | start_coordinates, coordinates | |
mouse_move | Move mouse to a point | coordinates | ref |
mouse_down | Press and hold left mouse button | coordinates | ref |
mouse_up | Release left mouse button | coordinates | ref |
go_back | Browser back | | |
go_forward | Browser forward | | |
wait | Pause execution | | duration |
goto_url | Navigate to URL | url | |
refresh | Reload page | | |
hold_key | Hold a key down | key | duration |
Parameter notes:
coordinates is always [x, y] in the normalized 1000x1000 space.
ref is an optional DOM element reference, used as an alternative to coordinates in browser contexts.
modifier is a modifier key held during the action: ctrl, shift, alt, meta, command, or super.
direction for scroll is one of: down, up, left, right.
amount for scroll is an integer where 1 unit is approximately 10% of the screen height.
Includes all core tools plus DOM/ref-based extras (browser_tools_expanded-20260403):
| Action | Description | Required Args | Optional Args |
|---|
extract_elements | Extract a structured ARIA-snapshot-style representation of the page’s interactive and semantic elements. Returns a pageContent string of - role "name" [ref=ref_N] lines (with optional id, href attributes) that the model can read. Each ref is a stable handle that later actions (left_click, set_element_value, …) can target. | | filter (visible, interactive, all) |
find | Search the page for elements whose ARIA-snapshot line matches a substring. Returns a matches list of - role "name" [ref=ref_N] lines (and a totalMatches count) so the model can target them. | text | |
set_element_value | Set the value of an <input>, <textarea>, or <select> element directly by ref, dispatching the right input/change events so the page sees the update. | ref, value | |
execute_js | Run an arbitrary JavaScript expression or statement block against the page and return the (JSON-serialized) result back to the model. Lets the model interact with the page directly when that’s faster or more reliable than the equivalent click/type/scroll sequence — reading hidden state, calling page-internal APIs, or scripting multi-step flows in one shot. | text | |
Like every Navigator tool, these are predicted by the model and executed by your client — the API never touches your browser. What’s specific to the expanded tools is how you execute them: instead of mapping to a Playwright primitive (click, type, scroll), each one needs custom JavaScript evaluated against the page (e.g., via page.evaluate()). The result comes back as a tool message in the next request.
Reference Implementation
The Yutori Python SDK bundles a reference implementation for each tool, along with an async helper that evaluates them against a Playwright page. Import the script constants and evaluate_tool_script from yutori.navigator.tools:
| Tool | SDK constant | Reference script | Returns |
|---|
extract_elements | EXTRACT_ELEMENTS_SCRIPT | extract_elements.js | {success, pageContent} — also stores live refs on window.__yutoriElementRefs for later ref-based actions |
find | FIND_SCRIPT | find.js | {success, matches, totalMatches} — substring filter over the same DOM walk |
set_element_value | SET_ELEMENT_VALUE_SCRIPT | set_element_value.js | {success, message} — sets the value by ref and dispatches the right input/change events |
execute_js | EXECUTE_JS_SCRIPT | execute_js.js | {success, hasResult, result} — wraps the model’s snippet in an AsyncFunction so both expressions and statement blocks work |
left_click / scroll via ref | GET_ELEMENT_BY_REF_SCRIPT | get_element_by_ref.js | {success, coordinates} — resolves a ref to viewport pixel coordinates and scrolls it into view |
The helper evaluate_tool_script(page, SCRIPT, *args) JSON-serializes its arguments, evaluates the script against the page, and returns a Python dict. Three common patterns:
from yutori.navigator.tools import (
EXECUTE_JS_SCRIPT,
EXTRACT_ELEMENTS_SCRIPT,
GET_ELEMENT_BY_REF_SCRIPT,
evaluate_tool_script,
)
# 1. Read structured DOM for the model — feed `pageContent` back as the tool result.
result = await evaluate_tool_script(page, EXTRACT_ELEMENTS_SCRIPT, "visible")
tool_result_text = result["pageContent"]
# 2. Resolve a `ref` (from extract_elements/find) into viewport pixels before a click.
result = await evaluate_tool_script(page, GET_ELEMENT_BY_REF_SCRIPT, ref)
if result["success"]:
px_x, px_y = result["coordinates"]
# 3. Run the model's `execute_js` snippet — pass the raw `text` argument; the script
# already wraps it in an async IIFE. Surface `result` when hasResult is True.
result = await evaluate_tool_script(page, EXECUTE_JS_SCRIPT, args["text"])
tool_result_text = str(result["result"]) if result.get("hasResult") else "undefined"
For the full agent loop — including how each tool’s response envelope feeds back into the next assistant turn — see examples/navigator_n1_5.py in the SDK.
Key Space
Navigator n1.5 uses lowercase key names. Combinations are joined with +, and sequential presses are separated by spaces.
| Category | Key Names |
|---|
| Modifiers | ctrl, alt, shift, meta, command, super |
| Common | enter, backspace, delete, tab, esc, space |
| Arrow keys | left, right, up, down |
| Page navigation | pageup, pagedown, home, end |
| Function keys | f1 through f12 |
Examples: ctrl+c, ctrl+shift+t, alt+left, down down down enter
Features
Use the tool_set parameter to select which set of browser tools are available to the model:
response = client.chat.completions.create(
model="n1.5-latest",
messages=[...],
extra_body={
"tool_set": "browser_tools_expanded-20260403",
}
)
Available tool sets:
browser_tools_core-20260403 (default) — coordinate-based visual browser tools
browser_tools_expanded-20260403 — core + DOM-based tools (extract_elements, find, set_element_value, execute_js)
Remove specific tools from the active tool set:
response = client.chat.completions.create(
model="n1.5-latest",
messages=[...],
extra_body={
"disable_tools": ["hold_key", "drag"],
}
)
JSON Structured Output
Provide a json_schema to get structured data extracted from the model’s response. The schema is appended to your task message, and the model returns JSON inside ```json code fences. The API parses this and returns it as a parsed_json field.
response = client.chat.completions.create(
model="n1.5-latest",
messages=[...],
extra_body={
"json_schema": {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "number"}
},
"required": ["product_name", "price"]
}
}
)
# Access the parsed result
parsed = response.parsed_json # {"product_name": "Widget Pro", "price": 29.99}
When json_schema is provided, the API also adds a structural tag for guided decoding of the JSON output, constraining it to match your schema.
If the model doesn’t return valid JSON (e.g., it’s still navigating), the parsed_json field will not be present in the response.
| Feature | Navigator n1 | Navigator n1.5 |
|---|
| JSON structured output | Not supported | json_schema param with parsed_json response |
| Tool sets | Fixed | Selectable (browser_tools_core-*, browser_tools_expanded-*) |
disable_tools | Not supported | Supported |
| Additional tools | — | hold_key, middle_click, mouse_down, mouse_up, go_forward |
| Mouse move | hover | mouse_move |
| Key press param | key_comb (Playwright names) | key (lowercase key space) |
| Click modifiers | Not supported | ref, modifier params |
type extras | press_enter_after, clear_before_typing | Not included |