LumaBrowser Documentation
LumaBrowser is a programmable browser with built-in AI automation, network monitoring, and a full REST API. Download the installer, launch the app, and start automating in minutes.
Installation
Download the installer for your platform from the downloads page. Available for Windows, macOS, and Linux. Run the installer and launch LumaBrowser — the REST API server starts automatically.
Quick Start
Once running, the REST API is available on http://localhost:3000. Try it:
# Check health
curl http://localhost:3000/api/health
# Create a tab
curl -X POST http://localhost:3000/api/browser/tabs \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Take a screenshot
curl http://localhost:3000/api/browser/tabs/0/screenshot
Configuration
Settings are persisted in SQLite. Key options:
| Setting | Default | Description |
|---|---|---|
core.apiPort | 3000 | REST API listen port |
core.apiEnabled | true | Enable/disable REST server |
core.mcpEnabled | true | Enable/disable MCP server |
Environment variable overrides: API_PORT=8080, API_HOST=0.0.0.0
The LumaBrowser REST API ships with the desktop app and runs on http://localhost:3000. When this doc says "the local API" or uses localhost:3000, assume the app is running on the same machine. The "Free APIs" section at the bottom of this page is a separate surface hosted at https://lumabyte.com — those endpoints are not served by the desktop app. See API hosts & what lives where.
Health & Discovery
| Endpoint | Description |
|---|---|
GET /api/health | Liveness probe. Returns status, timestamp, port, mcpEnabled, and the list of extensions currently loaded. |
GET /api/ | Service discovery. Returns the API name, port, mcpEnabled, and routes — a map of core routes and extension base-path mounts. See note below. |
Health response (actual shape):
{
"status": "ok",
"timestamp": "2026-04-19T12:34:56.000Z",
"port": 3000,
"mcpEnabled": true,
"extensions": ["ai-chat", "network-watcher", "template-builder",
"timed-tasks", "selenium-driver", "puppeteer-driver"]
}
The extensions array reflects what's actually loaded in this build — extensions that ship on disk but are disabled (for example webgpu-llm on a machine without a supported GPU) will not appear.
Discovery response (actual shape):
{
"name": "LumaBrowser API",
"port": 3000,
"mcpEnabled": true,
"routes": {
"core": ["/api/mcp/tools", "/api/mcp/call"],
"extensions": {
"ai-chat": "/api/ai-chat",
"network-watcher": "/api/watchers",
"template-builder": "/api/templates",
"timed-tasks": "/api/timed-tasks",
"selenium-driver": "/api/selenium",
"puppeteer-driver": "/api/puppeteer"
}
},
"aliases": { ... }
}
Discovery lists mount points, not every route. To enumerate individual endpoints under each mount (for example /api/browser/tabs/:id/click), refer to the per-section tables on this page. The MCP routes (/api/mcp/tools, /api/mcp/call) appear under routes.core only when the MCP server is wired.
Response Envelope
Most endpoints wrap their payload in one of two envelopes. Legacy endpoints may omit the wrapper entirely — always check the table below before writing a strongly-typed client.
Canonical envelope (newer / core endpoints):
{
"success": true,
"timestamp": "2026-04-19T12:34:56.000Z",
"data": { ... }, // or sometimes an array
"message": "Optional human-readable context"
}
Payload key by endpoint. Several older extension endpoints use a resource-named key instead of data. Clients that auto-detect the payload key should consult this table to be safe:
| Endpoint | Payload key | Shape |
|---|---|---|
GET /api/browser/tabs | data | array of tab objects |
POST /api/browser/tabs | data | tab object |
POST /api/templates/generate | data | flat template object (see Template Builder) |
POST /api/templates/validate | data | { valid, found, message? } |
POST /api/watchers | watcher | watcher object (root-level, no wrapper) |
GET /api/watchers | watchers | array (root-level) |
POST /api/timed-tasks | task | task object (root-level) |
GET /api/timed-tasks | tasks | array (root-level) |
POST /api/timed-tasks/:id/enable | task | task object (root-level) |
POST /api/timed-tasks/:id/disable | task | task object (root-level) |
POST /api/ai-chat/run | fields at root | no wrapper — success, summary, toolCalls, etc. are top-level |
ID types
Resource identifiers are not uniformly typed. Strongly-typed clients (C#, Kotlin, Swift, Rust) should model each one explicitly or silent deserialization failures will occur.
| Resource | ID field | Type | Example |
|---|---|---|---|
| Browser tab | id, tabId | number (integer) | 32 |
| Network watcher | id | string | "watcher_1776605796008_ze35ugf9e" |
| Timed task | id | string | "task_1776605796275_iw0ooo" |
| Template | id | string (numeric-looking) | "42" |
| Selenium session | sessionId | string (UUID-like) | "a1b2c3d4-..." |
| Puppeteer CDP target | targetId | string (uppercase hex, no dashes) | "4F1B8A09CBEE4C23B0A9DBA1F77DE9F1" |
| Puppeteer CDP session | sessionId | string (uppercase hex, no dashes) | "9A2CFE18E7B64A1CAD05B6E4FF1C02E0" |
Gotcha: Endpoints that take a tab reference expect integer tabId in the JSON body. POST /api/templates/generate historically tolerated a stringified numeric ID; prefer the integer form to avoid surprises.
API hosts — what lives where
There are two distinct surfaces. Pointing a client at the wrong host produces a confusing 404.
| Surface | Host | What it serves |
|---|---|---|
| LumaBrowser local API | http://localhost:3000 | Everything under /api/browser, /api/mcp, /api/watchers, /api/templates, /api/timed-tasks, /api/ai-chat, /api/selenium, /api/puppeteer. Requires the desktop app to be running. |
| LumaByte public helpers | https://lumabyte.com | /api/generateMockRandomPeople, /api/urlToQRCode, /api/analyzeText, /api/sms/*. Always-on, no desktop app required. |
Full programmatic control over browser tabs — navigate, click, fill forms, extract content, and take screenshots.
Tab Management
| Endpoint | Description |
|---|---|
GET /api/browser/tabs | List all open tabs |
POST /api/browser/tabs | Create new tab |
DELETE /api/browser/tabs/:id | Close tab |
PATCH /api/browser/tabs/:id | Navigate, refresh, activate, or execute JavaScript (requires action discriminator — see below) |
POST /api/browser/tabs/:id/activate | Foreground a tab — the preferred path for making a tab the visible rendered surface. See Activate a Tab. |
Create Tab
When silent: true, the tab is created but not rendered to the user-visible surface. This keeps background automation out of the user's way, but see the silent-tab caveats before using it for screenshot or scroll work.
POST /api/browser/tabs
{
"url": "https://example.com",
"silent": false
}
PATCH a Tab — the action discriminator
PATCH /api/browser/tabs/:id requires an action field. A naked {"url": "..."} is rejected with "Action must be one of: navigate, refresh, activate, executeJs". The four supported shapes:
# Navigate an existing tab
PATCH /api/browser/tabs/0
{"action": "navigate", "url": "https://example.com"}
# Refresh
PATCH /api/browser/tabs/0
{"action": "refresh"}
# Foreground (activate) the tab — "focus" is accepted as an alias
PATCH /api/browser/tabs/0
{"action": "activate"}
# Execute JavaScript in the tab
PATCH /api/browser/tabs/0
{"action": "executeJs", "script": "(function(){ return document.title; })()"}
Activate a Tab (make it the visible surface)
Activation foregrounds a tab so it becomes the rendered display surface. This is required before any operation that depends on an actual painted viewport — see silent-tab caveats for the full list. Two equivalent shapes are available:
# Preferred: dedicated endpoint
POST /api/browser/tabs/0/activate
# Or: PATCH with action discriminator (accepts "activate" or "focus" alias)
PATCH /api/browser/tabs/0
{"action": "activate"}
Silent tabs cannot be activated. A tab created with silent: true has no render surface to promote. Calling activate on one returns 400 with {"error": "Silent tabs cannot be activated"}. If you need to foreground work, create the tab with silent: false in the first place.
Reliable screenshot / scroll workflows in multi-tab scripts. Activate the target tab immediately before GET /tabs/:id/screenshot or POST /tabs/:id/scroll. Without this step, operations that require the painted surface can fail on any tab that isn't currently foregrounded.
# Activate, then screenshot — the reliable pattern
curl -X POST http://localhost:3000/api/browser/tabs/2/activate
curl http://localhost:3000/api/browser/tabs/2/screenshot
executeJs: no top-level return. Scripts are evaluated at page top level, where return is a syntax error. A bare "script": "return 2+40" fails with "Script failed to execute". Wrap any return value in an IIFE:
# Wrong
{"action": "executeJs", "script": "return 2+40"}
# Right
{"action": "executeJs", "script": "(function(){ return 2+40; })()"}
# Also fine — last expression is returned
{"action": "executeJs", "script": "2+40"}
Page Inspection
| Endpoint | Description |
|---|---|
GET /api/browser/tabs/:id/source | Page HTML (?type=text|clean|full) |
GET /api/browser/tabs/:id/screenshot | PNG screenshot as base64 — requires a rendered surface, activate the target tab first. |
GET /api/browser/tabs/:id/console | Console log entries |
GET /api/browser/tabs/:id/network | Network request log |
GET /api/browser/tabs/:id/element | Element properties (?selector=css&llmFallback=desc) |
GET /api/browser/tabs/:id/table | Extract table data (?selector=css&llmFallback=desc) |
User Interaction
All selector-based endpoints accept an optional llmFallback parameter — a natural language description of the target element. When the CSS selector fails and an LLM is configured, it resolves the element automatically. See LLM Fallback.
| Endpoint | Payload |
|---|---|
POST /tabs/:id/click | { "selector": "button.submit", "llmFallback": "Click the submit button" } |
POST /tabs/:id/fill | { "fields": [{ "selector": "input", "value": "text", "llmFallback": "Email input" }] } |
POST /tabs/:id/wait | { "selector": ".modal", "llmFallback": "The popup modal", "timeout": 5000 } |
POST /tabs/:id/scroll | { "selector": "#pricing", "llmFallback": "The pricing section" } — activate the tab first for reliable scroll behavior. |
POST /tabs/:id/press-key | { "key": "Enter", "selector": "input.search", "llmFallback": "The search box" } |
POST /tabs/:id/dialog | { "action": "accept" } |
Full Workflow Example
# 1. Create a tab
curl -X POST http://localhost:3000/api/browser/tabs \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# 2. Fill a form (with LLM fallback per field)
curl -X POST http://localhost:3000/api/browser/tabs/0/fill \
-H "Content-Type: application/json" \
-d '{"fields": [
{"selector": "input[name=email]", "value": "[email protected]",
"llmFallback": "The email input field"},
{"selector": "input[name=password]", "value": "secret123",
"llmFallback": "The password input field"}
], "llmFallback": "Fill the login form"}'
# 3. Click submit (with LLM fallback)
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
-H "Content-Type: application/json" \
-d '{"selector": "button[type=submit]",
"llmFallback": "Click the login button"}'
# 4. Wait for result (with LLM fallback)
curl -X POST http://localhost:3000/api/browser/tabs/0/wait \
-H "Content-Type: application/json" \
-d '{"selector": ".success-message", "timeout": 10000,
"llmFallback": "The success confirmation message"}'
# 5. Get page content
curl "http://localhost:3000/api/browser/tabs/0/source?type=text"
Silent-tab & surface-dependent operations
LumaBrowser is a single-window, single-visible-tab surface. A tab that is backgrounded (another tab is foregrounded) or was created with silent: true has no rendered display surface. A handful of operations require that surface and will fail on a hidden tab:
GET /api/browser/tabs/:id/screenshot— fails with "Current display surface not available for capture". Foreground the tab first withPOST /api/browser/tabs/:id/activate.POST /api/browser/tabs/:id/scroll— a scroll call returns successfully, butwindow.scrollYon a backgrounded tab doesn't always progress as a user-driven scroll would. Activate the tab first for deterministic scroll behavior.
For deterministic foregrounding in multi-tab automation, call POST /api/browser/tabs/:id/activate (or PATCH with {"action":"activate"}) immediately before any surface-dependent operation. Silent tabs cannot be activated — if you need to screenshot or scroll a tab, create it with silent: false.
Every selector-based browser command accepts an optional llmFallback parameter — a plain-English description of the target element. When a selector fails (or no selector is given at all), LumaBrowser runs your description through a four-stage pipeline that starts deterministic, escalates to the LLM only when needed, validates what the LLM returns, and silently improves future runs by seeding a page template in the background.
The Four-Stage Resolution Pipeline
This pipeline runs whenever a caller invokes any LLM-fallback path — REST llmFallback on click / wait / getElement / fill, the Selenium ai-description locator, /lumabyte/find, /lumabyte/click, the Puppeteer CDP methods Lumabyte.find and Lumabyte.click, and the MCP tools listed below. Every stage has one job, and most requests never reach the LLM at all.
Stage 1 — Deterministic match (0 tokens)
Before any LLM call, the resolver queries the live DOM with a battery of stable attribute strategies derived from your description:
[data-testid]— both the slugified description and the raw string[aria-label][placeholder][data-role]— slugified description- Exact visible-text match on interactable elements
If any strategy produces exactly one interactable element, the resolver uses it immediately. No LLM tokens spent, no latency added. Pages that expose stable semantic attributes resolve at native speed.
Stage 2 — LLM with a live DOM snapshot
If the deterministic stage misses, the LLM is prompted with a compact snapshot of up to 80 interactable elements from the actual current DOM. Each snapshot row carries id, data-*, aria-*, role, name, type, placeholder, disabled, and visible text. Any cached page template is layered on top as additional context.
Implication for dynamically-rendered content. The LLM sees what is in the DOM at the moment of the call. If your target element appears after a delay (lazy hydration, skeleton-swap, modal after click), trigger the delay before invoking wait_for with a description, or allow wait_for to poll until the element mounts. An element that is not in the DOM cannot be resolved.
Stage 3 — Validate and retry
Before the action runs, the LLM's selector is validated against the DOM. If it matches zero elements or more than one, the failure reason is fed back into the prompt as context and the LLM gets one more attempt. Total LLM calls are capped at two per resolution — no runaway costs on a pathological page.
Stage 4 — Async shadow template generation
After any successful resolution (deterministic or LLM), if no page template is cached for the current URL, the resolver fires a background generateForTab. The shared template repository is seeded silently — no explicit user action required, no delay added to the caller's response. The next call on the same URL can skip the LLM entirely.
Why This Is Better
- Free speed on well-built pages. Pages with stable
data-testid,aria-label,placeholder, ordata-roleattributes resolve in Stage 1 with zero LLM cost and sub-millisecond latency. - LLM grounded in current truth. The model sees a snapshot of the real DOM, not a stale dump — so dynamic, client-rendered, and single-page-app content work as long as the target is actually mounted at call time.
- Validated before acting. A hallucinated or ambiguous selector is rejected and retried before it touches the page — no silent misclicks on the wrong element.
- Self-warming cache. Every successful run pays forward: the background template generation means routine workflows converge to Stage 1 speed over time.
Prerequisites
- The
selector-resolverLLM slot, or a globally configured primary LLM (Settings → LLM Providers). See LLM slot configuration below. - Template Builder extension is recommended (enabled by default) — enables the Stage 4 cache.
- Works with OpenAI-compatible, Anthropic, or local WebGPU models.
LLM slot: selector-resolver
LumaBrowser registers a dedicated core-scope LLM slot for this pipeline, labelled Selector Resolver (fast, low-latency model recommended). If the slot is not explicitly configured, the resolver falls back to your global active LLM.
| Property | Value |
|---|---|
| Slot name | selector-resolver |
| Scope | Core (registered at startup) |
| Fallback | Global active LLM if slot is unconfigured |
| Temperature | 0 — hardcoded on the slot's call path. Deterministic selector resolution is non-negotiable. |
| Max tokens | Respects whatever you configure on the slot / model — not hardcoded. |
Deployment recommendation. Configure this slot with a fast, low-latency model: Claude Haiku, Gemini Flash, GPT-4.1 mini, or a similar speed-tier option. Heavier reasoning models are overkill for selector resolution and measurably slow the action path. The temperature: 0 guarantee means a small model performs well here — there is no reasoning budget to spend.
Configure the slot under Settings → LLM Providers → Selector Resolver.
Supported Endpoints
| Endpoint | Fallback Behavior |
|---|---|
POST /tabs/:id/click | Resolves click target. Selector becomes optional when llmFallback is provided. |
POST /tabs/:id/fill | Per-field llmFallback on each field object, plus top-level fallback for the whole form. |
POST /tabs/:id/wait | Resolves wait target. Selector becomes optional when llmFallback is provided. |
POST /tabs/:id/scroll | Resolves scroll-to-element target when a selector is involved. |
POST /tabs/:id/press-key | Resolves focus target element before key press. |
GET /tabs/:id/element | Resolves element to inspect. Selector becomes optional when llmFallback is provided. |
GET /tabs/:id/table | Resolves table container selector. |
Example: Click with Fallback
# Selector works → instant click, llmFallback is ignored
# Selector breaks → LLM resolves the right element automatically
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
-H "Content-Type: application/json" \
-d '{
"selector": "button.btn-primary.submit-form",
"llmFallback": "Click the submit button"
}'
# Response on LLM fallback success:
{
"data": { "clicked": true, "resolvedSelector": "form button[type=submit]" },
"success": true,
"message": "Element clicked successfully (resolved by LLM fallback)"
}
Example: Click without Selector (LLM-Only)
# No selector at all — LLM resolves from description alone
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
-H "Content-Type: application/json" \
-d '{
"tabId": 0,
"llmFallback": "Click the Sign In button in the navigation bar"
}'
Example: Form Fill with Per-Field Fallback
curl -X POST http://localhost:3000/api/browser/tabs/0/fill \
-H "Content-Type: application/json" \
-d '{
"fields": [
{"selector": "input#email", "value": "[email protected]",
"llmFallback": "The email address input"},
{"selector": "input#pass", "value": "secret",
"llmFallback": "The password input"},
{"selector": "select#role", "value": "admin",
"llmFallback": "The role dropdown"}
],
"llmFallback": "Fill the registration form"
}'
Template Builder Integration
When the Template Builder extension is active, cached page templates are layered into the Stage 2 LLM prompt alongside the live DOM snapshot, sharpening match accuracy on pages you've automated before. Successful resolutions also seed the template repository in the background (Stage 4) — no explicit generation call is required. See the Template Builder section for direct template management.
Debugging resolution (operator)
To trace why a specific description failed to resolve, launch LumaBrowser with the environment variable LLM_FALLBACK_DEBUG=1. The resolver writes verbose, stage-by-stage traces to stdout prefixed with [LlmFallback]: which deterministic strategies were tried, what was in the DOM snapshot, what the LLM returned, and why validation accepted or rejected it.
MCP Tools
All corresponding MCP tools also accept llmFallback: click, fill_form, wait_for, scroll, press_key, get_element, get_table. This means AI agents using the MCP server can provide their own fallback descriptions for resilient automation.
Connect Claude Desktop, OpenClaw, or any MCP-compatible AI agent to LumaBrowser's built-in MCP server. Give your agents native browser capabilities without writing glue code.
Claude Desktop / OpenClaw Setup
LumaBrowser includes a built-in MCP server. Add it to your Claude Desktop or OpenClaw config (%APPDATA%\Claude\claude_desktop_config.json on Windows, ~/.config/Claude/claude_desktop_config.json on macOS/Linux):
{
"mcpServers": {
"lumabrowser": {
"command": "lumabrowser",
"args": ["--mcp"]
}
}
}
Make sure LumaBrowser is installed and available on your PATH, or use the full path to the executable.
Available MCP Tools
Browser Automation
| Tool | Description |
|---|---|
get_tabs | List all open tabs |
create_tab | Open new tab with URL |
close_tab | Close a tab by ID |
navigate | Navigate tab to URL |
click | Click element by CSS selector (supports llmFallback) |
fill_form | Fill form fields (supports llmFallback) |
press_key | Send keyboard input (supports llmFallback) |
scroll | Scroll page (supports llmFallback) |
get_source | Get page HTML (text/clean/full) |
screenshot | Capture page as PNG |
wait_for | Wait for element to appear (supports llmFallback) |
get_element | Get element properties (supports llmFallback) |
execute_js | Run JavaScript in page context |
Extension Tools
| Tool | Description |
|---|---|
watcher_list | List network watchers |
watcher_add | Create network watcher |
watcher_remove | Delete network watcher |
watcher_toggle | Enable/disable watcher |
get_template | Get cached page template |
generate_template | Generate template via LLM |
task_list | List scheduled tasks |
task_create | Schedule new task |
task_trigger | Run task immediately |
task_delete | Delete task |
ai_chat_run | Execute headless agentic browser task |
selenium_driver_status | Report whether the W3C WebDriver server is running, its host/port, and active Selenium session count |
selenium_driver_start | Start the W3C WebDriver HTTP server (default 127.0.0.1:9515); returns bound port and URL prefix |
selenium_driver_stop | Stop the WebDriver server and release all active Selenium sessions |
puppeteer_driver_status | Report whether the CDP WebSocket server is running, its host/port, browser UUID, and active CDP session/target counts |
puppeteer_driver_start | Start the CDP server (default 127.0.0.1:9222); returns bound port so puppeteer.connect({ browserURL }) can attach |
puppeteer_driver_stop | Stop the CDP server, close every puppeteer-kind tab, and detach all debugger sessions |
Example: Claude Browsing the Web
User: "Go to Hacker News and tell me the top 3 stories."
Claude uses MCP tools:
→ navigate({ tabId: 0, url: "https://news.ycombinator.com" })
→ get_source({ tabId: 0, type: "text" })
→ "Here are the top 3 stories: ..."
Monitor HTTP traffic using Chrome DevTools Protocol (CDP) and forward matched responses to webhooks. No proxy setup or root certificates required.
How It Works
Network Watcher uses CDP to intercept responses at the browser level. You define URL patterns and a webhook destination — when traffic matches, the full request/response is forwarded to your endpoint.
API Endpoints
| Endpoint | Description |
|---|---|
GET /api/watchers | List all watchers |
POST /api/watchers | Create watcher |
GET /api/watchers/:id | Get watcher details |
PATCH /api/watchers/:id | Update watcher |
POST /api/watchers/:id/toggle | Enable/disable |
DELETE /api/watchers/:id | Remove watcher |
GET /api/watchers/stats | Trigger statistics |
POST /api/watchers/test | Test configuration |
Create a Watcher
curl -X POST http://localhost:3000/api/watchers \
-H "Content-Type: application/json" \
-d '{
"urlPattern": "*api.example.com/v1/orders*",
"sendTo": "https://your-backend.com/webhook",
"note": "Capture order API calls",
"method": "POST",
"captureHeaders": true,
"captureBody": true
}'
Webhook Payload
When a matching request is intercepted, this payload is POSTed to your sendTo URL. Note that request is a composite envelope: it carries the outbound request metadata and nests the captured response inside it.
{
"watcherId": "watcher_1711632896000_abc",
"note": "Capture order API calls",
"timestamp": "2026-03-28T12:34:56.000Z",
"request": {
"url": "https://api.example.com/v1/orders",
"method": "POST",
"timestamp": "2026-03-28T12:34:55.980Z",
"request": {
"headers": { "content-type": "application/json", "authorization": "Bearer ..." }
},
"response": {
"status": 200,
"statusText": "OK",
"mimeType": "application/json",
"headers": { "content-type": "application/json" },
"body": "eyJvcmRlcklkIjoiMTIzNDUifQ==",
"base64Encoded": true
}
}
}
Where to find each field:
- Request headers:
request.request.headers(double-nested — the outerrequestis the capture envelope, the innerrequestis the browser's outbound HTTP request). - Response status / body / headers:
request.response.*(not at the top level). - Encoding: bodies from non-text MIME types are base64-encoded; check
request.response.base64Encodedbefore decoding.
Watcher Configuration
| Field | Required | Description |
|---|---|---|
urlPattern | Yes | URL pattern with wildcards (*, ?) |
sendTo | Yes | Webhook destination URL |
note | No | Human-readable description |
method | No | HTTP method filter: GET, POST, PUT, DELETE, or * (default) |
captureHeaders | No | Include request/response headers (default: true) |
captureBody | No | Include response body (default: true) |
Toggle a watcher
POST /api/watchers/:id/toggle requires a body — it sets the enabled flag explicitly rather than flipping it. A body-less call fails with a 500. Always supply enabled:
POST /api/watchers/watcher_xxx/toggle
{"enabled": true} // or false
Updating a watcher (PATCH caveat)
PATCH /api/watchers/:id behaves as a full-row update and will reject partial bodies that omit required columns. For example, a PATCH with only {"note": "..."} fails with "NetworkWatcher: urlPattern is required". Re-send every required field (urlPattern, sendTo) in the update body, or use the toggle endpoint for enable/disable changes.
Uses LLM to analyze pages and generate CSS selector maps. Describe what you want in plain English, and the Template Builder returns reliable selectors — even against randomized class names.
API Endpoints
| Endpoint | Description |
|---|---|
GET /api/templates | List cached templates |
GET /api/templates/by-url?url=... | Fetch a cached template by URL (prefers exact match, falls back to pattern) |
GET /api/templates/:id | Get cached template by numeric ID |
POST /api/templates | Upsert a template manually (no LLM) |
POST /api/templates/generate | LLM-powered template generation. Latency is highly variable — see below. |
POST /api/templates/repair | LLM-powered selector repair for a previously-generated template |
POST /api/templates/test | Run every selector in the cached template against a live tab and report which ones still match |
POST /api/templates/validate | Check one selector against a live tab. No LLM, fast, cheap. |
DELETE /api/templates?url=... | Remove a cached template |
POST /api/templates/generate
Runs the Template Builder LLM pipeline against a live tab and persists the result. This is the slowest, costliest call in the Template Builder surface — use sparingly and let the cache absorb repeat requests.
Request body:
{
"tabId": 0, // required — integer tab ID of an open tab
"url": "https://example.com/page" // optional — recorded on the template
}
tabIdis required. A request without it fails fast with{"success":false,"error":"tabId is required"}.urlis optional. When omitted, the generator uses the tab's current URL.- Historically this endpoint tolerated a stringified tab ID. Pass the integer form to match the rest of the API.
Response (flat public shape):
{
"success": true,
"timestamp": "2026-04-19T12:34:56.000Z",
"message": "Template generated and saved successfully",
"data": {
"id": "42",
"url": "https://example.com/page",
"createdAt": "2026-04-19 12:34:56",
"pageInfo": {
"title": "Example Checkout",
"description": "Two-step checkout form",
"formFields": ["email", "card", "cvc"],
"navigationStrategy": "template"
},
"elements": [
{
"id": "element_1",
"name": "Submit Button",
"type": "Button",
"clickable": true,
"primarySelector": "button.submit",
"fallbackSelectors": ["button[type=submit]"],
"itemSelector": null,
"childSelectors": null
},
{
"id": "element_7",
"name": "User List",
"type": "RepeatingItem",
"clickable": false,
"primarySelector": "#user-list li.user",
"fallbackSelectors": [],
"itemSelector": "#user-list li.user",
"childSelectors": { "name": "span.name", "email": "span.email" }
}
],
"template": { /* original internal PascalCase shape — for backward compat only */ }
}
}
Consume the flat shape. data.elements[] is the canonical public contract. Each element is camelCase, fallbackSelectors is always an array of strings (possibly empty), and url / createdAt are now populated. The nested data.template preserves the old PascalCase internal shape solely for backward compatibility — new SDKs should ignore it.
Latency & idempotency
Template generation invokes an LLM, so wall-clock times are bounded by the slowest call in the pipeline:
- Typical: 15–30 s when the configured provider is a hosted frontier model.
- Observed worst case: well past 2 minutes with a slow or rate-limited provider. Do not hard-code a 30 s client timeout.
- Cache: subsequent
GET /api/templates/by-urlcalls for the same URL return the cached record in a few milliseconds. - Retry safety: the endpoint is an upsert keyed on URL. Retrying after a timeout is safe; you may get a newer template that overwrites the in-flight one.
POST /api/templates/validate — fast selector check
Answers a single question cheaply: does this CSS selector match anything on this page right now? No LLM, no template, no cache writes — just a scoped document.querySelectorAll against a live tab. Useful for healing workflows that need to decide whether to regenerate a template.
Request body: you must supply selector plus either tabId or url (to locate an open tab by its current URL).
# Validate against an existing tab
POST /api/templates/validate
{
"selector": "#btn-simple",
"tabId": 0
}
# Or validate against whichever open tab is currently at the given URL
POST /api/templates/validate
{
"selector": "button.submit",
"url": "https://example.com/checkout"
}
Response:
{
"success": true,
"timestamp": "2026-04-19T12:34:56.000Z",
"message": "Selector found",
"data": {
"valid": true, // bool — selector parses as valid CSS
"found": true // bool — at least one element matched (not a count)
}
}
# Selector parses but matches zero elements
{
"success": true,
"message": "Selector did not match",
"data": { "valid": true, "found": false }
}
# Selector is malformed CSS
{
"success": true,
"data": { "valid": false, "found": false,
"message": "Invalid selector syntax" }
}
Error responses:
400— missingselector, or neithertabIdnorurlprovided.404— no open tab matches the givenurl, or the giventabIdis not open.503— browser service not yet initialized (immediately after boot).
Caching
Generated templates are persisted in SQLite keyed on URL. Subsequent generate calls for the same URL overwrite the cached record. Reads via GET /api/templates/by-url return the cached version instantly; use that to avoid paying for another LLM call.
Schedule recurring LLM-driven browser automation tasks. Each execution runs an agentic loop (up to 10 tool-call iterations) with full access to the browser automation API.
API Endpoints
| Endpoint | Description |
|---|---|
GET /api/timed-tasks | List all tasks |
POST /api/timed-tasks | Create task |
GET /api/timed-tasks/:id | Get task details |
PATCH /api/timed-tasks/:id | Update task — see PATCH semantics before use |
DELETE /api/timed-tasks/:id | Delete task |
POST /api/timed-tasks/:id/trigger | Run immediately |
POST /api/timed-tasks/:id/enable | Preferred way to start scheduling. No body. Flips enabled=1 and restarts the scheduler. |
POST /api/timed-tasks/:id/disable | Preferred way to stop scheduling. No body. Flips enabled=0 and stops the scheduler. |
GET /api/timed-tasks/:id/runs | Execution history |
Create a Scheduled Task
The request body is camelCase. The response (below) is snake_case and uses 0/1 for enabled. This asymmetry is documented rather than fixed to keep existing SDKs working — model it explicitly in your client.
curl -X POST http://localhost:3000/api/timed-tasks \
-H "Content-Type: application/json" \
-d '{
"name": "Check Product Prices",
"requestPrompt": "Navigate to https://store.example.com, find the price of Widget X, and return it as JSON",
"responsePrompt": "Format the result as {\"product\": \"...\", \"price\": \"...\"}",
"repeatInterval": 3600000,
"webhookUrl": "https://my-backend.com/prices",
"enabled": true
}'
Request vs. response shape
Request (camelCase, enabled is boolean):
{
"name": "Check Product Prices",
"requestPrompt": "...",
"responsePrompt": "...",
"repeatInterval": 3600000,
"webhookUrl": "https://my-backend.com/prices",
"enabled": true
}
Response (snake_case, enabled is 0/1):
{
"success": true,
"task": {
"id": "task_1776605796275_iw0ooo",
"name": "Check Product Prices",
"request_prompt": "...",
"response_prompt": "...",
"repeat_interval": 3600000,
"webhook_url": "https://my-backend.com/prices",
"enabled": 1,
"last_run": null,
"next_run": null,
"created_at": "2026-04-19 12:34:56"
}
}
PATCH semantics — read this before updating a task
PATCH /api/timed-tasks/:id currently behaves as PUT: it performs a full row update and nulls any column you don't re-send. A partial-looking PATCH fails loudly on required columns:
PATCH /api/timed-tasks/task_xxx
{"name":"new-name"}
→ 400 NOT NULL constraint failed: timed_tasks.request_prompt
For enabling or disabling a task, use the dedicated endpoints:
# Enable — no body, no "are all required fields set?" gotcha
curl -X POST http://localhost:3000/api/timed-tasks/task_xxx/enable
# Disable
curl -X POST http://localhost:3000/api/timed-tasks/task_xxx/disable
# Response for both
{
"success": true,
"task": { /* full snake_case task object */ },
"message": "Task enabled" // or "Task disabled"
}
For other edits, re-send the full task body — every NOT NULL column (name, request_prompt, response_prompt, repeat_interval) must be present. The same PATCH-as-PUT behavior applies to PATCH /api/watchers/:id.
Execution Lifecycle
- Task fires at the configured interval
- A browser tab is pre-created
- LLM receives the prompt with available browser tools
- Agentic loop: LLM calls tools, gets results, decides next action (up to 10 iterations)
- Final result is stored and forwarded to the webhook URL
A built-in LLM-powered agent that can see and control the browser. Use it interactively via the sidebar, or programmatically via the REST API and MCP tool. Give it a single natural language prompt and it autonomously navigates, clicks, fills forms, and extracts data — returning a structured result.
Capabilities
- Agentic tool execution loop (up to 15 iterations per run, configurable)
- Full access to all browser automation tools
- Template Builder integration for faster, more accurate element targeting
- Headless REST API for programmatic use
- MCP tool (
ai_chat_run) for agent-to-agent orchestration - Optional screenshot capture of final page state
- Tab lifecycle management (auto-create, reuse existing, auto-close)
- Tool allowlisting for read-only or restricted runs
REST API Endpoint
| Endpoint | Description |
|---|---|
POST /api/ai-chat/run | Execute a headless agentic task |
Request
POST /api/ai-chat/run
{
"prompt": "Log into acme.com/login with [email protected] / demo123,
go to settings, change company name to 'Acme Industries'",
"tabId": 0, // optional — use existing tab
"autoCloseTab": true, // optional, default: true
"includeScreenshot": false, // optional, default: false
"maxIterations": 15, // optional, default: 15
"timeout": 120000, // optional, default: 120000 (ms)
"tools": ["navigate", "click", ...], // optional — restrict available tools
"systemPromptAppend": "Return JSON" // optional — extra instructions
}
Request Parameters
| Field | Required | Default | Description |
|---|---|---|---|
prompt | Yes | — | Natural language task description |
tabId | No | new tab | Use an existing browser tab by ID. If omitted, a new tab is created at about:blank. |
autoCloseTab | No | true | Close the tab after completion. Only closes tabs that were auto-created. |
includeScreenshot | No | false | Capture a base64 PNG screenshot of the final page state. |
maxIterations | No | 15 | Maximum agent loop iterations. Lower values keep costs predictable. |
timeout | No | 120000 | Maximum total execution time in ms. Prevents runaway loops. |
tools | No | all | Restrict which tools the agent can use. e.g., ["navigate", "get_source"] for read-only scraping. |
systemPromptAppend | No | — | Extra instructions appended to the system prompt for this run. |
Response
{
"success": true,
"summary": "Logged in and updated company name to Acme Industries",
"finalResponse": "Done! I logged into acme.com, navigated to settings...",
"tabId": 0,
"iterations": 4,
"durationMs": 18420,
"toolCalls": [
{ "tool": "navigate", "params": {"tabId": 0, "url": "..."}, "durationMs": 1200 },
{ "tool": "fill_form", "params": {"tabId": 0, "fields": [...]}, "durationMs": 340 },
{ "tool": "click", "params": {"tabId": 0, "selector": "..."}, "durationMs": 890 },
{ "tool": "wait_for", "params": {"tabId": 0, "selector": "..."}, "durationMs": 1100 }
],
"screenshot": "data:image/png;base64,..."
}
Response Fields
| Field | Description |
|---|---|
summary | LLM-generated one-line summary of what was accomplished |
finalResponse | The agent's full final response text |
tabId | The browser tab ID used (integer). Use this to continue working with the tab (set autoCloseTab: false). |
iterations | How many agent loop iterations ran |
durationMs | Total wall-clock execution time |
toolCalls | Flat array of tool invocations with name, params, durationMs, success, error. See Tool result shapes for the corresponding payload data. |
steps | Per-iteration record that includes each tool's toolResult (the actual data returned — screenshot bytes, element text, extracted HTML, etc.). |
screenshot | Only populated when the agent explicitly took a screenshot and the flag was set. See note on includeScreenshot. |
error | Error message if the run failed (e.g., timeout, LLM failure) |
Note on includeScreenshot
includeScreenshot: true authorizes the agent to take a screenshot — it does not force one. If your prompt doesn't imply visual verification, the agent will often finish without ever calling the screenshot tool, and the top-level screenshot field will be absent.
To reliably capture a final screenshot, add an explicit instruction to the prompt:
{
"prompt": "Log into acme.com, go to the billing page, then TAKE A SCREENSHOT of the final state.",
"includeScreenshot": true
}
If you only want the bytes and don't care about visual verification during the run, call GET /api/browser/tabs/:id/screenshot directly after the run completes (set autoCloseTab: false to keep the tab open). That endpoint returns in under a second for a foregrounded tab.
Tool result shapes
Each entry in toolCalls[] records the invocation metadata:
{
"tool": "screenshot",
"params": { "tabId": 23 },
"durationMs": 66,
"success": true,
"error": null
}
The actual payload produced by each tool lives under steps[N].toolResult (or equivalently steps[N].toolResult.data). Shapes by tool:
| Tool | toolResult.data shape |
|---|---|
screenshot | { "screenshot": "data:image/png;base64,..." } |
get_source | { "url": "...", "title": "...", "source": "..." } |
get_element | { "tag": "button", "text": "Sign In", "attributes": { ... }, "boundingRect": { ... } } |
click | { "clicked": true, "resolvedSelector"?: "...", "urlChanged"?: true, "newUrl"?: "...", "template"?: { ... } } |
fill_form | { "filled": N, "fields": [ { "selector": "...", "value": "..." }, ... ] } |
wait_for | { "matched": true, "resolvedSelector"?: "..." } (or matched: false on timeout) |
scroll | { "scrolled": true, "scrollY": 420 } |
navigate | { "url": "...", "loaded": true } |
create_tab | { "tabId": 7, "url": "..." } |
get_tabs | { "tabs": [ { "id": 0, "url": "...", "title": "..." } ] } |
If you need screenshot bytes that an agent captured mid-run, extract them from steps[].toolResult.data.screenshot rather than the top-level screenshot field.
Click tool schema (what the model sees)
The AI chat agent is instructed that click takes a real CSS selector in selector, and that text is an optional visible-text filter for disambiguating when the selector matches multiple elements. If you're routing your own LLM into LumaBrowser's click tool, align your system prompt to the same shape:
# Single unique match — selector alone
{ "tool": "click", "params": { "tabId": 0, "selector": "#btn-simple" } }
# Multiple matches — use text to pick
{ "tool": "click", "params": { "tabId": 0, "selector": "button", "text": "Sign In" } }
selector must be a CSS selector string — not a description, not the literal word "css".
Example: Scrape Data (Read-Only)
curl -X POST http://localhost:3000/api/ai-chat/run \
-H "Content-Type: application/json" \
-d '{
"prompt": "Go to news.ycombinator.com and return the top 5 story titles as JSON",
"tools": ["navigate", "get_source", "wait_for", "scroll"],
"systemPromptAppend": "Return only a JSON array of strings",
"maxIterations": 5
}'
Example: Reuse a Tab
# Step 1: Create a tab and keep it open
curl -X POST http://localhost:3000/api/ai-chat/run \
-H "Content-Type: application/json" \
-d '{
"prompt": "Navigate to acme.com/dashboard and log in with [email protected] / demo123",
"autoCloseTab": false
}'
# → returns { "tabId": 0, ... }
# Step 2: Continue on the same tab
curl -X POST http://localhost:3000/api/ai-chat/run \
-H "Content-Type: application/json" \
-d '{
"prompt": "Now go to the settings page and change the timezone to UTC",
"tabId": 0,
"autoCloseTab": false,
"includeScreenshot": true
}'
MCP Tool
The same functionality is available as an MCP tool: ai_chat_run. It accepts all the same parameters and returns the same result shape. This enables agent-to-agent orchestration — an outer AI agent can delegate complex browser tasks to LumaBrowser's built-in agent.
Available Browser Tools
The agent can use: navigate, create_tab, get_tabs, click, fill_form, press_key, scroll, get_source, screenshot, wait_for, get_element. When Template Builder is enabled, also: get_template, generate_template.
Drive LumaBrowser from any Selenium client — Python, Java, C#, Ruby, JavaScript — with zero code changes to your existing test suite. The selenium-driver extension embeds a W3C WebDriver Level 2 / ChromeDriver-compatible HTTP server inside LumaBrowser. Point RemoteWebDriver at it and the current browser tabs become your automation target, as if it were Chrome plus ChromeDriver.
The distinctive addition: LLM-powered selector fallback. When a CSS or XPath locator fails to match, or a click is intercepted, the driver can re-resolve the element from a natural-language description and retry automatically. Brittle selectors stop breaking builds.
Defaults
| Setting | Default | Description |
|---|---|---|
port | 9515 | Same port as ChromeDriver. Configurable. |
host | 127.0.0.1 | Local-only by default. Set to 0.0.0.0 to expose on the network. |
autostart | false | Off by default. Enable in the Selenium settings tab or via REST. |
urlPrefix / prefix | empty string | Optional path prefix (for example /wd/hub) prepended to WebDriver routes. Default is no prefix — the server listens directly at http://host:port/session. Point your Selenium client at the bare http://host:port URL (not http://host:port/wd/hub) unless you've explicitly configured a prefix. |
Field naming: The settings endpoint accepts urlPrefix on write; the status endpoint returns the same value as prefix. Treat them as the same field under two names.
Starting the Driver
Two ways to start the WebDriver server:
1. Settings tab. Open Settings → Selenium, check “Start automatically when LumaBrowser launches”, then click Save & Start.
2. REST API. Use LumaBrowser's existing API gateway on port 3000:
| Endpoint | Description |
|---|---|
GET /api/selenium/status | Current server state (running, host, port, urlPrefix) |
POST /api/selenium/start | Start the WebDriver server |
POST /api/selenium/stop | Stop the WebDriver server |
GET /api/selenium/settings | Read current settings |
POST /api/selenium/settings | Update settings (host, port, autostart, urlPrefix) |
# Start the WebDriver server
curl -X POST http://localhost:3000/api/selenium/start
# Check status
curl http://localhost:3000/api/selenium/status
# Change port to 4444 and expose on the network
curl -X POST http://localhost:3000/api/selenium/settings \
-H "Content-Type: application/json" \
-d '{"host": "0.0.0.0", "port": 4444}'
Response shapes: /start vs /status
These two endpoints return different shapes today. If you need the full runtime description after a start, call /status after /start.
# POST /api/selenium/start — minimal
{
"success": true,
"already": false, // true if the server was already running
"port": 9515
}
# GET /api/selenium/status — complete
{
"running": true,
"host": "127.0.0.1",
"port": 9515,
"prefix": "", // urlPrefix as configured; empty by default
"sessions": 0, // active Selenium sessions
"settings": { "host": "127.0.0.1", "port": 9515,
"urlPrefix": "", "autostart": false, ... }
}
Construct the server root as http://{host}:{port}{prefix}. With default settings, that's http://127.0.0.1:9515 with no trailing path.
Python Example
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
opts = Options()
# Opt into LumaBrowser's LLM fallback for this session
opts.set_capability("lumabyte:llmFallback", {
"enabled": True,
"onFindFail": True,
"onClickIntercepted": True,
})
driver = webdriver.Remote("http://127.0.0.1:9515", options=opts)
driver.get("https://example.com")
# Standard Selenium — works exactly as you'd expect
title = driver.find_element(By.CSS_SELECTOR, "h1").text
# LumaBrowser extension: find an element by natural-language description
el = driver.execute(
"lumabyte:find",
{"description": "the main sign-up button near the top"},
)
driver.quit()
LLM Fallback — Four Override Points
The fallback system intercepts WebDriver at four points. Each is independently controllable via session capabilities.
1. Custom locator strategy: "ai-description"
Send a Find Element request whose using is "ai-description". No CSS needed — the LLM resolves the selector from your description.
POST /session/{id}/element
{ "using": "ai-description", "value": "the blue submit button" }
2. Auto-retry on zero matches
When a standard CSS or XPath selector matches nothing and the request included a lumabyte:description hint and fallback is enabled, the driver re-resolves via the LLM and retries once.
3. Click re-resolve
When element.click() fails because the click was intercepted or the element went stale, the driver re-resolves the element's original description and retries.
4. Vendor extension commands
LumaBrowser adds a namespaced command surface at /session/{id}/lumabyte/…:
| Endpoint | Description |
|---|---|
POST /lumabyte/find | Description-only finder. Returns an element handle. |
POST /lumabyte/click | Semantic click by description — optional selector hint for a fast-path attempt. See details. |
POST /lumabyte/dom/snapshot | Bulk page extraction (URL, title, source, optional screenshot) in one call. |
POST /lumabyte/cdp/execute | Raw Chrome DevTools Protocol passthrough via Electron's webContents.debugger. Also available as /goog/cdp/execute. |
POST /session/{id}/lumabyte/click — semantic click with optional selector hint
The click endpoint requires a description and accepts an optional selector hint. If both are provided, the driver tries the selector directly first and only falls through to description-based resolution if the selector fails — giving you the speed of a CSS click with the resilience of natural-language fallback.
| Request body | Behavior |
|---|---|
{ "description": "..." } | Resolve via the resolution pipeline, then click. |
{ "description": "...", "selector": "..." } | Try selector first. On miss, resolve description and click. |
{ "selector": "..." } (no description) | Rejected — description is required on this endpoint. Use the standard WebDriver /element + /element/{id}/click pair for pure-CSS clicks. |
The response body identifies which path won:
POST /session/abc123/lumabyte/click
{ "description": "the add to cart button", "selector": "button.buy-now" }
———
200 OK
{
"value": {
"clicked": true,
"resolvedSelector": "button.buy-now",
"resolvedVia": "selector"
}
}
| Response field | Meaning |
|---|---|
clicked | true when the click dispatched. |
resolvedSelector | The CSS selector actually used for the click — either the hint you passed or the one the resolver produced. |
resolvedVia | "selector" if the hint succeeded on the fast path, "description" if the resolver had to run. |
Session Capabilities
Set these under alwaysMatch or firstMatch when creating a session:
| Capability | Type | Description |
|---|---|---|
lumabyte:llmFallback | boolean or object | true / false, or { enabled, onFindFail, onClickIntercepted, slot }. |
goog:chromeOptions | object | Accepted as pass-through so existing Selenium/ChromeOptions code works without modification. |
After session creation, the driver returns response capabilities that tell the client what's available:
| Response Capability | Shape |
|---|---|
lumabyte:features | { cdpPassthrough, aiDescriptionLocator, domSnapshot } |
lumabyte:llmFallback | The effective, merged fallback config for this session. |
LLM Slot
All Selenium LLM-fallback paths — the ai-description locator, /lumabyte/find, /lumabyte/click, and auto-retry on zero matches — route through the shared core-scope selector-resolver slot. Configure a fast, low-latency model there and every description-based Selenium call benefits automatically. If the slot is unconfigured, the resolver falls back to your global active LLM.
Known v1 Limitations
- Frame switching, shadow-DOM subqueries, async
execute_async_script, and element-scoped screenshot cropping returnunsupported operation. - Keyboard special-key W3C PUA codepoints pass through as literal characters.
POST /session/:id/lumabyte/cdp/executewithPage.captureScreenshotcan hang past 100 s when the target tab is backgrounded. The non-SeleniumGET /api/browser/tabs/:id/screenshotreturns in under a second for a foregrounded tab — prefer it for routine screenshot work and reach for the CDP path only when you genuinely need a CDP-specific capture flag (clip, quality, full page, etc.).
/lumabyte/cdp/execute passthrough unblocks most of the unsupported operations above by letting you issue raw Chrome DevTools Protocol commands. Use it from any language that can POST JSON.
Prefer CDP directly? If your client speaks Chrome DevTools Protocol natively (Puppeteer, Playwright over CDP, chrome-remote-interface), you can skip the WebDriver hop entirely and connect to the Puppeteer Driver on port 9222. Both drivers share the same LLM selector-resolution pipeline, so picking a protocol is a stylistic choice, not a capability tradeoff. See Puppeteer vs Selenium for a side-by-side.
Drive LumaBrowser from any Chrome DevTools Protocol client — Puppeteer, puppeteer-core, chrome-remote-interface, or any tool that speaks CDP over WebSocket — by connecting to the built-in CDP server on port 9222. The puppeteer-driver extension boots a standards-compliant CDP WebSocket server plus the usual /json/* HTTP bootstrap endpoints that puppeteer.connect({ browserURL }) uses for discovery.
Under the hood, the server is a translation layer rather than a from-scratch reimplementation of CDP: it owns the Browser, Target, and custom Lumabyte.* domains, and proxies every other command (Page, Runtime, DOM, Input, Network, Fetch, Emulation, Log, and friends — roughly 290 commands) straight through Electron's webContents.debugger to Chromium. You get the full native protocol surface for free.
The distinctive addition, identical in behavior to the Selenium driver: LLM-powered selector fallback exposed as the custom Lumabyte.* CDP domain. Any Puppeteer session can resolve elements from a natural-language description via Lumabyte.find / Lumabyte.click, and the same orchestrator (deterministic match → LLM attempt → validation → shadow-template regen) runs for both drivers.
Dedicated automation tabs
Unlike the Selenium driver (which drives whichever tab is active), the Puppeteer driver creates its own tabs with kind: 'puppeteer' and only those tabs surface as CDP targets. Automation-owned tabs are rendered in the tab strip with a purple accent and a PUP badge so you can watch the session drive them live. This is a deliberate design choice: webContents.debugger.attach is an exclusive lock, so isolating puppeteer-kind tabs lets the Network Interceptor, AI Chat browsing tools, and other subsystems keep operating on the user's regular tabs without conflict.
Defaults
| Setting | Default | Description |
|---|---|---|
port | 9222 | Chrome's conventional CDP port. Configurable. |
host | 127.0.0.1 | Local-only by default. Set to 0.0.0.0 to expose on the network. |
autostart | false | Off by default. Enable in the Puppeteer settings tab or via REST. |
fallback.defaultEnabled | false | Whether new CDP sessions inherit LLM fallback without an explicit Lumabyte.configureFallback call. |
fallback.onFindFail | true | When fallback is enabled, retry selector misses with the LLM resolver. |
fallback.onClickIntercepted | true | When fallback is enabled, retry intercepted clicks with the LLM resolver. |
fallback.slot | null | Optional named LLM slot. Defaults to the shared selector-resolver slot. |
Starting the Driver
Two ways to start the CDP server:
1. Settings tab. Open Settings → Puppeteer, check “Start automatically when LumaBrowser launches”, then click Save & Start.
2. REST API. Management endpoints live on LumaBrowser's existing API gateway (port 3000), separate from the CDP WebSocket on 9222:
| Endpoint | Description |
|---|---|
GET /api/puppeteer/status | Current server state (running, host, port, browser UUID, active sessions, active targets) |
POST /api/puppeteer/start | Start the CDP server |
POST /api/puppeteer/stop | Stop the CDP server and close every puppeteer-kind tab |
GET /api/puppeteer/settings | Read current settings (host, port, autostart, fallback defaults) |
POST /api/puppeteer/settings | Update settings |
# Start the CDP server
curl -X POST http://localhost:3000/api/puppeteer/start
# Check status
curl http://localhost:3000/api/puppeteer/status
# Bind to all interfaces on a custom port
curl -X POST http://localhost:3000/api/puppeteer/settings \
-H "Content-Type: application/json" \
-d '{"host": "0.0.0.0", "port": 9333}'
CDP bootstrap endpoints (port 9222)
The CDP server also speaks the Chrome-compatible HTTP bootstrap surface that Puppeteer uses to discover WebSocket URLs from just browserURL:
| Endpoint | Description |
|---|---|
GET /json/version | Version metadata plus the webSocketDebuggerUrl Puppeteer attaches to. |
GET /json, /json/list | Per-target descriptors (one entry per puppeteer-kind tab). |
PUT /json/new?<url> | Open a new puppeteer-kind tab and return its target descriptor. |
PUT /json/activate/{id} | Bring the target's tab to the foreground. |
PUT /json/close/{id} | Close the target's tab. |
WebSocket endpoints: ws://host:9222/devtools/browser/{uuid} (root connection, used by puppeteer.connect) and ws://host:9222/devtools/page/{targetId} (page-scoped attach).
First connection — Puppeteer
// npm install puppeteer-core
const puppeteer = require('puppeteer-core');
// LumaBrowser is already running; no bundled Chromium to launch.
const browser = await puppeteer.connect({
browserURL: 'http://127.0.0.1:9222',
});
// Spawns a new purple "PUP" tab inside LumaBrowser.
const page = await browser.newPage();
await page.goto('https://example.com');
const title = await page.title();
console.log(title); // "Example Domain"
await browser.disconnect(); // leaves the tab open; .close() would close it
Lumabyte CDP Domain — five methods
All five route through the same LLM-fallback orchestrator as the Selenium driver. Dispatch them from any CDP session via CDPSession.send('Lumabyte.method', params).
Lumabyte.find — resolve a selector, deterministically first
Accepts a description, a selector, or both. If both are given the server tries the selector against DOM.querySelector first and only invokes the LLM when it misses. Returns the selector that actually resolved, plus which strategy produced it.
| Parameter | Type | Description |
|---|---|---|
description | string, optional | Natural-language description of the element. Required unless selector is supplied. |
selector | string, optional | CSS selector to try deterministically before any LLM call. |
retry | boolean, optional | When false, skip the LLM path entirely. Default true. |
const page = await browser.newPage();
await page.goto('https://example.com');
const cdp = await page.target().createCDPSession();
await cdp.send('Lumabyte.configureFallback', { enabled: true });
const r = await cdp.send('Lumabyte.find', {
description: 'the more information link',
selector: 'a.deprecated-class', // optional hint tried first
});
console.log(r); // { selector: 'a[href*="iana.org"]', strategy: 'description' }
Lumabyte.click — semantic click
Resolves via the same pipeline as Lumabyte.find, then scrolls the element into view with DOM.scrollIntoViewIfNeeded, reads its box model with DOM.getBoxModel, and dispatches a synthetic mouse press/release at the center via Input.dispatchMouseEvent. Returns the winning selector, the strategy that produced it, and the click coordinates.
| Parameter | Type | Description |
|---|---|---|
description | string, optional | Natural-language description. Required unless selector is supplied. |
selector | string, optional | CSS selector tried first. |
button | string, optional | "left" (default), "right", or "middle". |
retry | boolean, optional | When false, skip the LLM path. Default true. |
const cdp = await page.target().createCDPSession();
await cdp.send('Lumabyte.configureFallback', { enabled: true });
const clicked = await cdp.send('Lumabyte.click', {
description: 'the add to cart button',
selector: 'button.buy-now', // fast path; falls through if absent
});
// { clicked: true, selector: 'button.buy-now', strategy: 'selector', x: 412, y: 318 }
Lumabyte.domSnapshot — one round-trip page capture
Collapses DOMSnapshot.captureSnapshot plus optional Accessibility.getFullAXTree and Page.captureScreenshot into one CDP call. Useful for agent loops that want a full page state in a single round-trip instead of orchestrating three separate commands.
| Parameter | Type | Description |
|---|---|---|
includeAxTree | boolean, optional | When true, include the full accessibility tree. Default false. |
includeScreenshot | boolean, optional | When true, include a base64 PNG of the viewport. Default false. |
const cdp = await page.target().createCDPSession();
const { snapshot, axTree, screenshot } = await cdp.send('Lumabyte.domSnapshot', {
includeAxTree: true,
includeScreenshot: true,
});
// snapshot — DOMSnapshot.captureSnapshot result
// axTree — Accessibility.getFullAXTree result (undefined if not requested)
// screenshot — base64 PNG string (undefined if not requested)
Lumabyte.configureFallback — per-session fallback config
CDP has no first-class capability slot (unlike WebDriver's lumabyte:llmFallback), so per-session LLM behavior is configured with this method. Accepts a boolean shortcut or a full config object. Settings persist for the lifetime of the CDP session.
| Parameter | Type | Description |
|---|---|---|
enabled | boolean | Master switch for this session's fallback behavior. |
onFindFail | boolean, optional | Invoke the LLM when Lumabyte.find's selector hint misses. Default true. |
onClickIntercepted | boolean, optional | Invoke the LLM when Lumabyte.click's selector hint misses. Default true. |
slot | string, optional | Named LLM slot override. Falls back to the shared selector-resolver slot when omitted. |
const cdp = await page.target().createCDPSession();
// Full config form
await cdp.send('Lumabyte.configureFallback', {
enabled: true,
onFindFail: true,
onClickIntercepted: true,
slot: null, // null → use the default selector-resolver slot
});
// Boolean shortcut — equivalent to { enabled: true, onFindFail: true, onClickIntercepted: true }
await cdp.send('Lumabyte.configureFallback', true);
Lumabyte.getInfo — fallback capability probe
Reports whether an LLM fallback service is wired on the server side and the current session's effective config. Use it at the start of a session to decide whether Lumabyte.find/Lumabyte.click can safely be called with description-only arguments.
const cdp = await page.target().createCDPSession();
const info = await cdp.send('Lumabyte.getInfo');
// {
// fallbackAvailable: true,
// config: { enabled: false, onFindFail: false, onClickIntercepted: false, slot: null },
// defaults: { defaultEnabled: false, onFindFail: true, onClickIntercepted: true, slot: null }
// }
if (info.fallbackAvailable && !info.config.enabled) {
await cdp.send('Lumabyte.configureFallback', true);
}
LLM Slot
All Puppeteer LLM-fallback paths — Lumabyte.find, Lumabyte.click, and description-based retries triggered by configured fallback — route through the same shared core-scope selector-resolver slot as the Selenium driver. Configure a fast, low-latency model there once and every description-based call from either driver benefits. If the slot is unconfigured, the resolver falls back to your global active LLM.
End-to-end example
Mixing native CDP with Lumabyte.* in a realistic flow:
const puppeteer = require('puppeteer-core');
const browser = await puppeteer.connect({ browserURL: 'http://127.0.0.1:9222' });
const page = await browser.newPage();
await page.goto('https://example.shop/products/coffee-grinder');
const cdp = await page.target().createCDPSession();
await cdp.send('Lumabyte.configureFallback', { enabled: true });
// Semantic click — no brittle CSS needed.
await cdp.send('Lumabyte.click', { description: 'the add-to-cart button' });
await cdp.send('Lumabyte.click', { description: 'the cart icon in the header' });
// One round-trip: DOMSnapshot + AX tree + viewport screenshot.
const snap = await cdp.send('Lumabyte.domSnapshot', {
includeAxTree: true,
includeScreenshot: true,
});
// Native CDP still works — verify via Runtime.evaluate.
const { result } = await cdp.send('Runtime.evaluate', {
expression: 'document.title',
returnByValue: true,
});
console.log(result.value);
await browser.disconnect();
Architecture notes for evaluators
- Only ~10 handwritten handlers. The server implements
Browser.*,Target.*, andLumabyte.*itself; every other command is forwarded throughwebContents.debugger.sendCommand. You get the ~290 commands Chromium implements natively for free, and the translation surface stays small and auditable. - Session model. The server supports flat-mode sessions (
Target.setAutoAttachwithflatten: true), matching modern Puppeteer's default. Multiple sessions can attach to the same target; events fan out to every subscriber. - Browser contexts.
Target.createBrowserContextreturns a context ID that's reused as a persistent partition name (persist:puppeteer-ctx-<id>) when creating targets, so cookies and storage are isolated per context. - Security posture. The CDP server binds to
127.0.0.1by default and is gated by the extension'sautostartsetting. CDP itself has no authentication; do not expose it on an untrusted network.
Related
- Puppeteer vs Selenium — side-by-side comparison of both drivers and when to pick each.
- Selenium Driver — the WebDriver analog of this extension.
- LLM Fallback — shared resolution pipeline both drivers route through.
LumaBrowser is built on a modular extension architecture. All major features (network watcher, template builder, AI chat, timed tasks, WebGPU LLM) are extensions that can be enabled, disabled, or replaced.
Built-in Extensions
| Extension | Description |
|---|---|
notification-interceptor | Captures browser notifications and forwards to webhooks |
network-watcher | CDP-based HTTP traffic monitoring |
ai-chat | LLM-powered sidebar assistant |
template-builder | LLM page analysis and selector generation |
timed-tasks | Scheduled recurring automation |
webgpu-llm | Local GPU inference (no API keys) |
selenium-driver | W3C WebDriver / ChromeDriver-compatible server with AI selector fallback |
puppeteer-driver | Chrome DevTools Protocol server over WebSocket; Puppeteer clients drive dedicated automation tabs with the same AI selector fallback as Selenium |
LLM Providers
Each extension can use a different LLM independently via the slot-based routing system:
- OpenAI-compatible — LM Studio, Ollama, or any OpenAI API endpoint
- Anthropic — Claude API
- WebGPU — Local GPU inference with Qwen 2.5 models (0.5B-7B), no API key
Custom Extensions
LumaBrowser supports a plugin architecture for building custom extensions with their own UI, REST routes, and MCP tools. Extension development documentation will be available soon.
lumabyte.comFree, no-auth-required helper APIs. These are hosted at https://lumabyte.com — they are not part of the LumaBrowser desktop app and will 404 on http://localhost:3000. Use them from any client, no install required. Subject to change, but we'll keep them running as long as possible.
Generates a list of mock person data including names, emails, and phone numbers.
Endpoint
GET /api/generateMockRandomPeople?count={number}
Parameters
- count (optional): Number of people to generate. Default: 10. Max: 1000.
Sample Request
curl https://lumabyte.com/api/generateMockRandomPeople?count=5
Sample Response
[{
"id": 1,
"name": "Ava Smith",
"email": "[email protected]",
"phoneNumber": "1234567890"
}, {
"id": 2,
"name": "Liam Johnson",
"email": "[email protected]",
"phoneNumber": "0987654321"
}]
Takes a URL as input and returns a JSON response with the original URL and a Base64 encoded QR code image.
Endpoint
POST /api/urlToQRCode
Request Payload
{
"url": "https://example.com"
}
Sample Response
{
"url": "https://example.com",
"qrCodeBase64": "data:image/png;base64,iVB..."
}
Demo
Analyzes text and returns metrics: word count, character count, sentence count, reading time, sentiment analysis, and token count (GPT tokenizer).
Endpoint
POST /api/analyzeText
Request Payload
{
"text": "Your text to analyze goes here."
}
Sample Response
{
"wordCount": 5,
"characterCount": 28,
"sentenceCount": 1,
"sentiment": "positive",
"readingTimeMinutes": 0.025,
"tokenCount": 7
}