LumaBrowser Documentation

Getting Started

LumaBrowser is a programmable browser with built-in AI automation, network monitoring, and a full REST API. Download the installer, launch the app, and start automating in minutes.

Installation

Download the installer for your platform from the downloads page. Available for Windows, macOS, and Linux. Run the installer and launch LumaBrowser — the REST API server starts automatically.

Quick Start

Once running, the REST API is available on http://localhost:3000. Try it:

# Check health
curl http://localhost:3000/api/health

# Create a tab
curl -X POST http://localhost:3000/api/browser/tabs \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Take a screenshot
curl http://localhost:3000/api/browser/tabs/0/screenshot

Configuration

Settings are persisted in SQLite. Key options:

SettingDefaultDescription
core.apiPort3000REST API listen port
core.apiEnabledtrueEnable/disable REST server
core.mcpEnabledtrueEnable/disable MCP server

Environment variable overrides: API_PORT=8080, API_HOST=0.0.0.0

REST API Overview

The LumaBrowser REST API ships with the desktop app and runs on http://localhost:3000. When this doc says "the local API" or uses localhost:3000, assume the app is running on the same machine. The "Free APIs" section at the bottom of this page is a separate surface hosted at https://lumabyte.com — those endpoints are not served by the desktop app. See API hosts & what lives where.

Health & Discovery

EndpointDescription
GET /api/healthLiveness probe. Returns status, timestamp, port, mcpEnabled, and the list of extensions currently loaded.
GET /api/Service discovery. Returns the API name, port, mcpEnabled, and routes — a map of core routes and extension base-path mounts. See note below.

Health response (actual shape):

{
  "status": "ok",
  "timestamp": "2026-04-19T12:34:56.000Z",
  "port": 3000,
  "mcpEnabled": true,
  "extensions": ["ai-chat", "network-watcher", "template-builder",
                 "timed-tasks", "selenium-driver", "puppeteer-driver"]
}

The extensions array reflects what's actually loaded in this build — extensions that ship on disk but are disabled (for example webgpu-llm on a machine without a supported GPU) will not appear.

Discovery response (actual shape):

{
  "name": "LumaBrowser API",
  "port": 3000,
  "mcpEnabled": true,
  "routes": {
    "core": ["/api/mcp/tools", "/api/mcp/call"],
    "extensions": {
      "ai-chat":          "/api/ai-chat",
      "network-watcher":  "/api/watchers",
      "template-builder": "/api/templates",
      "timed-tasks":      "/api/timed-tasks",
      "selenium-driver":  "/api/selenium",
      "puppeteer-driver": "/api/puppeteer"
    }
  },
  "aliases": { ... }
}

Discovery lists mount points, not every route. To enumerate individual endpoints under each mount (for example /api/browser/tabs/:id/click), refer to the per-section tables on this page. The MCP routes (/api/mcp/tools, /api/mcp/call) appear under routes.core only when the MCP server is wired.

Response Envelope

Most endpoints wrap their payload in one of two envelopes. Legacy endpoints may omit the wrapper entirely — always check the table below before writing a strongly-typed client.

Canonical envelope (newer / core endpoints):

{
  "success": true,
  "timestamp": "2026-04-19T12:34:56.000Z",
  "data": { ... },         // or sometimes an array
  "message": "Optional human-readable context"
}

Payload key by endpoint. Several older extension endpoints use a resource-named key instead of data. Clients that auto-detect the payload key should consult this table to be safe:

EndpointPayload keyShape
GET /api/browser/tabsdataarray of tab objects
POST /api/browser/tabsdatatab object
POST /api/templates/generatedataflat template object (see Template Builder)
POST /api/templates/validatedata{ valid, found, message? }
POST /api/watcherswatcherwatcher object (root-level, no wrapper)
GET /api/watcherswatchersarray (root-level)
POST /api/timed-taskstasktask object (root-level)
GET /api/timed-taskstasksarray (root-level)
POST /api/timed-tasks/:id/enabletasktask object (root-level)
POST /api/timed-tasks/:id/disabletasktask object (root-level)
POST /api/ai-chat/runfields at rootno wrapper — success, summary, toolCalls, etc. are top-level

ID types

Resource identifiers are not uniformly typed. Strongly-typed clients (C#, Kotlin, Swift, Rust) should model each one explicitly or silent deserialization failures will occur.

ResourceID fieldTypeExample
Browser tabid, tabIdnumber (integer)32
Network watcheridstring"watcher_1776605796008_ze35ugf9e"
Timed taskidstring"task_1776605796275_iw0ooo"
Templateidstring (numeric-looking)"42"
Selenium sessionsessionIdstring (UUID-like)"a1b2c3d4-..."
Puppeteer CDP targettargetIdstring (uppercase hex, no dashes)"4F1B8A09CBEE4C23B0A9DBA1F77DE9F1"
Puppeteer CDP sessionsessionIdstring (uppercase hex, no dashes)"9A2CFE18E7B64A1CAD05B6E4FF1C02E0"

Gotcha: Endpoints that take a tab reference expect integer tabId in the JSON body. POST /api/templates/generate historically tolerated a stringified numeric ID; prefer the integer form to avoid surprises.

API hosts — what lives where

There are two distinct surfaces. Pointing a client at the wrong host produces a confusing 404.

SurfaceHostWhat it serves
LumaBrowser local APIhttp://localhost:3000Everything under /api/browser, /api/mcp, /api/watchers, /api/templates, /api/timed-tasks, /api/ai-chat, /api/selenium, /api/puppeteer. Requires the desktop app to be running.
LumaByte public helpershttps://lumabyte.com/api/generateMockRandomPeople, /api/urlToQRCode, /api/analyzeText, /api/sms/*. Always-on, no desktop app required.
Browser Automation API

Full programmatic control over browser tabs — navigate, click, fill forms, extract content, and take screenshots.

Tab Management

EndpointDescription
GET /api/browser/tabsList all open tabs
POST /api/browser/tabsCreate new tab
DELETE /api/browser/tabs/:idClose tab
PATCH /api/browser/tabs/:idNavigate, refresh, activate, or execute JavaScript (requires action discriminator — see below)
POST /api/browser/tabs/:id/activateForeground a tab — the preferred path for making a tab the visible rendered surface. See Activate a Tab.

Create Tab

When silent: true, the tab is created but not rendered to the user-visible surface. This keeps background automation out of the user's way, but see the silent-tab caveats before using it for screenshot or scroll work.

POST /api/browser/tabs
{
  "url": "https://example.com",
  "silent": false
}

PATCH a Tab — the action discriminator

PATCH /api/browser/tabs/:id requires an action field. A naked {"url": "..."} is rejected with "Action must be one of: navigate, refresh, activate, executeJs". The four supported shapes:

# Navigate an existing tab
PATCH /api/browser/tabs/0
{"action": "navigate", "url": "https://example.com"}

# Refresh
PATCH /api/browser/tabs/0
{"action": "refresh"}

# Foreground (activate) the tab — "focus" is accepted as an alias
PATCH /api/browser/tabs/0
{"action": "activate"}

# Execute JavaScript in the tab
PATCH /api/browser/tabs/0
{"action": "executeJs", "script": "(function(){ return document.title; })()"}

Activate a Tab (make it the visible surface)

Activation foregrounds a tab so it becomes the rendered display surface. This is required before any operation that depends on an actual painted viewport — see silent-tab caveats for the full list. Two equivalent shapes are available:

# Preferred: dedicated endpoint
POST /api/browser/tabs/0/activate

# Or: PATCH with action discriminator (accepts "activate" or "focus" alias)
PATCH /api/browser/tabs/0
{"action": "activate"}

Silent tabs cannot be activated. A tab created with silent: true has no render surface to promote. Calling activate on one returns 400 with {"error": "Silent tabs cannot be activated"}. If you need to foreground work, create the tab with silent: false in the first place.

Reliable screenshot / scroll workflows in multi-tab scripts. Activate the target tab immediately before GET /tabs/:id/screenshot or POST /tabs/:id/scroll. Without this step, operations that require the painted surface can fail on any tab that isn't currently foregrounded.

# Activate, then screenshot — the reliable pattern
curl -X POST http://localhost:3000/api/browser/tabs/2/activate
curl http://localhost:3000/api/browser/tabs/2/screenshot

executeJs: no top-level return. Scripts are evaluated at page top level, where return is a syntax error. A bare "script": "return 2+40" fails with "Script failed to execute". Wrap any return value in an IIFE:

# Wrong
{"action": "executeJs", "script": "return 2+40"}

# Right
{"action": "executeJs", "script": "(function(){ return 2+40; })()"}

# Also fine — last expression is returned
{"action": "executeJs", "script": "2+40"}

Page Inspection

EndpointDescription
GET /api/browser/tabs/:id/sourcePage HTML (?type=text|clean|full)
GET /api/browser/tabs/:id/screenshotPNG screenshot as base64 — requires a rendered surface, activate the target tab first.
GET /api/browser/tabs/:id/consoleConsole log entries
GET /api/browser/tabs/:id/networkNetwork request log
GET /api/browser/tabs/:id/elementElement properties (?selector=css&llmFallback=desc)
GET /api/browser/tabs/:id/tableExtract table data (?selector=css&llmFallback=desc)

User Interaction

All selector-based endpoints accept an optional llmFallback parameter — a natural language description of the target element. When the CSS selector fails and an LLM is configured, it resolves the element automatically. See LLM Fallback.

EndpointPayload
POST /tabs/:id/click{ "selector": "button.submit", "llmFallback": "Click the submit button" }
POST /tabs/:id/fill{ "fields": [{ "selector": "input", "value": "text", "llmFallback": "Email input" }] }
POST /tabs/:id/wait{ "selector": ".modal", "llmFallback": "The popup modal", "timeout": 5000 }
POST /tabs/:id/scroll{ "selector": "#pricing", "llmFallback": "The pricing section" }activate the tab first for reliable scroll behavior.
POST /tabs/:id/press-key{ "key": "Enter", "selector": "input.search", "llmFallback": "The search box" }
POST /tabs/:id/dialog{ "action": "accept" }

Full Workflow Example

# 1. Create a tab
curl -X POST http://localhost:3000/api/browser/tabs \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# 2. Fill a form (with LLM fallback per field)
curl -X POST http://localhost:3000/api/browser/tabs/0/fill \
  -H "Content-Type: application/json" \
  -d '{"fields": [
    {"selector": "input[name=email]", "value": "[email protected]",
     "llmFallback": "The email input field"},
    {"selector": "input[name=password]", "value": "secret123",
     "llmFallback": "The password input field"}
  ], "llmFallback": "Fill the login form"}'

# 3. Click submit (with LLM fallback)
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
  -H "Content-Type: application/json" \
  -d '{"selector": "button[type=submit]",
       "llmFallback": "Click the login button"}'

# 4. Wait for result (with LLM fallback)
curl -X POST http://localhost:3000/api/browser/tabs/0/wait \
  -H "Content-Type: application/json" \
  -d '{"selector": ".success-message", "timeout": 10000,
       "llmFallback": "The success confirmation message"}'

# 5. Get page content
curl "http://localhost:3000/api/browser/tabs/0/source?type=text"

Silent-tab & surface-dependent operations

LumaBrowser is a single-window, single-visible-tab surface. A tab that is backgrounded (another tab is foregrounded) or was created with silent: true has no rendered display surface. A handful of operations require that surface and will fail on a hidden tab:

  • GET /api/browser/tabs/:id/screenshot — fails with "Current display surface not available for capture". Foreground the tab first with POST /api/browser/tabs/:id/activate.
  • POST /api/browser/tabs/:id/scroll — a scroll call returns successfully, but window.scrollY on a backgrounded tab doesn't always progress as a user-driven scroll would. Activate the tab first for deterministic scroll behavior.

For deterministic foregrounding in multi-tab automation, call POST /api/browser/tabs/:id/activate (or PATCH with {"action":"activate"}) immediately before any surface-dependent operation. Silent tabs cannot be activated — if you need to screenshot or scroll a tab, create it with silent: false.

Natural-language selector resolution (LLM Fallback)

Every selector-based browser command accepts an optional llmFallback parameter — a plain-English description of the target element. When a selector fails (or no selector is given at all), LumaBrowser runs your description through a four-stage pipeline that starts deterministic, escalates to the LLM only when needed, validates what the LLM returns, and silently improves future runs by seeding a page template in the background.

The Four-Stage Resolution Pipeline

This pipeline runs whenever a caller invokes any LLM-fallback path — REST llmFallback on click / wait / getElement / fill, the Selenium ai-description locator, /lumabyte/find, /lumabyte/click, the Puppeteer CDP methods Lumabyte.find and Lumabyte.click, and the MCP tools listed below. Every stage has one job, and most requests never reach the LLM at all.

Stage 1 — Deterministic match (0 tokens)

Before any LLM call, the resolver queries the live DOM with a battery of stable attribute strategies derived from your description:

  • [data-testid] — both the slugified description and the raw string
  • [aria-label]
  • [placeholder]
  • [data-role] — slugified description
  • Exact visible-text match on interactable elements

If any strategy produces exactly one interactable element, the resolver uses it immediately. No LLM tokens spent, no latency added. Pages that expose stable semantic attributes resolve at native speed.

Stage 2 — LLM with a live DOM snapshot

If the deterministic stage misses, the LLM is prompted with a compact snapshot of up to 80 interactable elements from the actual current DOM. Each snapshot row carries id, data-*, aria-*, role, name, type, placeholder, disabled, and visible text. Any cached page template is layered on top as additional context.

Implication for dynamically-rendered content. The LLM sees what is in the DOM at the moment of the call. If your target element appears after a delay (lazy hydration, skeleton-swap, modal after click), trigger the delay before invoking wait_for with a description, or allow wait_for to poll until the element mounts. An element that is not in the DOM cannot be resolved.

Stage 3 — Validate and retry

Before the action runs, the LLM's selector is validated against the DOM. If it matches zero elements or more than one, the failure reason is fed back into the prompt as context and the LLM gets one more attempt. Total LLM calls are capped at two per resolution — no runaway costs on a pathological page.

Stage 4 — Async shadow template generation

After any successful resolution (deterministic or LLM), if no page template is cached for the current URL, the resolver fires a background generateForTab. The shared template repository is seeded silently — no explicit user action required, no delay added to the caller's response. The next call on the same URL can skip the LLM entirely.

Why This Is Better

  • Free speed on well-built pages. Pages with stable data-testid, aria-label, placeholder, or data-role attributes resolve in Stage 1 with zero LLM cost and sub-millisecond latency.
  • LLM grounded in current truth. The model sees a snapshot of the real DOM, not a stale dump — so dynamic, client-rendered, and single-page-app content work as long as the target is actually mounted at call time.
  • Validated before acting. A hallucinated or ambiguous selector is rejected and retried before it touches the page — no silent misclicks on the wrong element.
  • Self-warming cache. Every successful run pays forward: the background template generation means routine workflows converge to Stage 1 speed over time.

Prerequisites

  • The selector-resolver LLM slot, or a globally configured primary LLM (Settings → LLM Providers). See LLM slot configuration below.
  • Template Builder extension is recommended (enabled by default) — enables the Stage 4 cache.
  • Works with OpenAI-compatible, Anthropic, or local WebGPU models.

LLM slot: selector-resolver

LumaBrowser registers a dedicated core-scope LLM slot for this pipeline, labelled Selector Resolver (fast, low-latency model recommended). If the slot is not explicitly configured, the resolver falls back to your global active LLM.

PropertyValue
Slot nameselector-resolver
ScopeCore (registered at startup)
FallbackGlobal active LLM if slot is unconfigured
Temperature0 — hardcoded on the slot's call path. Deterministic selector resolution is non-negotiable.
Max tokensRespects whatever you configure on the slot / model — not hardcoded.

Deployment recommendation. Configure this slot with a fast, low-latency model: Claude Haiku, Gemini Flash, GPT-4.1 mini, or a similar speed-tier option. Heavier reasoning models are overkill for selector resolution and measurably slow the action path. The temperature: 0 guarantee means a small model performs well here — there is no reasoning budget to spend.

Configure the slot under Settings → LLM Providers → Selector Resolver.

Supported Endpoints

EndpointFallback Behavior
POST /tabs/:id/clickResolves click target. Selector becomes optional when llmFallback is provided.
POST /tabs/:id/fillPer-field llmFallback on each field object, plus top-level fallback for the whole form.
POST /tabs/:id/waitResolves wait target. Selector becomes optional when llmFallback is provided.
POST /tabs/:id/scrollResolves scroll-to-element target when a selector is involved.
POST /tabs/:id/press-keyResolves focus target element before key press.
GET /tabs/:id/elementResolves element to inspect. Selector becomes optional when llmFallback is provided.
GET /tabs/:id/tableResolves table container selector.

Example: Click with Fallback

# Selector works → instant click, llmFallback is ignored
# Selector breaks → LLM resolves the right element automatically
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
  -H "Content-Type: application/json" \
  -d '{
    "selector": "button.btn-primary.submit-form",
    "llmFallback": "Click the submit button"
  }'

# Response on LLM fallback success:
{
  "data": { "clicked": true, "resolvedSelector": "form button[type=submit]" },
  "success": true,
  "message": "Element clicked successfully (resolved by LLM fallback)"
}

Example: Click without Selector (LLM-Only)

# No selector at all — LLM resolves from description alone
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
  -H "Content-Type: application/json" \
  -d '{
    "tabId": 0,
    "llmFallback": "Click the Sign In button in the navigation bar"
  }'

Example: Form Fill with Per-Field Fallback

curl -X POST http://localhost:3000/api/browser/tabs/0/fill \
  -H "Content-Type: application/json" \
  -d '{
    "fields": [
      {"selector": "input#email", "value": "[email protected]",
       "llmFallback": "The email address input"},
      {"selector": "input#pass", "value": "secret",
       "llmFallback": "The password input"},
      {"selector": "select#role", "value": "admin",
       "llmFallback": "The role dropdown"}
    ],
    "llmFallback": "Fill the registration form"
  }'

Template Builder Integration

When the Template Builder extension is active, cached page templates are layered into the Stage 2 LLM prompt alongside the live DOM snapshot, sharpening match accuracy on pages you've automated before. Successful resolutions also seed the template repository in the background (Stage 4) — no explicit generation call is required. See the Template Builder section for direct template management.

Debugging resolution (operator)

To trace why a specific description failed to resolve, launch LumaBrowser with the environment variable LLM_FALLBACK_DEBUG=1. The resolver writes verbose, stage-by-stage traces to stdout prefixed with [LlmFallback]: which deterministic strategies were tried, what was in the DOM snapshot, what the LLM returned, and why validation accepted or rejected it.

MCP Tools

All corresponding MCP tools also accept llmFallback: click, fill_form, wait_for, scroll, press_key, get_element, get_table. This means AI agents using the MCP server can provide their own fallback descriptions for resilient automation.

MCP Server Integration

Connect Claude Desktop, OpenClaw, or any MCP-compatible AI agent to LumaBrowser's built-in MCP server. Give your agents native browser capabilities without writing glue code.

Claude Desktop / OpenClaw Setup

LumaBrowser includes a built-in MCP server. Add it to your Claude Desktop or OpenClaw config (%APPDATA%\Claude\claude_desktop_config.json on Windows, ~/.config/Claude/claude_desktop_config.json on macOS/Linux):

{
  "mcpServers": {
    "lumabrowser": {
      "command": "lumabrowser",
      "args": ["--mcp"]
    }
  }
}

Make sure LumaBrowser is installed and available on your PATH, or use the full path to the executable.

Available MCP Tools

Browser Automation

ToolDescription
get_tabsList all open tabs
create_tabOpen new tab with URL
close_tabClose a tab by ID
navigateNavigate tab to URL
clickClick element by CSS selector (supports llmFallback)
fill_formFill form fields (supports llmFallback)
press_keySend keyboard input (supports llmFallback)
scrollScroll page (supports llmFallback)
get_sourceGet page HTML (text/clean/full)
screenshotCapture page as PNG
wait_forWait for element to appear (supports llmFallback)
get_elementGet element properties (supports llmFallback)
execute_jsRun JavaScript in page context

Extension Tools

ToolDescription
watcher_listList network watchers
watcher_addCreate network watcher
watcher_removeDelete network watcher
watcher_toggleEnable/disable watcher
get_templateGet cached page template
generate_templateGenerate template via LLM
task_listList scheduled tasks
task_createSchedule new task
task_triggerRun task immediately
task_deleteDelete task
ai_chat_runExecute headless agentic browser task
selenium_driver_statusReport whether the W3C WebDriver server is running, its host/port, and active Selenium session count
selenium_driver_startStart the W3C WebDriver HTTP server (default 127.0.0.1:9515); returns bound port and URL prefix
selenium_driver_stopStop the WebDriver server and release all active Selenium sessions
puppeteer_driver_statusReport whether the CDP WebSocket server is running, its host/port, browser UUID, and active CDP session/target counts
puppeteer_driver_startStart the CDP server (default 127.0.0.1:9222); returns bound port so puppeteer.connect({ browserURL }) can attach
puppeteer_driver_stopStop the CDP server, close every puppeteer-kind tab, and detach all debugger sessions

Example: Claude Browsing the Web

User: "Go to Hacker News and tell me the top 3 stories."

Claude uses MCP tools:
  → navigate({ tabId: 0, url: "https://news.ycombinator.com" })
  → get_source({ tabId: 0, type: "text" })
  → "Here are the top 3 stories: ..."
Network Watcher

Monitor HTTP traffic using Chrome DevTools Protocol (CDP) and forward matched responses to webhooks. No proxy setup or root certificates required.

How It Works

Network Watcher uses CDP to intercept responses at the browser level. You define URL patterns and a webhook destination — when traffic matches, the full request/response is forwarded to your endpoint.

API Endpoints

EndpointDescription
GET /api/watchersList all watchers
POST /api/watchersCreate watcher
GET /api/watchers/:idGet watcher details
PATCH /api/watchers/:idUpdate watcher
POST /api/watchers/:id/toggleEnable/disable
DELETE /api/watchers/:idRemove watcher
GET /api/watchers/statsTrigger statistics
POST /api/watchers/testTest configuration

Create a Watcher

curl -X POST http://localhost:3000/api/watchers \
  -H "Content-Type: application/json" \
  -d '{
    "urlPattern": "*api.example.com/v1/orders*",
    "sendTo": "https://your-backend.com/webhook",
    "note": "Capture order API calls",
    "method": "POST",
    "captureHeaders": true,
    "captureBody": true
  }'

Webhook Payload

When a matching request is intercepted, this payload is POSTed to your sendTo URL. Note that request is a composite envelope: it carries the outbound request metadata and nests the captured response inside it.

{
  "watcherId": "watcher_1711632896000_abc",
  "note":      "Capture order API calls",
  "timestamp": "2026-03-28T12:34:56.000Z",
  "request": {
    "url":       "https://api.example.com/v1/orders",
    "method":    "POST",
    "timestamp": "2026-03-28T12:34:55.980Z",

    "request": {
      "headers": { "content-type": "application/json", "authorization": "Bearer ..." }
    },

    "response": {
      "status":        200,
      "statusText":    "OK",
      "mimeType":      "application/json",
      "headers":       { "content-type": "application/json" },
      "body":          "eyJvcmRlcklkIjoiMTIzNDUifQ==",
      "base64Encoded": true
    }
  }
}

Where to find each field:

  • Request headers: request.request.headers (double-nested — the outer request is the capture envelope, the inner request is the browser's outbound HTTP request).
  • Response status / body / headers: request.response.* (not at the top level).
  • Encoding: bodies from non-text MIME types are base64-encoded; check request.response.base64Encoded before decoding.

Watcher Configuration

FieldRequiredDescription
urlPatternYesURL pattern with wildcards (*, ?)
sendToYesWebhook destination URL
noteNoHuman-readable description
methodNoHTTP method filter: GET, POST, PUT, DELETE, or * (default)
captureHeadersNoInclude request/response headers (default: true)
captureBodyNoInclude response body (default: true)

Toggle a watcher

POST /api/watchers/:id/toggle requires a body — it sets the enabled flag explicitly rather than flipping it. A body-less call fails with a 500. Always supply enabled:

POST /api/watchers/watcher_xxx/toggle
{"enabled": true}   // or false

Updating a watcher (PATCH caveat)

PATCH /api/watchers/:id behaves as a full-row update and will reject partial bodies that omit required columns. For example, a PATCH with only {"note": "..."} fails with "NetworkWatcher: urlPattern is required". Re-send every required field (urlPattern, sendTo) in the update body, or use the toggle endpoint for enable/disable changes.

Template Builder

Uses LLM to analyze pages and generate CSS selector maps. Describe what you want in plain English, and the Template Builder returns reliable selectors — even against randomized class names.

API Endpoints

EndpointDescription
GET /api/templatesList cached templates
GET /api/templates/by-url?url=...Fetch a cached template by URL (prefers exact match, falls back to pattern)
GET /api/templates/:idGet cached template by numeric ID
POST /api/templatesUpsert a template manually (no LLM)
POST /api/templates/generateLLM-powered template generation. Latency is highly variable — see below.
POST /api/templates/repairLLM-powered selector repair for a previously-generated template
POST /api/templates/testRun every selector in the cached template against a live tab and report which ones still match
POST /api/templates/validateCheck one selector against a live tab. No LLM, fast, cheap.
DELETE /api/templates?url=...Remove a cached template

POST /api/templates/generate

Runs the Template Builder LLM pipeline against a live tab and persists the result. This is the slowest, costliest call in the Template Builder surface — use sparingly and let the cache absorb repeat requests.

Request body:

{
  "tabId": 0,                               // required — integer tab ID of an open tab
  "url":   "https://example.com/page"       // optional — recorded on the template
}
  • tabId is required. A request without it fails fast with {"success":false,"error":"tabId is required"}.
  • url is optional. When omitted, the generator uses the tab's current URL.
  • Historically this endpoint tolerated a stringified tab ID. Pass the integer form to match the rest of the API.

Response (flat public shape):

{
  "success": true,
  "timestamp": "2026-04-19T12:34:56.000Z",
  "message": "Template generated and saved successfully",
  "data": {
    "id":   "42",
    "url":  "https://example.com/page",
    "createdAt": "2026-04-19 12:34:56",

    "pageInfo": {
      "title":              "Example Checkout",
      "description":        "Two-step checkout form",
      "formFields":         ["email", "card", "cvc"],
      "navigationStrategy": "template"
    },

    "elements": [
      {
        "id":                "element_1",
        "name":              "Submit Button",
        "type":              "Button",
        "clickable":         true,
        "primarySelector":   "button.submit",
        "fallbackSelectors": ["button[type=submit]"],
        "itemSelector":      null,
        "childSelectors":    null
      },
      {
        "id":                "element_7",
        "name":              "User List",
        "type":              "RepeatingItem",
        "clickable":         false,
        "primarySelector":   "#user-list li.user",
        "fallbackSelectors": [],
        "itemSelector":      "#user-list li.user",
        "childSelectors":    { "name": "span.name", "email": "span.email" }
      }
    ],

    "template": { /* original internal PascalCase shape — for backward compat only */ }
  }
}

Consume the flat shape. data.elements[] is the canonical public contract. Each element is camelCase, fallbackSelectors is always an array of strings (possibly empty), and url / createdAt are now populated. The nested data.template preserves the old PascalCase internal shape solely for backward compatibility — new SDKs should ignore it.

Latency & idempotency

Template generation invokes an LLM, so wall-clock times are bounded by the slowest call in the pipeline:

  • Typical: 15–30 s when the configured provider is a hosted frontier model.
  • Observed worst case: well past 2 minutes with a slow or rate-limited provider. Do not hard-code a 30 s client timeout.
  • Cache: subsequent GET /api/templates/by-url calls for the same URL return the cached record in a few milliseconds.
  • Retry safety: the endpoint is an upsert keyed on URL. Retrying after a timeout is safe; you may get a newer template that overwrites the in-flight one.

POST /api/templates/validate — fast selector check

Answers a single question cheaply: does this CSS selector match anything on this page right now? No LLM, no template, no cache writes — just a scoped document.querySelectorAll against a live tab. Useful for healing workflows that need to decide whether to regenerate a template.

Request body: you must supply selector plus either tabId or url (to locate an open tab by its current URL).

# Validate against an existing tab
POST /api/templates/validate
{
  "selector": "#btn-simple",
  "tabId":    0
}

# Or validate against whichever open tab is currently at the given URL
POST /api/templates/validate
{
  "selector": "button.submit",
  "url":      "https://example.com/checkout"
}

Response:

{
  "success": true,
  "timestamp": "2026-04-19T12:34:56.000Z",
  "message": "Selector found",
  "data": {
    "valid": true,    // bool — selector parses as valid CSS
    "found": true     // bool — at least one element matched (not a count)
  }
}

# Selector parses but matches zero elements
{
  "success": true,
  "message": "Selector did not match",
  "data": { "valid": true, "found": false }
}

# Selector is malformed CSS
{
  "success": true,
  "data": { "valid": false, "found": false,
            "message": "Invalid selector syntax" }
}

Error responses:

  • 400 — missing selector, or neither tabId nor url provided.
  • 404 — no open tab matches the given url, or the given tabId is not open.
  • 503 — browser service not yet initialized (immediately after boot).

Caching

Generated templates are persisted in SQLite keyed on URL. Subsequent generate calls for the same URL overwrite the cached record. Reads via GET /api/templates/by-url return the cached version instantly; use that to avoid paying for another LLM call.

Timed Tasks

Schedule recurring LLM-driven browser automation tasks. Each execution runs an agentic loop (up to 10 tool-call iterations) with full access to the browser automation API.

API Endpoints

EndpointDescription
GET /api/timed-tasksList all tasks
POST /api/timed-tasksCreate task
GET /api/timed-tasks/:idGet task details
PATCH /api/timed-tasks/:idUpdate task — see PATCH semantics before use
DELETE /api/timed-tasks/:idDelete task
POST /api/timed-tasks/:id/triggerRun immediately
POST /api/timed-tasks/:id/enablePreferred way to start scheduling. No body. Flips enabled=1 and restarts the scheduler.
POST /api/timed-tasks/:id/disablePreferred way to stop scheduling. No body. Flips enabled=0 and stops the scheduler.
GET /api/timed-tasks/:id/runsExecution history

Create a Scheduled Task

The request body is camelCase. The response (below) is snake_case and uses 0/1 for enabled. This asymmetry is documented rather than fixed to keep existing SDKs working — model it explicitly in your client.

curl -X POST http://localhost:3000/api/timed-tasks \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Check Product Prices",
    "requestPrompt": "Navigate to https://store.example.com, find the price of Widget X, and return it as JSON",
    "responsePrompt": "Format the result as {\"product\": \"...\", \"price\": \"...\"}",
    "repeatInterval": 3600000,
    "webhookUrl": "https://my-backend.com/prices",
    "enabled": true
  }'

Request vs. response shape

Request (camelCase, enabled is boolean):

{
  "name":           "Check Product Prices",
  "requestPrompt":  "...",
  "responsePrompt": "...",
  "repeatInterval": 3600000,
  "webhookUrl":     "https://my-backend.com/prices",
  "enabled":        true
}

Response (snake_case, enabled is 0/1):

{
  "success": true,
  "task": {
    "id":               "task_1776605796275_iw0ooo",
    "name":             "Check Product Prices",
    "request_prompt":   "...",
    "response_prompt":  "...",
    "repeat_interval":  3600000,
    "webhook_url":      "https://my-backend.com/prices",
    "enabled":          1,
    "last_run":         null,
    "next_run":         null,
    "created_at":       "2026-04-19 12:34:56"
  }
}

PATCH semantics — read this before updating a task

PATCH /api/timed-tasks/:id currently behaves as PUT: it performs a full row update and nulls any column you don't re-send. A partial-looking PATCH fails loudly on required columns:

PATCH /api/timed-tasks/task_xxx
{"name":"new-name"}

→ 400 NOT NULL constraint failed: timed_tasks.request_prompt

For enabling or disabling a task, use the dedicated endpoints:

# Enable — no body, no "are all required fields set?" gotcha
curl -X POST http://localhost:3000/api/timed-tasks/task_xxx/enable

# Disable
curl -X POST http://localhost:3000/api/timed-tasks/task_xxx/disable

# Response for both
{
  "success": true,
  "task":    { /* full snake_case task object */ },
  "message": "Task enabled"    // or "Task disabled"
}

For other edits, re-send the full task body — every NOT NULL column (name, request_prompt, response_prompt, repeat_interval) must be present. The same PATCH-as-PUT behavior applies to PATCH /api/watchers/:id.

Execution Lifecycle

  1. Task fires at the configured interval
  2. A browser tab is pre-created
  3. LLM receives the prompt with available browser tools
  4. Agentic loop: LLM calls tools, gets results, decides next action (up to 10 iterations)
  5. Final result is stored and forwarded to the webhook URL
AI Chat

A built-in LLM-powered agent that can see and control the browser. Use it interactively via the sidebar, or programmatically via the REST API and MCP tool. Give it a single natural language prompt and it autonomously navigates, clicks, fills forms, and extracts data — returning a structured result.

Capabilities

  • Agentic tool execution loop (up to 15 iterations per run, configurable)
  • Full access to all browser automation tools
  • Template Builder integration for faster, more accurate element targeting
  • Headless REST API for programmatic use
  • MCP tool (ai_chat_run) for agent-to-agent orchestration
  • Optional screenshot capture of final page state
  • Tab lifecycle management (auto-create, reuse existing, auto-close)
  • Tool allowlisting for read-only or restricted runs

REST API Endpoint

EndpointDescription
POST /api/ai-chat/runExecute a headless agentic task

Request

POST /api/ai-chat/run
{
  "prompt": "Log into acme.com/login with [email protected] / demo123,
             go to settings, change company name to 'Acme Industries'",
  "tabId": 0,                        // optional — use existing tab
  "autoCloseTab": true,               // optional, default: true
  "includeScreenshot": false,          // optional, default: false
  "maxIterations": 15,                 // optional, default: 15
  "timeout": 120000,                   // optional, default: 120000 (ms)
  "tools": ["navigate", "click", ...], // optional — restrict available tools
  "systemPromptAppend": "Return JSON"  // optional — extra instructions
}

Request Parameters

FieldRequiredDefaultDescription
promptYesNatural language task description
tabIdNonew tabUse an existing browser tab by ID. If omitted, a new tab is created at about:blank.
autoCloseTabNotrueClose the tab after completion. Only closes tabs that were auto-created.
includeScreenshotNofalseCapture a base64 PNG screenshot of the final page state.
maxIterationsNo15Maximum agent loop iterations. Lower values keep costs predictable.
timeoutNo120000Maximum total execution time in ms. Prevents runaway loops.
toolsNoallRestrict which tools the agent can use. e.g., ["navigate", "get_source"] for read-only scraping.
systemPromptAppendNoExtra instructions appended to the system prompt for this run.

Response

{
  "success": true,
  "summary": "Logged in and updated company name to Acme Industries",
  "finalResponse": "Done! I logged into acme.com, navigated to settings...",
  "tabId": 0,
  "iterations": 4,
  "durationMs": 18420,
  "toolCalls": [
    { "tool": "navigate", "params": {"tabId": 0, "url": "..."}, "durationMs": 1200 },
    { "tool": "fill_form", "params": {"tabId": 0, "fields": [...]}, "durationMs": 340 },
    { "tool": "click", "params": {"tabId": 0, "selector": "..."}, "durationMs": 890 },
    { "tool": "wait_for", "params": {"tabId": 0, "selector": "..."}, "durationMs": 1100 }
  ],
  "screenshot": "data:image/png;base64,..."
}

Response Fields

FieldDescription
summaryLLM-generated one-line summary of what was accomplished
finalResponseThe agent's full final response text
tabIdThe browser tab ID used (integer). Use this to continue working with the tab (set autoCloseTab: false).
iterationsHow many agent loop iterations ran
durationMsTotal wall-clock execution time
toolCallsFlat array of tool invocations with name, params, durationMs, success, error. See Tool result shapes for the corresponding payload data.
stepsPer-iteration record that includes each tool's toolResult (the actual data returned — screenshot bytes, element text, extracted HTML, etc.).
screenshotOnly populated when the agent explicitly took a screenshot and the flag was set. See note on includeScreenshot.
errorError message if the run failed (e.g., timeout, LLM failure)

Note on includeScreenshot

includeScreenshot: true authorizes the agent to take a screenshot — it does not force one. If your prompt doesn't imply visual verification, the agent will often finish without ever calling the screenshot tool, and the top-level screenshot field will be absent.

To reliably capture a final screenshot, add an explicit instruction to the prompt:

{
  "prompt": "Log into acme.com, go to the billing page, then TAKE A SCREENSHOT of the final state.",
  "includeScreenshot": true
}

If you only want the bytes and don't care about visual verification during the run, call GET /api/browser/tabs/:id/screenshot directly after the run completes (set autoCloseTab: false to keep the tab open). That endpoint returns in under a second for a foregrounded tab.

Tool result shapes

Each entry in toolCalls[] records the invocation metadata:

{
  "tool":       "screenshot",
  "params":     { "tabId": 23 },
  "durationMs": 66,
  "success":    true,
  "error":      null
}

The actual payload produced by each tool lives under steps[N].toolResult (or equivalently steps[N].toolResult.data). Shapes by tool:

TooltoolResult.data shape
screenshot{ "screenshot": "data:image/png;base64,..." }
get_source{ "url": "...", "title": "...", "source": "..." }
get_element{ "tag": "button", "text": "Sign In", "attributes": { ... }, "boundingRect": { ... } }
click{ "clicked": true, "resolvedSelector"?: "...", "urlChanged"?: true, "newUrl"?: "...", "template"?: { ... } }
fill_form{ "filled": N, "fields": [ { "selector": "...", "value": "..." }, ... ] }
wait_for{ "matched": true, "resolvedSelector"?: "..." } (or matched: false on timeout)
scroll{ "scrolled": true, "scrollY": 420 }
navigate{ "url": "...", "loaded": true }
create_tab{ "tabId": 7, "url": "..." }
get_tabs{ "tabs": [ { "id": 0, "url": "...", "title": "..." } ] }

If you need screenshot bytes that an agent captured mid-run, extract them from steps[].toolResult.data.screenshot rather than the top-level screenshot field.

Click tool schema (what the model sees)

The AI chat agent is instructed that click takes a real CSS selector in selector, and that text is an optional visible-text filter for disambiguating when the selector matches multiple elements. If you're routing your own LLM into LumaBrowser's click tool, align your system prompt to the same shape:

# Single unique match — selector alone
{ "tool": "click", "params": { "tabId": 0, "selector": "#btn-simple" } }

# Multiple matches — use text to pick
{ "tool": "click", "params": { "tabId": 0, "selector": "button", "text": "Sign In" } }

selector must be a CSS selector string — not a description, not the literal word "css".

Example: Scrape Data (Read-Only)

curl -X POST http://localhost:3000/api/ai-chat/run \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Go to news.ycombinator.com and return the top 5 story titles as JSON",
    "tools": ["navigate", "get_source", "wait_for", "scroll"],
    "systemPromptAppend": "Return only a JSON array of strings",
    "maxIterations": 5
  }'

Example: Reuse a Tab

# Step 1: Create a tab and keep it open
curl -X POST http://localhost:3000/api/ai-chat/run \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Navigate to acme.com/dashboard and log in with [email protected] / demo123",
    "autoCloseTab": false
  }'
# → returns { "tabId": 0, ... }

# Step 2: Continue on the same tab
curl -X POST http://localhost:3000/api/ai-chat/run \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Now go to the settings page and change the timezone to UTC",
    "tabId": 0,
    "autoCloseTab": false,
    "includeScreenshot": true
  }'

MCP Tool

The same functionality is available as an MCP tool: ai_chat_run. It accepts all the same parameters and returns the same result shape. This enables agent-to-agent orchestration — an outer AI agent can delegate complex browser tasks to LumaBrowser's built-in agent.

Available Browser Tools

The agent can use: navigate, create_tab, get_tabs, click, fill_form, press_key, scroll, get_source, screenshot, wait_for, get_element. When Template Builder is enabled, also: get_template, generate_template.

Selenium Driver

Drive LumaBrowser from any Selenium client — Python, Java, C#, Ruby, JavaScript — with zero code changes to your existing test suite. The selenium-driver extension embeds a W3C WebDriver Level 2 / ChromeDriver-compatible HTTP server inside LumaBrowser. Point RemoteWebDriver at it and the current browser tabs become your automation target, as if it were Chrome plus ChromeDriver.

The distinctive addition: LLM-powered selector fallback. When a CSS or XPath locator fails to match, or a click is intercepted, the driver can re-resolve the element from a natural-language description and retry automatically. Brittle selectors stop breaking builds.

Defaults

SettingDefaultDescription
port9515Same port as ChromeDriver. Configurable.
host127.0.0.1Local-only by default. Set to 0.0.0.0 to expose on the network.
autostartfalseOff by default. Enable in the Selenium settings tab or via REST.
urlPrefix / prefixempty stringOptional path prefix (for example /wd/hub) prepended to WebDriver routes. Default is no prefix — the server listens directly at http://host:port/session. Point your Selenium client at the bare http://host:port URL (not http://host:port/wd/hub) unless you've explicitly configured a prefix.

Field naming: The settings endpoint accepts urlPrefix on write; the status endpoint returns the same value as prefix. Treat them as the same field under two names.

Starting the Driver

Two ways to start the WebDriver server:

1. Settings tab. Open Settings → Selenium, check “Start automatically when LumaBrowser launches”, then click Save & Start.

2. REST API. Use LumaBrowser's existing API gateway on port 3000:

EndpointDescription
GET /api/selenium/statusCurrent server state (running, host, port, urlPrefix)
POST /api/selenium/startStart the WebDriver server
POST /api/selenium/stopStop the WebDriver server
GET /api/selenium/settingsRead current settings
POST /api/selenium/settingsUpdate settings (host, port, autostart, urlPrefix)
# Start the WebDriver server
curl -X POST http://localhost:3000/api/selenium/start

# Check status
curl http://localhost:3000/api/selenium/status

# Change port to 4444 and expose on the network
curl -X POST http://localhost:3000/api/selenium/settings \
  -H "Content-Type: application/json" \
  -d '{"host": "0.0.0.0", "port": 4444}'

Response shapes: /start vs /status

These two endpoints return different shapes today. If you need the full runtime description after a start, call /status after /start.

# POST /api/selenium/start — minimal
{
  "success": true,
  "already": false,   // true if the server was already running
  "port":    9515
}

# GET /api/selenium/status — complete
{
  "running":  true,
  "host":     "127.0.0.1",
  "port":     9515,
  "prefix":   "",          // urlPrefix as configured; empty by default
  "sessions": 0,           // active Selenium sessions
  "settings": { "host": "127.0.0.1", "port": 9515,
                "urlPrefix": "", "autostart": false, ... }
}

Construct the server root as http://{host}:{port}{prefix}. With default settings, that's http://127.0.0.1:9515 with no trailing path.

Python Example

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

opts = Options()
# Opt into LumaBrowser's LLM fallback for this session
opts.set_capability("lumabyte:llmFallback", {
    "enabled": True,
    "onFindFail": True,
    "onClickIntercepted": True,
})

driver = webdriver.Remote("http://127.0.0.1:9515", options=opts)
driver.get("https://example.com")

# Standard Selenium — works exactly as you'd expect
title = driver.find_element(By.CSS_SELECTOR, "h1").text

# LumaBrowser extension: find an element by natural-language description
el = driver.execute(
    "lumabyte:find",
    {"description": "the main sign-up button near the top"},
)

driver.quit()

LLM Fallback — Four Override Points

The fallback system intercepts WebDriver at four points. Each is independently controllable via session capabilities.

1. Custom locator strategy: "ai-description"

Send a Find Element request whose using is "ai-description". No CSS needed — the LLM resolves the selector from your description.

POST /session/{id}/element
{ "using": "ai-description", "value": "the blue submit button" }

2. Auto-retry on zero matches

When a standard CSS or XPath selector matches nothing and the request included a lumabyte:description hint and fallback is enabled, the driver re-resolves via the LLM and retries once.

3. Click re-resolve

When element.click() fails because the click was intercepted or the element went stale, the driver re-resolves the element's original description and retries.

4. Vendor extension commands

LumaBrowser adds a namespaced command surface at /session/{id}/lumabyte/…:

EndpointDescription
POST /lumabyte/findDescription-only finder. Returns an element handle.
POST /lumabyte/clickSemantic click by description — optional selector hint for a fast-path attempt. See details.
POST /lumabyte/dom/snapshotBulk page extraction (URL, title, source, optional screenshot) in one call.
POST /lumabyte/cdp/executeRaw Chrome DevTools Protocol passthrough via Electron's webContents.debugger. Also available as /goog/cdp/execute.

POST /session/{id}/lumabyte/click — semantic click with optional selector hint

The click endpoint requires a description and accepts an optional selector hint. If both are provided, the driver tries the selector directly first and only falls through to description-based resolution if the selector fails — giving you the speed of a CSS click with the resilience of natural-language fallback.

Request bodyBehavior
{ "description": "..." }Resolve via the resolution pipeline, then click.
{ "description": "...", "selector": "..." }Try selector first. On miss, resolve description and click.
{ "selector": "..." } (no description)Rejected — description is required on this endpoint. Use the standard WebDriver /element + /element/{id}/click pair for pure-CSS clicks.

The response body identifies which path won:

POST /session/abc123/lumabyte/click
{ "description": "the add to cart button", "selector": "button.buy-now" }

———

200 OK
{
  "value": {
    "clicked": true,
    "resolvedSelector": "button.buy-now",
    "resolvedVia": "selector"
  }
}
Response fieldMeaning
clickedtrue when the click dispatched.
resolvedSelectorThe CSS selector actually used for the click — either the hint you passed or the one the resolver produced.
resolvedVia"selector" if the hint succeeded on the fast path, "description" if the resolver had to run.

Session Capabilities

Set these under alwaysMatch or firstMatch when creating a session:

CapabilityTypeDescription
lumabyte:llmFallbackboolean or objecttrue / false, or { enabled, onFindFail, onClickIntercepted, slot }.
goog:chromeOptionsobjectAccepted as pass-through so existing Selenium/ChromeOptions code works without modification.

After session creation, the driver returns response capabilities that tell the client what's available:

Response CapabilityShape
lumabyte:features{ cdpPassthrough, aiDescriptionLocator, domSnapshot }
lumabyte:llmFallbackThe effective, merged fallback config for this session.

LLM Slot

All Selenium LLM-fallback paths — the ai-description locator, /lumabyte/find, /lumabyte/click, and auto-retry on zero matches — route through the shared core-scope selector-resolver slot. Configure a fast, low-latency model there and every description-based Selenium call benefits automatically. If the slot is unconfigured, the resolver falls back to your global active LLM.

Known v1 Limitations

  • Frame switching, shadow-DOM subqueries, async execute_async_script, and element-scoped screenshot cropping return unsupported operation.
  • Keyboard special-key W3C PUA codepoints pass through as literal characters.
  • POST /session/:id/lumabyte/cdp/execute with Page.captureScreenshot can hang past 100 s when the target tab is backgrounded. The non-Selenium GET /api/browser/tabs/:id/screenshot returns in under a second for a foregrounded tab — prefer it for routine screenshot work and reach for the CDP path only when you genuinely need a CDP-specific capture flag (clip, quality, full page, etc.).
Workaround: The /lumabyte/cdp/execute passthrough unblocks most of the unsupported operations above by letting you issue raw Chrome DevTools Protocol commands. Use it from any language that can POST JSON.

Prefer CDP directly? If your client speaks Chrome DevTools Protocol natively (Puppeteer, Playwright over CDP, chrome-remote-interface), you can skip the WebDriver hop entirely and connect to the Puppeteer Driver on port 9222. Both drivers share the same LLM selector-resolution pipeline, so picking a protocol is a stylistic choice, not a capability tradeoff. See Puppeteer vs Selenium for a side-by-side.

Puppeteer Driver

Drive LumaBrowser from any Chrome DevTools Protocol client — Puppeteer, puppeteer-core, chrome-remote-interface, or any tool that speaks CDP over WebSocket — by connecting to the built-in CDP server on port 9222. The puppeteer-driver extension boots a standards-compliant CDP WebSocket server plus the usual /json/* HTTP bootstrap endpoints that puppeteer.connect({ browserURL }) uses for discovery.

Under the hood, the server is a translation layer rather than a from-scratch reimplementation of CDP: it owns the Browser, Target, and custom Lumabyte.* domains, and proxies every other command (Page, Runtime, DOM, Input, Network, Fetch, Emulation, Log, and friends — roughly 290 commands) straight through Electron's webContents.debugger to Chromium. You get the full native protocol surface for free.

The distinctive addition, identical in behavior to the Selenium driver: LLM-powered selector fallback exposed as the custom Lumabyte.* CDP domain. Any Puppeteer session can resolve elements from a natural-language description via Lumabyte.find / Lumabyte.click, and the same orchestrator (deterministic match → LLM attempt → validation → shadow-template regen) runs for both drivers.

Dedicated automation tabs

Unlike the Selenium driver (which drives whichever tab is active), the Puppeteer driver creates its own tabs with kind: 'puppeteer' and only those tabs surface as CDP targets. Automation-owned tabs are rendered in the tab strip with a purple accent and a PUP badge so you can watch the session drive them live. This is a deliberate design choice: webContents.debugger.attach is an exclusive lock, so isolating puppeteer-kind tabs lets the Network Interceptor, AI Chat browsing tools, and other subsystems keep operating on the user's regular tabs without conflict.

Defaults

SettingDefaultDescription
port9222Chrome's conventional CDP port. Configurable.
host127.0.0.1Local-only by default. Set to 0.0.0.0 to expose on the network.
autostartfalseOff by default. Enable in the Puppeteer settings tab or via REST.
fallback.defaultEnabledfalseWhether new CDP sessions inherit LLM fallback without an explicit Lumabyte.configureFallback call.
fallback.onFindFailtrueWhen fallback is enabled, retry selector misses with the LLM resolver.
fallback.onClickInterceptedtrueWhen fallback is enabled, retry intercepted clicks with the LLM resolver.
fallback.slotnullOptional named LLM slot. Defaults to the shared selector-resolver slot.

Starting the Driver

Two ways to start the CDP server:

1. Settings tab. Open Settings → Puppeteer, check “Start automatically when LumaBrowser launches”, then click Save & Start.

2. REST API. Management endpoints live on LumaBrowser's existing API gateway (port 3000), separate from the CDP WebSocket on 9222:

EndpointDescription
GET /api/puppeteer/statusCurrent server state (running, host, port, browser UUID, active sessions, active targets)
POST /api/puppeteer/startStart the CDP server
POST /api/puppeteer/stopStop the CDP server and close every puppeteer-kind tab
GET /api/puppeteer/settingsRead current settings (host, port, autostart, fallback defaults)
POST /api/puppeteer/settingsUpdate settings
# Start the CDP server
curl -X POST http://localhost:3000/api/puppeteer/start

# Check status
curl http://localhost:3000/api/puppeteer/status

# Bind to all interfaces on a custom port
curl -X POST http://localhost:3000/api/puppeteer/settings \
  -H "Content-Type: application/json" \
  -d '{"host": "0.0.0.0", "port": 9333}'

CDP bootstrap endpoints (port 9222)

The CDP server also speaks the Chrome-compatible HTTP bootstrap surface that Puppeteer uses to discover WebSocket URLs from just browserURL:

EndpointDescription
GET /json/versionVersion metadata plus the webSocketDebuggerUrl Puppeteer attaches to.
GET /json, /json/listPer-target descriptors (one entry per puppeteer-kind tab).
PUT /json/new?<url>Open a new puppeteer-kind tab and return its target descriptor.
PUT /json/activate/{id}Bring the target's tab to the foreground.
PUT /json/close/{id}Close the target's tab.

WebSocket endpoints: ws://host:9222/devtools/browser/{uuid} (root connection, used by puppeteer.connect) and ws://host:9222/devtools/page/{targetId} (page-scoped attach).

First connection — Puppeteer

// npm install puppeteer-core
const puppeteer = require('puppeteer-core');

// LumaBrowser is already running; no bundled Chromium to launch.
const browser = await puppeteer.connect({
  browserURL: 'http://127.0.0.1:9222',
});

// Spawns a new purple "PUP" tab inside LumaBrowser.
const page = await browser.newPage();
await page.goto('https://example.com');

const title = await page.title();
console.log(title); // "Example Domain"

await browser.disconnect(); // leaves the tab open; .close() would close it

Lumabyte CDP Domain — five methods

All five route through the same LLM-fallback orchestrator as the Selenium driver. Dispatch them from any CDP session via CDPSession.send('Lumabyte.method', params).

Lumabyte.find — resolve a selector, deterministically first

Accepts a description, a selector, or both. If both are given the server tries the selector against DOM.querySelector first and only invokes the LLM when it misses. Returns the selector that actually resolved, plus which strategy produced it.

ParameterTypeDescription
descriptionstring, optionalNatural-language description of the element. Required unless selector is supplied.
selectorstring, optionalCSS selector to try deterministically before any LLM call.
retryboolean, optionalWhen false, skip the LLM path entirely. Default true.
const page = await browser.newPage();
await page.goto('https://example.com');

const cdp = await page.target().createCDPSession();
await cdp.send('Lumabyte.configureFallback', { enabled: true });

const r = await cdp.send('Lumabyte.find', {
  description: 'the more information link',
  selector:    'a.deprecated-class',  // optional hint tried first
});

console.log(r); // { selector: 'a[href*="iana.org"]', strategy: 'description' }

Lumabyte.click — semantic click

Resolves via the same pipeline as Lumabyte.find, then scrolls the element into view with DOM.scrollIntoViewIfNeeded, reads its box model with DOM.getBoxModel, and dispatches a synthetic mouse press/release at the center via Input.dispatchMouseEvent. Returns the winning selector, the strategy that produced it, and the click coordinates.

ParameterTypeDescription
descriptionstring, optionalNatural-language description. Required unless selector is supplied.
selectorstring, optionalCSS selector tried first.
buttonstring, optional"left" (default), "right", or "middle".
retryboolean, optionalWhen false, skip the LLM path. Default true.
const cdp = await page.target().createCDPSession();
await cdp.send('Lumabyte.configureFallback', { enabled: true });

const clicked = await cdp.send('Lumabyte.click', {
  description: 'the add to cart button',
  selector:    'button.buy-now',  // fast path; falls through if absent
});
// { clicked: true, selector: 'button.buy-now', strategy: 'selector', x: 412, y: 318 }

Lumabyte.domSnapshot — one round-trip page capture

Collapses DOMSnapshot.captureSnapshot plus optional Accessibility.getFullAXTree and Page.captureScreenshot into one CDP call. Useful for agent loops that want a full page state in a single round-trip instead of orchestrating three separate commands.

ParameterTypeDescription
includeAxTreeboolean, optionalWhen true, include the full accessibility tree. Default false.
includeScreenshotboolean, optionalWhen true, include a base64 PNG of the viewport. Default false.
const cdp = await page.target().createCDPSession();

const { snapshot, axTree, screenshot } = await cdp.send('Lumabyte.domSnapshot', {
  includeAxTree:     true,
  includeScreenshot: true,
});

// snapshot   — DOMSnapshot.captureSnapshot result
// axTree     — Accessibility.getFullAXTree result (undefined if not requested)
// screenshot — base64 PNG string (undefined if not requested)

Lumabyte.configureFallback — per-session fallback config

CDP has no first-class capability slot (unlike WebDriver's lumabyte:llmFallback), so per-session LLM behavior is configured with this method. Accepts a boolean shortcut or a full config object. Settings persist for the lifetime of the CDP session.

ParameterTypeDescription
enabledbooleanMaster switch for this session's fallback behavior.
onFindFailboolean, optionalInvoke the LLM when Lumabyte.find's selector hint misses. Default true.
onClickInterceptedboolean, optionalInvoke the LLM when Lumabyte.click's selector hint misses. Default true.
slotstring, optionalNamed LLM slot override. Falls back to the shared selector-resolver slot when omitted.
const cdp = await page.target().createCDPSession();

// Full config form
await cdp.send('Lumabyte.configureFallback', {
  enabled:            true,
  onFindFail:         true,
  onClickIntercepted: true,
  slot:               null,  // null → use the default selector-resolver slot
});

// Boolean shortcut — equivalent to { enabled: true, onFindFail: true, onClickIntercepted: true }
await cdp.send('Lumabyte.configureFallback', true);

Lumabyte.getInfo — fallback capability probe

Reports whether an LLM fallback service is wired on the server side and the current session's effective config. Use it at the start of a session to decide whether Lumabyte.find/Lumabyte.click can safely be called with description-only arguments.

const cdp = await page.target().createCDPSession();
const info = await cdp.send('Lumabyte.getInfo');

// {
//   fallbackAvailable: true,
//   config:   { enabled: false, onFindFail: false, onClickIntercepted: false, slot: null },
//   defaults: { defaultEnabled: false, onFindFail: true, onClickIntercepted: true, slot: null }
// }

if (info.fallbackAvailable && !info.config.enabled) {
  await cdp.send('Lumabyte.configureFallback', true);
}

LLM Slot

All Puppeteer LLM-fallback paths — Lumabyte.find, Lumabyte.click, and description-based retries triggered by configured fallback — route through the same shared core-scope selector-resolver slot as the Selenium driver. Configure a fast, low-latency model there once and every description-based call from either driver benefits. If the slot is unconfigured, the resolver falls back to your global active LLM.

End-to-end example

Mixing native CDP with Lumabyte.* in a realistic flow:

const puppeteer = require('puppeteer-core');

const browser = await puppeteer.connect({ browserURL: 'http://127.0.0.1:9222' });
const page    = await browser.newPage();
await page.goto('https://example.shop/products/coffee-grinder');

const cdp = await page.target().createCDPSession();
await cdp.send('Lumabyte.configureFallback', { enabled: true });

// Semantic click — no brittle CSS needed.
await cdp.send('Lumabyte.click', { description: 'the add-to-cart button' });
await cdp.send('Lumabyte.click', { description: 'the cart icon in the header' });

// One round-trip: DOMSnapshot + AX tree + viewport screenshot.
const snap = await cdp.send('Lumabyte.domSnapshot', {
  includeAxTree:     true,
  includeScreenshot: true,
});

// Native CDP still works — verify via Runtime.evaluate.
const { result } = await cdp.send('Runtime.evaluate', {
  expression: 'document.title',
  returnByValue: true,
});
console.log(result.value);

await browser.disconnect();

Architecture notes for evaluators

  • Only ~10 handwritten handlers. The server implements Browser.*, Target.*, and Lumabyte.* itself; every other command is forwarded through webContents.debugger.sendCommand. You get the ~290 commands Chromium implements natively for free, and the translation surface stays small and auditable.
  • Session model. The server supports flat-mode sessions (Target.setAutoAttach with flatten: true), matching modern Puppeteer's default. Multiple sessions can attach to the same target; events fan out to every subscriber.
  • Browser contexts. Target.createBrowserContext returns a context ID that's reused as a persistent partition name (persist:puppeteer-ctx-<id>) when creating targets, so cookies and storage are isolated per context.
  • Security posture. The CDP server binds to 127.0.0.1 by default and is gated by the extension's autostart setting. CDP itself has no authentication; do not expose it on an untrusted network.

Related

Extension System

LumaBrowser is built on a modular extension architecture. All major features (network watcher, template builder, AI chat, timed tasks, WebGPU LLM) are extensions that can be enabled, disabled, or replaced.

Built-in Extensions

ExtensionDescription
notification-interceptorCaptures browser notifications and forwards to webhooks
network-watcherCDP-based HTTP traffic monitoring
ai-chatLLM-powered sidebar assistant
template-builderLLM page analysis and selector generation
timed-tasksScheduled recurring automation
webgpu-llmLocal GPU inference (no API keys)
selenium-driverW3C WebDriver / ChromeDriver-compatible server with AI selector fallback
puppeteer-driverChrome DevTools Protocol server over WebSocket; Puppeteer clients drive dedicated automation tabs with the same AI selector fallback as Selenium

LLM Providers

Each extension can use a different LLM independently via the slot-based routing system:

  • OpenAI-compatible — LM Studio, Ollama, or any OpenAI API endpoint
  • Anthropic — Claude API
  • WebGPU — Local GPU inference with Qwen 2.5 models (0.5B-7B), no API key

Custom Extensions

LumaBrowser supports a plugin architecture for building custom extensions with their own UI, REST routes, and MCP tools. Extension development documentation will be available soon.


Free APIs — hosted on lumabyte.com

Free, no-auth-required helper APIs. These are hosted at https://lumabyte.com — they are not part of the LumaBrowser desktop app and will 404 on http://localhost:3000. Use them from any client, no install required. Subject to change, but we'll keep them running as long as possible.

Mock Random People API

Generates a list of mock person data including names, emails, and phone numbers.

Endpoint

GET /api/generateMockRandomPeople?count={number}

Parameters

  • count (optional): Number of people to generate. Default: 10. Max: 1000.

Sample Request

curl https://lumabyte.com/api/generateMockRandomPeople?count=5

Sample Response

[{
    "id": 1,
    "name": "Ava Smith",
    "email": "[email protected]",
    "phoneNumber": "1234567890"
}, {
    "id": 2,
    "name": "Liam Johnson",
    "email": "[email protected]",
    "phoneNumber": "0987654321"
}]
URL to QR Code API

Takes a URL as input and returns a JSON response with the original URL and a Base64 encoded QR code image.

Endpoint

POST /api/urlToQRCode

Request Payload

{
    "url": "https://example.com"
}

Sample Response

{
    "url": "https://example.com",
    "qrCodeBase64": "data:image/png;base64,iVB..."
}

Demo



Text Analysis API

Analyzes text and returns metrics: word count, character count, sentence count, reading time, sentiment analysis, and token count (GPT tokenizer).

Endpoint

POST /api/analyzeText

Request Payload

{
    "text": "Your text to analyze goes here."
}

Sample Response

{
    "wordCount": 5,
    "characterCount": 28,
    "sentenceCount": 1,
    "sentiment": "positive",
    "readingTimeMinutes": 0.025,
    "tokenCount": 7
}

Demo