LumaBrowser Documentation

Getting Started

LumaBrowser is a programmable browser with built-in AI automation, network monitoring, and a full REST API. Download the installer, launch the app, and start automating in minutes.

Installation

Download the installer for your platform from the downloads page. Available for Windows, macOS, and Linux. Run the installer and launch LumaBrowser — the REST API server starts automatically.

Quick Start

Once running, the REST API is available on http://localhost:3000. Try it:

# Check health
curl http://localhost:3000/api/health

# Create a tab
curl -X POST http://localhost:3000/api/browser/tabs \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# Take a screenshot
curl http://localhost:3000/api/browser/tabs/0/screenshot

Configuration

Settings are persisted in SQLite. Key options:

SettingDefaultDescription
core.apiPort3000REST API listen port
core.apiEnabledtrueEnable/disable REST server
core.mcpEnabledtrueEnable/disable MCP server

Environment variable overrides: API_PORT=8080, API_HOST=0.0.0.0

REST API Overview

All endpoints are available at http://localhost:3000 by default. Responses are JSON.

Health & Discovery

EndpointDescription
GET /api/healthServer status, port, loaded extensions
GET /api/List all available routes

Response Format

Successful responses wrap data in a standard envelope:

{
  "data": { ... },
  "count": 5,
  "success": true
}
Browser Automation API

Full programmatic control over browser tabs — navigate, click, fill forms, extract content, and take screenshots.

Tab Management

EndpointDescription
GET /api/browser/tabsList all open tabs
POST /api/browser/tabsCreate new tab
DELETE /api/browser/tabs/:idClose tab
PATCH /api/browser/tabs/:idNavigate, refresh, or execute JS

Create Tab

POST /api/browser/tabs
{
  "url": "https://example.com",
  "silent": false
}

Page Inspection

EndpointDescription
GET /api/browser/tabs/:id/sourcePage HTML (?type=text|clean|full)
GET /api/browser/tabs/:id/screenshotPNG screenshot as base64
GET /api/browser/tabs/:id/consoleConsole log entries
GET /api/browser/tabs/:id/networkNetwork request log
GET /api/browser/tabs/:id/elementElement properties (?selector=css&llmFallback=desc)
GET /api/browser/tabs/:id/tableExtract table data (?selector=css&llmFallback=desc)

User Interaction

All selector-based endpoints accept an optional llmFallback parameter — a natural language description of the target element. When the CSS selector fails and an LLM is configured, it resolves the element automatically. See LLM Fallback.

EndpointPayload
POST /tabs/:id/click{ "selector": "button.submit", "llmFallback": "Click the submit button" }
POST /tabs/:id/fill{ "fields": [{ "selector": "input", "value": "text", "llmFallback": "Email input" }] }
POST /tabs/:id/wait{ "selector": ".modal", "llmFallback": "The popup modal", "timeout": 5000 }
POST /tabs/:id/scroll{ "selector": "#pricing", "llmFallback": "The pricing section" }
POST /tabs/:id/press-key{ "key": "Enter", "selector": "input.search", "llmFallback": "The search box" }
POST /tabs/:id/dialog{ "action": "accept" }

Full Workflow Example

# 1. Create a tab
curl -X POST http://localhost:3000/api/browser/tabs \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}'

# 2. Fill a form (with LLM fallback per field)
curl -X POST http://localhost:3000/api/browser/tabs/0/fill \
  -H "Content-Type: application/json" \
  -d '{"fields": [
    {"selector": "input[name=email]", "value": "[email protected]",
     "llmFallback": "The email input field"},
    {"selector": "input[name=password]", "value": "secret123",
     "llmFallback": "The password input field"}
  ], "llmFallback": "Fill the login form"}'

# 3. Click submit (with LLM fallback)
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
  -H "Content-Type: application/json" \
  -d '{"selector": "button[type=submit]",
       "llmFallback": "Click the login button"}'

# 4. Wait for result (with LLM fallback)
curl -X POST http://localhost:3000/api/browser/tabs/0/wait \
  -H "Content-Type: application/json" \
  -d '{"selector": ".success-message", "timeout": 10000,
       "llmFallback": "The success confirmation message"}'

# 5. Get page content
curl "http://localhost:3000/api/browser/tabs/0/source?type=text"
LLM Fallback

Every selector-based browser command accepts an optional llmFallback parameter — a plain-English description of the target element. When the primary CSS selector fails and an LLM provider is configured, LumaBrowser automatically resolves the correct element using AI.

How It Works

  1. Your command runs with the provided CSS selector
  2. If the selector fails (element not found) and llmFallback is present:
  3. If the Template Builder extension is active, LumaBrowser checks for an existing page template — or generates one automatically
  4. The LLM receives the page template and your natural language description, then resolves the correct CSS selector
  5. The command retries with the AI-resolved selector
  6. On success, the response includes resolvedSelector so you can update your scripts

Prerequisites

  • A primary LLM must be configured (Settings → LLM Provider)
  • Template Builder extension is recommended (enabled by default) for better accuracy
  • Works with OpenAI-compatible, Anthropic, or local WebGPU models

Supported Endpoints

EndpointFallback Behavior
POST /tabs/:id/clickResolves click target. Selector becomes optional when llmFallback is provided.
POST /tabs/:id/fillPer-field llmFallback on each field object, plus top-level fallback for the whole form.
POST /tabs/:id/waitResolves wait target. Selector becomes optional when llmFallback is provided.
POST /tabs/:id/scrollResolves scroll-to-element target when a selector is involved.
POST /tabs/:id/press-keyResolves focus target element before key press.
GET /tabs/:id/elementResolves element to inspect. Selector becomes optional when llmFallback is provided.
GET /tabs/:id/tableResolves table container selector.

Example: Click with Fallback

# Selector works → instant click, llmFallback is ignored
# Selector breaks → LLM resolves the right element automatically
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
  -H "Content-Type: application/json" \
  -d '{
    "selector": "button.btn-primary.submit-form",
    "llmFallback": "Click the submit button"
  }'

# Response on LLM fallback success:
{
  "data": { "clicked": true, "resolvedSelector": "form button[type=submit]" },
  "success": true,
  "message": "Element clicked successfully (resolved by LLM fallback)"
}

Example: Click without Selector (LLM-Only)

# No selector at all — LLM resolves from description alone
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
  -H "Content-Type: application/json" \
  -d '{
    "tabId": 0,
    "llmFallback": "Click the Sign In button in the navigation bar"
  }'

Example: Form Fill with Per-Field Fallback

curl -X POST http://localhost:3000/api/browser/tabs/0/fill \
  -H "Content-Type: application/json" \
  -d '{
    "fields": [
      {"selector": "input#email", "value": "[email protected]",
       "llmFallback": "The email address input"},
      {"selector": "input#pass", "value": "secret",
       "llmFallback": "The password input"},
      {"selector": "select#role", "value": "admin",
       "llmFallback": "The role dropdown"}
    ],
    "llmFallback": "Fill the registration form"
  }'

Template Builder Integration

When the Template Builder extension is active, LLM Fallback becomes significantly more accurate. Before calling the LLM, LumaBrowser checks for a cached page template. If none exists, it generates one automatically. The template's element map (names, types, selectors) is included in the LLM prompt, giving the model precise context about available page elements instead of relying on raw HTML analysis.

MCP Tools

All corresponding MCP tools also accept llmFallback: click, fill_form, wait_for, scroll, press_key, get_element, get_table. This means AI agents using the MCP server can provide their own fallback descriptions for resilient automation.

MCP Server Integration

Connect Claude Desktop, OpenClaw, or any MCP-compatible AI agent to LumaBrowser's built-in MCP server. Give your agents native browser capabilities without writing glue code.

Claude Desktop / OpenClaw Setup

LumaBrowser includes a built-in MCP server. Add it to your Claude Desktop or OpenClaw config (%APPDATA%\Claude\claude_desktop_config.json on Windows, ~/.config/Claude/claude_desktop_config.json on macOS/Linux):

{
  "mcpServers": {
    "lumabrowser": {
      "command": "lumabrowser",
      "args": ["--mcp"]
    }
  }
}

Make sure LumaBrowser is installed and available on your PATH, or use the full path to the executable.

Available MCP Tools

Browser Automation

ToolDescription
get_tabsList all open tabs
create_tabOpen new tab with URL
close_tabClose a tab by ID
navigateNavigate tab to URL
clickClick element by CSS selector (supports llmFallback)
fill_formFill form fields (supports llmFallback)
press_keySend keyboard input (supports llmFallback)
scrollScroll page (supports llmFallback)
get_sourceGet page HTML (text/clean/full)
screenshotCapture page as PNG
wait_forWait for element to appear (supports llmFallback)
get_elementGet element properties (supports llmFallback)
execute_jsRun JavaScript in page context

Extension Tools

ToolDescription
watcher_listList network watchers
watcher_addCreate network watcher
watcher_removeDelete network watcher
watcher_toggleEnable/disable watcher
get_templateGet cached page template
generate_templateGenerate template via LLM
task_listList scheduled tasks
task_createSchedule new task
task_triggerRun task immediately
task_deleteDelete task
ai_chat_runExecute headless agentic browser task

Example: Claude Browsing the Web

User: "Go to Hacker News and tell me the top 3 stories."

Claude uses MCP tools:
  → navigate({ tabId: 0, url: "https://news.ycombinator.com" })
  → get_source({ tabId: 0, type: "text" })
  → "Here are the top 3 stories: ..."
Network Watcher

Monitor HTTP traffic using Chrome DevTools Protocol (CDP) and forward matched responses to webhooks. No proxy setup or root certificates required.

How It Works

Network Watcher uses CDP to intercept responses at the browser level. You define URL patterns and a webhook destination — when traffic matches, the full request/response is forwarded to your endpoint.

API Endpoints

EndpointDescription
GET /api/watchersList all watchers
POST /api/watchersCreate watcher
GET /api/watchers/:idGet watcher details
PATCH /api/watchers/:idUpdate watcher
POST /api/watchers/:id/toggleEnable/disable
DELETE /api/watchers/:idRemove watcher
GET /api/watchers/statsTrigger statistics
POST /api/watchers/testTest configuration

Create a Watcher

curl -X POST http://localhost:3000/api/watchers \
  -H "Content-Type: application/json" \
  -d '{
    "urlPattern": "*api.example.com/v1/orders*",
    "sendTo": "https://your-backend.com/webhook",
    "note": "Capture order API calls",
    "method": "POST",
    "captureHeaders": true,
    "captureBody": true
  }'

Webhook Payload

When a matching request is intercepted, this payload is POSTed to your sendTo URL:

{
  "watcherId": "watcher_1711632896000_abc",
  "note": "Capture order API calls",
  "timestamp": "2025-03-28T12:34:56.000Z",
  "request": {
    "url": "https://api.example.com/v1/orders",
    "method": "POST",
    "headers": { "content-type": "application/json" }
  },
  "response": {
    "status": 200,
    "statusText": "OK",
    "mimeType": "application/json",
    "headers": { "content-type": "application/json" },
    "body": "{\"orderId\": \"12345\", ...}",
    "base64Encoded": false
  }
}

Watcher Configuration

FieldRequiredDescription
urlPatternYesURL pattern with wildcards (*, ?)
sendToYesWebhook destination URL
noteNoHuman-readable description
methodNoHTTP method filter: GET, POST, PUT, DELETE, or * (default)
captureHeadersNoInclude request/response headers (default: true)
captureBodyNoInclude response body (default: true)
Template Builder

Uses LLM to analyze pages and generate CSS selector maps. Describe what you want in plain English, and the Template Builder returns reliable selectors — even against randomized class names.

API Endpoints

EndpointDescription
GET /api/templatesList cached templates
POST /api/templates/generateGenerate new template (LLM analysis, ~20s)
GET /api/templates/:idGet cached template
DELETE /api/templates/:idRemove cached template
POST /api/templates/validateValidate selector against live page

Generated Template Structure

{
  "url": "https://example.com/page",
  "pageInfo": {
    "title": "Page Title",
    "description": "What the page does",
    "formFields": ["email", "password"],
    "navigationStrategy": "SPA with client-side routing"
  },
  "elements": [
    {
      "id": "element_1",
      "name": "Submit Button",
      "primarySelector": "button.submit",
      "fallbackSelectors": ["button[type=submit]", "input[type=submit]"],
      "type": "button",
      "clickable": true
    },
    {
      "id": "element_2",
      "name": "User List",
      "type": "repeatingItem",
      "itemSelector": "li.user",
      "childSelectors": {
        "name": "span.name",
        "email": "span.email"
      }
    }
  ]
}

Templates are cached in SQLite per URL. Subsequent requests for the same page return the cached version instantly.

Timed Tasks

Schedule recurring LLM-driven browser automation tasks. Each execution runs an agentic loop (up to 10 tool-call iterations) with full access to the browser automation API.

API Endpoints

EndpointDescription
GET /api/timed-tasksList all tasks
POST /api/timed-tasksCreate task
GET /api/timed-tasks/:idGet task details
PATCH /api/timed-tasks/:idUpdate task
DELETE /api/timed-tasks/:idDelete task
POST /api/timed-tasks/:id/triggerRun immediately
POST /api/timed-tasks/:id/enableStart scheduling
POST /api/timed-tasks/:id/disableStop scheduling
GET /api/timed-tasks/:id/runsExecution history

Create a Scheduled Task

curl -X POST http://localhost:3000/api/timed-tasks \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Check Product Prices",
    "requestPrompt": "Navigate to https://store.example.com, find the price of Widget X, and return it as JSON",
    "responsePrompt": "Format the result as {\"product\": \"...\", \"price\": \"...\"}",
    "repeatInterval": "3600000",
    "webhookUrl": "https://my-backend.com/prices",
    "enabled": true
  }'

Execution Lifecycle

  1. Task fires at the configured interval
  2. A browser tab is pre-created
  3. LLM receives the prompt with available browser tools
  4. Agentic loop: LLM calls tools, gets results, decides next action (up to 10 iterations)
  5. Final result is stored and forwarded to the webhook URL
AI Chat

A built-in LLM-powered agent that can see and control the browser. Use it interactively via the sidebar, or programmatically via the REST API and MCP tool. Give it a single natural language prompt and it autonomously navigates, clicks, fills forms, and extracts data — returning a structured result.

Capabilities

  • Agentic tool execution loop (up to 15 iterations per run, configurable)
  • Full access to all browser automation tools
  • Template Builder integration for faster, more accurate element targeting
  • Headless REST API for programmatic use
  • MCP tool (ai_chat_run) for agent-to-agent orchestration
  • Optional screenshot capture of final page state
  • Tab lifecycle management (auto-create, reuse existing, auto-close)
  • Tool allowlisting for read-only or restricted runs

REST API Endpoint

EndpointDescription
POST /api/ai-chat/runExecute a headless agentic task

Request

POST /api/ai-chat/run
{
  "prompt": "Log into acme.com/login with [email protected] / demo123,
             go to settings, change company name to 'Acme Industries'",
  "tabId": 0,                        // optional — use existing tab
  "autoCloseTab": true,               // optional, default: true
  "includeScreenshot": false,          // optional, default: false
  "maxIterations": 15,                 // optional, default: 15
  "timeout": 120000,                   // optional, default: 120000 (ms)
  "tools": ["navigate", "click", ...], // optional — restrict available tools
  "systemPromptAppend": "Return JSON"  // optional — extra instructions
}

Request Parameters

FieldRequiredDefaultDescription
promptYesNatural language task description
tabIdNonew tabUse an existing browser tab by ID. If omitted, a new tab is created at about:blank.
autoCloseTabNotrueClose the tab after completion. Only closes tabs that were auto-created.
includeScreenshotNofalseCapture a base64 PNG screenshot of the final page state.
maxIterationsNo15Maximum agent loop iterations. Lower values keep costs predictable.
timeoutNo120000Maximum total execution time in ms. Prevents runaway loops.
toolsNoallRestrict which tools the agent can use. e.g., ["navigate", "get_source"] for read-only scraping.
systemPromptAppendNoExtra instructions appended to the system prompt for this run.

Response

{
  "success": true,
  "summary": "Logged in and updated company name to Acme Industries",
  "finalResponse": "Done! I logged into acme.com, navigated to settings...",
  "tabId": 0,
  "iterations": 4,
  "durationMs": 18420,
  "toolCalls": [
    { "tool": "navigate", "params": {"tabId": 0, "url": "..."}, "durationMs": 1200 },
    { "tool": "fill_form", "params": {"tabId": 0, "fields": [...]}, "durationMs": 340 },
    { "tool": "click", "params": {"tabId": 0, "selector": "..."}, "durationMs": 890 },
    { "tool": "wait_for", "params": {"tabId": 0, "selector": "..."}, "durationMs": 1100 }
  ],
  "screenshot": "data:image/png;base64,..."
}

Response Fields

FieldDescription
summaryLLM-generated one-line summary of what was accomplished
finalResponseThe agent's full final response text
tabIdThe browser tab ID used. Use this to continue working with the tab (set autoCloseTab: false).
iterationsHow many agent loop iterations ran
durationMsTotal wall-clock execution time
toolCallsArray of tool calls made: name, params, and timing
screenshotBase64 PNG of final page (only if includeScreenshot: true)
errorError message if the run failed (e.g., timeout, LLM failure)

Example: Scrape Data (Read-Only)

curl -X POST http://localhost:3000/api/ai-chat/run \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Go to news.ycombinator.com and return the top 5 story titles as JSON",
    "tools": ["navigate", "get_source", "wait_for", "scroll"],
    "systemPromptAppend": "Return only a JSON array of strings",
    "maxIterations": 5
  }'

Example: Reuse a Tab

# Step 1: Create a tab and keep it open
curl -X POST http://localhost:3000/api/ai-chat/run \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Navigate to acme.com/dashboard and log in with [email protected] / demo123",
    "autoCloseTab": false
  }'
# → returns { "tabId": 0, ... }

# Step 2: Continue on the same tab
curl -X POST http://localhost:3000/api/ai-chat/run \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Now go to the settings page and change the timezone to UTC",
    "tabId": 0,
    "autoCloseTab": false,
    "includeScreenshot": true
  }'

MCP Tool

The same functionality is available as an MCP tool: ai_chat_run. It accepts all the same parameters and returns the same result shape. This enables agent-to-agent orchestration — an outer AI agent can delegate complex browser tasks to LumaBrowser's built-in agent.

Available Browser Tools

The agent can use: navigate, create_tab, get_tabs, click, fill_form, press_key, scroll, get_source, screenshot, wait_for, get_element. When Template Builder is enabled, also: get_template, generate_template.

Extension System

LumaBrowser is built on a modular extension architecture. All major features (network watcher, template builder, AI chat, timed tasks, WebGPU LLM) are extensions that can be enabled, disabled, or replaced.

Built-in Extensions

ExtensionDescription
notification-interceptorCaptures browser notifications and forwards to webhooks
network-watcherCDP-based HTTP traffic monitoring
ai-chatLLM-powered sidebar assistant
template-builderLLM page analysis and selector generation
timed-tasksScheduled recurring automation
webgpu-llmLocal GPU inference (no API keys)

LLM Providers

Each extension can use a different LLM independently via the slot-based routing system:

  • OpenAI-compatible — LM Studio, Ollama, or any OpenAI API endpoint
  • Anthropic — Claude API
  • WebGPU — Local GPU inference with Qwen 2.5 models (0.5B-7B), no API key

Custom Extensions

LumaBrowser supports a plugin architecture for building custom extensions with their own UI, REST routes, and MCP tools. Extension development documentation will be available soon.


Free APIs

Free, no-auth-required APIs you can use in your projects. Subject to change, but we'll keep them running as long as possible.

Mock Random People API

Generates a list of mock person data including names, emails, and phone numbers.

Endpoint

GET /api/generateMockRandomPeople?count={number}

Parameters

  • count (optional): Number of people to generate. Default: 10. Max: 1000.

Sample Request

curl https://lumabyte.com/api/generateMockRandomPeople?count=5

Sample Response

[{
    "id": 1,
    "name": "Ava Smith",
    "email": "[email protected]",
    "phoneNumber": "1234567890"
}, {
    "id": 2,
    "name": "Liam Johnson",
    "email": "[email protected]",
    "phoneNumber": "0987654321"
}]
URL to QR Code API

Takes a URL as input and returns a JSON response with the original URL and a Base64 encoded QR code image.

Endpoint

POST /api/urlToQRCode

Request Payload

{
    "url": "https://example.com"
}

Sample Response

{
    "url": "https://example.com",
    "qrCodeBase64": "data:image/png;base64,iVB..."
}

Demo



Text Analysis API

Analyzes text and returns metrics: word count, character count, sentence count, reading time, sentiment analysis, and token count (GPT tokenizer).

Endpoint

POST /api/analyzeText

Request Payload

{
    "text": "Your text to analyze goes here."
}

Sample Response

{
    "wordCount": 5,
    "characterCount": 28,
    "sentenceCount": 1,
    "sentiment": "positive",
    "readingTimeMinutes": 0.025,
    "tokenCount": 7
}

Demo