LumaBrowser Documentation
LumaBrowser is a programmable browser with built-in AI automation, network monitoring, and a full REST API. Download the installer, launch the app, and start automating in minutes.
Installation
Download the installer for your platform from the downloads page. Available for Windows, macOS, and Linux. Run the installer and launch LumaBrowser — the REST API server starts automatically.
Quick Start
Once running, the REST API is available on http://localhost:3000. Try it:
# Check health
curl http://localhost:3000/api/health
# Create a tab
curl -X POST http://localhost:3000/api/browser/tabs \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# Take a screenshot
curl http://localhost:3000/api/browser/tabs/0/screenshot
Configuration
Settings are persisted in SQLite. Key options:
| Setting | Default | Description |
|---|---|---|
core.apiPort | 3000 | REST API listen port |
core.apiEnabled | true | Enable/disable REST server |
core.mcpEnabled | true | Enable/disable MCP server |
Environment variable overrides: API_PORT=8080, API_HOST=0.0.0.0
All endpoints are available at http://localhost:3000 by default. Responses are JSON.
Health & Discovery
| Endpoint | Description |
|---|---|
GET /api/health | Server status, port, loaded extensions |
GET /api/ | List all available routes |
Response Format
Successful responses wrap data in a standard envelope:
{
"data": { ... },
"count": 5,
"success": true
}
Full programmatic control over browser tabs — navigate, click, fill forms, extract content, and take screenshots.
Tab Management
| Endpoint | Description |
|---|---|
GET /api/browser/tabs | List all open tabs |
POST /api/browser/tabs | Create new tab |
DELETE /api/browser/tabs/:id | Close tab |
PATCH /api/browser/tabs/:id | Navigate, refresh, or execute JS |
Create Tab
POST /api/browser/tabs
{
"url": "https://example.com",
"silent": false
}
Page Inspection
| Endpoint | Description |
|---|---|
GET /api/browser/tabs/:id/source | Page HTML (?type=text|clean|full) |
GET /api/browser/tabs/:id/screenshot | PNG screenshot as base64 |
GET /api/browser/tabs/:id/console | Console log entries |
GET /api/browser/tabs/:id/network | Network request log |
GET /api/browser/tabs/:id/element | Element properties (?selector=css&llmFallback=desc) |
GET /api/browser/tabs/:id/table | Extract table data (?selector=css&llmFallback=desc) |
User Interaction
All selector-based endpoints accept an optional llmFallback parameter — a natural language description of the target element. When the CSS selector fails and an LLM is configured, it resolves the element automatically. See LLM Fallback.
| Endpoint | Payload |
|---|---|
POST /tabs/:id/click | { "selector": "button.submit", "llmFallback": "Click the submit button" } |
POST /tabs/:id/fill | { "fields": [{ "selector": "input", "value": "text", "llmFallback": "Email input" }] } |
POST /tabs/:id/wait | { "selector": ".modal", "llmFallback": "The popup modal", "timeout": 5000 } |
POST /tabs/:id/scroll | { "selector": "#pricing", "llmFallback": "The pricing section" } |
POST /tabs/:id/press-key | { "key": "Enter", "selector": "input.search", "llmFallback": "The search box" } |
POST /tabs/:id/dialog | { "action": "accept" } |
Full Workflow Example
# 1. Create a tab
curl -X POST http://localhost:3000/api/browser/tabs \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
# 2. Fill a form (with LLM fallback per field)
curl -X POST http://localhost:3000/api/browser/tabs/0/fill \
-H "Content-Type: application/json" \
-d '{"fields": [
{"selector": "input[name=email]", "value": "[email protected]",
"llmFallback": "The email input field"},
{"selector": "input[name=password]", "value": "secret123",
"llmFallback": "The password input field"}
], "llmFallback": "Fill the login form"}'
# 3. Click submit (with LLM fallback)
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
-H "Content-Type: application/json" \
-d '{"selector": "button[type=submit]",
"llmFallback": "Click the login button"}'
# 4. Wait for result (with LLM fallback)
curl -X POST http://localhost:3000/api/browser/tabs/0/wait \
-H "Content-Type: application/json" \
-d '{"selector": ".success-message", "timeout": 10000,
"llmFallback": "The success confirmation message"}'
# 5. Get page content
curl "http://localhost:3000/api/browser/tabs/0/source?type=text"
Every selector-based browser command accepts an optional llmFallback parameter — a plain-English description of the target element. When the primary CSS selector fails and an LLM provider is configured, LumaBrowser automatically resolves the correct element using AI.
How It Works
- Your command runs with the provided CSS selector
- If the selector fails (element not found) and
llmFallbackis present: - If the Template Builder extension is active, LumaBrowser checks for an existing page template — or generates one automatically
- The LLM receives the page template and your natural language description, then resolves the correct CSS selector
- The command retries with the AI-resolved selector
- On success, the response includes
resolvedSelectorso you can update your scripts
Prerequisites
- A primary LLM must be configured (Settings → LLM Provider)
- Template Builder extension is recommended (enabled by default) for better accuracy
- Works with OpenAI-compatible, Anthropic, or local WebGPU models
Supported Endpoints
| Endpoint | Fallback Behavior |
|---|---|
POST /tabs/:id/click | Resolves click target. Selector becomes optional when llmFallback is provided. |
POST /tabs/:id/fill | Per-field llmFallback on each field object, plus top-level fallback for the whole form. |
POST /tabs/:id/wait | Resolves wait target. Selector becomes optional when llmFallback is provided. |
POST /tabs/:id/scroll | Resolves scroll-to-element target when a selector is involved. |
POST /tabs/:id/press-key | Resolves focus target element before key press. |
GET /tabs/:id/element | Resolves element to inspect. Selector becomes optional when llmFallback is provided. |
GET /tabs/:id/table | Resolves table container selector. |
Example: Click with Fallback
# Selector works → instant click, llmFallback is ignored
# Selector breaks → LLM resolves the right element automatically
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
-H "Content-Type: application/json" \
-d '{
"selector": "button.btn-primary.submit-form",
"llmFallback": "Click the submit button"
}'
# Response on LLM fallback success:
{
"data": { "clicked": true, "resolvedSelector": "form button[type=submit]" },
"success": true,
"message": "Element clicked successfully (resolved by LLM fallback)"
}
Example: Click without Selector (LLM-Only)
# No selector at all — LLM resolves from description alone
curl -X POST http://localhost:3000/api/browser/tabs/0/click \
-H "Content-Type: application/json" \
-d '{
"tabId": 0,
"llmFallback": "Click the Sign In button in the navigation bar"
}'
Example: Form Fill with Per-Field Fallback
curl -X POST http://localhost:3000/api/browser/tabs/0/fill \
-H "Content-Type: application/json" \
-d '{
"fields": [
{"selector": "input#email", "value": "[email protected]",
"llmFallback": "The email address input"},
{"selector": "input#pass", "value": "secret",
"llmFallback": "The password input"},
{"selector": "select#role", "value": "admin",
"llmFallback": "The role dropdown"}
],
"llmFallback": "Fill the registration form"
}'
Template Builder Integration
When the Template Builder extension is active, LLM Fallback becomes significantly more accurate. Before calling the LLM, LumaBrowser checks for a cached page template. If none exists, it generates one automatically. The template's element map (names, types, selectors) is included in the LLM prompt, giving the model precise context about available page elements instead of relying on raw HTML analysis.
MCP Tools
All corresponding MCP tools also accept llmFallback: click, fill_form, wait_for, scroll, press_key, get_element, get_table. This means AI agents using the MCP server can provide their own fallback descriptions for resilient automation.
Connect Claude Desktop, OpenClaw, or any MCP-compatible AI agent to LumaBrowser's built-in MCP server. Give your agents native browser capabilities without writing glue code.
Claude Desktop / OpenClaw Setup
LumaBrowser includes a built-in MCP server. Add it to your Claude Desktop or OpenClaw config (%APPDATA%\Claude\claude_desktop_config.json on Windows, ~/.config/Claude/claude_desktop_config.json on macOS/Linux):
{
"mcpServers": {
"lumabrowser": {
"command": "lumabrowser",
"args": ["--mcp"]
}
}
}
Make sure LumaBrowser is installed and available on your PATH, or use the full path to the executable.
Available MCP Tools
Browser Automation
| Tool | Description |
|---|---|
get_tabs | List all open tabs |
create_tab | Open new tab with URL |
close_tab | Close a tab by ID |
navigate | Navigate tab to URL |
click | Click element by CSS selector (supports llmFallback) |
fill_form | Fill form fields (supports llmFallback) |
press_key | Send keyboard input (supports llmFallback) |
scroll | Scroll page (supports llmFallback) |
get_source | Get page HTML (text/clean/full) |
screenshot | Capture page as PNG |
wait_for | Wait for element to appear (supports llmFallback) |
get_element | Get element properties (supports llmFallback) |
execute_js | Run JavaScript in page context |
Extension Tools
| Tool | Description |
|---|---|
watcher_list | List network watchers |
watcher_add | Create network watcher |
watcher_remove | Delete network watcher |
watcher_toggle | Enable/disable watcher |
get_template | Get cached page template |
generate_template | Generate template via LLM |
task_list | List scheduled tasks |
task_create | Schedule new task |
task_trigger | Run task immediately |
task_delete | Delete task |
ai_chat_run | Execute headless agentic browser task |
Example: Claude Browsing the Web
User: "Go to Hacker News and tell me the top 3 stories."
Claude uses MCP tools:
→ navigate({ tabId: 0, url: "https://news.ycombinator.com" })
→ get_source({ tabId: 0, type: "text" })
→ "Here are the top 3 stories: ..."
Monitor HTTP traffic using Chrome DevTools Protocol (CDP) and forward matched responses to webhooks. No proxy setup or root certificates required.
How It Works
Network Watcher uses CDP to intercept responses at the browser level. You define URL patterns and a webhook destination — when traffic matches, the full request/response is forwarded to your endpoint.
API Endpoints
| Endpoint | Description |
|---|---|
GET /api/watchers | List all watchers |
POST /api/watchers | Create watcher |
GET /api/watchers/:id | Get watcher details |
PATCH /api/watchers/:id | Update watcher |
POST /api/watchers/:id/toggle | Enable/disable |
DELETE /api/watchers/:id | Remove watcher |
GET /api/watchers/stats | Trigger statistics |
POST /api/watchers/test | Test configuration |
Create a Watcher
curl -X POST http://localhost:3000/api/watchers \
-H "Content-Type: application/json" \
-d '{
"urlPattern": "*api.example.com/v1/orders*",
"sendTo": "https://your-backend.com/webhook",
"note": "Capture order API calls",
"method": "POST",
"captureHeaders": true,
"captureBody": true
}'
Webhook Payload
When a matching request is intercepted, this payload is POSTed to your sendTo URL:
{
"watcherId": "watcher_1711632896000_abc",
"note": "Capture order API calls",
"timestamp": "2025-03-28T12:34:56.000Z",
"request": {
"url": "https://api.example.com/v1/orders",
"method": "POST",
"headers": { "content-type": "application/json" }
},
"response": {
"status": 200,
"statusText": "OK",
"mimeType": "application/json",
"headers": { "content-type": "application/json" },
"body": "{\"orderId\": \"12345\", ...}",
"base64Encoded": false
}
}
Watcher Configuration
| Field | Required | Description |
|---|---|---|
urlPattern | Yes | URL pattern with wildcards (*, ?) |
sendTo | Yes | Webhook destination URL |
note | No | Human-readable description |
method | No | HTTP method filter: GET, POST, PUT, DELETE, or * (default) |
captureHeaders | No | Include request/response headers (default: true) |
captureBody | No | Include response body (default: true) |
Uses LLM to analyze pages and generate CSS selector maps. Describe what you want in plain English, and the Template Builder returns reliable selectors — even against randomized class names.
API Endpoints
| Endpoint | Description |
|---|---|
GET /api/templates | List cached templates |
POST /api/templates/generate | Generate new template (LLM analysis, ~20s) |
GET /api/templates/:id | Get cached template |
DELETE /api/templates/:id | Remove cached template |
POST /api/templates/validate | Validate selector against live page |
Generated Template Structure
{
"url": "https://example.com/page",
"pageInfo": {
"title": "Page Title",
"description": "What the page does",
"formFields": ["email", "password"],
"navigationStrategy": "SPA with client-side routing"
},
"elements": [
{
"id": "element_1",
"name": "Submit Button",
"primarySelector": "button.submit",
"fallbackSelectors": ["button[type=submit]", "input[type=submit]"],
"type": "button",
"clickable": true
},
{
"id": "element_2",
"name": "User List",
"type": "repeatingItem",
"itemSelector": "li.user",
"childSelectors": {
"name": "span.name",
"email": "span.email"
}
}
]
}
Templates are cached in SQLite per URL. Subsequent requests for the same page return the cached version instantly.
Schedule recurring LLM-driven browser automation tasks. Each execution runs an agentic loop (up to 10 tool-call iterations) with full access to the browser automation API.
API Endpoints
| Endpoint | Description |
|---|---|
GET /api/timed-tasks | List all tasks |
POST /api/timed-tasks | Create task |
GET /api/timed-tasks/:id | Get task details |
PATCH /api/timed-tasks/:id | Update task |
DELETE /api/timed-tasks/:id | Delete task |
POST /api/timed-tasks/:id/trigger | Run immediately |
POST /api/timed-tasks/:id/enable | Start scheduling |
POST /api/timed-tasks/:id/disable | Stop scheduling |
GET /api/timed-tasks/:id/runs | Execution history |
Create a Scheduled Task
curl -X POST http://localhost:3000/api/timed-tasks \
-H "Content-Type: application/json" \
-d '{
"name": "Check Product Prices",
"requestPrompt": "Navigate to https://store.example.com, find the price of Widget X, and return it as JSON",
"responsePrompt": "Format the result as {\"product\": \"...\", \"price\": \"...\"}",
"repeatInterval": "3600000",
"webhookUrl": "https://my-backend.com/prices",
"enabled": true
}'
Execution Lifecycle
- Task fires at the configured interval
- A browser tab is pre-created
- LLM receives the prompt with available browser tools
- Agentic loop: LLM calls tools, gets results, decides next action (up to 10 iterations)
- Final result is stored and forwarded to the webhook URL
A built-in LLM-powered agent that can see and control the browser. Use it interactively via the sidebar, or programmatically via the REST API and MCP tool. Give it a single natural language prompt and it autonomously navigates, clicks, fills forms, and extracts data — returning a structured result.
Capabilities
- Agentic tool execution loop (up to 15 iterations per run, configurable)
- Full access to all browser automation tools
- Template Builder integration for faster, more accurate element targeting
- Headless REST API for programmatic use
- MCP tool (
ai_chat_run) for agent-to-agent orchestration - Optional screenshot capture of final page state
- Tab lifecycle management (auto-create, reuse existing, auto-close)
- Tool allowlisting for read-only or restricted runs
REST API Endpoint
| Endpoint | Description |
|---|---|
POST /api/ai-chat/run | Execute a headless agentic task |
Request
POST /api/ai-chat/run
{
"prompt": "Log into acme.com/login with [email protected] / demo123,
go to settings, change company name to 'Acme Industries'",
"tabId": 0, // optional — use existing tab
"autoCloseTab": true, // optional, default: true
"includeScreenshot": false, // optional, default: false
"maxIterations": 15, // optional, default: 15
"timeout": 120000, // optional, default: 120000 (ms)
"tools": ["navigate", "click", ...], // optional — restrict available tools
"systemPromptAppend": "Return JSON" // optional — extra instructions
}
Request Parameters
| Field | Required | Default | Description |
|---|---|---|---|
prompt | Yes | — | Natural language task description |
tabId | No | new tab | Use an existing browser tab by ID. If omitted, a new tab is created at about:blank. |
autoCloseTab | No | true | Close the tab after completion. Only closes tabs that were auto-created. |
includeScreenshot | No | false | Capture a base64 PNG screenshot of the final page state. |
maxIterations | No | 15 | Maximum agent loop iterations. Lower values keep costs predictable. |
timeout | No | 120000 | Maximum total execution time in ms. Prevents runaway loops. |
tools | No | all | Restrict which tools the agent can use. e.g., ["navigate", "get_source"] for read-only scraping. |
systemPromptAppend | No | — | Extra instructions appended to the system prompt for this run. |
Response
{
"success": true,
"summary": "Logged in and updated company name to Acme Industries",
"finalResponse": "Done! I logged into acme.com, navigated to settings...",
"tabId": 0,
"iterations": 4,
"durationMs": 18420,
"toolCalls": [
{ "tool": "navigate", "params": {"tabId": 0, "url": "..."}, "durationMs": 1200 },
{ "tool": "fill_form", "params": {"tabId": 0, "fields": [...]}, "durationMs": 340 },
{ "tool": "click", "params": {"tabId": 0, "selector": "..."}, "durationMs": 890 },
{ "tool": "wait_for", "params": {"tabId": 0, "selector": "..."}, "durationMs": 1100 }
],
"screenshot": "data:image/png;base64,..."
}
Response Fields
| Field | Description |
|---|---|
summary | LLM-generated one-line summary of what was accomplished |
finalResponse | The agent's full final response text |
tabId | The browser tab ID used. Use this to continue working with the tab (set autoCloseTab: false). |
iterations | How many agent loop iterations ran |
durationMs | Total wall-clock execution time |
toolCalls | Array of tool calls made: name, params, and timing |
screenshot | Base64 PNG of final page (only if includeScreenshot: true) |
error | Error message if the run failed (e.g., timeout, LLM failure) |
Example: Scrape Data (Read-Only)
curl -X POST http://localhost:3000/api/ai-chat/run \
-H "Content-Type: application/json" \
-d '{
"prompt": "Go to news.ycombinator.com and return the top 5 story titles as JSON",
"tools": ["navigate", "get_source", "wait_for", "scroll"],
"systemPromptAppend": "Return only a JSON array of strings",
"maxIterations": 5
}'
Example: Reuse a Tab
# Step 1: Create a tab and keep it open
curl -X POST http://localhost:3000/api/ai-chat/run \
-H "Content-Type: application/json" \
-d '{
"prompt": "Navigate to acme.com/dashboard and log in with [email protected] / demo123",
"autoCloseTab": false
}'
# → returns { "tabId": 0, ... }
# Step 2: Continue on the same tab
curl -X POST http://localhost:3000/api/ai-chat/run \
-H "Content-Type: application/json" \
-d '{
"prompt": "Now go to the settings page and change the timezone to UTC",
"tabId": 0,
"autoCloseTab": false,
"includeScreenshot": true
}'
MCP Tool
The same functionality is available as an MCP tool: ai_chat_run. It accepts all the same parameters and returns the same result shape. This enables agent-to-agent orchestration — an outer AI agent can delegate complex browser tasks to LumaBrowser's built-in agent.
Available Browser Tools
The agent can use: navigate, create_tab, get_tabs, click, fill_form, press_key, scroll, get_source, screenshot, wait_for, get_element. When Template Builder is enabled, also: get_template, generate_template.
LumaBrowser is built on a modular extension architecture. All major features (network watcher, template builder, AI chat, timed tasks, WebGPU LLM) are extensions that can be enabled, disabled, or replaced.
Built-in Extensions
| Extension | Description |
|---|---|
notification-interceptor | Captures browser notifications and forwards to webhooks |
network-watcher | CDP-based HTTP traffic monitoring |
ai-chat | LLM-powered sidebar assistant |
template-builder | LLM page analysis and selector generation |
timed-tasks | Scheduled recurring automation |
webgpu-llm | Local GPU inference (no API keys) |
LLM Providers
Each extension can use a different LLM independently via the slot-based routing system:
- OpenAI-compatible — LM Studio, Ollama, or any OpenAI API endpoint
- Anthropic — Claude API
- WebGPU — Local GPU inference with Qwen 2.5 models (0.5B-7B), no API key
Custom Extensions
LumaBrowser supports a plugin architecture for building custom extensions with their own UI, REST routes, and MCP tools. Extension development documentation will be available soon.
Free, no-auth-required APIs you can use in your projects. Subject to change, but we'll keep them running as long as possible.
Generates a list of mock person data including names, emails, and phone numbers.
Endpoint
GET /api/generateMockRandomPeople?count={number}
Parameters
- count (optional): Number of people to generate. Default: 10. Max: 1000.
Sample Request
curl https://lumabyte.com/api/generateMockRandomPeople?count=5
Sample Response
[{
"id": 1,
"name": "Ava Smith",
"email": "[email protected]",
"phoneNumber": "1234567890"
}, {
"id": 2,
"name": "Liam Johnson",
"email": "[email protected]",
"phoneNumber": "0987654321"
}]
Takes a URL as input and returns a JSON response with the original URL and a Base64 encoded QR code image.
Endpoint
POST /api/urlToQRCode
Request Payload
{
"url": "https://example.com"
}
Sample Response
{
"url": "https://example.com",
"qrCodeBase64": "data:image/png;base64,iVB..."
}
Demo
Analyzes text and returns metrics: word count, character count, sentence count, reading time, sentiment analysis, and token count (GPT tokenizer).
Endpoint
POST /api/analyzeText
Request Payload
{
"text": "Your text to analyze goes here."
}
Sample Response
{
"wordCount": 5,
"characterCount": 28,
"sentenceCount": 1,
"sentiment": "positive",
"readingTimeMinutes": 0.025,
"tokenCount": 7
}