Skip to content

Commit c95ac3b

Browse files
waleedlatif1claude
andauthored
improvement(browser-use,stagehand): expose live session URLs (#4314)
* improvement(browser-use,stagehand): expose live session URLs and align with latest API specs - Browser Use: switch to v2 camelCase schema, fetch live URL from sessions endpoint, add startUrl/maxSteps/allowedDomains/vision/flashMode/thinking/systemPromptExtension/structuredOutput/metadata params, surface liveUrl/shareUrl/sessionId outputs - Stagehand: fetch Browserbase debug URL, add mode/maxSteps params, surface liveViewUrl/sessionId outputs, bump @browserbasehq/stagehand to ^3.2.1, update to claude-sonnet-4-6 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(browser-use): respect API default for highlightElements Only send highlightElements when user explicitly toggles it; previously defaulted to true which silently overrode the v2 API default of false. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(browser-use,stagehand): address PR review feedback - Browser Use: fetch liveUrl during polling once sessionId is known, instead of immediately after task creation. Handles tasks started without profile_id (where sessionId isn't returned in create response) and ensures session is active before fetching. - Stagehand: coerce empty/whitespace maxSteps strings to undefined so they're dropped from the request body instead of failing zod validation as ''. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(stagehand): preserve liveViewUrl and sessionId on agent error If the agent throws after Browserbase session init succeeds, callers can still surface the live view / session ID for debugging. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(browser-use): coerce empty maxSteps strings to undefined Mirrors the Stagehand block's handling so a cleared field doesn't pass through as ''. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(browser-use): skip metadata when empty --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent ca814f0 commit c95ac3b

12 files changed

Lines changed: 570 additions & 180 deletions

File tree

apps/docs/content/docs/en/tools/browser_use.mdx

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,18 @@ Runs a browser automation task using BrowserUse
4242
| Parameter | Type | Required | Description |
4343
| --------- | ---- | -------- | ----------- |
4444
| `task` | string | Yes | What should the browser agent do |
45-
| `variables` | json | No | Optional variables to use as secrets \(format: \{key: value\}\) |
46-
| `save_browser_data` | boolean | No | Whether to save browser data |
47-
| `model` | string | No | LLM model to use \(default: gpt-4o\) |
45+
| `startUrl` | string | No | Initial page URL to start the agent on \(reduces navigation steps\) |
46+
| `variables` | json | No | Optional secrets injected into the task \(format: \{key: value\}\) |
47+
| `allowedDomains` | string | No | Comma-separated list of domains the agent is allowed to visit |
48+
| `maxSteps` | number | No | Maximum number of steps the agent may take \(default 100, max 10000\) |
49+
| `flashMode` | boolean | No | Enable flash mode \(faster, less careful navigation\) |
50+
| `thinking` | boolean | No | Enable extended reasoning mode |
51+
| `vision` | string | No | Vision capability: "true", "false", or "auto" |
52+
| `systemPromptExtension` | string | No | Optional text appended to the agent system prompt \(max 2000 chars\) |
53+
| `structuredOutput` | string | No | Stringified JSON schema for the structured output |
54+
| `highlightElements` | boolean | No | Highlight interactive elements on the page \(default true\) |
55+
| `metadata` | json | No | Custom key-value metadata \(up to 10 pairs\) for tracking |
56+
| `model` | string | No | LLM model identifier \(e.g. browser-use-2.0\) |
4857
| `apiKey` | string | Yes | API key for BrowserUse API |
4958
| `profile_id` | string | No | Browser profile ID for persistent sessions \(cookies, login state\) |
5059

@@ -54,7 +63,18 @@ Runs a browser automation task using BrowserUse
5463
| --------- | ---- | ----------- |
5564
| `id` | string | Task execution identifier |
5665
| `success` | boolean | Task completion status |
57-
| `output` | json | Task output data |
58-
| `steps` | json | Execution steps taken |
66+
| `output` | json | Final task output \(string or structured\) |
67+
| `steps` | array | Steps the agent executed \(number, memory, nextGoal, url, actions, duration\) |
68+
|`number` | number | Sequential step number |
69+
|`memory` | string | Agent memory at this step |
70+
|`evaluationPreviousGoal` | string | Evaluation of previous goal completion |
71+
|`nextGoal` | string | Goal for the next step |
72+
|`url` | string | Current URL of the browser |
73+
|`screenshotUrl` | string | Optional screenshot URL |
74+
|`actions` | array | Stringified JSON actions performed |
75+
|`duration` | number | Step duration in seconds |
76+
| `liveUrl` | string | Embeddable live browser session URL \(active during execution\) |
77+
| `shareUrl` | string | Public shareable URL for the recorded session \(post-run\) |
78+
| `sessionId` | string | Browser Use session identifier |
5979

6080

apps/docs/content/docs/en/tools/stagehand.mdx

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ Run an autonomous web agent to complete tasks and extract structured data
7272
| `provider` | string | No | AI provider to use: openai or anthropic |
7373
| `apiKey` | string | Yes | API key for the selected provider |
7474
| `outputSchema` | json | No | Optional JSON schema defining the structure of data the agent should return |
75+
| `mode` | string | No | Agent tool mode: dom \(default\), hybrid, or cua |
76+
| `maxSteps` | number | No | Maximum agent steps \(default 20, max 200\) |
7577

7678
#### Output
7779

@@ -92,5 +94,7 @@ Run an autonomous web agent to complete tasks and extract structured data
9294
|`timestamp` | number | Unix timestamp when the action was performed |
9395
|`timeMs` | number | Time in milliseconds \(for wait actions\) |
9496
| `structuredOutput` | object | Extracted data matching the provided output schema |
97+
| `liveViewUrl` | string | Embeddable Browserbase live view URL \(active only while the session is running\) |
98+
| `sessionId` | string | Browserbase session identifier |
9599

96100

apps/sim/app/api/tools/stagehand/agent/route.ts

Lines changed: 43 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ const requestSchema = z.object({
2222
variables: z.any(),
2323
provider: z.enum(['openai', 'anthropic']).optional().default('openai'),
2424
apiKey: z.string(),
25+
mode: z.enum(['dom', 'hybrid', 'cua']).optional().default('dom'),
26+
maxSteps: z.number().int().min(1).max(200).optional().default(20),
2527
})
2628

2729
/**
@@ -121,7 +123,7 @@ export const POST = withRouteHandler(async (request: NextRequest) => {
121123
}
122124

123125
const params = validationResult.data
124-
const { task, startUrl: rawStartUrl, outputSchema, provider, apiKey } = params
126+
const { task, startUrl: rawStartUrl, outputSchema, provider, apiKey, mode, maxSteps } = params
125127
const variablesObject = processVariables(params.variables)
126128

127129
const startUrl = normalizeUrl(rawStartUrl)
@@ -165,8 +167,10 @@ export const POST = withRouteHandler(async (request: NextRequest) => {
165167
return NextResponse.json({ error: 'Invalid Anthropic API key format' }, { status: 400 })
166168
}
167169

168-
const modelName =
169-
provider === 'anthropic' ? 'anthropic/claude-sonnet-4-5-20250929' : 'openai/gpt-5'
170+
const modelName = provider === 'anthropic' ? 'anthropic/claude-sonnet-4-6' : 'openai/gpt-5'
171+
172+
let sessionId: string | null = null
173+
let liveViewUrl: string | null = null
170174

171175
try {
172176
logger.info('Initializing Stagehand with Browserbase (v3)', { provider, modelName })
@@ -190,6 +194,35 @@ export const POST = withRouteHandler(async (request: NextRequest) => {
190194
await stagehand.init()
191195
logger.info('Stagehand initialized successfully')
192196

197+
sessionId = stagehand.browserbaseSessionID ?? null
198+
if (sessionId) {
199+
try {
200+
const debugResponse = await fetch(
201+
`https://api.browserbase.com/v1/sessions/${sessionId}/debug`,
202+
{
203+
method: 'GET',
204+
headers: {
205+
'X-BB-API-Key': BROWSERBASE_API_KEY,
206+
},
207+
}
208+
)
209+
if (debugResponse.ok) {
210+
const debugData = (await debugResponse.json()) as {
211+
debuggerFullscreenUrl?: string
212+
debuggerUrl?: string
213+
}
214+
liveViewUrl = debugData.debuggerFullscreenUrl ?? debugData.debuggerUrl ?? null
215+
if (liveViewUrl) {
216+
logger.info(`Browserbase live view URL: ${liveViewUrl}`)
217+
}
218+
} else {
219+
logger.warn(`Failed to fetch Browserbase debug URL: ${debugResponse.statusText}`)
220+
}
221+
} catch (debugError) {
222+
logger.warn('Error fetching Browserbase debug URL', { error: debugError })
223+
}
224+
}
225+
193226
const page = stagehand.context.pages()[0]
194227
logger.info(`Navigating to ${startUrl}`)
195228
await page.goto(startUrl, { waitUntil: 'networkidle' })
@@ -223,13 +256,14 @@ export const POST = withRouteHandler(async (request: NextRequest) => {
223256
apiKey: apiKey,
224257
},
225258
systemPrompt: agentInstructions,
259+
mode,
226260
})
227261

228-
logger.info('Executing agent task', { task: taskWithVariables })
262+
logger.info('Executing agent task', { task: taskWithVariables, mode, maxSteps })
229263

230264
const agentExecutionResult = await agent.execute({
231265
instruction: taskWithVariables,
232-
maxSteps: 20,
266+
maxSteps,
233267
})
234268

235269
const agentResult = {
@@ -293,6 +327,8 @@ export const POST = withRouteHandler(async (request: NextRequest) => {
293327
return NextResponse.json({
294328
agentResult,
295329
structuredOutput,
330+
liveViewUrl,
331+
sessionId,
296332
})
297333
} catch (error) {
298334
logger.error('Stagehand agent execution error', {
@@ -327,6 +363,8 @@ export const POST = withRouteHandler(async (request: NextRequest) => {
327363
{
328364
error: errorMessage,
329365
details: errorDetails,
366+
liveViewUrl,
367+
sessionId,
330368
},
331369
{ status: 500 }
332370
)

apps/sim/app/api/tools/stagehand/extract/route.ts

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,6 @@ const BROWSERBASE_PROJECT_ID = env.BROWSERBASE_PROJECT_ID
1717
const requestSchema = z.object({
1818
instruction: z.string(),
1919
schema: z.record(z.any()),
20-
useTextExtract: z.boolean().optional().default(false),
21-
selector: z.string().nullable().optional(),
2220
provider: z.enum(['openai', 'anthropic']).optional().default('openai'),
2321
apiKey: z.string(),
2422
url: z.string().url(),
@@ -51,7 +49,7 @@ export const POST = withRouteHandler(async (request: NextRequest) => {
5149
}
5250

5351
const params = validationResult.data
54-
const { url: rawUrl, instruction, selector, provider, apiKey, schema } = params
52+
const { url: rawUrl, instruction, provider, apiKey, schema } = params
5553
const url = normalizeUrl(rawUrl)
5654
const urlValidation = await validateUrlWithDNS(url, 'url')
5755
if (!urlValidation.isValid) {
@@ -101,8 +99,7 @@ export const POST = withRouteHandler(async (request: NextRequest) => {
10199
}
102100

103101
try {
104-
const modelName =
105-
provider === 'anthropic' ? 'anthropic/claude-sonnet-4-5-20250929' : 'openai/gpt-5'
102+
const modelName = provider === 'anthropic' ? 'anthropic/claude-sonnet-4-6' : 'openai/gpt-5'
106103

107104
logger.info('Initializing Stagehand with Browserbase (v3)', { provider, modelName })
108105

@@ -162,14 +159,11 @@ export const POST = withRouteHandler(async (request: NextRequest) => {
162159
logger.info('Calling stagehand.extract with options', {
163160
hasInstruction: !!instruction,
164161
hasSchema: !!zodSchema,
165-
hasSelector: !!selector,
166162
})
167163

168164
let extractedData
169165
if (zodSchema) {
170-
extractedData = await stagehand.extract(instruction, zodSchema, {
171-
selector: selector || undefined,
172-
})
166+
extractedData = await stagehand.extract(instruction, zodSchema)
173167
} else {
174168
extractedData = await stagehand.extract(instruction)
175169
}

apps/sim/blocks/blocks/browser_use.ts

Lines changed: 129 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,12 @@ export const BrowserUseBlock: BlockConfig<BrowserUseResponse> = {
2323
placeholder: 'Describe what the browser agent should do...',
2424
required: true,
2525
},
26+
{
27+
id: 'startUrl',
28+
title: 'Start URL',
29+
type: 'short-input',
30+
placeholder: 'https://example.com (optional starting URL)',
31+
},
2632
{
2733
id: 'variables',
2834
title: 'Variables (Secrets)',
@@ -51,22 +57,85 @@ export const BrowserUseBlock: BlockConfig<BrowserUseResponse> = {
5157
{ label: 'Claude 3.7 Sonnet', id: 'claude-3-7-sonnet-20250219' },
5258
{ label: 'Claude Sonnet 4', id: 'claude-sonnet-4-20250514' },
5359
{ label: 'Claude Sonnet 4.5', id: 'claude-sonnet-4-5-20250929' },
60+
{ label: 'Claude Sonnet 4.6', id: 'claude-sonnet-4-6' },
5461
{ label: 'Claude Opus 4.5', id: 'claude-opus-4-5-20251101' },
5562
{ label: 'Llama 4 Maverick', id: 'llama-4-maverick-17b-128e-instruct' },
5663
],
5764
},
58-
{
59-
id: 'save_browser_data',
60-
title: 'Save Browser Data',
61-
type: 'switch',
62-
placeholder: 'Save browser data',
63-
},
6465
{
6566
id: 'profile_id',
6667
title: 'Profile ID',
6768
type: 'short-input',
6869
placeholder: 'Enter browser profile ID (optional)',
6970
},
71+
{
72+
id: 'maxSteps',
73+
title: 'Max Steps',
74+
type: 'short-input',
75+
placeholder: '100',
76+
mode: 'advanced',
77+
},
78+
{
79+
id: 'allowedDomains',
80+
title: 'Allowed Domains',
81+
type: 'short-input',
82+
placeholder: 'example.com, docs.example.com',
83+
mode: 'advanced',
84+
},
85+
{
86+
id: 'vision',
87+
title: 'Vision',
88+
type: 'dropdown',
89+
options: [
90+
{ label: 'Auto (default)', id: 'auto' },
91+
{ label: 'Enabled', id: 'true' },
92+
{ label: 'Disabled', id: 'false' },
93+
],
94+
mode: 'advanced',
95+
},
96+
{
97+
id: 'flashMode',
98+
title: 'Flash Mode',
99+
type: 'switch',
100+
placeholder: 'Faster but less careful navigation',
101+
mode: 'advanced',
102+
},
103+
{
104+
id: 'thinking',
105+
title: 'Thinking',
106+
type: 'switch',
107+
placeholder: 'Enable extended reasoning',
108+
mode: 'advanced',
109+
},
110+
{
111+
id: 'highlightElements',
112+
title: 'Highlight Elements',
113+
type: 'switch',
114+
placeholder: 'Visually mark interactive elements',
115+
mode: 'advanced',
116+
},
117+
{
118+
id: 'systemPromptExtension',
119+
title: 'System Prompt Extension',
120+
type: 'long-input',
121+
placeholder: 'Append custom instructions to the agent system prompt (max 2000 chars)',
122+
mode: 'advanced',
123+
},
124+
{
125+
id: 'structuredOutput',
126+
title: 'Structured Output Schema',
127+
type: 'code',
128+
language: 'json',
129+
placeholder: 'Stringified JSON schema for structured output',
130+
mode: 'advanced',
131+
},
132+
{
133+
id: 'metadata',
134+
title: 'Metadata',
135+
type: 'table',
136+
columns: ['Key', 'Value'],
137+
mode: 'advanced',
138+
},
70139
{
71140
id: 'apiKey',
72141
title: 'API Key',
@@ -78,19 +147,68 @@ export const BrowserUseBlock: BlockConfig<BrowserUseResponse> = {
78147
],
79148
tools: {
80149
access: ['browser_use_run_task'],
150+
config: {
151+
tool: () => 'browser_use_run_task',
152+
params: (params) => {
153+
const next: Record<string, any> = { ...params }
154+
if (typeof next.maxSteps === 'string') {
155+
const trimmed = next.maxSteps.trim()
156+
if (trimmed === '') {
157+
next.maxSteps = undefined
158+
} else {
159+
const n = Number(trimmed)
160+
next.maxSteps = Number.isFinite(n) ? n : undefined
161+
}
162+
}
163+
if (next.vision === 'true') next.vision = true
164+
else if (next.vision === 'false') next.vision = false
165+
if (next.metadata && Array.isArray(next.metadata)) {
166+
const obj: Record<string, string> = {}
167+
for (const row of next.metadata as Array<Record<string, any>>) {
168+
const key = row?.cells?.Key ?? row?.Key
169+
const value = row?.cells?.Value ?? row?.Value
170+
if (key) obj[key] = String(value ?? '')
171+
}
172+
next.metadata = obj
173+
}
174+
return next
175+
},
176+
},
81177
},
82178
inputs: {
83179
task: { type: 'string', description: 'Browser automation task' },
180+
startUrl: { type: 'string', description: 'Starting URL for the agent' },
84181
apiKey: { type: 'string', description: 'BrowserUse API key' },
85-
variables: { type: 'json', description: 'Task variables' },
86-
model: { type: 'string', description: 'AI model to use' },
87-
save_browser_data: { type: 'boolean', description: 'Save browser data' },
182+
variables: { type: 'json', description: 'Secrets to inject into the task' },
183+
model: { type: 'string', description: 'LLM model to use' },
88184
profile_id: { type: 'string', description: 'Browser profile ID for persistent sessions' },
185+
maxSteps: { type: 'number', description: 'Maximum agent steps' },
186+
allowedDomains: { type: 'string', description: 'Comma-separated allowed domains' },
187+
vision: { type: 'string', description: 'Vision capability (auto / true / false)' },
188+
flashMode: { type: 'boolean', description: 'Enable flash mode' },
189+
thinking: { type: 'boolean', description: 'Enable extended reasoning' },
190+
highlightElements: { type: 'boolean', description: 'Highlight interactive elements' },
191+
systemPromptExtension: { type: 'string', description: 'Custom system prompt extension' },
192+
structuredOutput: { type: 'string', description: 'Stringified JSON schema' },
193+
metadata: { type: 'json', description: 'Custom key-value metadata' },
89194
},
90195
outputs: {
91196
id: { type: 'string', description: 'Task execution identifier' },
92197
success: { type: 'boolean', description: 'Task completion status' },
93-
output: { type: 'json', description: 'Task output data' },
94-
steps: { type: 'json', description: 'Execution steps taken' },
198+
output: { type: 'json', description: 'Final task output (string or structured)' },
199+
steps: {
200+
type: 'json',
201+
description:
202+
'Steps the agent executed (number, memory, evaluationPreviousGoal, nextGoal, url, screenshotUrl, actions, duration)',
203+
},
204+
liveUrl: {
205+
type: 'string',
206+
description: 'Embeddable live browser session URL (active during execution)',
207+
},
208+
shareUrl: {
209+
type: 'string',
210+
description: 'Public shareable URL for the session (post-run)',
211+
},
212+
sessionId: { type: 'string', description: 'Browser Use session identifier' },
95213
},
96214
}

0 commit comments

Comments
 (0)