Skip to content

References

Research papers and projects that influenced Lumen's design and implementation.

Projects

ProjectImpact on Lumen
Stagehandgithub.com/browserbase/stagehandCUA mode reference — Playwright-based browser agent with observe/act/extract API. ActCache with DOM fingerprinting inspired Lumen's self-healing deterministic replay
browser-usegithub.com/browser-use/browser-usePython browser agent — vision + DOM hybrid, multi-tab support, agent chain architecture
Claude Codeclaude.com/claude-codeAgentic loop design — tool-use pattern, streaming actions, context compaction strategy
Ralph Loopoh-my-claudecodeSelf-referential execution loop — iterate until verified complete, with architect verification gate

Papers

PaperImpact on Lumen
Surfer 2 — WebVoyager SOTA (97.1%)StateStore + Verifier + plannerModel — persistent context, validator-based completion gate, orchestrator planning
Magnitude — WebVoyager (93.9%)ActionCache + prompt caching + tier-1 screenshot compression — deterministic caching, observation masking, latency reduction
CATTS — Confidence-Aware Test-Time Scaling (2026)ConfidenceGate — multi-sample on hard steps, skip extra compute on easy ones
BacktrackAgent — Error Detection + Backtracking (EMNLP 2025)ActionVerifier — heuristic post-action checks (click target, input focus, goto host)
Tree Search with Browser Snapshots (ICLR 2025, CMU)CheckpointManager — save CDP state every N steps, backtrack on level 8+ stuck
ColorBrowserAgent — Adaptive Knowledge Base (2026)SiteKB — domain-specific navigation rules injected into prompts
Agent Workflow Memory (ICML 2025) arXiv 2409.07429WorkflowMemory — reusable multi-step routines from successful runs
AgentFold — Proactive Context Folding (Alibaba 2025) arXiv 2510.24699fold action — agent-controlled context compression for completed sub-tasks
OpenCUA — Three-Level Reasoning (COLM 2025) arXiv 2508.09123Structured reasoning prompts — THINK FIRST, CHECKPOINT PROGRESS every 3-5 steps
TTI — Test-Time Interaction Scaling (NeurIPS 2025)Action-biased prompts — "ACT DECISIVELY", favor exploration over long reasoning chains
Reflexion — Verbal Self-Reflection (NeurIPS 2023) arXiv 2303.11366Retry with judge feedback — structured reflection injected on retry attempts
FormFactory — Form-Filling Benchmark (2025)Form-specific prompt rules — fill one field at a time, verify after each, use URL params as fallback
Agent Q — Best-of-N Sampling (ICLR 2025) arXiv 2408.07199Informed confidence gate design — scoring vs agreement voting tradeoffs
SeeAct — Hybrid Vision+DOM Grounding (ICML 2024) arXiv 2401.01614Validated vision-first design — pure vision grounding identified as main bottleneck
BrowserAgent — Human-Inspired Browsing (TMLR 2025)writeState persistent memory — explicit cross-page information retention
DigiRL — VLM-Based Progress Evaluation (NeurIPS 2024) arXiv 2406.11896Informed RepeatDetector design — progress evaluation beyond pattern matching
WAC — World-Model-Augmented Action Correction (2026)Informed ModelVerifier — predict expected outcome, compare with actual
Agent-E — Hierarchical Planner-Executor (2024) arXiv 2407.13032delegate action — hand off sub-tasks to a child loop
Illusion of Progress (2025)Eval methodology — test across diverse sites, not just benchmark-specific tuning