References
Research papers and projects that influenced Lumen's design and implementation.
Projects
| Project | Impact on Lumen |
|---|---|
| Stagehand — github.com/browserbase/stagehand | CUA mode reference — Playwright-based browser agent with observe/act/extract API. ActCache with DOM fingerprinting inspired Lumen's self-healing deterministic replay |
| browser-use — github.com/browser-use/browser-use | Python browser agent — vision + DOM hybrid, multi-tab support, agent chain architecture |
| Claude Code — claude.com/claude-code | Agentic loop design — tool-use pattern, streaming actions, context compaction strategy |
| Ralph Loop — oh-my-claudecode | Self-referential execution loop — iterate until verified complete, with architect verification gate |
Papers
| Paper | Impact on Lumen |
|---|---|
| Surfer 2 — WebVoyager SOTA (97.1%) | StateStore + Verifier + plannerModel — persistent context, validator-based completion gate, orchestrator planning |
| Magnitude — WebVoyager (93.9%) | ActionCache + prompt caching + tier-1 screenshot compression — deterministic caching, observation masking, latency reduction |
| CATTS — Confidence-Aware Test-Time Scaling (2026) | ConfidenceGate — multi-sample on hard steps, skip extra compute on easy ones |
| BacktrackAgent — Error Detection + Backtracking (EMNLP 2025) | ActionVerifier — heuristic post-action checks (click target, input focus, goto host) |
| Tree Search with Browser Snapshots (ICLR 2025, CMU) | CheckpointManager — save CDP state every N steps, backtrack on level 8+ stuck |
| ColorBrowserAgent — Adaptive Knowledge Base (2026) | SiteKB — domain-specific navigation rules injected into prompts |
| Agent Workflow Memory (ICML 2025) arXiv 2409.07429 | WorkflowMemory — reusable multi-step routines from successful runs |
| AgentFold — Proactive Context Folding (Alibaba 2025) arXiv 2510.24699 | fold action — agent-controlled context compression for completed sub-tasks |
| OpenCUA — Three-Level Reasoning (COLM 2025) arXiv 2508.09123 | Structured reasoning prompts — THINK FIRST, CHECKPOINT PROGRESS every 3-5 steps |
| TTI — Test-Time Interaction Scaling (NeurIPS 2025) | Action-biased prompts — "ACT DECISIVELY", favor exploration over long reasoning chains |
| Reflexion — Verbal Self-Reflection (NeurIPS 2023) arXiv 2303.11366 | Retry with judge feedback — structured reflection injected on retry attempts |
| FormFactory — Form-Filling Benchmark (2025) | Form-specific prompt rules — fill one field at a time, verify after each, use URL params as fallback |
| Agent Q — Best-of-N Sampling (ICLR 2025) arXiv 2408.07199 | Informed confidence gate design — scoring vs agreement voting tradeoffs |
| SeeAct — Hybrid Vision+DOM Grounding (ICML 2024) arXiv 2401.01614 | Validated vision-first design — pure vision grounding identified as main bottleneck |
| BrowserAgent — Human-Inspired Browsing (TMLR 2025) | writeState persistent memory — explicit cross-page information retention |
| DigiRL — VLM-Based Progress Evaluation (NeurIPS 2024) arXiv 2406.11896 | Informed RepeatDetector design — progress evaluation beyond pattern matching |
| WAC — World-Model-Augmented Action Correction (2026) | Informed ModelVerifier — predict expected outcome, compare with actual |
| Agent-E — Hierarchical Planner-Executor (2024) arXiv 2407.13032 | delegate action — hand off sub-tasks to a child loop |
| Illusion of Progress (2025) | Eval methodology — test across diverse sites, not just benchmark-specific tuning |