Vibe Coding Weekly #32

Anthropic held a developer conference, doubled your rate limits, and signed SpaceX to power it. The rest of the week kept up.

May 11, 2026

This week in one satisfying refactor:

The Agent Story: Anthropic shipped three new capabilities for Claude Managed Agents — Dreaming, Outcomes, and Multiagent Orchestration — turning a single-agent tool into a self-improving, parallelizable fleet. Netflix is already running orchestration in production.
The Tool Story: OpenAI’s Codex moved into the browser with a Chrome extension that can test web apps, access DevTools, and use authenticated sessions for Gmail and internal tools — bringing Codex to where most real work actually lives.
The Insight: GitHub’s engineering team published the data on how they cut agent token costs by 19–62% across five production workflows. The key finding was structural, not algorithmic: most agent turns are deterministic data-gathering steps that never needed an LLM in the first place.

If you only read one thing this week: On May 6, Anthropic held Code with Claude 2026 in San Francisco — its first developer conference — and rather than announcing new models, made its existing products dramatically more capable. Claude Code’s five-hour rate limits were doubled for all paid plans. Anthropic signed an agreement to use all compute capacity at SpaceX’s Colossus 1 data center: 300+ megawatts and 220,000+ NVIDIA GPUs available within a month. API volume is up 17x year-on-year. The conference framed the product direction clearly: Claude Code is no longer a terminal tool — it’s a platform, with CLI, IDE, desktop app, Remote Agents, Code Review, and now Routines as surfaces. Read more →

Growing at 20% new subscribers per week.

The stories this week aren’t hard to find. What’s hard is knowing which ones actually matter before your team asks you on Monday.

That’s the only thing Vibe Coding Weekly does: cut through the volume so you arrive at the week with context, not anxiety.

Subscribers also get Change Management in Agentic AI Adoption — the framework for the conversation that always comes after “we should use AI more”: how to actually move an organization that didn’t ask to be moved. Included with every subscription.

Key Takeaways

Claude Managed Agents can now self-improve between sessions: The Dreaming feature (research preview) reviews past agent sessions to extract recurring patterns, mistakes, and shared preferences across a team — updating memory stores automatically or with developer review. Outcomes adds rubric-based evaluation with up to 10% higher task success rates on the hardest tasks. Multiagent Orchestration lets a lead agent delegate subtasks to parallel specialists with their own models and tools, running on shared storage. Read more →
Cursor 3.3 closes the loop between coding and reviewing: The May 7 release ships a full PR review experience inside the IDE — Reviews, Commits, and Changes tabs — alongside “Build in Parallel,” which identifies independent parts of a plan and runs them simultaneously using async subagents. Teams can now go from initial plan to parallel execution to PR review without leaving Cursor. Read more →
The cheapest token is the one you never send: GitHub’s May 7 analysis of five production agentic workflows found 19–62% token reductions came not from better prompting, but from removing LLM calls entirely for steps that didn’t need reasoning. Pruning unused MCP tools saved 8–12 KB of schema context per call; replacing GitHub MCP calls with direct CLI commands eliminated whole agent turns. Read more →
GPT-5.5 Instant cuts hallucinations 52.5% and replaces GPT-5.3 as ChatGPT’s default: OpenAI’s May 5 release produces fewer hallucinations on high-stakes topics (law, medicine, finance), uses 30% fewer words, and scores 81.2 vs. 65.4 on AIME 2025 math benchmarks. Developers get it as chat-latest in the API, with GPT-5.3 remaining available for three months on paid plans. Read more →
Cursor Enterprise gets model-level spend controls before the GitHub Copilot billing shift: Four weeks before GitHub’s June 1 move to token-based billing, Cursor’s May 4 update gives enterprise admins granular model and provider blocklists, soft spending limits with 50/80/100% alerts, and per-surface usage breakdowns covering Cloud Agents, Automations, and Security Review. The timing is not coincidental — teams that didn’t track AI spend last quarter are now being told they need to. Read more →
Security review is now table stakes for AI coding tools: Windsurf made Devin Review and Quick Review (10x faster bug detection via SWE-check) available to all subscribers; Snyk integrated Claude models into its AI Security Platform; and Opsera embedded DevSecOps agents — Architecture Analyzer, Security/SQL Scanner, Compliance Auditor — directly into Cursor. Three independent moves in the same week signal that security review is converging toward a default layer of the agentic coding stack, not an optional add-on. Read more →

📦 Releases & News

Gemini CLI v0.41.0 — Real-Time Voice Mode and Gemma 4 Support

The May 5 stable release ships the real-time voice mode with cloud and local backends that had been in preview — the most significant UI expansion since the CLI launched. Workspace trust enforcement now secures .env loading in headless mode, and shell command validation is strengthened with a core tools allowlist. Gemma 4 models are now supported via the Gemini API, with the v0.42.0 preview release enabling them by default.

Windsurf 2.2.17 — Devin Review and Quick Review for All Subscribers

Devin Review — in-editor code review with bug detection and full codebase context — and Quick Review — 10x faster bug detection powered by SWE-check — are now available to all Windsurf subscribers without a separate Cognition account. The release also improves the agent inbox with a list view, better session sidebar sorting, and fixes MCP server reliability and Devin Local agent stability.

Codex CLI 0.130.0 — Remote Control, Plugin Sharing, Bedrock Auth

The May 8 release adds codex remote-control as a simpler entrypoint for starting a headless, remotely controllable app-server — useful for CI/CD pipelines and background agent runners. Plugin details now expose bundled hooks with sharing and discoverability controls. Bedrock auth gains support for AWS console-login credentials, and view_image resolves files through the selected environment for multi-environment sessions.

OpenCode v1.14.42 — Scout Agent for Repo Research

The May 9 release introduces the Scout agent — a built-in tool for repository research, documentation lookup, and dependency-source inspection — giving agents a structured way to gather codebase context before making changes. Also adds HTTP API response compression for large non-streaming responses and an interactive split-footer mode for opencode run.

Claude Code v2.1.136 — MCP Stability, Auto Mode Rules, WSL2 Image Paste

The May 8 release brings 50+ fixes to Claude Code’s production deployment scenarios: settings.autoMode.hard_deny for classifier-based rules in auto mode; fixed MCP servers disappearing after /clear; fixed OAuth refresh token race conditions under concurrent server refreshes; and WSL2 image paste from Windows clipboard via PowerShell fallback. The week’s release cadence (8 releases from May 4–9) reflects the scale of active enterprise deployment across diverse environments.

📚 Tutorials and Resources

Simon Willison: Live Blog of Code with Claude 2026

Simon Willison’s real-time notes from Anthropic’s May 6 San Francisco developer conference capture every announcement as it happened — including the live coding demo with Boris Cherny and Jarred Sumner. The best single-source summary of what Claude Code’s new surfaces (Code Review, Remote Agents, Routines, desktop app) look like in practice, and what “Opus 4.7 has a real taste for visual design” actually means when demonstrated live.

💡 Others

GitHub Copilot in Visual Studio Code — April Releases: Semantic Search, Inline Diffs, and Bring Your Own Keys

The May 6 changelog covering VS Code releases v1.116–v1.119 introduces semantic indexing across all workspaces, meaning meaningful code search and grep-style queries across GitHub repos now work without per-project setup. Code changes from agents now appear as inline diffs in the chat thread, keeping review in context. Teams on Copilot Business and Enterprise can connect their own API keys from Anthropic, OpenAI, or Google — a notable shift that decouples Copilot from GitHub’s model choices for teams with existing provider commitments.

Lenny’s Newsletter: Code with Claude 2026 — The 5 Biggest Updates Explained

Claire Vo’s walkthrough of the five most significant announcements from Anthropic’s May 6 developer conference provides the clearest explanation of what Routines, Outcomes, and Dreaming mean for builders shipping AI products today. The episode reframes Routines as “higher-order prompts” — infrastructure for async developer workflows, not just scheduled scripts — and explains how Outcomes shifts agent evaluation from “did it run?” to “did it succeed against a rubric I defined?” Essential context for anyone building on top of Claude Managed Agents.

Next week, the stack keeps moving. So does this newsletter. Fall behind one week, and you’ll spend the next three catching up.

Every week, a new model drops. A new agent framework ships. A new “this changes everything” thread goes viral. And you still have actual code to write.

Every Monday, you open your inbox and already know what matters. You know that Anthropic just doubled your Claude Code rate limits and signed SpaceX’s entire Colossus data center to back it up — and what that means for your team’s roadmap. You know that the cheapest token is the one you never send, and exactly which structural changes GitHub used to cut agent costs by up to 62%. You didn’t spend your weekend reading to know this. I did.

That’s what Vibe Coding Weekly is. For developers, architects, tech leads, and everyone building or managing software in the age of AI.

Clean code and positive vibes,
Angel.

Vibe Coding Weekly

Discussion about this post

Ready for more?