Vibe Coding Weekly #33

The agent left the IDE. Your metrics didn't follow.

May 25, 2026

This week in one satisfying refactor:

The Big Story: Google I/O dropped Antigravity 2.0, Gemini 3.5 Flash, and Managed Agents — the most comprehensive agentic dev platform announcement since GitHub Copilot went GA.
The Benchmark: SWE-bench Verified is now flagged as contaminated, and a new ranking study proves scaffolding quality matters as much as the model underneath.
The Workflow Shift: Cursor embeds inside Jira, OpenAI puts Codex on your phone, and GitHub adds one-click CI fixes — the coding agent is no longer a separate tab, it lives inside every tool you already have open.

If you only read one thing this week: The Shai-Hulud/Megalodon supply chain attack is the most technically sophisticated assault on AI developer tooling to date. Wave 1 generated cryptographically valid SLSA Build Level 3 provenance attestations by hijacking legitimate build pipelines — meaning the signed artifacts were technically legitimate outputs from compromised pipelines. Wave 2 then deployed 5,718 malicious commits to 5,561 GitHub repositories in six hours. On May 12, the attackers open-sourced the worm itself, converting a targeted weapon into commodity attack infrastructure available to anyone. Read more →

The stories this week aren’t hard to find. What’s hard is knowing which ones actually matter before your team asks you on Monday.

That’s the only thing Vibe Coding Weekly does: cut through the volume so you arrive at the week with context, not anxiety.

Subscribers also get Change Management in Agentic AI Adoption — the framework for the conversation that always comes after “we should use AI more”: how to actually move an organization that didn’t ask to be moved. Included with every subscription.

Key Takeaways

The coding agent is leaving the IDE: Cursor is now inside Jira. Codex is now on your phone. GitHub Copilot fixes your failing CI jobs without you switching tabs. The agent isn’t a tool you open — it’s becoming part of every surface you already use. Read more →
Cursor is becoming a model lab: Composer 2.5 matches Claude Opus 4.7 on SWE-Bench Multilingual (79.8% vs. 80.5%) at one-tenth the token cost — and it ships exclusively inside Cursor. The IDE that started as a thin wrapper around OpenAI is now training and shipping its own frontier-grade models. Read more →
89% of engineering leaders say AI improved productivity — but 94% admit their metrics miss the actual costs: The Harness State of Engineering Excellence 2026 report finds 31% of developer time now goes to “invisible work” (reviewing AI code, fixing subtle AI bugs) that legacy productivity metrics don’t track. Read more →
SWE-bench Verified is no longer a reliable benchmark: OpenAI found 59.4% of its tasks had flawed or unsolvable test cases, with contamination across all major frontier models. More importantly, the same model scored up to 17 points apart on identical benchmarks depending on the agent framework — scaffolding quality now matters as much as the model. Read more →
Google’s I/O was the biggest agentic developer platform drop of the year: Antigravity 2.0 (desktop, CLI, SDK), Gemini 3.5 Flash (4x faster than Gemini 3.1 Pro at half the cost), and Managed Agents (one API call = full isolated Linux sandbox agent) all shipped the same week. Gemini CLI gets sunset for free users on June 18 — replaced by Antigravity CLI. Read more →
The MCP spec is getting its biggest overhaul since launch: The release candidate locked May 21 removes all session state, making MCP servers horizontally scalable behind plain load balancers. MCP Apps and a Tasks extension add UI rendering and long-running work as opt-in capabilities — the protocol is growing up for production. Read more →

📦 Releases & News

Cursor v3.4 + v3.5: Full-Screen Mode, Cloud Environments, and Automations

v3.4 (May 13) introduced full-screen tab mode with a floating prompt bar, compact/balanced/detailed tool-call density controls, and multi-repo cloud development environments with Dockerfile-based configuration and 70% faster builds via improved layer caching. v3.5 (May 20) added Automations into the Agents Window, multi-repo reasoning for agents, no-repo automations for non-code tasks, and five new marketplace templates — with a 50% discount on agent runs for the first seven days of any new automation.

GitHub Copilot for Eclipse Goes Open Source Under MIT License

GitHub published the full Eclipse plugin source code — including the implementation of chat, code completions, agent mode, Next Edit Suggestions, and MCP support — under the MIT license at github.com/microsoft/copilot-for-eclipse. The stated motivation is “community-driven innovation and increased transparency.” This does not open-source Copilot itself; it exposes one of its IDE front ends, including system prompts, context handling, and agentic workflow logic.

GitHub Copilot Gets Auto Model Selection and One-Click CI Fixes

Two notable Copilot updates landed in VS Code this week. Auto model selection (May 20) now routes tasks to the best model based on performance metrics, automatically matching model to task type across completions, chat, and agents. One-click CI fixes (May 18) let developers trigger the Copilot cloud agent directly from a failing GitHub Actions job to analyze the failure and propose a fix without leaving the Actions UI.

OpenAI Codex Comes to iPhone and Android

Codex is now available in preview inside the ChatGPT mobile app on iOS and Android, across all plan tiers including Free. The mobile interface is a remote control surface for Codex sessions running on a Mac or devbox: developers can review outputs, approve commands, switch models, monitor terminal output, diffs, and test results from anywhere. More than 4 million people now use Codex weekly. Windows desktop support is coming; no date announced.

Claude Code v2.1.141–v2.1.149: /code-review, Agent Flags, and Background Session Hardening

Eight releases in ten days. The most impactful changes: /simplify is now /code-review with an optional effort level parameter; the claude agents command accepts new flags to configure dispatched background sessions (--add-dir, --settings, --mcp-config, --plugin-dir, --permission-mode, --model, --effort); Fast Mode now defaults to Opus 4.7; /usage shows per-category cost breakdowns (skills, subagents, plugins, MCP); and background sessions now persist through macOS sleep/wake cycles with improved startup times (15s vs. 75s).

Gemini CLI v0.43.0: Surgical Edits and Session Portability

The May 22 stable release improves code editing precision (the model now defaults to the edit tool for targeted changes), introduces session portability (export active sessions to files and import later via a CLI flag), and ships an adaptive token calculator for smarter context window management. Important note: Google has announced Gemini CLI will be replaced by Antigravity CLI for unpaid-tier and Google One users on June 18th.

Windsurf: Claude Opus 4.7 Fast Mode Available

Windsurf added Claude Opus 4.7 in fast mode on May 12, delivering ~2.5x higher output speeds compared to standard Opus 4.7. The May 17 release (v2.3.9) fixed availability issues with the swe-check model, enhanced terminal processing performance, restored conversation sharing, and repaired Devin Local agent path resolution on WSL.

Zed 1.3.5: Terminal Threads, Parallel Agents, and Gemini 3.5 Flash

Zed’s latest stable release adds Terminal Threads from the sidebar and Agent Panel, Git panel branch history views, inline image and Mermaid diagram rendering inside agents, and a new subagent_model setting enabling parallel agent execution across different parts of a codebase. The follow-up release (1.3.6, May 21) added native Google AI support, including Gemini 3.5 Flash with configurable thinking levels.

📚 Tutorials and Resources

How to Use Gemini 3.5 Flash in GitHub Copilot — Availability, Pricing, and Setup

Gemini 3.5 Flash is now rolling out to GitHub Copilot Pro, Pro+, Business, and Enterprise subscribers across VS Code (1.115.0+), Visual Studio, JetBrains, Xcode, and Eclipse. Enterprise and Business admins must explicitly enable the Gemini 3.5 Flash policy in Copilot settings before users can access it. The model carries a 14x premium request multiplier (tentative). The rollout is gradual, so availability may vary — if you’re not seeing it yet, it’s coming.

💡 Others

GitHub Copilot Now Finds Issues with Natural Language — Semantic Search Goes GA

Semantic issue search launched in Copilot Chat on GitHub.com, allowing developers to find, group, and analyze repository issues using natural language queries powered by a new semantic issues index. The feature is generally available on all Copilot plans. It’s a small but meaningful shift: instead of writing precise filter queries, you describe what you’re looking for and Copilot understands intent and context — the same pattern that is slowly displacing structured query interfaces across every developer workflow.

Next week, the stack keeps moving. So does this newsletter. Fall behind one week, and you’ll spend the next three catching up.

Every week, a new model drops. A new agent framework ships. A new “this changes everything” thread goes viral. And you still have actual code to write.

Every Monday, you open your inbox and already know what matters. You’ve skipped three viral threads that turned out to be nothing. You know that Cursor launched a proprietary model, that Google’s Antigravity 2.0 just made “managed agent” a one-API-call concept, and that the supply chain attack hitting AI dev tools right now is more sophisticated than anything we’ve seen before — and you didn’t have to spend your weekend reading to know this. We did.

That’s what Vibe Coding Weekly is. For developers, architects, tech leads, and everyone building or managing software in the age of AI.

Clean code and positive vibes,
Angel.

Vibe Coding Weekly

Discussion about this post

Ready for more?