Inside Uber's AI & MCP gateways: blueprint for engineering leaders
Read Time 7 mins | Written by: Cole
Most companies don't have an AI strategy. They have 40 engineers doing 40 different things – different models, different API keys, no oversight, no cost visibility, and no idea what data is leaving the building. It works fine at small scale. It becomes a liability the moment AI usage gets serious.
Uber saw this coming early and built the infrastructure to handle it. Their approach centered on a pair of gateways that sit between every agent and every model. It’s one of the most detailed public examples of enterprise AI governance done right. The Uber Engineering Blog and The Pragmatic Engineer have both covered it in depth, and together they paint a clear picture of what serious AI infrastructure looks like at scale.
It's worth understanding because the problems it solves are universal.
What an AI gateway actually is
An AI gateway is the control layer that sits in front of every model call and every agent request. Every prompt – from every tool, every engineer, every automated workflow – passes through it.
A well-designed gateway handles a few specific jobs:
- Authentication and routing – figuring out who's calling and which model should answer
- Data redaction – stripping out sensitive information before it leaves your network
- Logging and observability – capturing every request for audit, debugging, and cost analysis
- Cost tracking – knowing what each team and each use case is spending
Without one, you have agents talking directly to external APIs with no visibility and no guardrails. With one, you have a single chokepoint you can actually manage.
For CTOs, the gateway is less of a technical decision and more of a governance decision. It's the answer to the question your board and your security team will eventually ask: how do you know what your AI agents are doing?
Uber built two AI gateways: gen AI and MCP

GenAI Gateway (2023–2024): the model gateway
The first piece, documented on the Uber Engineering Blog in 2024 is the GenAI Gateway. It's a Go service that combines external (OpenAI, Vertex AI) and internal LLMs and many generic capabilities, such as authentication and account management, caching, and observability/monitoring.
Three things make it work:
- OpenAI-compatible interface. Uber's GenAI Gateway closely mirrors OpenAI's interface, which means every open-source library and developer tool built for OpenAI works against Uber's gateway with no changes. Adoption friction drops to near zero.
- PII redaction at the infrastructure layer. The gateway provides functions such as PII Redaction, scrubbing internal identifiers before requests reach external models like Claude 3.5 or GPT-4o. No team building an AI use case has to remember to do it. The infrastructure handles it.
- A standardized review process. A standardized review process, managed by the Engineering Security team, reviews use cases against Uber's data handling standard before use cases are granted access to the gateway. The gateway becomes the front door. Teams know exactly how to walk through it.
By the time of the 2024 write-up, the GenAI Gateway was used by close to 30 customer teams and served 16 million queries per month.
MCP Gateway (2025–2026): the agent gateway

The second piece is newer. As agentic workflows took off, Uber needed a different kind of gateway – one designed for the Model Context Protocol (MCP), the emerging standard for connecting AI agents to tools, data sources, and APIs.
Gergely Orosz described it on The Pragmatic Engineer: Uber put together a "tiger team" (a temporary unit that gets things done fast) to design the MCP strategy and build the central MCP Gateway.
The MCP Gateway handles four things:
- Proxy internal endpoints to MCPs: any internal Thrift, Protobuffer, or HTTP endpoint can be exposed as an MCP server with a simple configuration change.
- First-party MCPs: these are exposed as a single, consistent interface
- Third-party MCPs: external MCP servers are also exposed via the gateway, which handles all authentication and authorization tasks.
- Platform concerns: the gateway takes care of authorization, telemetry, and logging in one central place.
It also ships with a registry to look up MCP servers, and for devs to be able to register their own, plus a sandbox for devs to experiment with MCP servers without long-winded setups.
Together, the two gateways are the membrane between Uber's engineers and the outside world. Every model call goes through GenAI Gateway. Every agent-to-tool call goes through MCP Gateway.
The four-layer stack the gateways sit inside

The gateways aren't standalone. They sit inside a broader AI infrastructure stack that Ty Smith and Anshu Chada laid out at The Pragmatic Summit. Uber's "agentic system" for software engineering is actually made up of several systems:
- Internal AI platform: Built on top of Michelangelo, Uber's ML/AI platform. This layer provides things like a model gateway to proxy to frontier models or internally hosted models.
- Internal Uber context: accessing Uber's source code, engineering documentation, Slack information, JIRA tickets, etc. These all serve as "memory" for agents to use.
- Industry agents: Uber's approach is to enable the "latest & greatest" AI agents for engineers, so they support several tools like Claude Code, GitHub Copilot, Codex, and other clients.
- Specialized agents: Uber's background agent platform, the test generation platform, code review agents, and more.
If you build use cases first and governance later, you're retrofitting security onto a system that wasn't designed for it. Uber built the infrastructure before scaling adoption, which is why they could roll out broadly without a data incident making headlines.
What the developer AI adoption numbers tell you
At The Pragmatic Summit in San Francisco earlier this year, Smith and Chada pulled back the curtain on how all of this is performing. The Pragmatic Engineer's full breakdown is worth reading in its entirety – the headline numbers as of March 2026 are striking:
- 84% of devs at Uber are agentic coding users (either using CLI-based agents or making more agentic requests than tab-completion in their IDE)
- 65-72% of code is AI-generated inside IDE-based tools. This number is, naturally, 100% for AI command line tools like Claude Code.
- Claude Code usage nearly doubled in 3 months – from 32% in Dec to 63% in Feb, while IDE-based tools (Cursor, IntelliJ) have plateaued.
- 92% of Uber devs use agents monthly, 65-72% of code is AI-generated inside IDEs, and 11% of pull requests opened by agents.
- AI-related costs are up 6x since 2024, and token cost optimization is a growing priority.
CLI-based agents are winning because they can be centrally governed. A command-line tool talks to the gateway. An IDE plugin is harder to route, monitor, and control.
As companies move from single-agent workflows to running multiple parallel agents simultaneously – which is where Uber is now – the ability to govern all of that through a single layer becomes the difference between a manageable system and an unmanageable one.
Where most companies get stuck
The main failure is sequencing.
Teams build agent use cases, demonstrate ROI, get buy-in to scale, and then discover that scaling requires infrastructure they don't have. By that point they're retrofitting governance onto a sprawling system of direct API connections, inconsistent credentials, and no audit trail.
The right order:
- Gateway first: a proxy that routes to your chosen models, logs every request, redacts sensitive data, and requires teams to register use cases before going live
- Agents second: single-threaded, then parallel, then specialized
- Optimization third: cost controls, model routing, fallback logic, and the rest
You don't need Uber's budget or Michelangelo to do this. The minimum viable version is achievable in weeks, not months.
The bottom line for engineering leaders
Uber published all of this publicly – both on the Uber Engineering Blog and through sessions like The Pragmatic Summit – because they believe sharing what works creates more value than keeping it proprietary.
That instinct is right. The companies building the governance layer now are the ones who will run AI agents confidently at scale in 12 months. The ones waiting will be cleaning up data incidents and surprise cloud bills.
If you want to build the infrastructure for AI software development at your company, let's talk.
Don't Miss
Another Update
new content is published
Cole
Cole is Codingscape's Content Marketing Strategist & Copywriter.
