Research
July 2, 2025

Is RAG Dead or Alive?

  •  
Read time:  
2
mins
Daniel Warfield
Senior Engineer

As AI models grow larger and context windows expand, some have started questioning whether Retrieval-Augmented Generation (RAG) is still necessary. With million-token prompts now possible, the idea of simply feeding all your data into a model is gaining traction. But is RAG really outdated, or are we overlooking what made it essential in the first place?

Is RAG Dead or Alive?

There’s an epic battle on the internet. In this corner, RAG. And in the other corner, also RAG. Long live RAG. RAG will never die. Or will it?

Here’s our take.

The Case Against RAG

The argument for why RAG might be dead is simple. As language models grow, their context windows get bigger. You can now fit more information into a single prompt than ever before.

RAG exists because you can’t feed all your documents into a language model directly. So instead, you search a database, usually a vector database of documents, pull out a few relevant pieces, and send them into the model as context. That’s what RAG does. Retrieve and attach.

But if you can fit everything into the prompt, why retrieve anything at all?

That’s the logic. And context windows really are expanding. Gemini 2.5 supports one million tokens. Two million is reportedly coming. Meta’s LLaMA model claims ten million. Just a few years ago, GPT launched with sixteen thousand.

We’ve gone from a few thousand to several million in under two years.

Maybe today you can only fit a few paragraphs. But soon, you’ll be able to fit books. After that, maybe libraries.

At that point, who needs RAG?

Why That Argument Falls Apart

Here’s why we don’t agree.

A million tokens is not that much. It covers maybe a few thousand pages. That might be fine for a small business. But a single civil lawsuit can include one to five million pages. We’re nowhere close to fitting even one of those cases into a model.

And there’s a bigger issue.

This is really about where the world’s information lives. Right now, it’s on hard drives. That’s the cheapest way to store data. But language models run on GPUs. That’s the most expensive compute we have.

To move just one tenth of Google’s cloud into GPUs, you’d need an explosion of nuclear power plants. That’s not an exaggeration. It’s a scale problem.

Could hardware improve? Sure. Architectures will evolve. But not overnight. For RAG to truly be obsolete, we would need a fundamental shift in infrastructure. That doesn’t happen fast.

RAG Is Still Alive

So for all the people calling RAG dead, it’s not.

RAG is still very much alive.

And that’s our take.

More news

News

Apple vs. Reasoning Models: What The Illusion of Thinking Paper Reveals About AI’s Limits

A breakdown of what Apple’s “Illusion of Thinking” paper reveals about the limits of reasoning models and the debate over what reasoning in AI actually means.

Read Article
Research

A2A vs MCP: How Agent Protocols Really Work (and Where Each One Wins)

A breakdown of when to use MCP vs A2A for connecting agents to tools or each other.

Read Article
Research

RAG vs CAG: What Cache-Augmented Generation Means for Enterprise AI

RAG retrieves what matters, CAG remembers it. Together, they unlock faster, smarter enterprise AI.

Read Article

Find out what the buzz is about. Learn to build AI you can trust.