Research
July 3, 2025

Is RAG Dead or Alive?

  •  
Read time:  
2
mins
Daniel Warfield
Senior Engineer

As AI models grow larger and context windows expand, some have started questioning whether Retrieval-Augmented Generation (RAG) is still necessary. With million-token prompts now possible, the idea of simply feeding all your data into a model is gaining traction. But is RAG really outdated, or are we overlooking what made it essential in the first place?

Is RAG Dead or Alive?

There’s an epic battle on the internet. In this corner, RAG. And in the other corner, also RAG. Long live RAG. RAG will never die. Or will it?

Here’s our take.

The Case Against RAG

The argument for why RAG might be dead is simple. As language models grow, their context windows get bigger. You can now fit more information into a single prompt than ever before.

RAG exists because you can’t feed all your documents into a language model directly. So instead, you search a database, usually a vector database of documents, pull out a few relevant pieces, and send them into the model as context. That’s what RAG does. Retrieve and attach.

But if you can fit everything into the prompt, why retrieve anything at all?

That’s the logic. And context windows really are expanding. Gemini 2.5 supports one million tokens. Two million is reportedly coming. Meta’s LLaMA model claims ten million. Just a few years ago, GPT launched with sixteen thousand.

We’ve gone from a few thousand to several million in under two years.

Maybe today you can only fit a few paragraphs. But soon, you’ll be able to fit books. After that, maybe libraries.

At that point, who needs RAG?

Why That Argument Falls Apart

Here’s why we don’t agree.

A million tokens is not that much. It covers maybe a few thousand pages. That might be fine for a small business. But a single civil lawsuit can include one to five million pages. We’re nowhere close to fitting even one of those cases into a model.

And there’s a bigger issue.

This is really about where the world’s information lives. Right now, it’s on hard drives. That’s the cheapest way to store data. But language models run on GPUs. That’s the most expensive compute we have.

To move just one tenth of Google’s cloud into GPUs, you’d need an explosion of nuclear power plants. That’s not an exaggeration. It’s a scale problem.

Could hardware improve? Sure. Architectures will evolve. But not overnight. For RAG to truly be obsolete, we would need a fundamental shift in infrastructure. That doesn’t happen fast.

RAG Is Still Alive

So for all the people calling RAG dead, it’s not.

RAG is still very much alive.

And that’s our take.

More news

News

Andromeda Advantage Joins the FraudX Platform to Fight Construction Fraud with AI

FraudX and Andromeda Advantage unite to tackle construction fraud with advanced AI, connecting insights across claims and uncovering hidden fraud networks.

Read Article
News

Fighting the “Fraudemic” at the 2025 CLM Construction Conference

The construction industry is seeing a surge in fraudulent activity, what many are calling a “fraudemic.” This year’s CLM Construction Conference is bringing the industry together to address it head-on.

Read Article

Find out what the buzz is about. Learn to build AI you can trust.