What Is Reranking in RAG?
Reranking in RAG is a second-stage step that reorders the documents returned by initial retrieval to improve relevance. After a fast first-stage retrieval returns candidate documents, a reranker, typically a cross-encoder, scores each candidate against the query and reorders them so the most relevant appear first. Reranking improves the order of results but cannot recover documents that retrieval missed.
How reranking works
First-stage retrieval is optimized for speed and returns a broad candidate set. A reranker examines each query-document pair in detail, which is more accurate but more expensive, and assigns relevance scores used to reorder candidates. Because it runs on every query, reranking adds cost and latency proportional to query volume.
Why reranking exists, and when it becomes optional
Reranking is fundamentally a correction step. It exists because first-stage retrieval over a noisy index returns candidates in imperfect order, so a second model re-sorts them. If the index is clean from the start, the first pass already returns well-ordered, relevant results, and the correction step becomes optional. Green Vectors eliminates redundant vectors at ingestion, so the index is not polluted with near-duplicate noise and first-pass relevance is high. For most production workloads this removes the dependency on a separate reranking stage. Ultra-high-precision applications may still layer reranking on top.
What reranking can and cannot do
Reranking can fix the order of retrieved results. It cannot improve recall, meaning it cannot surface a relevant document that first-stage retrieval failed to return. Improving recall requires a better index, not a reranker.