Case study · Scale and efficiency

Project Gutenberg: Green Vectors at 15-million-vector scale.

Morphos AI benchmarked Green Vectors against traditional vectorization using the complete Project Gutenberg library, over 50,000 books containing billions of words. The goal was to measure storage efficiency, query speed, and search accuracy at a scale that pushes any system to its limits.

Up to 99.5%Storage reductionUp to 4xFaster queries (15M-vector scale)Up to 59%Improved search quality

The challenge

A dataset large enough to expose real limits.

Validating a vectorization technology at scale requires a dataset large and complex enough to expose real limits. The complete Project Gutenberg library provided that testbed: a vast public-domain corpus that, under traditional vectorization, produced more than 15 million vectors requiring 260GB of storage.

The approach

Two indexes, one corpus, identical queries.

Two vector databases were built from the same corpus, one using standard vectorization and one using Green Vectors. Identical queries were run against both to compare storage, latency, and accuracy. The benchmark also compared against aggressive quantization to distinguish true efficiency from lossy compression.

The results

260GB to 1.3GB.

260GB1.3GB

Stored vector-index size

Source material: Over 50,000 books
Traditional vector count: More than 15 million
Green Vectors vector count: 76,000
Query latency: Up to 4x faster
Search quality: Up to 59% improved across domains
1-bit quantization (comparison): 8.1GB, with accuracy loss

This result comes from one public literary collection under one configuration. Results on other collections or workloads require separate tests.

Why this matters

Efficiency without the accuracy tradeoff.

The significant point is not only the storage and speed gains, but that they were achieved while improving accuracy. Quantization reduces storage by lowering the precision of every vector, losing information. Green Vectors reduces storage by eliminating redundant vectors through semantic transformation, preserving full precision in those that remain. This is a fundamentally more efficient data structure, not compression.

Frequently asked

Questions about this benchmark.

Q1
How much did Green Vectors reduce storage on Project Gutenberg?
From 260GB to 1.3GB, a reduction of up to 99.5%, by reducing more than 15 million vectors to 76,000.
Q2
How does this compare to quantization?
Aggressive 1-bit quantization on the same dataset required 8.1GB and lost accuracy. Green Vectors achieved 1.3GB while improving search quality.
Q3
Is this compression?
No. Green Vectors eliminates redundant vectors through semantic transformation while preserving full precision in the remaining vectors. It is not compression.

Related
Green Vectors
Read
Related
Elastic BBQ Benchmark
Read
Related
Vector Reduction
Read

See Green Vectors on your data.

Bring a dataset worth testing and define the constraint that matters to you.

Get in touch

Project Gutenberg: Green Vectors at 15-million-vector scale.

A dataset large enough to expose real limits.

Two indexes, one corpus, identical queries.

260GB to 1.3GB.

Efficiency without the accuracy tradeoff.

Questions about this benchmark.

How much did Green Vectors reduce storage on Project Gutenberg?

How does this compare to quantization?

Is this compression?

Green Vectors

Elastic BBQ Benchmark

Vector Reduction

See Green Vectors on your data.