Case Study

    Project Gutenberg: Green Vectors at 15-Million-Vector Scale

    Morphos AI benchmarked Green Vectors against traditional vectorization using the complete Project Gutenberg library, over 50,000 books containing billions of words. The goal was to measure storage efficiency, query speed, and search accuracy at a scale that pushes any system to its limits.

    Up to 99.5%
    Storage reduction
    Up to 4x
    Faster queries (15M-vector scale)
    Up to 59%
    Improved search quality
    Storage
    Traditional full index260 GB
    Green Vectors1.3 GB

    The challenge

    Validating a vectorization technology at scale requires a dataset large and complex enough to expose real limits. The complete Project Gutenberg library provided that testbed: a vast public-domain corpus that, under traditional vectorization, produced more than 15 million vectors requiring 260GB of storage.

    The approach

    Two vector databases were built from the same corpus, one using standard vectorization and one using Green Vectors. Identical queries were run against both to compare storage, latency, and accuracy. The benchmark also compared against aggressive quantization to distinguish true efficiency from lossy compression.

    The results

    Green Vectors reduced more than 15 million vectors to 76,000, and reduced storage from 260GB to 1.3GB, a 99.5% reduction. Query latency improved by up to 4x. Search quality improved by up to 59% across domains. For comparison, aggressive 1-bit quantization on the same dataset still required 8.1GB and sacrificed accuracy to do it.

    Why this matters

    The significant point is not only the storage and speed gains, but that they were achieved while improving accuracy. Quantization reduces storage by lowering the precision of every vector, losing information. Green Vectors reduces storage by eliminating redundant vectors through semantic transformation, preserving full precision in those that remain. This is a fundamentally more efficient data structure, not compression.

    FAQ

    Frequently asked questions.

    From 260GB to 1.3GB, a reduction of up to 99.5%, by reducing more than 15 million vectors to 76,000.
    Aggressive 1-bit quantization on the same dataset required 8.1GB and lost accuracy. Green Vectors achieved 1.3GB while improving search quality.
    No. Green Vectors eliminates redundant vectors through semantic transformation while preserving full precision in the remaining vectors. It is not compression.

    Related

    See Green Vectors on your data

    Request Kitana beta access