Skip to main content
Binary quantization compresses embedding vectors by representing each dimension with a single bit instead of a full floating-point number. This dramatically reduces storage requirements and speeds up vector operations, making it practical to use larger, higher-quality embedding models that produce more dimensions.

Why use binary quantization

Larger embedding models (1536+ dimensions) generally produce better semantic search results because they capture more nuance in the meaning of text. However, storing and comparing high-dimensional vectors is expensive in terms of disk space, memory, and CPU time. Binary quantization solves this trade-off:
Without BQWith BQ
Each dimension stored as 32-bit floatEach dimension stored as 1 bit
1536-dim vector = 6 KB1536-dim vector = 192 bytes
Slower indexing at high dimensionsSignificantly faster indexing
Full precision similarityApproximate similarity (still effective)
The key insight is that a large model with binary quantization often outperforms a small model without it. For example, OpenAI’s text-embedding-3-large (3072 dimensions) with binary quantization typically produces better search results than text-embedding-3-small (1536 dimensions) at full precision, while using less storage.

When to use it

Binary quantization is most effective when:
  • Your dataset contains more than 1M documents with embeddings
  • You use a model with 1400+ dimensions (the more dimensions, the better BQ works, because there is more information to preserve even after quantization)
  • You want to reduce disk usage and speed up indexing without switching to a smaller model
  • Storage or memory is a constraint in your deployment
Binary quantization is less effective with low-dimensional models (under 512 dimensions), where the information loss from quantization has a more noticeable impact on search quality.

Enable binary quantization

Set binaryQuantized to true in your embedder configuration:
curl \
  -X PATCH 'MEILISEARCH_URL/indexes/products/settings/embedders' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer MEILISEARCH_KEY' \
  --data-binary '{
    "default": {
      "binaryQuantized": true
    }
  }'
This works with any embedder source (OpenAI, Cohere, HuggingFace, REST, or user-provided).

Example: OpenAI with a large model

Use OpenAI’s largest embedding model with binary quantization for the best balance of quality and efficiency:
curl \
  -X PATCH 'MEILISEARCH_URL/indexes/products/settings/embedders' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer MEILISEARCH_KEY' \
  --data-binary '{
    "default": {
      "source": "openAi",
      "apiKey": "OPEN_AI_API_KEY",
      "model": "text-embedding-3-large",
      "binaryQuantized": true
    }
  }'
Activating binary quantization is irreversible. Once enabled, Meilisearch converts all vectors and discards the original full-precision data. The only way to recover the original vectors is to re-index all documents in a new embedder without binary quantization.

Impact on search quality

Binary quantization reduces the precision of vector similarity calculations. In practice, the impact on search quality depends on the model and dataset:
  • High-dimensional models (1500+ dims): minimal quality loss, often imperceptible
  • Medium-dimensional models (512-1500 dims): slight quality reduction, acceptable for most use cases
  • Low-dimensional models (under 512 dims): noticeable quality reduction, not recommended
The ranking pipeline mitigates this further in hybrid search mode, where keyword matching compensates for any precision loss in the vector component.
ProviderModelDimensionsGood with BQ?
OpenAItext-embedding-3-large3072Excellent
OpenAItext-embedding-3-small1536Good
Cohereembed-english-v3.01024Good
Cohereembed-multilingual-v3.01024Good
HuggingFaceBAAI/bge-large-en-v1.51024Good
HuggingFaceBAAI/bge-small-en-v1.5384Not recommended

Next steps

Choose an embedder

Compare embedding providers for your use case

Custom hybrid ranking

Tune the balance between keyword and vector search

Composite embedders

Use different models for indexing and search

Performance tuning

Optimize overall search performance