Binary quantization

Binary quantization compresses embedding vectors by representing each dimension with a single bit instead of a full floating-point number. This dramatically reduces storage requirements and speeds up vector operations, making it practical to use larger, higher-quality embedding models that produce more dimensions.

Why use binary quantization

Larger embedding models (1536+ dimensions) generally produce better semantic search results because they capture more nuance in the meaning of text. However, storing and comparing high-dimensional vectors is expensive in terms of disk space, memory, and CPU time. Binary quantization solves this trade-off:

Without BQ	With BQ
Each dimension stored as 32-bit float	Each dimension stored as 1 bit
1536-dim vector = 6 KB	1536-dim vector = 192 bytes
Slower indexing at high dimensions	Significantly faster indexing
Full precision similarity	Approximate similarity (still effective)

The key insight is that a large model with binary quantization often outperforms a small model without it. For example, OpenAI’s text-embedding-3-large (3072 dimensions) with binary quantization typically produces better search results than text-embedding-3-small (1536 dimensions) at full precision, while using less storage.

When to use it

Binary quantization is most effective when:

Your dataset contains more than 1M documents with embeddings
You use a model with 1400+ dimensions (the more dimensions, the better BQ works, because there is more information to preserve even after quantization)
You want to reduce disk usage and speed up indexing without switching to a smaller model
Storage or memory is a constraint in your deployment

Binary quantization is less effective with low-dimensional models (under 512 dimensions), where the information loss from quantization has a more noticeable impact on search quality.

Enable binary quantization

Set binaryQuantized to true in your embedder configuration:

curl \
  -X PATCH 'MEILISEARCH_URL/indexes/products/settings/embedders' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer MEILISEARCH_KEY' \
  --data-binary '{
    "default": {
      "binaryQuantized": true
    }
  }'

This works with any embedder source (OpenAI, Cohere, HuggingFace, REST, or user-provided).

Example: OpenAI with a large model

Use OpenAI’s largest embedding model with binary quantization for the best balance of quality and efficiency:

curl \
  -X PATCH 'MEILISEARCH_URL/indexes/products/settings/embedders' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer MEILISEARCH_KEY' \
  --data-binary '{
    "default": {
      "source": "openAi",
      "apiKey": "OPEN_AI_API_KEY",
      "model": "text-embedding-3-large",
      "binaryQuantized": true
    }
  }'

Activating binary quantization is irreversible. Once enabled, Meilisearch converts all vectors and discards the original full-precision data. The only way to recover the original vectors is to re-index all documents in a new embedder without binary quantization.

Impact on search quality

Binary quantization reduces the precision of vector similarity calculations. In practice, the impact on search quality depends on the model and dataset:

High-dimensional models (1500+ dims): minimal quality loss, often imperceptible
Medium-dimensional models (512-1500 dims): slight quality reduction, acceptable for most use cases
Low-dimensional models (under 512 dims): noticeable quality reduction, not recommended

The ranking pipeline mitigates this further in hybrid search mode, where keyword matching compensates for any precision loss in the vector component.

Recommended models with binary quantization

Provider	Model	Dimensions	Good with BQ?
OpenAI	`text-embedding-3-large`	3072	Excellent
OpenAI	`text-embedding-3-small`	1536	Good
Cohere	`embed-english-v3.0`	1024	Good
Cohere	`embed-multilingual-v3.0`	1024	Good
HuggingFace	`BAAI/bge-large-en-v1.5`	1024	Good
HuggingFace	`BAAI/bge-small-en-v1.5`	384	Not recommended

Next steps

Choose an embedder

Compare embedding providers for your use case

Custom hybrid ranking

Tune the balance between keyword and vector search

Composite embedders

Use different models for indexing and search

Performance tuning

Optimize overall search performance

Capabilities

Full-text search

Hybrid and semantic search

Geo search

Conversational search

Multi-search

Filtering, sorting, and faceting

Personalization

Analytics

Security and tenant tokens

Teams

Indexing

Why use binary quantization

When to use it

Enable binary quantization

Example: OpenAI with a large model

Impact on search quality

Recommended models with binary quantization

Next steps

Choose an embedder

Custom hybrid ranking

Composite embedders

Performance tuning

Capabilities

Full-text search

Hybrid and semantic search

Geo search

Conversational search

Multi-search

Filtering, sorting, and faceting

Personalization

Analytics

Security and tenant tokens

Teams

Indexing

​Why use binary quantization

​When to use it

​Enable binary quantization

​Example: OpenAI with a large model

​Impact on search quality

​Recommended models with binary quantization

​Next steps

Choose an embedder

Custom hybrid ranking

Composite embedders

Performance tuning

Why use binary quantization

When to use it

Enable binary quantization

Example: OpenAI with a large model

Impact on search quality

Recommended models with binary quantization

Next steps