Why use binary quantization
Larger embedding models (1536+ dimensions) generally produce better semantic search results because they capture more nuance in the meaning of text. However, storing and comparing high-dimensional vectors is expensive in terms of disk space, memory, and CPU time. Binary quantization solves this trade-off:| Without BQ | With BQ |
|---|---|
| Each dimension stored as 32-bit float | Each dimension stored as 1 bit |
| 1536-dim vector = 6 KB | 1536-dim vector = 192 bytes |
| Slower indexing at high dimensions | Significantly faster indexing |
| Full precision similarity | Approximate similarity (still effective) |
text-embedding-3-large (3072 dimensions) with binary quantization typically produces better search results than text-embedding-3-small (1536 dimensions) at full precision, while using less storage.
When to use it
Binary quantization is most effective when:- Your dataset contains more than 1M documents with embeddings
- You use a model with 1400+ dimensions (the more dimensions, the better BQ works, because there is more information to preserve even after quantization)
- You want to reduce disk usage and speed up indexing without switching to a smaller model
- Storage or memory is a constraint in your deployment
Enable binary quantization
SetbinaryQuantized to true in your embedder configuration:
Example: OpenAI with a large model
Use OpenAI’s largest embedding model with binary quantization for the best balance of quality and efficiency:Impact on search quality
Binary quantization reduces the precision of vector similarity calculations. In practice, the impact on search quality depends on the model and dataset:- High-dimensional models (1500+ dims): minimal quality loss, often imperceptible
- Medium-dimensional models (512-1500 dims): slight quality reduction, acceptable for most use cases
- Low-dimensional models (under 512 dims): noticeable quality reduction, not recommended
Recommended models with binary quantization
| Provider | Model | Dimensions | Good with BQ? |
|---|---|---|---|
| OpenAI | text-embedding-3-large | 3072 | Excellent |
| OpenAI | text-embedding-3-small | 1536 | Good |
| Cohere | embed-english-v3.0 | 1024 | Good |
| Cohere | embed-multilingual-v3.0 | 1024 | Good |
| HuggingFace | BAAI/bge-large-en-v1.5 | 1024 | Good |
| HuggingFace | BAAI/bge-small-en-v1.5 | 384 | Not recommended |
Next steps
Choose an embedder
Compare embedding providers for your use case
Custom hybrid ranking
Tune the balance between keyword and vector search
Composite embedders
Use different models for indexing and search
Performance tuning
Optimize overall search performance