When working with datasets containing hundreds of thousands or millions of documents, how you send data to Meilisearch matters. This guide covers batch sizing, supported formats, compression, progress monitoring, and error handling for large imports.
Always configure your index settings before adding documents. If you add documents first and then change settings like ranking rules or filterable attributes, Meilisearch re-indexes the entire dataset. For large imports, this doubles the work.
curl \
-X PATCH 'MEILISEARCH_URL/indexes/products/settings' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer MEILISEARCH_KEY' \
--data-binary '{
"searchableAttributes": ["title", "description"],
"filterableAttributes": ["category", "price"],
"sortableAttributes": ["price", "created_at"]
}'
Wait for this task to complete before sending documents.
Choose the right payload size
A single large payload is faster than many small ones. Each HTTP request creates a task, and Meilisearch processes tasks sequentially. Fewer, larger payloads mean less overhead.
The default maximum payload size is 100 MB. You can adjust this with the --http-payload-size-limit configuration option.
Guidelines:
| Dataset size | Recommended batch size | Why |
|---|
| Under 100K documents | Send all at once | Fits in a single payload |
| 100K to 1M documents | 50K to 100K per batch | Balances payload size with memory usage |
| Over 1M documents | 50K to 100K per batch | Prevents memory pressure during indexing |
The ideal batch size depends on your document size. If each document is small (under 1 KB), you can send more per batch. If documents are large (10+ KB each with long text fields), use smaller batches.
Use NDJSON for streaming
For large imports, NDJSON (Newline Delimited JSON) is more efficient than JSON arrays. NDJSON lets you stream documents line by line without loading the entire payload into memory:
curl \
-X POST 'MEILISEARCH_URL/indexes/products/documents' \
-H 'Content-Type: application/x-ndjson' \
-H 'Authorization: Bearer MEILISEARCH_KEY' \
--data-binary @products.ndjson
An NDJSON file has one JSON object per line:
{"id": 1, "title": "Product A", "price": 29.99}
{"id": 2, "title": "Product B", "price": 49.99}
{"id": 3, "title": "Product C", "price": 19.99}
Meilisearch also supports CSV for tabular data:
curl \
-X POST 'MEILISEARCH_URL/indexes/products/documents' \
-H 'Content-Type: text/csv' \
-H 'Authorization: Bearer MEILISEARCH_KEY' \
--data-binary @products.csv
Compress payloads
Reduce network transfer time by compressing your payloads. Meilisearch supports gzip, deflate, and br (Brotli) encoding:
gzip products.ndjson
curl \
-X POST 'MEILISEARCH_URL/indexes/products/documents' \
-H 'Content-Type: application/x-ndjson' \
-H 'Content-Encoding: gzip' \
-H 'Authorization: Bearer MEILISEARCH_KEY' \
--data-binary @products.ndjson.gz
Compression is especially effective for text-heavy documents. A typical JSON payload compresses to 10-20% of its original size.
Monitor import progress
Each document addition returns a taskUid. Use it to check progress:
# Send documents
RESPONSE=$(curl -s \
-X POST 'MEILISEARCH_URL/indexes/products/documents' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer MEILISEARCH_KEY' \
--data-binary @batch_1.json)
TASK_UID=$(echo $RESPONSE | jq -r '.taskUid')
# Check task status
curl \
-X GET "MEILISEARCH_URL/tasks/$TASK_UID" \
-H 'Authorization: Bearer MEILISEARCH_KEY'
The task response includes timing information:
{
"uid": 42,
"status": "succeeded",
"type": "documentAdditionOrUpdate",
"details": {
"receivedDocuments": 50000,
"indexedDocuments": 50000
},
"duration": "PT12.453S",
"enqueuedAt": "2024-01-15T10:00:00Z",
"startedAt": "2024-01-15T10:00:01Z",
"finishedAt": "2024-01-15T10:00:13Z"
}
For batch imports, filter tasks by index to see all pending work:
curl \
-X GET 'MEILISEARCH_URL/tasks?indexUids=products&statuses=enqueued,processing' \
-H 'Authorization: Bearer MEILISEARCH_KEY'
Handle errors in batches
If a batch fails, the task status is failed with an error description. Common errors during large imports:
| Error | Cause | Solution |
|---|
payload_too_large | Batch exceeds payload size limit | Reduce batch size or increase --http-payload-size-limit |
invalid_document_id | A document has an invalid primary key | Fix the offending documents and resend the batch |
missing_document_id | Documents are missing the primary key field | Add the primary key field or set it using the primaryKey query parameter |
When a batch fails, only that batch is affected. Other batches continue processing normally.
Retry strategy
For automated imports, implement a simple retry pattern:
- Send a batch and record the
taskUid
- Poll the task status until it reaches
succeeded or failed
- If
failed, log the error, fix the data if needed, and resend
- If
succeeded, move to the next batch
Do not resend a batch before its task has completed. Sending duplicate documents is safe (Meilisearch deduplicates by primary key), but it creates unnecessary work in the task queue.
Trim documents before importing
Remove fields that are not searchable, filterable, sortable, or displayed. Smaller documents index faster and use less disk space. If your source data has 50 fields but users only search on 5, extract those 5 fields before sending to Meilisearch.
Next steps
Indexing best practices
Additional tips for efficient indexing
Monitor tasks
Track task status and progress
Design primary keys
Choose the right primary key for your documents