Import large datasets

When working with datasets containing hundreds of thousands or millions of documents, how you send data to Meilisearch matters. This guide covers batch sizing, supported formats, compression, progress monitoring, and error handling for large imports.

Configure settings before importing

Always configure your index settings before adding documents. If you add documents first and then change settings like ranking rules or filterable attributes, Meilisearch re-indexes the entire dataset. For large imports, this doubles the work.

curl \
  -X PATCH 'MEILISEARCH_URL/indexes/products/settings' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer MEILISEARCH_KEY' \
  --data-binary '{
    "searchableAttributes": ["title", "description"],
    "filterableAttributes": ["category", "price"],
    "sortableAttributes": ["price", "created_at"]
  }'

Wait for this task to complete before sending documents.

Choose the right payload size

A single large payload is faster than many small ones. Each HTTP request creates a task, and Meilisearch processes tasks sequentially. Fewer, larger payloads mean less overhead. The default maximum payload size is 100 MB. You can adjust this with the --http-payload-size-limit configuration option. Guidelines:

Dataset size	Recommended batch size	Why
Under 100K documents	Send all at once	Fits in a single payload
100K to 1M documents	50K to 100K per batch	Balances payload size with memory usage
Over 1M documents	50K to 100K per batch	Prevents memory pressure during indexing

The ideal batch size depends on your document size. If each document is small (under 1 KB), you can send more per batch. If documents are large (10+ KB each with long text fields), use smaller batches.

Use NDJSON for streaming

For large imports, NDJSON (Newline Delimited JSON) is more efficient than JSON arrays. NDJSON lets you stream documents line by line without loading the entire payload into memory:

curl \
  -X POST 'MEILISEARCH_URL/indexes/products/documents' \
  -H 'Content-Type: application/x-ndjson' \
  -H 'Authorization: Bearer MEILISEARCH_KEY' \
  --data-binary @products.ndjson

An NDJSON file has one JSON object per line:

{"id": 1, "title": "Product A", "price": 29.99}
{"id": 2, "title": "Product B", "price": 49.99}
{"id": 3, "title": "Product C", "price": 19.99}

Meilisearch also supports CSV for tabular data:

curl \
  -X POST 'MEILISEARCH_URL/indexes/products/documents' \
  -H 'Content-Type: text/csv' \
  -H 'Authorization: Bearer MEILISEARCH_KEY' \
  --data-binary @products.csv

Compress payloads

Reduce network transfer time by compressing your payloads. Meilisearch supports gzip, deflate, and br (Brotli) encoding:

gzip products.ndjson
curl \
  -X POST 'MEILISEARCH_URL/indexes/products/documents' \
  -H 'Content-Type: application/x-ndjson' \
  -H 'Content-Encoding: gzip' \
  -H 'Authorization: Bearer MEILISEARCH_KEY' \
  --data-binary @products.ndjson.gz

Compression is especially effective for text-heavy documents. A typical JSON payload compresses to 10-20% of its original size.

Monitor import progress

Each document addition returns a taskUid. Use it to check progress:

# Send documents
RESPONSE=$(curl -s \
  -X POST 'MEILISEARCH_URL/indexes/products/documents' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer MEILISEARCH_KEY' \
  --data-binary @batch_1.json)

TASK_UID=$(echo $RESPONSE | jq -r '.taskUid')

# Check task status
curl \
  -X GET "MEILISEARCH_URL/tasks/$TASK_UID" \
  -H 'Authorization: Bearer MEILISEARCH_KEY'

The task response includes timing information:

{
  "uid": 42,
  "status": "succeeded",
  "type": "documentAdditionOrUpdate",
  "details": {
    "receivedDocuments": 50000,
    "indexedDocuments": 50000
  },
  "duration": "PT12.453S",
  "enqueuedAt": "2024-01-15T10:00:00Z",
  "startedAt": "2024-01-15T10:00:01Z",
  "finishedAt": "2024-01-15T10:00:13Z"
}

For batch imports, filter tasks by index to see all pending work:

curl \
  -X GET 'MEILISEARCH_URL/tasks?indexUids=products&statuses=enqueued,processing' \
  -H 'Authorization: Bearer MEILISEARCH_KEY'

Handle errors in batches

If a batch fails, the task status is failed with an error description. Common errors during large imports:

Error	Cause	Solution
`payload_too_large`	Batch exceeds payload size limit	Reduce batch size or increase `--http-payload-size-limit`
`invalid_document_id`	A document has an invalid primary key	Fix the offending documents and resend the batch
`missing_document_id`	Documents are missing the primary key field	Add the primary key field or set it using the `primaryKey` query parameter

When a batch fails, only that batch is affected. Other batches continue processing normally.

Retry strategy

For automated imports, implement a simple retry pattern:

Send a batch and record the taskUid
Poll the task status until it reaches succeeded or failed
If failed, log the error, fix the data if needed, and resend
If succeeded, move to the next batch

Do not resend a batch before its task has completed. Sending duplicate documents is safe (Meilisearch deduplicates by primary key), but it creates unnecessary work in the task queue.

Trim documents before importing

Remove fields that are not searchable, filterable, sortable, or displayed. Smaller documents index faster and use less disk space. If your source data has 50 fields but users only search on 5, extract those 5 fields before sending to Meilisearch.

Next steps

Indexing best practices

Additional tips for efficient indexing

Monitor tasks

Track task status and progress

Design primary keys

Choose the right primary key for your documents

Capabilities

Full-text search

Hybrid and semantic search

Geo search

Conversational search

Multi-search

Filtering, sorting, and faceting

Personalization

Analytics

Security and tenant tokens

Teams

Indexing

Configure settings before importing

Choose the right payload size

Use NDJSON for streaming

Compress payloads

Monitor import progress

Handle errors in batches

Retry strategy

Trim documents before importing

Next steps

Indexing best practices

Monitor tasks

Design primary keys

Capabilities

Full-text search

Hybrid and semantic search

Geo search

Conversational search

Multi-search

Filtering, sorting, and faceting

Personalization

Analytics

Security and tenant tokens

Teams

Indexing

​Configure settings before importing

​Choose the right payload size

​Use NDJSON for streaming

​Compress payloads

​Monitor import progress

​Handle errors in batches

​Retry strategy

​Trim documents before importing

​Next steps

Indexing best practices

Monitor tasks

Design primary keys

Configure settings before importing

Choose the right payload size

Use NDJSON for streaming

Compress payloads

Monitor import progress

Handle errors in batches

Retry strategy

Trim documents before importing

Next steps