How dimensions affect cost
Vector dimensions are the size of each embedding's output array. A 1536-dim model produces vectors with 1536 floating-point numbers; a 3072-dim model doubles that. Storage and similarity-search compute scale linearly with dimension.
For a 1M-document corpus at float32: 1536-dim ≈ 6 GB, 3072-dim ≈ 12 GB. Add HNSW or IVF index overhead and the gap widens. Halving dimensions can also speed up cosine queries 2× — meaningful at scale.
OpenAI's text-embedding-3-large supports Matryoshka truncation: ask the API for fewer dimensions and the model returns a prefix-truncated vector that retains most of the semantic signal. Common choices: 3072 (full), 1536, 1024, 512, 256.
FAQ
- Why does dimension matter?
- It dictates index size and query speed in your vector store. A 3072-dim embedding takes 2× the disk and ~2× the cosine-similarity compute of a 1536-dim one. Higher dims aren't universally better — they trade cost for marginal recall gains.
- Can I shorten OpenAI's 3072-dim embeddings?
- Yes.
text-embedding-3-largesupports thedimensionsparameter to truncate the output. The model was trained with Matryoshka representations so the truncated vector retains most of the semantic signal. 1536, 1024, even 256 all work. - Are dimensions comparable across providers?
- No. A 1536-dim OpenAI embedding and a 1536-dim Cohere embedding live in completely different vector spaces — you cannot compare them with cosine similarity. Stick to one provider per index.
- How do I migrate from <code>ada-002</code> to <code>3-small</code>?
- Re-embed everything. Even though both are 1536-dim, the underlying spaces differ. Migration is a one-time backfill: re-embed your corpus, build a new index, swap traffic.
Common pitfalls
- Mixing embeddings from different providers in the same index — vectors aren't comparable across vector spaces.
- Treating
ada-002and3-smallas drop-in compatible just because both are 1536-dim — they're trained differently. - Defaulting to the largest dimension "for accuracy" without measuring recall — for many use cases 1536 matches 3072 within margin.
- Forgetting the max input token limit. Embedding APIs truncate or reject input over the limit; chunk first.
In your code
npm i openai import OpenAI from 'openai';
const openai = new OpenAI();
const res = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'hello world',
});
console.log(res.data[0].embedding.length); // 1536 pip install voyageai import voyageai
vo = voyageai.Client()
result = vo.embed(['hello world'], model='voyage-3')
print(len(result.embeddings[0])) Related tools
- Token Counter
Count tokens for GPT-4o, GPT-4, GPT-3.5 and more — tokenizer-exact, runs in your browser.
- Multi-model Token Compare
Side-by-side token counts for the same input across GPT-4o, GPT-4 Turbo, GPT-3.5, Claude, Gemini.
- Context Window Calculator
Pick a model, paste your prompt, see how much context you have left after reserving output tokens.
- RAG Chunk Estimator
Estimate chunk count and embedding token spend from chunk size + overlap + corpus size.