Promptyard
Cost

Rate Limit / TPM Calculator

Sanity-check whether your traffic stays within RPM and TPM ceilings for a given model and tier.

Requests per minute12%
60 / 500 rpm
Tokens per minute100%
144,000 / 30,000 tpm
Verdict

Over the limit on TPM. Reduce traffic, queue, or request a tier upgrade.

Source: platform.openai.com/account/limits. Limits depend on your account tier and usage history. Verify in your provider dashboard.

How rate limits work

Rate limits are enforced per minute as a sliding window. Provider APIs return 429 Too Many Requests with a retry-after header when you breach. Two limits typically apply concurrently: requests-per-minute (RPM) and tokens-per-minute (TPM). You're throttled by whichever you hit first.

At the start of an account or for a new model, default limits are conservative. They scale with cumulative usage and time. For predictable production loads, file a limit-increase request before launch — the response is usually within a day.

FAQ

Why are limits structured as both RPM and TPM?
Either can throttle you. RPM caps the number of calls per minute regardless of size; TPM caps the total tokens flowing through. A workload with many small calls hits RPM first; one with few large calls hits TPM first.
How do I get tier-upgrade?
OpenAI and Anthropic both auto-upgrade based on accumulated spend + account age. There's usually a tier ladder published in the account dashboard. For larger jumps, contact sales.
What happens if I exceed the limit?
You get a 429 response. Most SDKs retry with exponential backoff. Persistent breaches risk temporary suspension; the dashboard surfaces breach counts.
Why are the numbers in this tool so different from what I see in my dashboard?
These are public defaults. Your actual limits depend on tier, region, and any custom limits the provider granted. Always verify with your account.

Common pitfalls

  • Sizing infrastructure for a startup-tier limit when you should be at a higher tier already (check the dashboard).
  • Forgetting parallelism — 10 workers each making 50 RPM = 500 RPM aggregate.
  • Not implementing exponential backoff. A naive retry loop on 429 makes things worse.
  • Treating short bursts as average traffic. Limits are per-minute, but burst-traffic patterns can briefly push 10× the average.

Related tools