Handling rate limits with an astrology API

Q: What HTTP status does the Vedika API return when I exceed the rate limit?

A 429 Too Many Requests, accompanied by a Retry-After header indicating how many seconds to wait. The X-RateLimit-* headers on every response let you back off before you reach that point.

Q: How is a 429 different from a 402?

A 429 means you sent too many requests too quickly, so wait and retry. A 402 means your wallet balance is insufficient to cover the query, so retrying will not help; add funds or move to a higher plan. They are separate ceilings.

Q: Does streaming count differently against my rate limit?

One streaming connection to /api/v1/astrology/query/stream counts as a single request, not one per token or SSE event. Streaming changes how the answer is delivered, not how the request is metered.

To handle rate limits with the Vedika astrology API, read the rate-limit headers on every response, retry 429 responses with exponential backoff plus jitter while honouring the Retry-After header, and cache the computations that never change for a given birth chart. Because most astrology workloads are bursty — a batch of kundli generations, a matchmaking sweep, or a daily transit refresh — the bulk of your throughput problems disappear once you separate one-time computed data from per-request AI calls and queue the rest.

This guide covers how rate limiting works on the Vedika API, how to read the headers, a production-ready retry implementation, and the caching and batching patterns that keep you well under any ceiling while controlling cost.

How rate limiting works on the Vedika API

Rate limits exist to keep the service responsive for everyone and to protect you from a runaway loop quietly draining your wallet. The Vedika API applies limits per API key, so your vk_live_* key has its own budget that is not affected by other customers. Two distinct ceilings matter:

Request rate — how many calls per window your key may make. This is what produces a 429 Too Many Requests response when exceeded.
Wallet balance — every query costs money (roughly $0.01–$0.05 depending on the path), so a key with an exhausted balance is rejected even when it is comfortably under the request ceiling. Plan tiers map to wallet credits: Starter at $12/mo, Professional at $60, Business at $120, Enterprise at $240.

Higher plans carry more generous request windows alongside the larger wallet. If you are throughput-bound rather than balance-bound, moving up a tier or talking to us about an Enterprise window is usually cheaper than engineering around a low ceiling. The pricing page lists current tiers.

Which endpoints count differently

Not every operation carries the same weight. It helps to think in three buckets:

Pure computation (/v2/astrology/*) — divisional charts, dashas, ashtakavarga, panchang. These are deterministic for a given birth input and are among the lowest-priced and fastest to serve.
AI answer path (/api/v1/astrology/query) — a natural-language question grounded on the computed chart. This is the heaviest path; the optional speed:"fast" flag routes to Vedika Swift for lower latency.
Streaming (/api/v1/astrology/query/stream) — the same answer path delivered over Server-Sent Events. One SSE connection is one request against your rate budget, not one per token.

Read the rate-limit headers

Every response carries headers that tell you exactly where you stand, so you should never have to guess or hard-code a number. Rate-limit headers are explicitly allowed in our public surface, so they are safe to rely on in client code.

Header	Meaning
`X-RateLimit-Limit`	Maximum requests allowed in the current window.
`X-RateLimit-Remaining`	Requests left before you hit the ceiling.
`X-RateLimit-Reset`	When the window resets (epoch seconds).
`Retry-After`	Present on a `429`; seconds to wait before retrying.

The disciplined pattern is to slow down before you are throttled. When X-RateLimit-Remaining drops near zero, pause your sender until X-RateLimit-Reset rather than firing the requests that will bounce. A quick way to inspect the headers on a single call:

curl -i -X POST https://api.vedika.io/api/v1/astrology/query \
  -H "x-api-key: vk_live_yourkey" \
  -H "Content-Type: application/json" \
  -d '{
    "question": "What does my Moon sign say about my temperament?",
    "birthDetails": {
      "datetime": "1990-05-15T08:30:00",
      "latitude": 18.5204,
      "longitude": 73.8567,
      "timezone": "Asia/Kolkata"
    }
  }'
# Inspect the X-RateLimit-* headers in the response before scaling up.

Retry 429s with backoff and jitter

When you do get a 429, the correct response is to wait and retry — but not on a fixed delay, and never in a tight loop. Fixed delays cause synchronised retry storms where every worker wakes at the same moment and overwhelms the window again. Exponential backoff with random jitter spreads the retries out. Always prefer the server's Retry-After value when it is present; fall back to computed backoff when it is not.

Node.js

const BASE_URL = "https://api.vedika.io";

async function vedikaQuery(body, { maxRetries = 5 } = {}) {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const res = await fetch(`${BASE_URL}/api/v1/astrology/query`, {
      method: "POST",
      headers: {
        "x-api-key": process.env.VEDIKA_API_KEY,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(body),
    });

    if (res.status !== 429) return res; // success or a non-retryable error

    // Honour Retry-After if the server sent it, else exponential backoff + jitter
    const retryAfter = Number(res.headers.get("retry-after"));
    const backoff = Number.isFinite(retryAfter)
      ? retryAfter * 1000
      : Math.min(2 ** attempt * 500, 30000);
    const jitter = Math.random() * 250;
    await new Promise((r) => setTimeout(r, backoff + jitter));
  }
  throw new Error("Rate limit retries exhausted");
}

Python

import os, time, random, requests

BASE_URL = "https://api.vedika.io"

def vedika_query(body, max_retries=5):
    headers = {
        "x-api-key": os.environ["VEDIKA_API_KEY"],
        "Content-Type": "application/json",
    }
    for attempt in range(max_retries + 1):
        res = requests.post(
            f"{BASE_URL}/api/v1/astrology/query", json=body, headers=headers
        )
        if res.status_code != 429:
            return res

        retry_after = res.headers.get("Retry-After")
        backoff = (
            float(retry_after)
            if retry_after else min(2 ** attempt * 0.5, 30)
        )
        time.sleep(backoff + random.uniform(0, 0.25))
    raise RuntimeError("Rate limit retries exhausted")

Only retry on 429 and on transient 5xx responses. A 400 (bad birth data), 401 (key problem), or 402 (insufficient wallet balance) will never succeed on retry — retrying them just wastes time and, in the case of a balance error, is a signal to top up rather than to loop.

Cache what never changes

The single biggest lever for staying under a rate limit is not calling the API at all. A birth chart is fixed: a person's natal positions, divisional charts (D1 through D60), Vimshottari dasha sequence, and ashtakavarga bindus do not change after birth. Compute them once via /v2/astrology/* and store the result keyed on the normalised birth input.

Cache key — hash the tuple of datetime + latitude + longitude + timezone (and the system: Vedic, Western, or KP). Identical input always yields identical output, so the same key is safe forever.
Permanent cache — natal charts, divisional charts, dashas. No TTL needed; they are immutable.
Time-bounded cache — transits (gochar), daily panchang, and current-dasha context change with the clock. A TTL of a day for panchang and a few hours for transit positions is usually fine.
Do not blindly cache AI answers across different questions. The computed chart underneath is cacheable; the natural-language response to a specific question is not interchangeable.

In practice, teams that cache the computed layer find their actual API call volume drops sharply, because repeat visits to the same chart are served locally. That keeps you under the request ceiling and trims the per-query spend at the same time.

Batch and queue bursty workloads

Matrimony platforms running compatibility sweeps, and dashboards refreshing transits for thousands of users, generate spiky load. Rather than firing every request the moment a job starts, push the work through a queue with a bounded concurrency limit tuned to your X-RateLimit-Limit.

Set a concurrency cap below your per-window limit — for example, run 5–10 workers rather than unbounded parallelism.
Spread overnight jobs. A daily transit refresh for a large user base does not need to finish in one minute; pace it across the window and you will never see a 429.
Drain the queue on backpressure. When X-RateLimit-Remaining approaches zero, pause the queue until X-RateLimit-Reset instead of letting workers hammer the wall.
Prefer computation endpoints for bulk. If a sweep only needs scores or yogas, use the /v2/astrology/* compute path; reserve the heavier AI query path for the moments a user actually asks a question.

You can prototype all of this against the free sandbox, which needs no API key, so you can validate your backoff and queue logic before a single real request is metered.

Key facts

Rate limits are applied per API key; your vk_live_* key has an isolated budget.
Read X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset on every response; honour Retry-After on a 429.
Retry only 429 and transient 5xx, using exponential backoff with jitter. Never retry 400, 401, or 402.
A 402 means an exhausted wallet, not a rate problem — top up or upgrade your plan.
Cache immutable computed data (natal and divisional charts, dashas) permanently; cache transits and panchang with a short TTL.
Plan tiers carry both wider request windows and larger wallets: Starter $12, Professional $60, Business $120, Enterprise $240.
Computation endpoints (/v2/astrology/*) are lighter than the AI answer path (/api/v1/astrology/query); batch bulk work through the compute path.

FAQ

What HTTP status does the Vedika API return when I exceed the rate limit?

A 429 Too Many Requests, accompanied by a Retry-After header indicating how many seconds to wait. The X-RateLimit-* headers on every response let you back off before you ever reach that point.

How is a 429 different from a 402?

A 429 means you sent too many requests too quickly — wait and retry. A 402 means your wallet balance is insufficient to cover the query — retrying will not help; add funds or move to a higher plan. They are separate ceilings.

Does streaming count differently against my rate limit?

One streaming connection to /api/v1/astrology/query/stream counts as a single request, not one per token or SSE event. Streaming changes how the answer is delivered, not how the request is metered.

How do I avoid hitting limits on a large batch job?

Cache the immutable computed layer so repeat charts never hit the API, run the job through a queue with bounded concurrency tuned to your X-RateLimit-Limit, and use the lighter /v2/astrology/* computation endpoints for bulk work. Pace overnight refreshes across the window rather than firing them all at once.

Can I test my retry logic without spending money?

Yes. The free sandbox at vedika.io/sandbox exposes mock endpoints with no API key required, so you can validate backoff, jitter, and queue behaviour before any real request is metered. See the API docs for the full endpoint reference.