Voice API

Audio-in, audio-out astrology. Send a spoken question, receive a spoken answer — one API call handles speech recognition, AI reasoning, and speech synthesis.

Production-ready. Both /api/v1/voice (buffered) and /api/v1/voice/stream (SSE streaming) are live at api.vedika.io.

How It Works

1
Upload audio — POST a multipart form with the caller's recorded audio (webm, mp3, wav, m4a, or ogg; max 25 MB).
2
Speech-to-text — Vedika AI transcribes the audio and detects the spoken language automatically.
3
AI astrology — The transcribed question is routed through the Vedika Intelligence pipeline (fast mode) with birth chart context, yielding a personalized astrology response.
4
Text-to-speech — The response is synthesized into natural-sounding speech (MP3) in the detected language.
5
Audio response — The MP3 binary is returned directly (or streamed as base64 SSE events on the streaming endpoint).

Endpoints

POST /api/v1/voice

Buffered voice query. Accepts multipart audio, returns a complete MP3 file as the response body (audio/mpeg).

Best for: mobile apps, IVR systems, any client that plays audio after full download.

POST /api/v1/voice/stream

Streaming voice query (SSE). Same multipart input, but the response is a text/event-stream that emits audio chunks as they are generated — sub-200ms time-to-first-audio on the Jarvis tier.

Best for: real-time voice assistants, conversational UIs, and any client that can play audio progressively.

Authentication

Authenticate with your Vedika API key using either method:

Authorization: Bearer vk_live_your_api_key

or

Authorization: Bearer vk_live_your_api_key

Voice requires a live API key (vk_live_* or vk_ent_*). Test keys (vk_test_*) are not accepted. A minimum wallet balance of $0.15 is required per call.

Request Format

Content-Type: multipart/form-data

FieldTypeRequiredDescription
audio File REQUIRED Audio file. Max 25 MB. Accepted formats: webm, mp3, wav, m4a, ogg, aac, flac.
birthDetails JSON string optional Birth data for personalized chart-based answers. Also accepts birth_details (snake_case alias).
{"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"}
language String optional Hint for speech recognition. ISO 639-1 code: en, hi, ta, te, kn, ml, bn, gu, mr, etc. Auto-detected if omitted.
speed String optional Must be "fast" or omitted. Voice only supports fast mode. Sending "standard" returns a 400 error.
tier String optional Voice quality tier. One of: vedika-native, vedika-standard, vedika-standard, vedika-standard, vedika-standard, vedika-jarvis. Auto-selected by language if omitted. See Voice Tiers.
conversationId String optional Pass a previous conversation ID for multi-turn follow-up questions. The AI will reference prior context.
signal String optional Deprecated. Use tier field instead. For backward compat: "b2c"/"free"vedika-native, "jarvis"vedika-jarvis. New integrations should set tier directly.

Voice Tiers

Three B2B voice tiers, all available on Business ($120/mo) and Enterprise ($240/mo) plans. Default is vedika-standard if tier is omitted. Starter / Pro plans receive 403 VOICE_PLAN_REQUIRED.

TierCost / callLatencyLanguagesBest for
vedika-standard default $0.072 ~1s 15+ (Hindi, English, Tamil, Telugu, Kannada, Bengali, Gujarati, Marathi, Malayalam, Punjabi, and more) Balanced quality + latency. Use this unless you have a specific reason to pick another tier.
vedika-native budget $0.040 ~800ms 600+ via audio-native pipeline High-volume apps where cost-per-call matters more than the last 10% of audio polish. Skips the transcription step entirely.
vedika-jarvis real-time $0.080 <500ms voice-to-voice Hindi, English, + 10 Indic (hi/en/ta/te/gu/mr/bn/kn/ml/pa) Live assistants, IVR, in-call agents. Use with POST /api/v1/voice/stream for SSE streaming audio chunks.

Rate limits (separate from text query limits — voice costs 3–10× more):

  • Business ($120/mo): 30 calls/min · 2,000 calls/day
  • Enterprise ($240/mo): 100 calls/min · 10,000 calls/day

On breach: 429 Too Many Requests with Retry-After header. Headers X-Vedika-Voice-RateLimit-Minute, X-Vedika-Voice-RateLimit-Remaining-Minute, X-Vedika-Voice-RateLimit-Day, X-Vedika-Voice-RateLimit-Remaining-Day on every response.

All prices above are per-call. Exact cost per call is returned in the response X-Vedika-Voice-Meta header (base64 JSON) as costUsd. AI reasoning is billed separately for vedika-standard and vedika-jarvis; vedika-native is a single flat-rate call.

Response: Buffered Endpoint

Success (audio available)

Content-Type: audio/mpeg
The response body is raw MP3 binary. Save it directly as an .mp3 file or pipe it to an audio player.

Response Headers

HeaderDescription
Content-Typeaudio/mpeg
Content-LengthSize of the MP3 in bytes
X-Vedika-Voice-MetaBase64-encoded JSON with transcription, language, tier, billing, and processing time. Decode with atob() / base64.b64decode().
X-Vedika-Voice-TierPublic tier label: vedika-native, vedika-standard, or vedika-jarvis
X-Vedika-Voice-LangDetected language ISO code (e.g., hi, en)
X-Vedika-TranscriptionURL-encoded transcription of the input audio (max 2000 chars)
X-Vedika-SignatureHMAC watermark for response integrity verification

X-Vedika-Voice-Meta (decoded)

{
  "transcription": "What does my chart say about career?",
  "language": "en",
  "tier": "vedika-standard",
  "tierSource": "auto",
  "processingMs": 4200,
  "sttDurationSec": 3.2,
  "ttsDurationSec": 12.5,
  "engine": "vedika-voice",
  "costUsd": 0.072000
}

Fallback (TTS failed)

If speech synthesis fails, the endpoint degrades gracefully to a JSON response with audio: null and the text answer:

{
  "success": true,
  "audio": null,
  "response": "Your chart shows a strong period for career growth...",
  "transcription": "What does my chart say about career?",
  "language": "en",
  "tier": "vedika-standard",
  "billing": { ... }
}

Response: Streaming Endpoint

Content-Type: text/event-stream (Server-Sent Events). The stream emits the following event types in order:

event: started

Fired after successful speech recognition. Contains the transcription and detected language.

data: {"transcription":"What about my marriage?","language":"hi","tier":"vedika-jarvis","sttMs":820}
event: text

The complete AI-generated text answer. Emitted as a single frame (the AI pipeline returns the full answer, not token-by-token).

data: {"delta":"Your Venus is exalted in Pisces...","done":true}
event: audio

Base64-encoded MP3 chunks. Multiple audio events are emitted in sequence. Decode each chunk and append to a buffer or MediaSource for progressive playback.

data: {"bytesBase64":"//uQxAAAAAANIAAAAAExBTUUzLjEw...","seq":0}
data: {"bytesBase64":"AAAAIGZ0eXBpc29t...","seq":1}
data: {"bytesBase64":"...","seq":2}
event: completed

Final event with billing summary and total processing time.

data: {"processingMs":2400,"sttDurationSec":1.2,"ttsDurationSec":8.5,"totalChunks":14,"costUsd":0.080000}
event: error

Emitted on failure at any stage. The stream closes after this event.

data: {"code":"STT_FAILED","message":"Could not transcribe the submitted audio."}

Error Codes

HTTPCodeDescription
400NO_AUDIOThe audio multipart field is missing or empty.
400VOICE_REQUIRES_FAST_MODEspeed was set to something other than "fast". Voice only supports fast mode.
400BAD_BIRTH_DETAILS_JSONbirthDetails is not valid JSON.
401NO_API_KEYMissing or invalid API key.
402INSUFFICIENT_BALANCEWallet balance is below $0.15. Top up via the dashboard.
422STT_FAILEDSpeech recognition failed. The audio may be corrupted, silent, or in an unsupported format.
502LLM_FAILEDAI pipeline failed after successful transcription. The transcription field is included so the client can retry via the text API.
502EMPTY_LLMAI returned an empty response. Rare; retry typically resolves it.
500VOICE_INTERNALUnexpected server error.

On the streaming endpoint, errors are emitted as SSE event: error instead of HTTP status codes (since the stream starts with HTTP 200). Check the code field in the error event data.

Code Examples

cURL

curl -X POST https://api.vedika.io/api/v1/voice \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F 'birthDetails={"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"}' \
  -F "language=hi" \
  -F "speed=fast" \
  -F "tier=vedika-native" \
  --output response.mp3

The --output flag saves the MP3 binary to a file. To also read the metadata header, add -D headers.txt.

JavaScript (Browser / Node.js)

const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
  datetime: '1992-08-20T14:30:00',
  latitude: 12.97,
  longitude: 77.59,
  timezone: 'Asia/Kolkata'
}));
form.append('language', 'hi');
form.append('speed', 'fast');
form.append('tier', 'vedika-native');

const res = await fetch('https://api.vedika.io/api/v1/voice', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
  body: form
});

// Play the audio
const audioBuffer = await res.arrayBuffer();
const audio = new Audio(URL.createObjectURL(
  new Blob([audioBuffer], { type: 'audio/mpeg' })
));
audio.play();

// Read metadata from header
const meta = JSON.parse(atob(
  res.headers.get('X-Vedika-Voice-Meta')
));
console.log('Transcription:', meta.transcription);
console.log('Cost:', meta.costUsd);
const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
  datetime: '1992-08-20T14:30:00',
  latitude: 12.97,
  longitude: 77.59,
  timezone: 'Asia/Kolkata'
}));
form.append('tier', 'vedika-jarvis');
form.append('speed', 'fast');

const res = await fetch('https://api.vedika.io/api/v1/voice/stream', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
  body: form
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });

  // Parse SSE events
  const lines = buffer.split('\n');
  buffer = lines.pop(); // keep incomplete line

  let eventType = '';
  for (const line of lines) {
    if (line.startsWith('event: ')) {
      eventType = line.slice(7);
    } else if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));

      switch (eventType) {
        case 'started':
          console.log('Transcribed:', data.transcription);
          break;
        case 'text':
          console.log('Answer:', data.delta);
          break;
        case 'audio':
          // Decode base64 and queue for playback
          const bytes = Uint8Array.from(
            atob(data.bytesBase64), c => c.charCodeAt(0)
          );
          // Append to MediaSource or Web Audio buffer
          break;
        case 'completed':
          console.log('Done in', data.processingMs, 'ms');
          break;
        case 'error':
          console.error('Voice error:', data.code);
          break;
      }
    }
  }
}

Python

import requests
import base64
import json

url = "https://api.vedika.io/api/v1/voice"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

birth = json.dumps({
    "datetime": "1992-08-20T14:30:00",
    "latitude": 12.97,
    "longitude": 77.59,
    "timezone": "Asia/Kolkata"
})

with open("question.webm", "rb") as f:
    files = {"audio": ("question.webm", f, "audio/webm")}
    data = {
        "birthDetails": birth,
        "language": "hi",
        "speed": "fast",
        "tier": "vedika-native"
    }
    resp = requests.post(url, headers=headers, files=files, data=data)

if resp.status_code == 200:
    # Save audio
    with open("response.mp3", "wb") as out:
        out.write(resp.content)

    # Read metadata
    meta_b64 = resp.headers.get("X-Vedika-Voice-Meta", "")
    if meta_b64:
        meta = json.loads(base64.b64decode(meta_b64))
        print("Transcription:", meta["transcription"])
        print("Cost: $", meta["costUsd"])
else:
    print("Error:", resp.status_code, resp.json())
import requests
import json
import base64

url = "https://api.vedika.io/api/v1/voice/stream"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

with open("question.webm", "rb") as f:
    files = {"audio": ("question.webm", f, "audio/webm")}
    data = {"tier": "vedika-jarvis", "speed": "fast"}
    resp = requests.post(url, headers=headers, files=files,
                         data=data, stream=True)

audio_chunks = []

for line in resp.iter_lines(decode_unicode=True):
    if not line:
        continue
    if line.startswith("event: "):
        event_type = line[7:]
    elif line.startswith("data: "):
        payload = json.loads(line[6:])

        if event_type == "started":
            print("Transcribed:", payload["transcription"])
        elif event_type == "text":
            print("Answer:", payload["delta"][:100], "...")
        elif event_type == "audio":
            chunk = base64.b64decode(payload["bytesBase64"])
            audio_chunks.append(chunk)
        elif event_type == "completed":
            print(f"Done: {payload['processingMs']}ms, "
                  f"{payload['totalChunks']} chunks")
        elif event_type == "error":
            print("Error:", payload["code"], payload.get("message"))

# Save assembled audio
with open("response.mp3", "wb") as f:
    f.write(b"".join(audio_chunks))
print(f"Saved {len(audio_chunks)} chunks to response.mp3")

Supported Languages

LanguageCodeAvailable Tiers
EnglishenAll tiers
HindihiAll tiers
Tamiltavedika-standard
Telugutevedika-standard
Kannadaknvedika-standard
Malayalammlvedika-standard
Bengalibnvedika-standard
Gujaratiguvedika-standard
Marathimrvedika-standard
Punjabipavedika-standard
Odiaorvedika-standard
Urduurvedika-standard
Arabicarvedika-standard
Russianruvedika-standard
Spanishesvedika-standard
Frenchfrvedika-standard
Germandevedika-standard
Chinesezhvedika-standard
Japanesejavedika-standard
Koreankovedika-standard
Thaithvedika-standard

Speech recognition supports 50+ languages. If the language hint is omitted, the system auto-detects from the audio. For best accuracy with short clips, provide the language hint.

Billing

Voice calls are deducted from your wallet balance after the response is generated. Each tier has a flat per-call price (see Voice Tiers) plus standard Vedika Intelligence inference charges for the AI answer.

The total charged amount is returned as costUsd in the X-Vedika-Voice-Meta header (buffered endpoint) or in the completed SSE event (streaming endpoint).

Check your wallet balance in the dashboard for the deducted amount.

Best Practices

Audio Quality

  • Use WebM at 48kHz for best speech recognition accuracy at small file sizes.
  • MP3 and WAV are fully supported but produce larger uploads.
  • Keep recordings under 60 seconds for optimal response time.

Multi-turn Conversations

  • Save the conversationId from the first response metadata and pass it in subsequent calls.
  • The AI will reference prior questions and answers for contextual follow-ups.
  • Birth details only need to be sent on the first call — they persist in the conversation.

Choosing a Tier

  • High-volume B2C: Use vedika-native (or signal=b2c) for the lowest cost per query.
  • Premium experience: Use vedika-standard (Hindi) or vedika-standard (English) for the most natural voice.
  • Multi-language: Use vedika-standard for Tamil, Telugu, Bengali, and other Indic languages.
  • Real-time assistant: Use vedika-jarvis with the /voice/stream endpoint for sub-200ms time-to-first-audio.

Error Handling

  • Always check for audio: null in buffered responses — it means TTS failed but the text answer is available.
  • On the streaming endpoint, handle the error event gracefully and close the EventSource.
  • If LLM_FAILED is returned, the transcription field contains the recognized text — retry via the text /query endpoint as a fallback.

Get Started

Sign up for an API key and start building voice-powered astrology experiences.

Get API Key Back to Docs