Voice API

Audio-in, audio-out astrology. Send a spoken question, receive a spoken answer — one API call handles speech recognition, AI reasoning, and speech synthesis.

Production-ready. Both /api/v1/voice (buffered) and /api/v1/voice/stream (SSE streaming) are live at api.vedika.io.

How It Works

Upload audio — POST a multipart form with the caller's recorded audio (webm, mp3, wav, m4a, or ogg; max 25 MB).

Speech-to-text — Vedika AI transcribes the audio and detects the spoken language automatically.

AI astrology — The transcribed question is routed through the Vedika Intelligence pipeline (fast mode) with birth chart context, yielding a personalized astrology response.

Text-to-speech — The response is synthesized into natural-sounding speech (MP3) in the detected language.

Audio response — The MP3 binary is returned directly (or streamed as base64 SSE events on the streaming endpoint).

Endpoints

POST /api/v1/voice

Buffered voice query. Accepts multipart audio, returns a complete MP3 file as the response body (audio/mpeg).

Best for: mobile apps, IVR systems, any client that plays audio after full download.

POST /api/v1/voice/stream

Streaming voice query (SSE). Same multipart input, but the response is a text/event-stream that emits audio chunks as they are generated — sub-200ms time-to-first-audio on the Jarvis tier.

Best for: real-time voice assistants, conversational UIs, and any client that can play audio progressively.

Authentication

Authenticate with your Vedika API key using either method:

Authorization: Bearer vk_live_your_api_key

Authorization: Bearer vk_live_your_api_key

Voice requires a live API key (vk_live_* or vk_ent_*). Test keys (vk_test_*) are not accepted. A minimum wallet balance of $0.15 is required per call.

Request Format

Content-Type: multipart/form-data

Field	Type	Required	Description
`audio`	File	REQUIRED	Audio file. Max 25 MB. Accepted formats: `webm`, `mp3`, `wav`, `m4a`, `ogg`, `aac`, `flac`.
`birthDetails`	JSON string	optional	Birth data for personalized chart-based answers. Also accepts `birth_details` (snake_case alias). `{"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"}`
`language`	String	optional	Hint for speech recognition. ISO 639-1 code: `en`, `hi`, `ta`, `te`, `kn`, `ml`, `bn`, `gu`, `mr`, etc. Auto-detected if omitted.
`speed`	String	optional	Must be `"fast"` or omitted. Voice only supports fast mode. Sending `"standard"` returns a 400 error.
`tier`	String	optional	Voice quality tier. One of: `vedika-native`, `vedika-standard`, `vedika-standard`, `vedika-standard`, `vedika-standard`, `vedika-jarvis`. Auto-selected by language if omitted. See Voice Tiers.
`conversationId`	String	optional	Pass a previous conversation ID for multi-turn follow-up questions. The AI will reference prior context.
`signal`	String	optional	Deprecated. Use `tier` field instead. For backward compat: `"b2c"`/`"free"` → `vedika-native`, `"jarvis"` → `vedika-jarvis`. New integrations should set `tier` directly.

Voice Tiers

Three B2B voice tiers, all available on Business ($120/mo) and Enterprise ($240/mo) plans. Default is vedika-standard if tier is omitted. Starter / Pro plans receive 403 VOICE_PLAN_REQUIRED.

Tier	Cost / call	Latency	Languages	Best for
`vedika-standard` default	$0.072	~1s	15+ (Hindi, English, Tamil, Telugu, Kannada, Bengali, Gujarati, Marathi, Malayalam, Punjabi, and more)	Balanced quality + latency. Use this unless you have a specific reason to pick another tier.
`vedika-native` budget	$0.040	~800ms	600+ via audio-native pipeline	High-volume apps where cost-per-call matters more than the last 10% of audio polish. Skips the transcription step entirely.
`vedika-jarvis` real-time	$0.080	<500ms voice-to-voice	Hindi, English, + 10 Indic (hi/en/ta/te/gu/mr/bn/kn/ml/pa)	Live assistants, IVR, in-call agents. Use with `POST /api/v1/voice/stream` for SSE streaming audio chunks.

Rate limits (separate from text query limits — voice costs 3–10× more):

Business ($120/mo): 30 calls/min · 2,000 calls/day
Enterprise ($240/mo): 100 calls/min · 10,000 calls/day

On breach: 429 Too Many Requests with Retry-After header. Headers X-Vedika-Voice-RateLimit-Minute, X-Vedika-Voice-RateLimit-Remaining-Minute, X-Vedika-Voice-RateLimit-Day, X-Vedika-Voice-RateLimit-Remaining-Day on every response.

All prices above are per-call. Exact cost per call is returned in the response X-Vedika-Voice-Meta header (base64 JSON) as costUsd. AI reasoning is billed separately for vedika-standard and vedika-jarvis; vedika-native is a single flat-rate call.

Response: Buffered Endpoint

Success (audio available)

Content-Type: audio/mpeg
The response body is raw MP3 binary. Save it directly as an .mp3 file or pipe it to an audio player.

Response Headers

Header	Description
`Content-Type`	`audio/mpeg`
`Content-Length`	Size of the MP3 in bytes
`X-Vedika-Voice-Meta`	Base64-encoded JSON with transcription, language, tier, billing, and processing time. Decode with `atob()` / `base64.b64decode()`.
`X-Vedika-Voice-Tier`	Public tier label: `vedika-native`, `vedika-standard`, or `vedika-jarvis`
`X-Vedika-Voice-Lang`	Detected language ISO code (e.g., `hi`, `en`)
`X-Vedika-Transcription`	URL-encoded transcription of the input audio (max 2000 chars)
`X-Vedika-Signature`	HMAC watermark for response integrity verification

X-Vedika-Voice-Meta (decoded)

{
  "transcription": "What does my chart say about career?",
  "language": "en",
  "tier": "vedika-standard",
  "tierSource": "auto",
  "processingMs": 4200,
  "sttDurationSec": 3.2,
  "ttsDurationSec": 12.5,
  "engine": "vedika-voice",
  "costUsd": 0.072000
}

Fallback (TTS failed)

If speech synthesis fails, the endpoint degrades gracefully to a JSON response with audio: null and the text answer:

{
  "success": true,
  "audio": null,
  "response": "Your chart shows a strong period for career growth...",
  "transcription": "What does my chart say about career?",
  "language": "en",
  "tier": "vedika-standard",
  "billing": { ... }
}

Response: Streaming Endpoint

Content-Type: text/event-stream (Server-Sent Events). The stream emits the following event types in order:

event: started

Fired after successful speech recognition. Contains the transcription and detected language.

data: {"transcription":"What about my marriage?","language":"hi","tier":"vedika-jarvis","sttMs":820}

event: text

The complete AI-generated text answer. Emitted as a single frame (the AI pipeline returns the full answer, not token-by-token).

data: {"delta":"Your Venus is exalted in Pisces...","done":true}

event: audio

Base64-encoded MP3 chunks. Multiple audio events are emitted in sequence. Decode each chunk and append to a buffer or MediaSource for progressive playback.

data: {"bytesBase64":"//uQxAAAAAANIAAAAAExBTUUzLjEw...","seq":0}
data: {"bytesBase64":"AAAAIGZ0eXBpc29t...","seq":1}
data: {"bytesBase64":"...","seq":2}

event: completed

Final event with billing summary and total processing time.

data: {"processingMs":2400,"sttDurationSec":1.2,"ttsDurationSec":8.5,"totalChunks":14,"costUsd":0.080000}

event: error

Emitted on failure at any stage. The stream closes after this event.

data: {"code":"STT_FAILED","message":"Could not transcribe the submitted audio."}

Error Codes

HTTP	Code	Description
400	`NO_AUDIO`	The `audio` multipart field is missing or empty.
400	`VOICE_REQUIRES_FAST_MODE`	`speed` was set to something other than `"fast"`. Voice only supports fast mode.
400	`BAD_BIRTH_DETAILS_JSON`	`birthDetails` is not valid JSON.
401	`NO_API_KEY`	Missing or invalid API key.
402	`INSUFFICIENT_BALANCE`	Wallet balance is below $0.15. Top up via the dashboard.
422	`STT_FAILED`	Speech recognition failed. The audio may be corrupted, silent, or in an unsupported format.
502	`LLM_FAILED`	AI pipeline failed after successful transcription. The `transcription` field is included so the client can retry via the text API.
502	`EMPTY_LLM`	AI returned an empty response. Rare; retry typically resolves it.
500	`VOICE_INTERNAL`	Unexpected server error.

On the streaming endpoint, errors are emitted as SSE event: error instead of HTTP status codes (since the stream starts with HTTP 200). Check the code field in the error event data.

Code Examples

cURL

curl -X POST https://api.vedika.io/api/v1/voice \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F 'birthDetails={"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"}' \
  -F "language=hi" \
  -F "speed=fast" \
  -F "tier=vedika-native" \
  --output response.mp3

The --output flag saves the MP3 binary to a file. To also read the metadata header, add -D headers.txt.

JavaScript (Browser / Node.js)

const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
  datetime: '1992-08-20T14:30:00',
  latitude: 12.97,
  longitude: 77.59,
  timezone: 'Asia/Kolkata'
}));
form.append('language', 'hi');
form.append('speed', 'fast');
form.append('tier', 'vedika-native');

const res = await fetch('https://api.vedika.io/api/v1/voice', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
  body: form
});

// Play the audio
const audioBuffer = await res.arrayBuffer();
const audio = new Audio(URL.createObjectURL(
  new Blob([audioBuffer], { type: 'audio/mpeg' })
));
audio.play();

// Read metadata from header
const meta = JSON.parse(atob(
  res.headers.get('X-Vedika-Voice-Meta')
));
console.log('Transcription:', meta.transcription);
console.log('Cost:', meta.costUsd);

const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
  datetime: '1992-08-20T14:30:00',
  latitude: 12.97,
  longitude: 77.59,
  timezone: 'Asia/Kolkata'
}));
form.append('tier', 'vedika-jarvis');
form.append('speed', 'fast');

const res = await fetch('https://api.vedika.io/api/v1/voice/stream', {
  method: 'POST',
  headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
  body: form
});

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });

  // Parse SSE events
  const lines = buffer.split('\n');
  buffer = lines.pop(); // keep incomplete line

  let eventType = '';
  for (const line of lines) {
    if (line.startsWith('event: ')) {
      eventType = line.slice(7);
    } else if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));

      switch (eventType) {
        case 'started':
          console.log('Transcribed:', data.transcription);
          break;
        case 'text':
          console.log('Answer:', data.delta);
          break;
        case 'audio':
          // Decode base64 and queue for playback
          const bytes = Uint8Array.from(
            atob(data.bytesBase64), c => c.charCodeAt(0)
          );
          // Append to MediaSource or Web Audio buffer
          break;
        case 'completed':
          console.log('Done in', data.processingMs, 'ms');
          break;
        case 'error':
          console.error('Voice error:', data.code);
          break;
      }
    }
  }
}

Python

import requests
import base64
import json

url = "https://api.vedika.io/api/v1/voice"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

birth = json.dumps({
    "datetime": "1992-08-20T14:30:00",
    "latitude": 12.97,
    "longitude": 77.59,
    "timezone": "Asia/Kolkata"
})

with open("question.webm", "rb") as f:
    files = {"audio": ("question.webm", f, "audio/webm")}
    data = {
        "birthDetails": birth,
        "language": "hi",
        "speed": "fast",
        "tier": "vedika-native"
    }
    resp = requests.post(url, headers=headers, files=files, data=data)

if resp.status_code == 200:
    # Save audio
    with open("response.mp3", "wb") as out:
        out.write(resp.content)

    # Read metadata
    meta_b64 = resp.headers.get("X-Vedika-Voice-Meta", "")
    if meta_b64:
        meta = json.loads(base64.b64decode(meta_b64))
        print("Transcription:", meta["transcription"])
        print("Cost: $", meta["costUsd"])
else:
    print("Error:", resp.status_code, resp.json())

import requests
import json
import base64

url = "https://api.vedika.io/api/v1/voice/stream"
headers = {"Authorization": "Bearer YOUR_API_KEY"}

with open("question.webm", "rb") as f:
    files = {"audio": ("question.webm", f, "audio/webm")}
    data = {"tier": "vedika-jarvis", "speed": "fast"}
    resp = requests.post(url, headers=headers, files=files,
                         data=data, stream=True)

audio_chunks = []

for line in resp.iter_lines(decode_unicode=True):
    if not line:
        continue
    if line.startswith("event: "):
        event_type = line[7:]
    elif line.startswith("data: "):
        payload = json.loads(line[6:])

        if event_type == "started":
            print("Transcribed:", payload["transcription"])
        elif event_type == "text":
            print("Answer:", payload["delta"][:100], "...")
        elif event_type == "audio":
            chunk = base64.b64decode(payload["bytesBase64"])
            audio_chunks.append(chunk)
        elif event_type == "completed":
            print(f"Done: {payload['processingMs']}ms, "
                  f"{payload['totalChunks']} chunks")
        elif event_type == "error":
            print("Error:", payload["code"], payload.get("message"))

# Save assembled audio
with open("response.mp3", "wb") as f:
    f.write(b"".join(audio_chunks))
print(f"Saved {len(audio_chunks)} chunks to response.mp3")

Supported Languages

Language	Code	Available Tiers
English	`en`	All tiers
Hindi	`hi`	All tiers
Tamil	`ta`	vedika-standard
Telugu	`te`	vedika-standard
Kannada	`kn`	vedika-standard
Malayalam	`ml`	vedika-standard
Bengali	`bn`	vedika-standard
Gujarati	`gu`	vedika-standard
Marathi	`mr`	vedika-standard
Punjabi	`pa`	vedika-standard
Odia	`or`	vedika-standard
Urdu	`ur`	vedika-standard
Arabic	`ar`	vedika-standard
Russian	`ru`	vedika-standard
Spanish	`es`	vedika-standard
French	`fr`	vedika-standard
German	`de`	vedika-standard
Chinese	`zh`	vedika-standard
Japanese	`ja`	vedika-standard
Korean	`ko`	vedika-standard
Thai	`th`	vedika-standard

Speech recognition supports 50+ languages. If the language hint is omitted, the system auto-detects from the audio. For best accuracy with short clips, provide the language hint.

Billing

Voice calls are deducted from your wallet balance after the response is generated. Each tier has a flat per-call price (see Voice Tiers) plus standard Vedika Intelligence inference charges for the AI answer.

The total charged amount is returned as costUsd in the X-Vedika-Voice-Meta header (buffered endpoint) or in the completed SSE event (streaming endpoint).

Check your wallet balance in the dashboard for the deducted amount.

Best Practices

Audio Quality

Use WebM at 48kHz for best speech recognition accuracy at small file sizes.
MP3 and WAV are fully supported but produce larger uploads.
Keep recordings under 60 seconds for optimal response time.

Multi-turn Conversations

Save the conversationId from the first response metadata and pass it in subsequent calls.
The AI will reference prior questions and answers for contextual follow-ups.
Birth details only need to be sent on the first call — they persist in the conversation.

Choosing a Tier

High-volume B2C: Use vedika-native (or signal=b2c) for the lowest cost per query.
Premium experience: Use vedika-standard (Hindi) or vedika-standard (English) for the most natural voice.
Multi-language: Use vedika-standard for Tamil, Telugu, Bengali, and other Indic languages.
Real-time assistant: Use vedika-jarvis with the /voice/stream endpoint for sub-200ms time-to-first-audio.

Error Handling

Always check for audio: null in buffered responses — it means TTS failed but the text answer is available.
On the streaming endpoint, handle the error event gracefully and close the EventSource.
If LLM_FAILED is returned, the transcription field contains the recognized text — retry via the text /query endpoint as a fallback.