Voice API
Audio-in, audio-out astrology. Send a spoken question, receive a spoken answer — one API call handles speech recognition, AI reasoning, and speech synthesis.
Production-ready. Both /api/v1/voice (buffered) and /api/v1/voice/stream (SSE streaming) are live at api.vedika.io.
How It Works
Endpoints
POST /api/v1/voice
Buffered voice query. Accepts multipart audio, returns a complete MP3 file as the response body (audio/mpeg).
Best for: mobile apps, IVR systems, any client that plays audio after full download.
POST /api/v1/voice/stream
Streaming voice query (SSE). Same multipart input, but the response is a text/event-stream that emits audio chunks as they are generated — sub-200ms time-to-first-audio on the Jarvis tier.
Best for: real-time voice assistants, conversational UIs, and any client that can play audio progressively.
Authentication
Authenticate with your Vedika API key using either method:
Authorization: Bearer vk_live_your_api_key
or
Authorization: Bearer vk_live_your_api_key
Voice requires a live API key (vk_live_* or vk_ent_*). Test keys (vk_test_*) are not accepted. A minimum wallet balance of $0.15 is required per call.
Request Format
Content-Type: multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
audio |
File | REQUIRED | Audio file. Max 25 MB. Accepted formats: webm, mp3, wav, m4a, ogg, aac, flac. |
birthDetails |
JSON string | optional | Birth data for personalized chart-based answers. Also accepts birth_details (snake_case alias).{"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"} |
language |
String | optional | Hint for speech recognition. ISO 639-1 code: en, hi, ta, te, kn, ml, bn, gu, mr, etc. Auto-detected if omitted. |
speed |
String | optional | Must be "fast" or omitted. Voice only supports fast mode. Sending "standard" returns a 400 error. |
tier |
String | optional | Voice quality tier. One of: vedika-native, vedika-standard, vedika-standard, vedika-standard, vedika-standard, vedika-jarvis. Auto-selected by language if omitted. See Voice Tiers. |
conversationId |
String | optional | Pass a previous conversation ID for multi-turn follow-up questions. The AI will reference prior context. |
signal |
String | optional | Deprecated. Use tier field instead. For backward compat: "b2c"/"free" → vedika-native, "jarvis" → vedika-jarvis. New integrations should set tier directly. |
Voice Tiers
Three B2B voice tiers, all available on Business ($120/mo) and Enterprise ($240/mo) plans. Default is vedika-standard if tier is omitted. Starter / Pro plans receive 403 VOICE_PLAN_REQUIRED.
| Tier | Cost / call | Latency | Languages | Best for |
|---|---|---|---|---|
vedika-standard default |
$0.072 | ~1s | 15+ (Hindi, English, Tamil, Telugu, Kannada, Bengali, Gujarati, Marathi, Malayalam, Punjabi, and more) | Balanced quality + latency. Use this unless you have a specific reason to pick another tier. |
vedika-native budget |
$0.040 | ~800ms | 600+ via audio-native pipeline | High-volume apps where cost-per-call matters more than the last 10% of audio polish. Skips the transcription step entirely. |
vedika-jarvis real-time |
$0.080 | <500ms voice-to-voice | Hindi, English, + 10 Indic (hi/en/ta/te/gu/mr/bn/kn/ml/pa) | Live assistants, IVR, in-call agents. Use with POST /api/v1/voice/stream for SSE streaming audio chunks. |
Rate limits (separate from text query limits — voice costs 3–10× more):
- Business ($120/mo): 30 calls/min · 2,000 calls/day
- Enterprise ($240/mo): 100 calls/min · 10,000 calls/day
On breach: 429 Too Many Requests with Retry-After header. Headers X-Vedika-Voice-RateLimit-Minute, X-Vedika-Voice-RateLimit-Remaining-Minute, X-Vedika-Voice-RateLimit-Day, X-Vedika-Voice-RateLimit-Remaining-Day on every response.
All prices above are per-call. Exact cost per call is returned in the response X-Vedika-Voice-Meta header (base64 JSON) as costUsd. AI reasoning is billed separately for vedika-standard and vedika-jarvis; vedika-native is a single flat-rate call.
Response: Buffered Endpoint
Success (audio available)
Content-Type: audio/mpeg
The response body is raw MP3 binary. Save it directly as an .mp3 file or pipe it to an audio player.
Response Headers
| Header | Description |
|---|---|
Content-Type | audio/mpeg |
Content-Length | Size of the MP3 in bytes |
X-Vedika-Voice-Meta | Base64-encoded JSON with transcription, language, tier, billing, and processing time. Decode with atob() / base64.b64decode(). |
X-Vedika-Voice-Tier | Public tier label: vedika-native, vedika-standard, or vedika-jarvis |
X-Vedika-Voice-Lang | Detected language ISO code (e.g., hi, en) |
X-Vedika-Transcription | URL-encoded transcription of the input audio (max 2000 chars) |
X-Vedika-Signature | HMAC watermark for response integrity verification |
X-Vedika-Voice-Meta (decoded)
{
"transcription": "What does my chart say about career?",
"language": "en",
"tier": "vedika-standard",
"tierSource": "auto",
"processingMs": 4200,
"sttDurationSec": 3.2,
"ttsDurationSec": 12.5,
"engine": "vedika-voice",
"costUsd": 0.072000
}
Fallback (TTS failed)
If speech synthesis fails, the endpoint degrades gracefully to a JSON response with audio: null and the text answer:
{
"success": true,
"audio": null,
"response": "Your chart shows a strong period for career growth...",
"transcription": "What does my chart say about career?",
"language": "en",
"tier": "vedika-standard",
"billing": { ... }
}
Response: Streaming Endpoint
Content-Type: text/event-stream (Server-Sent Events). The stream emits the following event types in order:
event: started
Fired after successful speech recognition. Contains the transcription and detected language.
data: {"transcription":"What about my marriage?","language":"hi","tier":"vedika-jarvis","sttMs":820}
event: text
The complete AI-generated text answer. Emitted as a single frame (the AI pipeline returns the full answer, not token-by-token).
data: {"delta":"Your Venus is exalted in Pisces...","done":true}
event: audio
Base64-encoded MP3 chunks. Multiple audio events are emitted in sequence. Decode each chunk and append to a buffer or MediaSource for progressive playback.
data: {"bytesBase64":"//uQxAAAAAANIAAAAAExBTUUzLjEw...","seq":0}
data: {"bytesBase64":"AAAAIGZ0eXBpc29t...","seq":1}
data: {"bytesBase64":"...","seq":2}
event: completed
Final event with billing summary and total processing time.
data: {"processingMs":2400,"sttDurationSec":1.2,"ttsDurationSec":8.5,"totalChunks":14,"costUsd":0.080000}
event: error
Emitted on failure at any stage. The stream closes after this event.
data: {"code":"STT_FAILED","message":"Could not transcribe the submitted audio."}
Error Codes
| HTTP | Code | Description |
|---|---|---|
| 400 | NO_AUDIO | The audio multipart field is missing or empty. |
| 400 | VOICE_REQUIRES_FAST_MODE | speed was set to something other than "fast". Voice only supports fast mode. |
| 400 | BAD_BIRTH_DETAILS_JSON | birthDetails is not valid JSON. |
| 401 | NO_API_KEY | Missing or invalid API key. |
| 402 | INSUFFICIENT_BALANCE | Wallet balance is below $0.15. Top up via the dashboard. |
| 422 | STT_FAILED | Speech recognition failed. The audio may be corrupted, silent, or in an unsupported format. |
| 502 | LLM_FAILED | AI pipeline failed after successful transcription. The transcription field is included so the client can retry via the text API. |
| 502 | EMPTY_LLM | AI returned an empty response. Rare; retry typically resolves it. |
| 500 | VOICE_INTERNAL | Unexpected server error. |
On the streaming endpoint, errors are emitted as SSE event: error instead of HTTP status codes (since the stream starts with HTTP 200). Check the code field in the error event data.
Code Examples
cURL
curl -X POST https://api.vedika.io/api/v1/voice \ -H "Authorization: Bearer YOUR_API_KEY" \ -F "[email protected]" \ -F 'birthDetails={"datetime":"1992-08-20T14:30:00","latitude":12.97,"longitude":77.59,"timezone":"Asia/Kolkata"}' \ -F "language=hi" \ -F "speed=fast" \ -F "tier=vedika-native" \ --output response.mp3
The --output flag saves the MP3 binary to a file. To also read the metadata header, add -D headers.txt.
JavaScript (Browser / Node.js)
const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
datetime: '1992-08-20T14:30:00',
latitude: 12.97,
longitude: 77.59,
timezone: 'Asia/Kolkata'
}));
form.append('language', 'hi');
form.append('speed', 'fast');
form.append('tier', 'vedika-native');
const res = await fetch('https://api.vedika.io/api/v1/voice', {
method: 'POST',
headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
body: form
});
// Play the audio
const audioBuffer = await res.arrayBuffer();
const audio = new Audio(URL.createObjectURL(
new Blob([audioBuffer], { type: 'audio/mpeg' })
));
audio.play();
// Read metadata from header
const meta = JSON.parse(atob(
res.headers.get('X-Vedika-Voice-Meta')
));
console.log('Transcription:', meta.transcription);
console.log('Cost:', meta.costUsd);
const form = new FormData();
form.append('audio', audioBlob, 'question.webm');
form.append('birthDetails', JSON.stringify({
datetime: '1992-08-20T14:30:00',
latitude: 12.97,
longitude: 77.59,
timezone: 'Asia/Kolkata'
}));
form.append('tier', 'vedika-jarvis');
form.append('speed', 'fast');
const res = await fetch('https://api.vedika.io/api/v1/voice/stream', {
method: 'POST',
headers: { 'Authorization': 'Bearer YOUR_API_KEY' },
body: form
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// Parse SSE events
const lines = buffer.split('\n');
buffer = lines.pop(); // keep incomplete line
let eventType = '';
for (const line of lines) {
if (line.startsWith('event: ')) {
eventType = line.slice(7);
} else if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
switch (eventType) {
case 'started':
console.log('Transcribed:', data.transcription);
break;
case 'text':
console.log('Answer:', data.delta);
break;
case 'audio':
// Decode base64 and queue for playback
const bytes = Uint8Array.from(
atob(data.bytesBase64), c => c.charCodeAt(0)
);
// Append to MediaSource or Web Audio buffer
break;
case 'completed':
console.log('Done in', data.processingMs, 'ms');
break;
case 'error':
console.error('Voice error:', data.code);
break;
}
}
}
}
Python
import requests
import base64
import json
url = "https://api.vedika.io/api/v1/voice"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
birth = json.dumps({
"datetime": "1992-08-20T14:30:00",
"latitude": 12.97,
"longitude": 77.59,
"timezone": "Asia/Kolkata"
})
with open("question.webm", "rb") as f:
files = {"audio": ("question.webm", f, "audio/webm")}
data = {
"birthDetails": birth,
"language": "hi",
"speed": "fast",
"tier": "vedika-native"
}
resp = requests.post(url, headers=headers, files=files, data=data)
if resp.status_code == 200:
# Save audio
with open("response.mp3", "wb") as out:
out.write(resp.content)
# Read metadata
meta_b64 = resp.headers.get("X-Vedika-Voice-Meta", "")
if meta_b64:
meta = json.loads(base64.b64decode(meta_b64))
print("Transcription:", meta["transcription"])
print("Cost: $", meta["costUsd"])
else:
print("Error:", resp.status_code, resp.json())
import requests
import json
import base64
url = "https://api.vedika.io/api/v1/voice/stream"
headers = {"Authorization": "Bearer YOUR_API_KEY"}
with open("question.webm", "rb") as f:
files = {"audio": ("question.webm", f, "audio/webm")}
data = {"tier": "vedika-jarvis", "speed": "fast"}
resp = requests.post(url, headers=headers, files=files,
data=data, stream=True)
audio_chunks = []
for line in resp.iter_lines(decode_unicode=True):
if not line:
continue
if line.startswith("event: "):
event_type = line[7:]
elif line.startswith("data: "):
payload = json.loads(line[6:])
if event_type == "started":
print("Transcribed:", payload["transcription"])
elif event_type == "text":
print("Answer:", payload["delta"][:100], "...")
elif event_type == "audio":
chunk = base64.b64decode(payload["bytesBase64"])
audio_chunks.append(chunk)
elif event_type == "completed":
print(f"Done: {payload['processingMs']}ms, "
f"{payload['totalChunks']} chunks")
elif event_type == "error":
print("Error:", payload["code"], payload.get("message"))
# Save assembled audio
with open("response.mp3", "wb") as f:
f.write(b"".join(audio_chunks))
print(f"Saved {len(audio_chunks)} chunks to response.mp3")
Supported Languages
| Language | Code | Available Tiers |
|---|---|---|
| English | en | All tiers |
| Hindi | hi | All tiers |
| Tamil | ta | vedika-standard |
| Telugu | te | vedika-standard |
| Kannada | kn | vedika-standard |
| Malayalam | ml | vedika-standard |
| Bengali | bn | vedika-standard |
| Gujarati | gu | vedika-standard |
| Marathi | mr | vedika-standard |
| Punjabi | pa | vedika-standard |
| Odia | or | vedika-standard |
| Urdu | ur | vedika-standard |
| Arabic | ar | vedika-standard |
| Russian | ru | vedika-standard |
| Spanish | es | vedika-standard |
| French | fr | vedika-standard |
| German | de | vedika-standard |
| Chinese | zh | vedika-standard |
| Japanese | ja | vedika-standard |
| Korean | ko | vedika-standard |
| Thai | th | vedika-standard |
Speech recognition supports 50+ languages. If the language hint is omitted, the system auto-detects from the audio. For best accuracy with short clips, provide the language hint.
Billing
Voice calls are deducted from your wallet balance after the response is generated. Each tier has a flat per-call price (see Voice Tiers) plus standard Vedika Intelligence inference charges for the AI answer.
The total charged amount is returned as costUsd in the X-Vedika-Voice-Meta header (buffered endpoint) or in the completed SSE event (streaming endpoint).
Check your wallet balance in the dashboard for the deducted amount.
Best Practices
Audio Quality
- Use WebM at 48kHz for best speech recognition accuracy at small file sizes.
- MP3 and WAV are fully supported but produce larger uploads.
- Keep recordings under 60 seconds for optimal response time.
Multi-turn Conversations
- Save the
conversationIdfrom the first response metadata and pass it in subsequent calls. - The AI will reference prior questions and answers for contextual follow-ups.
- Birth details only need to be sent on the first call — they persist in the conversation.
Choosing a Tier
- High-volume B2C: Use
vedika-native(orsignal=b2c) for the lowest cost per query. - Premium experience: Use
vedika-standard(Hindi) orvedika-standard(English) for the most natural voice. - Multi-language: Use
vedika-standardfor Tamil, Telugu, Bengali, and other Indic languages. - Real-time assistant: Use
vedika-jarviswith the/voice/streamendpoint for sub-200ms time-to-first-audio.
Error Handling
- Always check for
audio: nullin buffered responses — it means TTS failed but the text answer is available. - On the streaming endpoint, handle the
errorevent gracefully and close the EventSource. - If
LLM_FAILEDis returned, thetranscriptionfield contains the recognized text — retry via the text/queryendpoint as a fallback.