Get Started
Overview
The VDK Service simplifies the development and deployment of voice-enabled applications by exposing voice processing capabilities through REST and WebSocket APIs. It allows you to run your voice logic locally via a standalone binary, making it easier to integrate voice technologies into a wide range of platforms and tech stacks — without being tied to C++ or Java (Android).
Whether you're building for Linux, Windows, or Android, this guide will walk you through the setup, usage patterns, and best practices to help you get the most out of the VDK Service.
Get Started
To generate the vdk-service we need to follow these steps:
Create a project in VDK-Studio:
Initiate a new custom project and specify the 'VDK Service' interaction mode during project creation.Integrate technologies:
Integrate the desired technologies into your project and configure them accordingly.Export your project:
This will generate these directories for Linux/Windows:
.
├── config/
├── data/
└── vdk-service/
For Android export will be a bit different:
assets/
├── jniLibs/
└── vsdk/
├── config/
└── data/
Run VDK Service
Windows/Linux
Android
We also provide a sample code that you can find in Console → Project Settings → Downloads:
📦 vdk-android-samples-service-application
Loading the configuration
The VDK Service’s configuration can either be loaded via a terminal argument at startup, like shown above or load later on through its REST API. For example, below we start the service and then use the corresponding REST API route to set the configuration:localhost:39806/v1/configuration-path.
./vdk-service/bin/vdk-service.exe &
curl --header "Content-Type: application/json" \
--request POST \
--data '{"path":"config/vsdk.json"}' \
http://localhost:39806/v1/configuration-path
The configuration path can only be loaded once
VDK Service CLI options
Option | Value range | Default value | Description |
|---|---|---|---|
| - | - | Prints usage. |
| [1-65535] |
| Port the web server will listen to (HTTP & WebSocket). |
| [0, 2-255] |
| Number of logical cores to use (0 uses all available hardware cores). |
| [0-65535] |
| The number of milliseconds to wait before closing the WebSocket after no data has been sent. |
| - |
| Path to the initial VSDK configuration file to load at startup. |
| - |
| Path to the dynamic libraries. |
| [off, trace, debug, info, warning, error, critical] |
| Minimum log level that can be printed. |
| - | - | Output version. |
Stop VDK Service
There are two available methods for safely terminating the VDK service:
Using
/quitroute:
curl --header "Content-Type: application/json" \
--request POST \
--data '{}' \
http://localhost:39806/v1/quit
Sending
SIGINTorSIGTERMto the application.
Streaming and receiving audio
To stream audio, you first need to retrieve a token via a call to the REST API, then use it to open a WebSocket connection through the WebSocket API.
Here is an example in Python:
request_uri = "http://localhost:39806/v1/voice-recognition/recognize"
request_data = {
"models": {
"model-1": {}
}
}
response = requests.post(request_uri, json=request_data)
response_data = response.json()
token = response_data["token"]
def handle_message(ws, message):
print(message)
web_socket_url = f"ws://localhost:39806/v1/ws/{token}"
ws = websocket.WebSocketApp(web_socket_url, on_message=handle_message)
When the current task is done, the web socket connection remains open for a certain period before closing. By default, it closes immediately. This duration can be changed using the --websocket-close-delay parameter. This parameters can be used to avoid losing the last packet sent just before closing the web socket connection.
When streaming audio from a file, you must stream it in realtime. Failing to do so may lead to inaccurate results. Additionally, if the audio ends abruptly, you may not receive any output. To prevent this, consider adding a short period of silence (about 1 second) at the end of the file.
Audio Format
Audio Format Requirements
Format: 16-bit signed PCM
Byte Order: Little-Endian
Sample Rate
Text-to-Speech (TTS / csdk):
Output sample rate: 22,050 Hz
Voice Recognition, Speech Enhancement, Voice Biometrics:
Input sample rate: 16,000 Hz
Channel Requirements
Speech Enhancement:
Supports mono and stereo input
Voice Recognition, Voice Biometrics, and other technologies:
Require mono input only