Get Started

Overview

The VDK Service simplifies the development and deployment of voice-enabled applications by exposing voice processing capabilities through REST and WebSocket APIs. It allows you to run your voice logic locally via a standalone binary, making it easier to integrate voice technologies into a wide range of platforms and tech stacks — without being tied to C++ or Java (Android).

Whether you're building for Linux, Windows, or Android, this guide will walk you through the setup, usage patterns, and best practices to help you get the most out of the VDK Service.

Get Started

To generate the vdk-service we need to follow these steps:

Create a project in VDK-Studio:
Initiate a new custom project and specify the 'VDK Service' interaction mode during project creation.
Integrate technologies:
Integrate the desired technologies into your project and configure them accordingly.
Export your project:

This will generate these directories for Linux/Windows:

BASH

.
├── config/
├── data/
└── vdk-service/

For Android export will be a bit different:

BASH

assets/
├── jniLibs/
└── vsdk/
    ├── config/
    └── data/

Run VDK Service

Windows/Linux

How to run?

Use the command below to start the service:

BASH

./vdk-service/bin/vdk-service.exe -f config/vsdk.json # Windows
./vdk-service/bin/vdk-service -f config/vsdk.json # Linux

Android

Setup Android project and run service

Steps Overview

Move files to your project
Add INTERNET permission
Add networkSecurityConfig
Create a native Service class
Move assets to internal storage
Call .start(…) method to run service

Move Files

First of all, move the data exported from the VDK Studio into your Android project:

Move assets without jniLibs to app/src/main/
Move jniLibs to app/src/main/

Permissions

Then, you will need to configure the network access into your Android Manifest.

Add the permission below to AndroidManifest.xml:

XML

<uses-permission android:name="android.permission.INTERNET" />

Network Security

Then, we will need to configure the network security. Create a file into res/xml named network_security_config.xml and paste the content below in it:

XML

<?xml version="1.0" encoding="utf-8"?>
<network-security-config>
    <domain-config cleartextTrafficPermitted="true">
        <domain includeSubdomains="true">127.0.0.1</domain>
    </domain-config>
</network-security-config>

Finally, add to tag <application> into the Manifest: `

XML

android:networkSecurityConfig="@xml/network_security_config"

It will allow the vdk-service to use spawn a HTTP webserver.

Native Class

Now that everything is configured, create the Service class that you can use it to start and stop the vdk-service.

Make sure to keep the same package name and class name

JAVA

package com.vivoka.vdk;

class Service {
    static {
        System.loadLibrary("vdk-service-jni");
    }

    public static native int start(String configPath, int port);
    public static native void stop();
}

Start Service

JAVA

final String configPath = getFilesDir().getAbsolutePath() + "/vsdk/config/vsdk.json";
com.vivoka.vdk.Service.start(configPath, 39806);

The vdk-service should now run and start a webserver listening on 127.0.0.1:39806.

We also provide a sample code that you can find in Console → Project Settings → Downloads:

📦 vdk-android-samples-service-application

Loading the configuration

The VDK Service’s configuration can either be loaded via a terminal argument at startup, like shown above or load later on through its REST API. For example, below we start the service and then use the corresponding REST API route to set the configuration:
localhost:39806/v1/configuration-path.

BASH

./vdk-service/bin/vdk-service.exe &
curl --header "Content-Type: application/json" \
  --request POST \
  --data '{"path":"config/vsdk.json"}' \
  http://localhost:39806/v1/configuration-path

The configuration path can only be loaded once

VDK Service CLI options

Option	Value range	Default value	Description
`-h`, `--help`	-	-	Prints usage.
`-p`, `--port`	[1-65535]	`39806`	Port the web server will listen to (HTTP & WebSocket).
`-c`, `--cpu-count`	[0, 2-255]	`0`	Number of logical cores to use (0 uses all available hardware cores).
`-d`, `--websocket-close-delay`	[0-65535]	`0`	The number of milliseconds to wait before closing the WebSocket after no data has been sent.
`-f`, `--config-path`	-	`""`	Path to the initial VSDK configuration file to load at startup.
`-L`, `--plugin-path`	-	`../plugins/`	Path to the dynamic libraries.
`-v`, `--verbosity`	[off, trace, debug, info, warning, error, critical]	`debug`	Minimum log level that can be printed.
`-V`, `--version`	-	-	Output version.

Stop VDK Service

There are two available methods for safely terminating the VDK service:

Using /quit route:

BASH

curl --header "Content-Type: application/json" \
  --request POST \
  --data '{}' \
  http://localhost:39806/v1/quit

Sending SIGINT or SIGTERM to the application.

Streaming and receiving audio

To stream audio, you first need to retrieve a token via a call to the REST API, then use it to open a WebSocket connection through the WebSocket API.

Here is an example in Python:

PY

request_uri = "http://localhost:39806/v1/voice-recognition/recognize"
request_data = {
    "models": {
        "model-1": {}
    }
}
response = requests.post(request_uri, json=request_data)
response_data = response.json()
token = response_data["token"]

def handle_message(ws, message):
  print(message)
  
web_socket_url = f"ws://localhost:39806/v1/ws/{token}"
ws = websocket.WebSocketApp(web_socket_url, on_message=handle_message)

When the current task is done, the web socket connection remains open for a certain period before closing. By default, it closes immediately. This duration can be changed using the --websocket-close-delay parameter. This parameters can be used to avoid losing the last packet sent just before closing the web socket connection.

When streaming audio from a file, you must stream it in realtime. Failing to do so may lead to inaccurate results. Additionally, if the audio ends abruptly, you may not receive any output. To prevent this, consider adding a short period of silence (about 1 second) at the end of the file.

Audio Format

Audio Format Requirements

Format: 16-bit signed PCM
Byte Order: Little-Endian

Sample Rate

Text-to-Speech (TTS / csdk):
- Output sample rate: 22,050 Hz
Voice Recognition, Speech Enhancement, Voice Biometrics:
- Input sample rate: 16,000 Hz

Channel Requirements

Speech Enhancement:
- Supports mono and stereo input
Voice Recognition, Voice Biometrics, and other technologies:
- Require mono input only