WebSocket API
Establishing a WebSocket Connection
When calling asynchronous routes in the REST API, you first obtain a token that grants access to a WebSocket channel.
Using the WebSocket protocol, connect to the following endpoint:
ws://{HOSTNAME}:{VDK_SERVICE_PORT}/v1/ws/{TOKEN}
Each WebSocket instance is bound to the specific task triggered by its corresponding route.
Its behavior may vary depending on which endpoint issued the token.
As of now, this is the only route in the WebSocket API, since all other interactions occur through the socket itself.
Working with WebSocket Routes
The socket exchanges data in JSON format.
You may encounter up to four top-level objects in the messages you receive.
Objects:
EventErrorResultData
Audio can be either streamed or received, and it is encoded in Base64 for transport through the socket. The same message structure applies in both directions — sending and receiving:
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
Below, we describe in detail how the socket behaves for each supported technology.
ROUTE Recognize
/v1/advanced-recognition/recognize
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Asr Result Message
{
"result": {
"technology": "asr",
"model_name": <string>,
"type": <int>,
"type_string": <string>,
"is_final": <bool>,
"begin_time": <int>,
"end_time": <int>,
"hypotheses": [ <hypothesis>, ... ]
}
}
Fields | Possible values | Description |
|---|---|---|
| - | The model name associated to the result. |
| [ | The result type as an int value |
| [ | The result type as a string value |
| [ | Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on. |
| [ | The system time in milliseconds at the start of the hypothesis recognition operation. |
| [ | The system time in milliseconds at the end of the hypothesis recognition operation. |
| [ | Indicates the likelihood the recognized words are correct. |
| - | A JSON array containing all the hypotheses of the recognized speech content. |
DETAILS Asr Result Hypothesis
"hypotheses": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"start_rule": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the entry point of the grammar (<main>). |
| A JSON array containing all the matched tokens. An item object can be either a type |
DETAILS Asr Result Item (Orthography)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "terminal",
"orthography": <string>
},
...
]
Fields | Description |
|---|---|
| The matched terminal token. |
DETAILS Asr Result Item (Tag)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "tag",
"name": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the name given as an attribute to the |
| Same as |
RECEIVE Biometrics Result Message
{
"result": {
"technology": "biometrics",
"model_name": <string>,
"id": <string>,
"probability": <double>,
"score": <double>
}
}
RECEIVE Event Message
{
"event": {
"technology": <string>,
"model_name": <string>,
"code": <int>
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Technologies | Description |
|---|---|
| Voice recognition |
| Voice biometrics |
Asr Events | Code | Description |
|---|---|---|
| 0 | Indicates that the recognizer has started processing speech input. |
| 1 | Indicates that the recognizer is no longer processing speech input. |
| 2 | Indicates that the recognizer detects input that it can identify as speech. |
| 3 | Indicates that the recognizer is receiving silence or non-speech. |
RECEIVE Error Message
{
"error": {
"technology": <string>,
"model_name": <string>,
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Recognize
/v1/voice-recognition/recognize
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result Message
{
"result": {
"type": <int>,
"type_string": <string>,
"is_final": <bool>,
"begin_time": <int>,
"end_time": <int>,
"hypotheses": [ <hypothesis>, ... ]
}
}
Fields | Possible values | Description |
|---|---|---|
| [ | The result type as an int value |
| [ | The result type as a string value |
| [ | Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on. |
| [ | The system time in milliseconds at the start of the hypothesis recognition operation. |
| [ | The system time in milliseconds at the end of the hypothesis recognition operation. |
| [ | Indicates the likelihood the recognized words are correct. |
| - | A JSON array containing all the hypotheses of the recognized speech content. |
DETAILS Result Hypothesis
"hypotheses": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"start_rule": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the entry point of the grammar (<main>). |
| A JSON array containing all the matched tokens. An item object can be either a type |
DETAILS Result Item (Orthography)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "terminal",
"orthography": <string>
},
...
]
Fields | Description |
|---|---|
| The matched terminal token. |
DETAILS Result Item (Tag)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "tag",
"name": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the name given as an attribute to the |
| Same as |
RECEIVE Event Message
{
"event": {
"code": <int>,
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Events | Code | Description |
|---|---|---|
| 0 | Indicates that the recognizer has started processing speech input. |
| 1 | Indicates that the recognizer is no longer processing speech input. |
| 2 | Indicates that the recognizer detects input that it can identify as speech. |
| 3 | Indicates that the recognizer is receiving silence or non-speech. |
| 4 | Indicates that the recognizer reached the end of the audio stream. |
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Synthesize
/v1/voice-synthesis/synthesize
Messages
RECEIVE Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Event Message
{
"event": {
"code": <int>
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Fields | Possible values |
|---|---|
| [ |
| [ |
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Authenticate
/v1/voice-biometrics/authenticate
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result Message
{
"id": <string>,
"probability": <double>,
"score": <double>
}
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Identify
/v1/voice-biometrics/identify
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result Message
{
"id": <string>,
"probability": <double>,
"type": <string>
}
ROUTE Enroll
/v1/voice-biometrics/enroll
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result MessagIne
{
"accepted": <bool>,
"progress": <int>,
"speech_duration": <double>,
"utterances": [ <utterance>, ... ]
}
DETAILS Result utterance
"utterances": [
{
"accepted": <bool>,
"contains_speech": <bool>,
"enough_speech": <bool>,
"is_band_limited": <bool>,
"is_consistent": <bool>,
"is_peak_clipped": <bool>,
"is_snr_ok": <bool>,
"snr": <double>,
"speech_duration": <double>
},
...
]
Fields | Description |
|---|---|
| Indicates that an utterance is valid and could be added to the enrollment profile. |
| Indicates whether an audio contains speech or not. |
| Indicates whether the given speech duration is enough to pass the enrollment process checks. |
| Check if the utterance is band-limited. |
| Check if the utterance is consistent with the previous utterances. |
| Indicates whether the degree of peak clipping is below a certain threshold. |
| Indicates if the SNR value is sufficiently high enough. |
| Represents the signal-to-noise ratio of the enrollment utterance. SNR value is measured in dB. |
| Represents the speech duration within an audio input. |
RECEIVE Event Message
{
"event": {
"code": 0
"code_string": "INFO",
"message": <string>,
"timestamp": <int>
}
}
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Enhance
/v1/speech-enhancement/enhance
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>,
"is_reference": <bool>
}
Fields | Description |
|---|---|
| Indicates whether the audio you send should be treated as reference audio. This is used for Acoustic Echo Cancellation (AEC), enabling the system to remove signals such as TTS playback from the microphone input. |
RECEIVE Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
SEND Audio Chunk Message
{
"pipeline": "myPipelineId",
"is_reference": <bool>,
"modifier": "myModifierId",
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
Fields | Required | Default Value | Description |
|---|---|---|---|
| Yes | - | The pipeline id used when creating the pipeline. |
| - | False | Indicates whether the audio you send should be treated as reference audio. This is used for Acoustic Echo Cancellation (AEC), enabling the system to remove signals such as TTS playback from the microphone input. |
| - | - | The modifier id used when creating the pipeline. Required when |
| Yes | - | Base 64 encoded audio data. |
| Yes | - | True to indicate the last chunk of audio data. |
RECEIVE Audio Chunk Message
{
"type": "Audio"
"context": {
"pipeline": <string>,
"module_type": "Consumer",
"module_id": <string>
},
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
Fields | Description |
|---|---|
| The pipeline id used when creating the pipeline. |
| The module type that generated the event. |
| The module id that generated the event. |
| Base 64 encoded audio data. |
| True to indicate the last chunk of audio data. |
RECEIVE Event Message
{
"type": "Event",
"context": {
"pipeline": <string>,
"module_type": <string>,
"module_id": <string>
},
"event": {
"code": <int>,
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Fields | Description |
|---|---|
| The message type: |
| The pipeline context of dispatched event. |
| The module type that generated the event. |
| The module id that generated the event. |
| The event code in integer value. Each module defines its own enum values for the same code. VoiceRecognition values: VoiceSynthesis values: Voice Biometrics values: |
| The event code in string value. |
| The event message. |
| The event timestamp. |
RECEIVE Warning Message
{
"type": "Warning",
"context": {
"pipeline": <string>,
"module_type": <string>,
"module_id": <string>
},
"warning": {
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
RECEIVE Error Message
{
"type": "Error",
"context": {
"pipeline": <string>,
"module_type": <string>,
"module_id": <string>
},
"error": {
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
RECEIVE Exception Message
module_type and module_id are optional in case exception occurs in pipeline outside modules.
{
"type": "Exception",
"context": {
"pipeline": <string>,
"module_type": <string>,
"module_id": <string>
},
"exception": {
"line": <int>,
"file": <string>,
"function": <string>,
"code": <int>,
"message": <string>,
"backtrace": [
{
"line": <int>,
"file": <string>,
"function": <string>,
"code": <int>,
"message": <string>
}
]
}
}
RECEIVE VoiceBiometrics Result Message
{
"type": "Result",
"context": {
"pipeline": <string>,
"module_type": "Consumer",
"module_id": <string>
},
"result": {
"id": <string>,
"probability": <double>,
"score": <double>
}
}
RECEIVE VoiceRecognition Result Message
{
"type": "Result",
"context": {
"pipeline": <string>,
"module_type": "Consumer",
"module_id": <string>
},
"result": {
"type": <int>,
"type_string": <string>,
"is_final": <bool>,
"begin_time": <int>,
"end_time": <int>,
"hypotheses": [ <hypothesis>, ... ]
}
}
Fields | Possible values | Description |
|---|---|---|
| [ | The result type as an int value |
| [ | The result type as a string value |
| [ | Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on. |
| [ | The system time in milliseconds at the start of the hypothesis recognition operation. |
| [ | The system time in milliseconds at the end of the hypothesis recognition operation. |
| [ | Indicates the likelihood the recognized words are correct. |
| - | A JSON array containing all the hypotheses of the recognized speech content. |
DETAILS Result Hypothesis
"hypotheses": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"start_rule": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the entry point of the grammar (<main>). |
| A JSON array containing all the matched tokens. An item object can be either a type |
DETAILS Result Item (Orthography)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "terminal",
"orthography": <string>
},
...
]
Fields | Description |
|---|---|
| The matched terminal token. |
DETAILS Result Item (Tag)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "tag",
"name": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the name given as an attribute to the |
| Same as |