WebSocket API
Establishing a WebSocket Connection
When calling asynchronous routes in the REST API, you first obtain a token that grants access to a WebSocket channel.
Using the WebSocket protocol, connect to the following endpoint:
ws://{HOSTNAME}:{VDK_SERVICE_PORT}/v1/ws/{TOKEN}
Each WebSocket instance is bound to the specific task triggered by its corresponding route.
Its behavior may vary depending on which endpoint issued the token.
As of now, this is the only route in the WebSocket API, since all other interactions occur through the socket itself.
Working with WebSocket Routes
The socket exchanges data in JSON format.
You may encounter up to four top-level objects in the messages you receive.
Objects:
EventErrorResultData
Audio can be either streamed or received, and it is encoded in Base64 for transport through the socket. The same message structure applies in both directions — sending and receiving:
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
Below, we describe in detail how the socket behaves for each supported technology.
ROUTE Recognize
/v1/advanced-recognition/recognize
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Asr Result Message
{
"result": {
"technology": "asr",
"model_name": <string>,
"type": <int>,
"type_string": <string>,
"is_final": <bool>,
"begin_time": <int>,
"end_time": <int>,
"hypotheses": [ <hypothesis>, ... ]
}
}
Fields | Possible values | Description |
|---|---|---|
| - | The model name associated to the result. |
| [ | The result type as an int value |
| [ | The result type as a string value |
| [ | Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on. |
| [ | The system time in milliseconds at the start of the hypothesis recognition operation. |
| [ | The system time in milliseconds at the end of the hypothesis recognition operation. |
| [ | Indicates the likelihood the recognized words are correct. |
| - | A JSON array containing all the hypotheses of the recognized speech content. |
DETAILS Asr Result Hypothesis
"hypotheses": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"start_rule": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the entry point of the grammar (<main>). |
| A JSON array containing all the matched tokens. An item object can be either a type |
DETAILS Asr Result Item (Orthography)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "terminal",
"orthography": <string>
},
...
]
Fields | Description |
|---|---|
| The matched terminal token. |
DETAILS Asr Result Item (Tag)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "tag",
"name": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the name given as an attribute to the |
| Same as |
RECEIVE Biometrics Result Message
{
"result": {
"technology": "biometrics",
"model_name": <string>,
"id": <string>,
"probability": <double>,
"score": <double>
}
}
RECEIVE Event Message
{
"event": {
"technology": <string>,
"model_name": <string>,
"code": <int>
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Technologies | Description |
|---|---|
| Voice recognition |
| Voice biometrics |
Asr Events | Code | Description |
|---|---|---|
| 0 | Indicates that the recognizer has started processing speech input. |
| 1 | Indicates that the recognizer is no longer processing speech input. |
| 2 | Indicates that the recognizer detects input that it can identify as speech. |
| 3 | Indicates that the recognizer is receiving silence or non-speech. |
RECEIVE Error Message
{
"error": {
"technology": <string>,
"model_name": <string>,
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Recognize
/v1/voice-recognition/recognize
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result Message
{
"result": {
"type": <int>,
"type_string": <string>,
"is_final": <bool>,
"begin_time": <int>,
"end_time": <int>,
"hypotheses": [ <hypothesis>, ... ]
}
}
Fields | Possible values | Description |
|---|---|---|
| [ | The result type as an int value |
| [ | The result type as a string value |
| [ | Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on. |
| [ | The system time in milliseconds at the start of the hypothesis recognition operation. |
| [ | The system time in milliseconds at the end of the hypothesis recognition operation. |
| [ | Indicates the likelihood the recognized words are correct. |
| - | A JSON array containing all the hypotheses of the recognized speech content. |
DETAILS Result Hypothesis
"hypotheses": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"start_rule": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the entry point of the grammar (<main>). |
| A JSON array containing all the matched tokens. An item object can be either a type |
DETAILS Result Item (Orthography)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "terminal",
"orthography": <string>
},
...
]
Fields | Description |
|---|---|
| The matched terminal token. |
DETAILS Result Item (Tag)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "tag",
"name": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
|---|---|
| Represents the name given as an attribute to the |
| Same as |
RECEIVE Event Message
{
"event": {
"code": <int>
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Events | Code | Description |
|---|---|---|
| 0 | Indicates that the recognizer has started processing speech input. |
| 1 | Indicates that the recognizer is no longer processing speech input. |
| 2 | Indicates that the recognizer detects input that it can identify as speech. |
| 3 | Indicates that the recognizer is receiving silence or non-speech. |
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Synthesize
/v1/voice-synthesis/synthesize
Messages
RECEIVE Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Event Message
{
"event": {
"code": <int>
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Fields | Possible values |
|---|---|
| [ |
| [ |
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Authenticate
/v1/voice-biometrics/authenticate
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result Message
{
"id": <string>,
"probability": <double>,
"score": <double>
}
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Identify
/v1/voice-biometrics/identify
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result Message
{
"id": <string>,
"probability": <double>,
"type": <string>
}
ROUTE Enroll
/v1/voice-biometrics/enroll
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result MessagIne
{
"accepted": <bool>,
"progress": <int>,
"speech_duration": <double>,
"utterances": [ <utterance>, ... ]
}
DETAILS Result utterance
"utterances": [
{
"accepted": <bool>,
"contains_speech": <bool>,
"enough_speech": <bool>,
"is_band_limited": <bool>,
"is_consistent": <bool>,
"is_peak_clipped": <bool>,
"is_snr_ok": <bool>,
"snr": <double>,
"speech_duration": <double>
},
...
]
Fields | Description |
|---|---|
| Indicates that an utterance is valid and could be added to the enrollment profile. |
| Indicates whether an audio contains speech or not. |
| Indicates whether the given speech duration is enough to pass the enrollment process checks. |
| Check if the utterance is band-limited. |
| Check if the utterance is consistent with the previous utterances. |
| Indicates whether the degree of peak clipping is below a certain threshold. |
| Indicates if the SNR value is sufficiently high enough. |
| Represents the signal-to-noise ratio of the enrollment utterance. SNR value is measured in dB. |
| Represents the speech duration within an audio input. |
RECEIVE Event Message
{
"event": {
"code": 0
"code_string": "INFO",
"message": <string>,
"timestamp": <int>
}
}
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Enhance
/v1/speech-enhancement/enhance
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}