Skip to main content
Skip table of contents

WebSocket API ‎

Establishing a WebSocket Connection

When calling asynchronous routes in the REST API, you first obtain a token that grants access to a WebSocket channel.
Using the WebSocket protocol, connect to the following endpoint:

CODE
ws://{HOSTNAME}:{VDK_SERVICE_PORT}/v1/ws/{TOKEN}

Each WebSocket instance is bound to the specific task triggered by its corresponding route.
Its behavior may vary depending on which endpoint issued the token.

As of now, this is the only route in the WebSocket API, since all other interactions occur through the socket itself.


Working with WebSocket Routes

The socket exchanges data in JSON format.
You may encounter up to four top-level objects in the messages you receive.

Objects:

  • Event

  • Error

  • Result

  • Data

Audio can be either streamed or received, and it is encoded in Base64 for transport through the socket. The same message structure applies in both directions — sending and receiving:

CODE
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

Below, we describe in detail how the socket behaves for each supported technology.


Advanced Recognition

ROUTE Recognize

CODE
/v1/advanced-recognition/recognize   

Messages

SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Asr Result Message

JSON
{
  "result": {
    "technology": "asr",
    "model_name": <string>,
    "type": <int>,
    "type_string": <string>,
    "is_final": <bool>,
    "begin_time": <int>,
    "end_time": <int>,
    "hypotheses": [ <hypothesis>, ... ]
  }
}

Fields

Possible values

Description

model_name

-

The model name associated to the result.

type

[0,1]

The result type as an int value

type_string

[ ASR, NLU ]

The result type as a string value

is_final

[ false, true ]

Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on.

begin_time

[0, INT_MAX]

The system time in milliseconds at the start of the hypothesis recognition operation.

end_time

[0, INT_MAX]

The system time in milliseconds at the end of the hypothesis recognition operation.

confidence

[0, 10000]

Indicates the likelihood the recognized words are correct.

hypotheses

-

A JSON array containing all the hypotheses of the recognized speech content.

DETAILS Asr Result Hypothesis
JSON
  "hypotheses": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "start_rule": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields

Description

start_rule

Represents the entry point of the grammar (<main>).

items

A JSON array containing all the matched tokens. An item object can be either a type tag or a terminal.

DETAILS Asr Result Item (Orthography)
JSON
  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "terminal",
      "orthography": <string>
    },
    ...
  ]

Fields

Description

orthography

The matched terminal token.

DETAILS Asr Result Item (Tag)
JSON
  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "tag",
      "name": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields

Description

name

Represents the name given as an attribute to the !tag directive. Note that when choosing vsdk-csdk this name becomes the concatenation of the grammar name and the actual tag name.

items

Same as Result item (Orthography)

RECEIVE Biometrics Result Message

JSON
{
  "result": {
    "technology": "biometrics",
    "model_name": <string>,
    "id": <string>,
    "probability": <double>,
    "score": <double>
  }
}

RECEIVE Event Message

JSON
{
  "event": {
    "technology": <string>,
    "model_name": <string>,
    "code": <int>
    "code_string": <string>,
    "message": <string>,
    "timestamp": <int>
  }
}

Technologies

Description

asr

Voice recognition

biometrics

Voice biometrics

Asr Events

Code

Description

RECOGNIZER_STARTED

0

Indicates that the recognizer has started processing speech input.

RECOGNIZER_STOPPED

1

Indicates that the recognizer is no longer processing speech input.

SPEECH_DETECTED

2

Indicates that the recognizer detects input that it can identify as speech.

SILENCE_DETECTED

3

Indicates that the recognizer is receiving silence or non-speech.

RECEIVE Error Message

JSON
{
  "error": {
    "technology": <string>,
    "model_name": <string>,
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}
Voice Recognition

ROUTE Recognize

CODE
/v1/voice-recognition/recognize   

Messages

SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Result Message

JSON
{
  "result": {
    "type": <int>,
    "type_string": <string>,
    "is_final": <bool>,
    "begin_time": <int>,
    "end_time": <int>,
    "hypotheses": [ <hypothesis>, ... ]
  }
}

Fields

Possible values

Description

type

[0,1]

The result type as an int value

type_string

[ ASR, NLU ]

The result type as a string value

is_final

[ false, true ]

Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on.

begin_time

[0, INT_MAX]

The system time in milliseconds at the start of the hypothesis recognition operation.

end_time

[0, INT_MAX]

The system time in milliseconds at the end of the hypothesis recognition operation.

confidence

[0, 10000]

Indicates the likelihood the recognized words are correct.

hypotheses

-

A JSON array containing all the hypotheses of the recognized speech content.

DETAILS Result Hypothesis
JSON
  "hypotheses": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "start_rule": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields

Description

start_rule

Represents the entry point of the grammar (<main>).

items

A JSON array containing all the matched tokens. An item object can be either a type tag or a terminal.

DETAILS Result Item (Orthography)
JSON
  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "terminal",
      "orthography": <string>
    },
    ...
  ]

Fields

Description

orthography

The matched terminal token.

DETAILS Result Item (Tag)
JSON
  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "tag",
      "name": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields

Description

name

Represents the name given as an attribute to the !tag directive. Note that when choosing vsdk-csdk this name becomes the concatenation of the grammar name and the actual tag name.

items

Same as Result item (Orthography)

RECEIVE Event Message

JSON
{
  "event": {
    "code": <int>
    "code_string": <string>,
    "message": <string>,
    "timestamp": <int>
  }
}

Events

Code

Description

RECOGNIZER_STARTED

0

Indicates that the recognizer has started processing speech input.

RECOGNIZER_STOPPED

1

Indicates that the recognizer is no longer processing speech input.

SPEECH_DETECTED

2

Indicates that the recognizer detects input that it can identify as speech.

SILENCE_DETECTED

3

Indicates that the recognizer is receiving silence or non-speech.

RECEIVE Error Message

JSON
{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}
Voice Synthesis

ROUTE Synthesize

CODE
/v1/voice-synthesis/synthesize

Messages


RECEIVE Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Event Message

JSON
{
  "event": {
    "code": <int>
    "code_string": <string>,
    "message": <string>,
    "timestamp": <int>
  }
}

Fields

Possible values

code

[0,7]

code_string

[ NativeEvent, GenerationStarted, GenerationEnded, ProcessFinished, TextRewritten, Marker, WordMarkerStart, WordMarkerEnd ]

RECEIVE Error Message

JSON
{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}
Voice Biometrics

ROUTE Authenticate

CODE
/v1/voice-biometrics/authenticate

Messages


SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Result Message

JSON
{
  "id": <string>,
  "probability": <double>,
  "score": <double>
}

RECEIVE Error Message

JSON
{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}

ROUTE Identify

CODE
/v1/voice-biometrics/identify

Messages


SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Result Message

JSON
{
  "id": <string>,
  "probability": <double>,
  "type": <string>
}

ROUTE Enroll

CODE
/v1/voice-biometrics/enroll

Messages


SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Result MessagIne

JSON
{
  "accepted": <bool>,
  "progress": <int>,
  "speech_duration": <double>,
  "utterances": [ <utterance>, ... ]
}
DETAILS Result utterance
JSON
"utterances": [
    {
      "accepted": <bool>,
      "contains_speech": <bool>,
      "enough_speech": <bool>,
      "is_band_limited": <bool>,
      "is_consistent": <bool>,
      "is_peak_clipped": <bool>,
      "is_snr_ok": <bool>,
      "snr": <double>,
      "speech_duration": <double>
    },
    ...
  ]

Fields

Description

accepted

Indicates that an utterance is valid and could be added to the enrollment profile.

contains_speech

Indicates whether an audio contains speech or not.

enough_speech

Indicates whether the given speech duration is enough to pass the enrollment process checks.

is_band_limited

Check if the utterance is band-limited.

is_consistent

Check if the utterance is consistent with the previous utterances.

is_peak_clipped

Indicates whether the degree of peak clipping is below a certain threshold.

is_snr_ok

Indicates if the SNR value is sufficiently high enough.

snr

Represents the signal-to-noise ratio of the enrollment utterance. SNR value is measured in dB.

speech_duration

Represents the speech duration within an audio input.


RECEIVE Event Message

JSON
{
  "event": {
    "code": 0
    "code_string": "INFO",
    "message": <string>,
    "timestamp": <int>
  }
}

RECEIVE Error Message

JSON
{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}
Speech Enhancement


ROUTE Enhance

CODE
/v1/speech-enhancement/enhance

Messages

SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.