Skip to main content
Skip table of contents

WebSocket API ‎

Advanced Recognition

ROUTE Recognize

CODE
/v1/advanced-recognition/recognize   

Messages

SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Asr Result Message

JSON
{
  "result": {
    "technology": "asr",
    "model_name": <string>,
    "type": <int>,
    "type_string": <string>,
    "is_final": <bool>,
    "begin_time": <int>,
    "end_time": <int>,
    "hypotheses": [ <hypothesis>, ... ]
  }
}

Fields

Possible values

Description

model_name

-

The model name associated to the result.

type

[0,1]

The result type as an int value

type_string

[ ASR, NLU ]

The result type as a string value

is_final

[ false, true ]

Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on.

begin_time

[0, INT_MAX]

The system time in milliseconds at the start of the hypothesis recognition operation.

end_time

[0, INT_MAX]

The system time in milliseconds at the end of the hypothesis recognition operation.

confidence

[0, 10000]

Indicates the likelihood the recognized words are correct.

hypotheses

-

A JSON array containing all the hypotheses of the recognized speech content.

DETAILS Asr Result Hypothesis

JSON
  "hypotheses": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "start_rule": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields

Description

start_rule

Represents the entry point of the grammar (<main>).

items

A JSON array containing all the matched tokens. An item object can be either a type tag or a terminal.

DETAILS Asr Result Item (Orthography)

JSON
  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "terminal",
      "orthography": <string>
    },
    ...
  ]

Fields

Description

orthography

The matched terminal token.

DETAILS Asr Result Item (Tag)

JSON
  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "tag",
      "name": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields

Description

name

Represents the name given as an attribute to the !tag directive. Note that when choosing vsdk-csdk this name becomes the concatenation of the grammar name and the actual tag name.

items

Same as Result item (Orthography)

RECEIVE Biometrics Result Message

JSON
{
  "result": {
    "technology": "biometrics",
    "model_name": <string>,
    "id": <string>,
    "probability": <double>,
    "score": <double>
  }
}

RECEIVE Event Message

JSON
{
  "event": {
    "technology": <string>,
    "model_name": <string>,
    "code": <int>
    "code_string": <string>,
    "message": <string>,
    "timestamp": <int>
  }
}

Technologies

Description

asr

Voice recognition

biometrics

Voice biometrics

Asr Events

Code

Description

RECOGNIZER_STARTED

0

Indicates that the recognizer has started processing speech input.

RECOGNIZER_STOPPED

1

Indicates that the recognizer is no longer processing speech input.

SPEECH_DETECTED

2

Indicates that the recognizer detects input that it can identify as speech.

SILENCE_DETECTED

3

Indicates that the recognizer is receiving silence or non-speech.

RECEIVE Error Message

JSON
{
  "error": {
    "technology": <string>,
    "model_name": <string>,
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}
Voice Recognition

ROUTE Recognize

CODE
/v1/voice-recognition/recognize   

Messages

SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Result Message

JSON
{
  "result": {
    "type": <int>,
    "type_string": <string>,
    "is_final": <bool>,
    "begin_time": <int>,
    "end_time": <int>,
    "hypotheses": [ <hypothesis>, ... ]
  }
}

Fields

Possible values

Description

type

[0,1]

The result type as an int value

type_string

[ ASR, NLU ]

The result type as a string value

is_final

[ false, true ]

Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on.

begin_time

[0, INT_MAX]

The system time in milliseconds at the start of the hypothesis recognition operation.

end_time

[0, INT_MAX]

The system time in milliseconds at the end of the hypothesis recognition operation.

confidence

[0, 10000]

Indicates the likelihood the recognized words are correct.

hypotheses

-

A JSON array containing all the hypotheses of the recognized speech content.

DETAILS Result Hypothesis

JSON
  "hypotheses": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "start_rule": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields

Description

start_rule

Represents the entry point of the grammar (<main>).

items

A JSON array containing all the matched tokens. An item object can be either a type tag or a terminal.

DETAILS Result Item (Orthography)

JSON
  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "terminal",
      "orthography": <string>
    },
    ...
  ]

Fields

Description

orthography

The matched terminal token.

DETAILS Result Item (Tag)

JSON
  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "tag",
      "name": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields

Description

name

Represents the name given as an attribute to the !tag directive. Note that when choosing vsdk-csdk this name becomes the concatenation of the grammar name and the actual tag name.

items

Same as Result item (Orthography)

RECEIVE Event Message

JSON
{
  "event": {
    "code": <int>
    "code_string": <string>,
    "message": <string>,
    "timestamp": <int>
  }
}

Events

Code

Description

RECOGNIZER_STARTED

0

Indicates that the recognizer has started processing speech input.

RECOGNIZER_STOPPED

1

Indicates that the recognizer is no longer processing speech input.

SPEECH_DETECTED

2

Indicates that the recognizer detects input that it can identify as speech.

SILENCE_DETECTED

3

Indicates that the recognizer is receiving silence or non-speech.

RECEIVE Error Message

JSON
{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}
Voice Synthesis

ROUTE Synthesize

CODE
/v1/voice-synthesis/synthesize

Messages


RECEIVE Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Event Message

JSON
{
  "event": {
    "code": <int>
    "code_string": <string>,
    "message": <string>,
    "timestamp": <int>
  }
}

Fields

Possible values

code

[0,7]

code_string

[ NativeEvent, GenerationStarted, GenerationEnded, ProcessFinished, TextRewritten, Marker, WordMarkerStart, WordMarkerEnd ]

RECEIVE Error Message

JSON
{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}
Voice Biometrics

ROUTE Authenticate

CODE
/v1/voice-biometrics/authenticate

Messages


SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Result Message

JSON
{
  "id": <string>,
  "probability": <double>,
  "score": <double>
}

RECEIVE Error Message

JSON
{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}

ROUTE Identify

CODE
/v1/voice-biometrics/identify

Messages


SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Result Message

JSON
{
  "id": <string>,
  "probability": <double>,
  "type": <string>
}

ROUTE Enroll

CODE
/v1/voice-biometrics/enroll

Messages


SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Result MessagIne

JSON
{
  "accepted": <bool>,
  "progress": <int>,
  "speech_duration": <double>,
  "utterances": [ <utterance>, ... ]
}

DETAILS Result utterance

JSON
"utterances": [
    {
      "accepted": <bool>,
      "contains_speech": <bool>,
      "enough_speech": <bool>,
      "is_band_limited": <bool>,
      "is_consistent": <bool>,
      "is_peak_clipped": <bool>,
      "is_snr_ok": <bool>,
      "snr": <double>,
      "speech_duration": <double>
    },
    ...
  ]

Fields

Description

accepted

Indicates that an utterance is valid and could be added to the enrollment profile.

contains_speech

Indicates whether an audio contains speech or not.

enough_speech

Indicates whether the given speech duration is enough to pass the enrollment process checks.

is_band_limited

Check if the utterance is band-limited.

is_consistent

Check if the utterance is consistent with the previous utterances.

is_peak_clipped

Indicates whether the degree of peak clipping is below a certain threshold.

is_snr_ok

Indicates if the SNR value is sufficiently high enough.

snr

Represents the signal-to-noise ratio of the enrollment utterance. SNR value is measured in dB.

speech_duration

Represents the speech duration within an audio input.


RECEIVE Event Message

JSON
{
  "event": {
    "code": 0
    "code_string": "INFO",
    "message": <string>,
    "timestamp": <int>
  }
}

RECEIVE Error Message

JSON
{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}
Speech Enhancement


ROUTE Enhance

CODE
/v1/speech-enhancement/enhance

Messages

SEND Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
} 

RECEIVE Audio Chunk Message

JSON
{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.