WebSocket API ‎

Voice Recognition

ROUTE Recognize

CODE

/v1/voice-recognition/recognize

Messages

SEND Audio Chunk Message

JSON

{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Result Message

JSON

{
  "result": {
    "type": <int>,
    "type_string": <string>,
    "is_final": <bool>,
    "begin_time": <int>,
    "end_time": <int>,
    "hypotheses": [ <hypothesis>, ... ]
  }
}

Fields	Possible values	Description
`type`	[`0,1`]	The result type as an int value
`type_string`	[ `ASR`, `NLU` ]	The result type as a string value
`is_final`	[ `false`, `true` ]	Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on.
`begin_time`	[`0, INT_MAX`]	The system time in milliseconds at the start of the hypothesis recognition operation.
`end_time`	[`0, INT_MAX`]	The system time in milliseconds at the end of the hypothesis recognition operation.
`confidence`	[`0, 10000`]	Indicates the likelihood the recognized words are correct.
`hypotheses`	-	A JSON array containing all the hypotheses of the recognized speech content.

DETAILS Result Hypothesis

JSON

  "hypotheses": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "start_rule": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields	Description
`start_rule`	Represents the entry point of the grammar (<main>).
`items`	A JSON array containing all the matched tokens. An item object can be either a type `tag` or a `terminal`.

DETAILS Result Item (Orthography)

JSON

  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "terminal",
      "orthography": <string>
    },
    ...
  ]

Fields	Description
`orthography`	The matched terminal token.

DETAILS Result Item (Tag)

JSON

  "items": [
    {
      "confidence": <int>,
      "begin_time": <int>,
      "end_time": <int>,
      "type": "tag",
      "name": <string>,
      "items": [ <item>, ... ]
    },
    ...
  ]

Fields	Description
`name`	Represents the name given as an attribute to the `!tag` directive. Note that when choosing `vsdk-csdk` this name becomes the concatenation of the grammar name and the actual tag name.
`items`	Same as `Result item (Orthography)`

RECEIVE Event Message

JSON

{
  "event": {
    "code": <int>
    "code_string": <string>,
    "message": <string>,
    "timestamp": <int>
  }
}

Events	Code	Description
`RECOGNIZER_STARTED`	0	Indicates that the recognizer has started processing speech input.
`RECOGNIZER_STOPPED`	1	Indicates that the recognizer is no longer processing speech input.
`SPEECH_DETECTED`	2	Indicates that the recognizer detects input that it can identify as speech.
`SILENCE_DETECTED`	3	Indicates that the recognizer is receiving silence or non-speech.

RECEIVE Error Message

JSON

{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}

Voice Synthesis

ROUTE Synthesize

CODE

/v1/voice-synthesis/synthesize

Messages

RECEIVE Audio Chunk Message

JSON

{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Event Message

JSON

{
  "event": {
    "code": <int>
    "code_string": <string>,
    "message": <string>,
    "timestamp": <int>
  }
}

Fields	Possible values
`code`	[`0,7`]
`code_string`	[ `NativeEvent, GenerationStarted, GenerationEnded, ProcessFinished,` `TextRewritten, Marker, WordMarkerStart, WordMarkerEnd` ]

RECEIVE Error Message

JSON

{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}

Voice Biometrics

ROUTE Authenticate

CODE

/v1/voice-biometrics/authenticate

Messages

SEND Audio Chunk Message

JSON

{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Result Message

JSON

{
  "id": <string>,
  "probability": <double>,
  "score": <double>
}

RECEIVE Error Message

JSON

{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}

ROUTE Identify

CODE

/v1/voice-biometrics/identify

Messages

SEND Audio Chunk Message

JSON

{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Result Message

JSON

{
  "id": <string>,
  "probability": <double>,
  "type": <string>
}

ROUTE Enroll

CODE

/v1/voice-biometrics/enroll

Messages

SEND Audio Chunk Message

JSON

{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Result MessagIne

JSON

{
  "accepted": <bool>,
  "progress": <int>,
  "speech_duration": <double>,
  "utterances": [ <utterance>, ... ]
}

DETAILS Result utterance

JSON

"utterances": [
    {
      "accepted": <bool>,
      "contains_speech": <bool>,
      "enough_speech": <bool>,
      "is_band_limited": <bool>,
      "is_consistent": <bool>,
      "is_peak_clipped": <bool>,
      "is_snr_ok": <bool>,
      "snr": <double>,
      "speech_duration": <double>
    },
    ...
  ]

Fields	Description
`accepted`	Indicates that an utterance is valid and could be added to the enrollment profile.
`contains_speech`	Indicates whether an audio contains speech or not.
`enough_speech`	Indicates whether the given speech duration is enough to pass the enrollment process checks.
`is_band_limited`	Check if the utterance is band-limited.
`is_consistent`	Check if the utterance is consistent with the previous utterances.
`is_peak_clipped`	Indicates whether the degree of peak clipping is below a certain threshold.
`is_snr_ok`	Indicates if the SNR value is sufficiently high enough.
`snr`	Represents the signal-to-noise ratio of the enrollment utterance. SNR value is measured in dB.
`speech_duration`	Represents the speech duration within an audio input.

RECEIVE Event Message

JSON

{
  "event": {
    "code": 0
    "code_string": "INFO",
    "message": <string>,
    "timestamp": <int>
  }
}

RECEIVE Error Message

JSON

{
  "error": {
    "type": <string>,
    "code": <int>,
    "code_string": <string>,
    "message": <string>
  }
}

Speech Enhancement

ROUTE Enhance

CODE

/v1/speech-enhancement/enhance

Messages

SEND Audio Chunk Message

JSON

{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}

RECEIVE Audio Chunk Message

JSON

{
  "data": "data:audio/pcm;base64,<base64_audio>",
  "last": <bool>
}