WebSocket API
ROUTE Recognize
/v1/advanced-recognition/recognize
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Asr Result Message
{
"result": {
"technology": "asr",
"model_name": <string>,
"type": <int>,
"type_string": <string>,
"is_final": <bool>,
"begin_time": <int>,
"end_time": <int>,
"hypotheses": [ <hypothesis>, ... ]
}
}
Fields | Possible values | Description |
---|---|---|
| - | The model name associated to the result. |
| [ | The result type as an int value |
| [ | The result type as a string value |
| [ | Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on. |
| [ | The system time in milliseconds at the start of the hypothesis recognition operation. |
| [ | The system time in milliseconds at the end of the hypothesis recognition operation. |
| [ | Indicates the likelihood the recognized words are correct. |
| - | A JSON array containing all the hypotheses of the recognized speech content. |
DETAILS Asr Result Hypothesis
"hypotheses": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"start_rule": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
---|---|
| Represents the entry point of the grammar (<main>). |
| A JSON array containing all the matched tokens. An item object can be either a type |
DETAILS Asr Result Item (Orthography)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "terminal",
"orthography": <string>
},
...
]
Fields | Description |
---|---|
| The matched terminal token. |
DETAILS Asr Result Item (Tag)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "tag",
"name": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
---|---|
| Represents the name given as an attribute to the |
| Same as |
RECEIVE Biometrics Result Message
{
"result": {
"technology": "biometrics",
"model_name": <string>,
"id": <string>,
"probability": <double>,
"score": <double>
}
}
RECEIVE Event Message
{
"event": {
"technology": <string>,
"model_name": <string>,
"code": <int>
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Technologies | Description |
---|---|
| Voice recognition |
| Voice biometrics |
Asr Events | Code | Description |
---|---|---|
| 0 | Indicates that the recognizer has started processing speech input. |
| 1 | Indicates that the recognizer is no longer processing speech input. |
| 2 | Indicates that the recognizer detects input that it can identify as speech. |
| 3 | Indicates that the recognizer is receiving silence or non-speech. |
RECEIVE Error Message
{
"error": {
"technology": <string>,
"model_name": <string>,
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Recognize
/v1/voice-recognition/recognize
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result Message
{
"result": {
"type": <int>,
"type_string": <string>,
"is_final": <bool>,
"begin_time": <int>,
"end_time": <int>,
"hypotheses": [ <hypothesis>, ... ]
}
}
Fields | Possible values | Description |
---|---|---|
| [ | The result type as an int value |
| [ | The result type as a string value |
| [ | Indicates whether this result is final or not. if true, this is the final time this result will be returned; if not, then this result is an interim result and may be updated later on. |
| [ | The system time in milliseconds at the start of the hypothesis recognition operation. |
| [ | The system time in milliseconds at the end of the hypothesis recognition operation. |
| [ | Indicates the likelihood the recognized words are correct. |
| - | A JSON array containing all the hypotheses of the recognized speech content. |
DETAILS Result Hypothesis
"hypotheses": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"start_rule": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
---|---|
| Represents the entry point of the grammar (<main>). |
| A JSON array containing all the matched tokens. An item object can be either a type |
DETAILS Result Item (Orthography)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "terminal",
"orthography": <string>
},
...
]
Fields | Description |
---|---|
| The matched terminal token. |
DETAILS Result Item (Tag)
"items": [
{
"confidence": <int>,
"begin_time": <int>,
"end_time": <int>,
"type": "tag",
"name": <string>,
"items": [ <item>, ... ]
},
...
]
Fields | Description |
---|---|
| Represents the name given as an attribute to the |
| Same as |
RECEIVE Event Message
{
"event": {
"code": <int>
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Events | Code | Description |
---|---|---|
| 0 | Indicates that the recognizer has started processing speech input. |
| 1 | Indicates that the recognizer is no longer processing speech input. |
| 2 | Indicates that the recognizer detects input that it can identify as speech. |
| 3 | Indicates that the recognizer is receiving silence or non-speech. |
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Synthesize
/v1/voice-synthesis/synthesize
Messages
RECEIVE Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Event Message
{
"event": {
"code": <int>
"code_string": <string>,
"message": <string>,
"timestamp": <int>
}
}
Fields | Possible values |
---|---|
| [ |
| [ |
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Authenticate
/v1/voice-biometrics/authenticate
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result Message
{
"id": <string>,
"probability": <double>,
"score": <double>
}
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Identify
/v1/voice-biometrics/identify
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result Message
{
"id": <string>,
"probability": <double>,
"type": <string>
}
ROUTE Enroll
/v1/voice-biometrics/enroll
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Result MessagIne
{
"accepted": <bool>,
"progress": <int>,
"speech_duration": <double>,
"utterances": [ <utterance>, ... ]
}
DETAILS Result utterance
"utterances": [
{
"accepted": <bool>,
"contains_speech": <bool>,
"enough_speech": <bool>,
"is_band_limited": <bool>,
"is_consistent": <bool>,
"is_peak_clipped": <bool>,
"is_snr_ok": <bool>,
"snr": <double>,
"speech_duration": <double>
},
...
]
Fields | Description |
---|---|
| Indicates that an utterance is valid and could be added to the enrollment profile. |
| Indicates whether an audio contains speech or not. |
| Indicates whether the given speech duration is enough to pass the enrollment process checks. |
| Check if the utterance is band-limited. |
| Check if the utterance is consistent with the previous utterances. |
| Indicates whether the degree of peak clipping is below a certain threshold. |
| Indicates if the SNR value is sufficiently high enough. |
| Represents the signal-to-noise ratio of the enrollment utterance. SNR value is measured in dB. |
| Represents the speech duration within an audio input. |
RECEIVE Event Message
{
"event": {
"code": 0
"code_string": "INFO",
"message": <string>,
"timestamp": <int>
}
}
RECEIVE Error Message
{
"error": {
"type": <string>,
"code": <int>,
"code_string": <string>,
"message": <string>
}
}
ROUTE Enhance
/v1/speech-enhancement/enhance
Messages
SEND Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}
RECEIVE Audio Chunk Message
{
"data": "data:audio/pcm;base64,<base64_audio>",
"last": <bool>
}