Sessions - Audio Processing Pipeline

Introduction

The Audio Processing Pipeline API provides a flexible framework for building real-time audio processing workflows. The system is based on the concept of sessions, which encapsulate one or more pipelines that process audio data through a chain of modules.

Each pipeline represents a directed processing flow that starts with a producer, optionally passes through modifiers, and ends with one or more consumers. This architecture allows developers to construct complex audio processing scenarios such as recording, streaming, enhancement, recognition, or playback.

Screenshot from 2026-03-10 13-38-19-20260310-123819.png — Pipeline Structure

The API is designed to support both real-time communication via WebSockets and control operations via REST endpoints.

With this mode:

You create and configure a session.
Within that session, you define and manage one or more pipelines.
Pipelines can be started, paused, resumed, or modified in real time.
Producers and consumers can be changed dynamically without recreating the underlying resources.
Multiple pipelines can run concurrently inside the same session.
Audio can be streamed continuously without requiring the session to be restarted after each result.

This architecture enables a more efficient and natural workflow:

Resources are instantiated once and reused.
Operations are no longer constrained to single-technology or single-shot tasks.
Input becomes continuous rather than transactional.
System behavior can evolve at runtime without tearing down the execution context.

From an input perspective, session mode allows uninterrupted audio flow. Results can be retrieved while the session remains active, eliminating the need to repeatedly stop and restart processing.

See the Google Slides for a quick overview of the pipeline basics and how it works:
https://docs.google.com/presentation/d/1B6vSKgeedHH5JlT9M4HE1-KyqyW3PCvg11k67g0vn1A/
Covers core components, data flow, and key processing steps.

This feature has been introduced recently within Vivoka’s ecosystem. Your feedback is essential to refine it further. If certain aspects of the workflow, configuration model, or runtime control could better match your needs, we encourage you to share your experience.This feature has been introduced recently, please help us make it a better fit to your needs by telling us what may be improved.

Vocabulary

Session

Sessions allow users to group multiple pipelines that share the same lifecycle and communication channel. A session acts as a logical container that organizes and manages the pipelines involved in an audio processing workflow.

When a session is created, the system does not immediately instantiate the processing engines or allocate runtime resources for the pipelines. Instead, the API stores the configuration parameters that describe the pipelines and their associated modules. This configuration includes the pipeline structure, the module types (producer, modifiers, and consumers), and their respective settings.

By deferring initialization, the system avoids allocating unnecessary resources before they are actually needed. The stored configuration serves as a blueprint that the system can use later to construct the full processing pipeline.

The actual initialization of modules and allocation of processing resources occurs only when the pipeline load operation is requested. At that moment, the system reads the previously stored configuration, instantiates the required modules, validates their parameters, and prepares the pipeline for execution. This separation between configuration and runtime initialization enables more efficient resource management and provides greater flexibility when defining or modifying pipelines before they are executed.

A session provide a WebSocket connection through which we can transfer data in real time. This connection can be used to:

Send audio data to pipelines that require streaming input
Receive processed audio
Receive processing events
Receive results (such as recognition output)
Receive error notifications

It’s important to connect to the WebSocket early so you can receive any errors and detect issues as soon as they occur.

Users can create multiple sessions simultaneously and manage them independently. Sessions must be deleted manually when they are no longer needed.

ASR recognizers, TTS channels, and speech enhancement enhancers, cannot be loaded simultaneously in multiple pipelines.

Session Lifecycle

A typical session lifecycle includes:

Creating the session with one or more pipelines
Loading and starting pipelines
Streaming audio (if required)
Stopping pipelines
Deleting the session

Sequence diagram for session using voice synthesis

Pipeline

A pipeline is a processing chain that routes audio from a source to one or more processing endpoints.

State transitions are controlled through the pipeline lifecycle endpoints (load, start, pause, resume, stop, unload).

Each pipeline is composed of three categories of components:

Producer → Modifiers → Consumers

Pipeline states

Each pipeline has an internal life-cycle state:

Screenshot from 2026-03-10 15-41-24-20260310-144124.png

Unloaded: The pipeline is not initialized and holds no resources.
Loading: Resources are being allocated.
Loaded: Resources are ready, but the pipeline is not running.
Starting: The pipeline is transitioning to a running state.
Running: The pipeline is actively processing the incoming audio from producer.
Pausing: The pipeline is transitioning to a paused state.
Paused: Processing is temporarily halted.
Resuming: The pipeline is transitioning back to running.
Stopping: The pipeline is stopping execution.
Stopped: The pipeline has stopped but resources may still be allocated.
Unloading: Resources are being released.

Producer

Producers generate the initial audio stream that enters the pipeline.

They act as the source of audio data and can either capture, read, synthesize, or receive audio.

Available producers:

Producer Type	Description
AudioRecorder	Captures audio from a recording device such as a microphone.
File	Reads audio data from a file stored on disk.
Stream	Accepts audio streamed from the client through the session WebSocket.
VoiceSynthesis	Generates speech audio from text input using speech synthesis technology.

Modifiers

Modifiers process or transform the audio stream between the producer and consumers.

They are optional and can be chained together to build complex transformations.

Available modifiers:

Modifier Type	Description
ChannelExtractor	Extracts a specific audio channel from multi-channel audio input.
SpeechEnhancement	Improves audio quality by reducing noise or enhancing speech clarity.

Consumers

Consumers represent the endpoints of a pipeline. They receive processed audio data and perform actions such as playback, storage, or analysis.

A pipeline can have multiple consumers, enabling parallel outputs.

Available consumers:

Consumer Type	Description
AudioPlayer	Plays audio through an output device.
File	Writes processed audio to a file.
Stream	Streams processed audio back to the client.
VoiceBiometrics	Performs speaker identification or verification.
VoiceRecognition	Converts speech to text using speech recognition.

Usage

Running the VDK Service

The first step is to ensure that the VDK service is properly configured for your tasks. This means having the appropriate plugins installed (ASR, TTS, Biometrics) and the correct technology configuration (models, voices, enrolled users).

This configuration is done through VDK Studio. Refer to the corresponding documentation for detailed instructions.

Once your service is running, it is recommended to verify that the VDK Service is correctly configured by requesting the list of available models or voices depending on the relying technologies you’ve requested.

If a request fails or an internal error occurs, the HTTP response will contain a JSON object with an "errors" field describing what went wrong.
We recommend using a dedicated tool such as Postman to interact with and test the API.

Creating a session

A sample project is available in both C# and Python to help you get started quickly. Since the usage depends on your service configuration, you will need to adjust the placeholder configuration included in the sample project to match your setup.

The route used to create a session is the following.

[POST] /v1/sessions

The request must include a JSON body describing the session configuration with at least one pipeline. In the following example, the producer is an Audio File Producer and the consumer is Voice Recognition (ASR).

The pipeline therefore reads the file and processes it using speech recognition.

JSON

# Request JSON body
{
    "pipelines": {
        "myPipeline": {
            "producer": {
                "type": "File",
                "path": "path/to/myWav.wav",
                "sample_rate": 16000,
                "channels": 1
            },
            "consumers": {
                "myConsumer": {
                    "type": "VoiceRecognition",
                    "recognizer": "myRecognizer",
                    "models": [ "myModel" ],
                    "confidence": 0
                }
            }
        }
    }
}

If your configuration is valid, you should retrieve a token from the response.

JSON

# Response body (OK)
{
    "session_id": "7bc9e16c79-4442-2eac-bffc-22c30769bd"
}

It should be listed in the session list route.

[GET] /v1/sessions

If you are already familiar with the basic routes of the VDK Service, the process is the same. You connect to the session using the WebSocket API, which serves as the entry point for sending and receiving audio.

Running a pipeline

By default, pipelines are unloaded. You must first load them and then start them.

The loading phase serves two purposes:

Resolve potential resource conflicts: when a pipeline is loaded, it reserves the required resources.
Keep resources ready for fast restarts, preventing them from being released and reallocated, which would otherwise impact performance and startup time.

[POST] /v1/sessions/{sessionId}/pipelines/{pipelineId}/load

The request is synchronous, meaning it performs the loading operation and then returns an HTTP response code indicating whether the operation was successful.

If you are already connected to the session through a WebSocket, you will also receive an event on the socket similar to the following.

{'state': 'Loaded', 'type': 'PipelineStateChanged'}

Once the pipeline is loaded, you can start it. Depending on the consumer, it may stop automatically, and if you’re using a Stream producer, you will have to send the audio through the WebSocket.

[POST] /v1/sessions/{sessionId}/pipelines/{pipelineId}/start

Sending and receiving audio

You will receive audio on the WebSocket so be sure to be connected to the session.

Since a session can contain multiple pipelines, you must specify which pipeline you are sending audio to. Likewise, the service will indicate from which pipeline each event originates.

JSON

{
  "pipeline": "myPipeline"
  "data": "<base64AudioData>"
  "last": false|true
}

There is also a special case for Acoustic Echo Cancellation (AEC): to send the echo/reference signal, you must stream it through the socket with additional fields:

is_reference: Indicates that this audio contains only the echo/reference signal.
modifier: The name of the speech enhancement modifier in the pipeline.

JSON

{
  "pipeline": "myPipeline"
  "data": "<base64AudioData>"
  "last": false|true,
  "is_reference": true,
  "modifier": "my_speech_enhancer_name"
}

The WebSocket JSON request format:

JSON Schema

JSON

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Description of a WebSocket request",
    "type": "object",
    "properties": {
        "pipeline": {
            "description": "The pipeline to target",
            "type": "string"
        },
        "data": {
            "description": "A chunk of audio data to process in base64 audio/pcm",
            "type": "string"
        },
        "last": {
            "description": "Whether the audio chunk is the last one",
            "type": "boolean"
        },
        "modifier": {
            "description": "The modifier to target",
            "type": "string"
        },
        "is_reference": {
            "description": "Is the audio chunk a reference audio chunk",
            "type": "boolean"
        }
    },
    "required": ["pipeline", "data", "last"]
})

Updating a pipeline

Each component of a pipeline may be updated depending on its type. For example, with Speech Recognition, you may want to change the model, or with a File producer, update the file being streamed into the pipeline.

You need identifiers and a corresponding JSON request body.

[PUT] /v1/sessions/{sessionId}/pipelines/{pipelineId}/producer
[PUT] /v1/sessions/{sessionId}/pipelines/{pipelineId}/modifiers/{modifierId}
[PUT] /v1/sessions/{sessionId}/pipelines/{pipelineId}/consumers/{consumerId}

Additional routes

It is also worth noting that a pipeline can be paused and resumed. You can also dynamically create new pipelines or delete existing ones. Those routes are documented in the HTTP Api → Sessions swagger documentation.

Sample project

To help you get started and experiment with this feature, please refer to our sample project available in the examples.

Configurations

For each component, there are two types of configuration:

Creation configuration: used during pipeline creation.
Update configuration: applied after the pipeline has been created.

You can update all fields except the type. Some parameters cannot be modified while the pipeline is running. The fields highlighted in blue indicate which parameters can still be updated when the pipeline is running

The available enum values and minimum/maximum constraints are defined in the raw JSON schema below.

Producers

Audio Recorder

Component type: AudioRecorder

For Linux/Windows:

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `AudioRecorder`
device	string	-	`default`	Device to record
host_api	string	-	''	Top level grouping for audio devices
sample_rate	enum	-	16000	Sample Rate (Hz) Values: `16000`, `22050`
channels	enum	-	1	Channels count (Mono/Stereo) Values: `1`, `2`
buffer_size	integer	-	0 (any)	An option to buffer audio in the producer until size is reached

You can find a list of available devices using the routes:

[GET] /v1/audio/input-devices

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["AudioRecorder"] },
    "device": { "type": "string", "minLength": 1, "default": "default" },
    "host_api": { "type": "string", "minLength": 1, "default": "" },
    "sample_rate": {"type": "integer","enum": [16000, 22050],"default": 16000},
    "channels": { "type": "integer", "enum": [1, 2], "default": 1 },
    "buffer_size": { "type": "integer", "minimum": 0, "default": 0 }
  },
  "required": ["type"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "device": { "type": "string", "minLength": 1 },
    "host_api": { "type": "string", "minLength": 1 },
    "sample_rate": {"type": "integer","enum": [16000, 22050] },
    "channels": { "type": "integer", "enum": [1, 2] },
    "buffer_size": { "type": "integer", "minimum": 0 }
  }
}

For Android

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `AudioRecorder`
audio_source	enum	-	`VoiceCommunication`	Defines where recorded audio comes from (more). Values: `Camcorder`, `Default`, `Mic`, `RemoteSubmix`, `VoiceCall`, `VoiceCommunication`, `VoiceDownlink`, `VoicePerformance`, `VoiceRecognition`, `VoiceUplink`
sample_rate	enum	-	16000	Sample Rate (Hz) Values: `16000`, `22050`
channels	enum	-	1	Channels count (Mono/Stereo) Values: `1`, `2`
buffer_size	integer	-	1024	An option to buffer audio in the producer until size is reached

Raw JSON Schema

Creation schema

JSON

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
    "type": { "type": "string", "enutgm": ["AudioRecorder"] },
    "sample_rate": {"type": "integer","enum": [16000, 22050],"default": 16000},
    "channels": { "type": "integer", "enum": [1, 2], "default": 1 },
    "buffer_size": { "type": "integer", "minimum": 100, "default": 1024 },
    "audio_source": {
        "type": "string",
        "enum": [
        "Camcorder", "Default", "Mic", "RemoteSubmix", "VoiceCall",
        "VoiceCommunication", "VoiceDownlink", "VoicePerformance",
        "VoiceRecognition", "VoiceUplink"
        ],
        "default": "VoiceCommunication"
    }
    },
    "required": ["type"]
}

Update schema

JSON

{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "description": "Pipeline configuration",
    "type": "object",
    "properties": {
    "sample_rate": {"type": "integer","enum": [16000, 22050] },
    "channels": { "type": "integer", "enum": [1, 2] },
    "buffer_size": { "type": "integer", "minimum": 100 },
    "audio_source": {
        "type": "string",
        "enum": [
        "Camcorder", "Default", "Mic", "RemoteSubmix", "VoiceCall",
        "VoiceCommunication", "VoiceDownlink", "VoicePerformance",
        "VoiceRecognition", "VoiceUplink"
        ]
    }
    }
}

File Producer

Component type: File

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `File`
path	string	yes	-	Path to the audio file
sample_rate	enum	-	16000	Sample Rate (Hz) Values: `16000`, `22050`
channels	enum	-	1	Channels count (Mono/Stereo) Values: `1`, `2`
buffer_size	integer	-	1024	An option to buffer audio in the producer until size is reached
acceleration_rate	number	-	1.0	A way to stream audio faster than real time

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["File"] },
    "path": { "type": "string", "minLength": 1 },
    "sample_rate": { "type": "integer", "enum": [16000, 22050], "default": 16000 },
    "channels": { "type": "integer", "enum": [1, 2], "default": 1 },
    "buffer_size": { "type": "integer", "minimum": 100, "default": 1024 },
    "acceleration_rate": { "type": "number", "minimum": 1.0, "default": 1.0 }
  },
  "required": ["type", "path"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "path": { "type": "string", "minLength": 1 },
    "sample_rate": { "type": "integer", "enum": [16000, 22050] },
    "channels": { "type": "integer", "enum": [1, 2] },
    "buffer_size": { "type": "integer", "minimum": 100 },
    "acceleration_rate": { "type": "number", "minimum": 1.0 }
  }
}

Voice Synthesis (TTS)

Component type: VoiceSynthesis

channel, voice and text cannot be updated while synthesizing text.

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `VoiceSynthesis`
channel	string	yes	-	TTS Channel name
voice	string	yes	-	TTS Voice name
text	string	-	''	Text to synthesize

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["VoiceSynthesis"] },
    "channel": { "type": "string", "minLength": 1 },
    "voice": { "type": "string", "minLength": 1 },
    "text": { "type": "string", "minLength": 1, "default": "" }
  },
  "required": ["type", "channel", "voice"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "channel": { "type": "string", "minLength": 1 },
    "voice": { "type": "string", "minLength": 1 },
    "text": { "type": "string", "minLength": 1 }
  }
}

Stream producer

Component type: Stream

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `Stream`
sample_rate	enum	-	16000	Sample Rate (Hz) Values: `16000`, `22050`
channels	enum	-	1	Channels count (Mono/Stereo) Values: `1`, `2`

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["Stream"] },
    "sample_rate": { "type": "integer", "enum": [16000, 22050], "default": 16000 },
    "channels": { "type": "integer", "enum": [1, 2], "default": 1 }
  },
  "required": ["type"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "sample_rate": { "type": "integer", "enum": [16000, 22050] },
    "channels": { "type": "integer", "enum": [1, 2] }
  }
}

Modifiers

Speech Enhancement

Component type: SpeechEnhancement

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `SpeechEnhancement`
enhancer	string	yes	-	Enhancer name

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["SpeechEnhancement"] },
    "enhancer": { "type": "string", "minLength": 1 }
  },
  "required": ["type", "enhancer"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "enhancer": { "type": "string", "minLength": 1 }
  }
}

Channel Extractor

Component type: ChannelExtractor

Field	type	Required	Default value	Description
type	string	yes	-	Component type
channel_index	integer	-	0	Channel to extract

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["ChannelExtractor"] },
    "channel_index": { "type": "integer", "minimum": 0 , "default": 0 }
  },
  "required": ["type"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "channel_index": { "type": "integer", "minimum": 0 }
  }
}

Consumers

Audio Player

Component type: AudioPlayer

For Linux/Windows:

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `AudioPlayer`
device	string	-	`default`	Output device name
host_api	string	-	''	Top level grouping for audio devices
volume	number	-	1.0	Output volume (ranging from 0.0 to 1.0)

You can find a list of available devices using the routes:

[GET] /v1/audio/output-devices

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["AudioPlayer"] },
    "device": { "type": "string", "default": "default", "minLength": 1 },
    "host_api": { "type": "string", "minLength": 1, "default": "" },
    "volume": { "type": "number", "minimum": 0.0, "maximum": 1.0, "default": 1.0 }
  },
  "required": ["type"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "device": { "type": "string", "minLength": 1 },
    "host_api": { "type": "string", "minLength": 1 },
    "volume": { "type": "number", "minimum": 0.0, "maximum": 1.0 }
  }
}

For Android:

Field	type	Required	Default value	Description
type	string	yes	-	Component type
content_type	enum	-	`Speech`	Defines the type of the sound (more) Values: `Unknown`, `Movie`, `Music`, `Sonification`, `Speech`
usage	enum	-	`Media`	The usage type (more) Values: `Unknown`, `Media`, `VoiceCommunication`, `VoiceCommunicationSignalling`, `Alarm`, `Notification`, `NotificationRingtone`, `NotificationCommunicationRequest`, `NotificationCommunicationInstant`, `NotificationCommunicationDelayed`, `NotificationEvent`, `AssistanceAccessibility`, `AssistanceNavigationGuidance`, `AssistanceSonification`, `Game`, `Assistant`, `CallAssistant`, `Emergency`, `Safety`, `VehicleStatus`, `Announcement`, `SpeakerCleanup`
volume	number	-	1.0	Output volume (ranging from 0.0 to 1.0)

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["AudioPlayer"] },
    "volume": { "type": "number", "minimum": 0.0, "maximum": 1.0, "default": 1.0 },
    "content_type": { "type": "string", "enum": [
        "Unknown", "Movie", "Music", "Sonification", "Speech"
    ], "default": "Speech" },
    "usage": { "type": "string", "enum": [
      "Unknown", "Media", "VoiceCommunication", "VoiceCommunicationSignalling", "Alarm",
      "Notification", "NotificationRingtone", "NotificationCommunicationRequest",
      "NotificationCommunicationInstant", "NotificationCommunicationDelayed",
      "NotificationEvent", "AssistanceAccessibility", "AssistanceNavigationGuidance",
      "AssistanceSonification", "Game", "Assistant", "CallAssistant", "Emergency",
      "Safety", "VehicleStatus", "Announcement", "SpeakerCleanup"
    ], "default": "Media" }
  },
  "required": ["type"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "content_type": { "type": "string", "enum": [
        "Unknown", "Movie", "Music", "Sonification", "Speech"
    ]},
    "usage": { "type": "string", "enum": [
      "Unknown", "Media", "VoiceCommunication", "VoiceCommunicationSignalling", "Alarm",
      "Notification", "NotificationRingtone", "NotificationCommunicationRequest",
      "NotificationCommunicationInstant", "NotificationCommunicationDelayed",
      "NotificationEvent", "AssistanceAccessibility", "AssistanceNavigationGuidance",
      "AssistanceSonification", "Game", "Assistant", "CallAssistant", "Emergency",
      "Safety", "VehicleStatus", "Announcement", "SpeakerCleanup"
    ]},
    "volume": { "type": "number", "minimum": 0.0, "maximum": 1.0 }
  }
}

File Consumer

Component type: File

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `File`
path	string	yes	-	Path to the audio file
truncate	boolean	-	true	Overwrite existing

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["File"] },
    "path": { "type": "string", "minLength": 1 },
    "truncate": { "type": "boolean", "default": true }
  },
  "required": ["type", "path"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "path": { "type": "string", "minLength": 1 },
    "truncate": { "type": "boolean" }
  }
}

Stream Consumer

Component type: Stream

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `Stream`
buffer_size	integer	-	0 (any)	If set, buffers until the size reaches the threshold or the last buffer is received

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["Stream"] },
    "buffer_size": { "type": "integer", "minimum": 0, "default": 0 }
  },
  "required": ["type"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "buffer_size": { "type": "integer", "minimum": 0 }
  }
}

Voice Biometrics

Component type: VoiceBiometrics

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `VoiceBiometrics`
model_name	string	yes	-	Biometrics model name
model_type	enum	yes	-	Values: `TextDependent`, `TextIndependent`
mode	enum	yes	-	Values: `Authentication`, `Identification`
username	string	-	-	Required for authentication

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "enum": ["VoiceBiometrics"] },
    "model_name": { "type": "string", "minLength": 1 },
    "model_type": { "type": "string", "enum": ["TextDependent", "TextIndependent"] },
    "mode": { "type": "string", "enum": ["Authentication", "Identification"] },
    "username": { "type": "string", "minLength": 1 }
  },
  "required": ["type", "model_name", "model_type", "mode"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "model_name": { "type": "string", "minLength": 1 },
    "model_type": { "type": "string", "enum": ["TextDependent", "TextIndependent"] },
    "mode": { "type": "string", "enum": ["Authentication", "Identification"] },
    "username": { "type": "string", "minLength": 1 }
  }
}

Voice Recognition

Component type: VoiceRecognition

Field	type	Required	Default value	Description
type	string	yes	-	Component type: `VoiceRecognition`
recognizer	string	yes	-	Recognizer name
models	array	yes	-	ASR models to run
confidence	integer	-	0	Minimum confidence threshold for results
stop_at_first_result	boolean	-	false	Stop pipeline at first result
models_settings	object			ASR models configuration object
vec	object	-	-	Post-processor VEC configuration object
start_time	integer	-	-	Is available only in update to set the model(s).

models_settings.<model_name>

Field	type	Required	Default value	Description
user	string	-	-	Optional UserWord user
slots	object	-	-	Slot configuration for the model

models_settings.<model_name>.slots.<slot_name>

Field	type	Required	Default value	Description
values	array	-	-	List of values for the slot

Raw JSON Schema

Creation schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["VoiceRecognition"] },
    "recognizer": { "type": "string", "minLength": 1 },
    "vec": {
      "description": "Post processing configuration.",
      "type": "object",
      "properties": {
        "accent": {
          "description": "The accent to use for VEC.",
          "type": "string",
          "default": ""
        },
        "context": {
          "description": "The context to use for VEC.",
          "type": "array",
          "items": { "type": "string" }
        }
      }
    },
    "models": { "type": "array", "minLength": 1, "items": { "type": "string" } },
    "confidence": { "type": "integer", "minimum": 0, "maximum": 10000, "default": 0 },
    "stop_at_first_result": { "type": "boolean", "default": false },
    "models_settings": {
      "type": "object",
      "properties": {
        "user": { "type": "string", "minLength": 1 },
        "slots": {
          "type": "object",
          "properties": {
            "slot": {
              "type": "object",
              "properties": {
                "values": { "type": "array", "items": { "type": "string" } }
              }
            }
          }
        }
      }
    }
  },
  "required": ["type", "recognizer", "models"]
}

Update schema

JSON

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "recognizer": { "type": "string", "minLength": 1 },
    "vec": {
      "description": "Post processing configuration.",
      "type": "object",
      "properties": {
        "accent": {
          "description": "The accent to use for VEC.",
          "type": "string",
          "default": ""
        },
        "context": {
          "description": "The context to use for VEC.",
          "type": "array",
          "items": { "type": "string" }
        }
      }
    },
    "start_time": { "type": "integer" },
    "models": { "type": "array", "minLength": 1, "items": { "type": "string" } },
    "confidence": { "type": "integer", "minimum": 0, "maximum": 10000 },
    "stop_at_first_result": { "type": "boolean" },
    "models_settings": {
      "type": "object",
      "properties": {
        "user": { "type": "string", "minLength": 1 },
        "slots": {
          "type": "object",
          "properties": {
            "slot": {
              "type": "object",
              "properties": {
                "values": { "type": "array", "items": { "type": "string" } }
              }
            }
          }
        }
      }
    }
  }
}

A complete configuration example is provided below for clarity.

JSON

{
    "type": "VoiceRecognition",
    "recognizer": "rec_eng-US",
    "models": [ "csdk-dynamic-drinks" ],
    "confidence": 0,
    "stop_at_first_result": false,
    "models_settings": {
        "csdk-dynamic-drinks": {
            "slots": {
                "drink": {
                    "values": [ "coffee", "tea" ]
                }
            },
            "user": "emmanuel"
        }
    },
    "vec": {
        "context": ["a 1 3", "b 1 4"],
        "accent": "eng"
    }
}