Skip to main content
Skip table of contents

Sessions - Audio Processing Pipeline

Introduction

The Audio Processing Pipeline API provides a flexible framework for building real-time audio processing workflows. The system is based on the concept of sessions, which encapsulate one or more pipelines that process audio data through a chain of modules.

Each pipeline represents a directed processing flow that starts with a producer, optionally passes through modifiers, and ends with one or more consumers. This architecture allows developers to construct complex audio processing scenarios such as recording, streaming, enhancement, recognition, or playback.

Screenshot from 2026-03-10 13-38-19-20260310-123819.png

Pipeline Structure

The API is designed to support both real-time communication via WebSockets and control operations via REST endpoints.

With this mode:

  • You create and configure a session.

  • Within that session, you define and manage one or more pipelines.

  • Pipelines can be started, paused, resumed, or modified in real time.

  • Producers and consumers can be changed dynamically without recreating the underlying resources.

  • Multiple pipelines can run concurrently inside the same session.

  • Audio can be streamed continuously without requiring the session to be restarted after each result.

This architecture enables a more efficient and natural workflow:

  • Resources are instantiated once and reused.

  • Operations are no longer constrained to single-technology or single-shot tasks.

  • Input becomes continuous rather than transactional.

  • System behavior can evolve at runtime without tearing down the execution context.

From an input perspective, session mode allows uninterrupted audio flow. Results can be retrieved while the session remains active, eliminating the need to repeatedly stop and restart processing.

This feature has been introduced recently within Vivoka’s ecosystem. Your feedback is essential to refine it further. If certain aspects of the workflow, configuration model, or runtime control could better match your needs, we encourage you to share your experience.This feature has been introduced recently, please help us make it a better fit to your needs by telling us what may be improved.

See the Google Slides for a quick overview of the pipeline basics and how it works:
https://docs.google.com/presentation/d/1B6vSKgeedHH5JlT9M4HE1-KyqyW3PCvg11k67g0vn1A/
Covers core components, data flow, and key processing steps.

Vocabulary

Session

Sessions allow users to group multiple pipelines that share the same lifecycle and communication channel. A session acts as a logical container that organizes and manages the pipelines involved in an audio processing workflow.

When a session is created, the system does not immediately instantiate the processing engines or allocate runtime resources for the pipelines. Instead, the API stores the configuration parameters that describe the pipelines and their associated modules. This configuration includes the pipeline structure, the module types (producer, modifiers, and consumers), and their respective settings.

By deferring initialization, the system avoids allocating unnecessary resources before they are actually needed. The stored configuration serves as a blueprint that the system can use later to construct the full processing pipeline.

The actual initialization of modules and allocation of processing resources occurs only when the pipeline load operation is requested. At that moment, the system reads the previously stored configuration, instantiates the required modules, validates their parameters, and prepares the pipeline for execution. This separation between configuration and runtime initialization enables more efficient resource management and provides greater flexibility when defining or modifying pipelines before they are executed.

A session provide a WebSocket connection through which we can transfer data in real time. This connection can be used to:

  • Send audio data to pipelines that require streaming input

  • Receive processed audio

  • Receive processing events

  • Receive results (such as recognition output)

  • Receive error notifications

Users can create multiple sessions simultaneously and manage them independently. Sessions must be deleted manually when they are no longer needed.

ASR recognizers, TTS channels, and speech enhancement enhancers, cannot be loaded simultaneously in multiple pipelines.

service_streaming.png

VDK Service - Session mode


Session Lifecycle

A typical session lifecycle includes:

  1. Creating the session with one or more pipelines

  2. Loading and starting pipelines

  3. Streaming audio (if required)

  4. Stopping pipelines

  5. Deleting the session

image.png

Sequence diagram for session using voice synthesis


Pipeline

A pipeline is a processing chain that routes audio from a source to one or more processing endpoints.

State transitions are controlled through the pipeline lifecycle endpoints (load, start, pause, resume, stop, unload).

Each pipeline is composed of three categories of components:

CODE
Producer → Modifiers → Consumers
pipeline_schema.png

VSDK Pipeline

Pipeline states

Each pipeline has an internal life-cycle state:

Screenshot from 2026-03-10 15-41-24-20260310-144124.png
  • Unloaded: The pipeline is not initialized and holds no resources.

  • Loading: Resources are being allocated.

  • Loaded: Resources are ready, but the pipeline is not running.

  • Starting: The pipeline is transitioning to a running state.

  • Running: The pipeline is actively processing the incoming audio from producer.

  • Pausing: The pipeline is transitioning to a paused state.

  • Paused: Processing is temporarily halted.

  • Resuming: The pipeline is transitioning back to running.

  • Stopping: The pipeline is stopping execution.

  • Stopped: The pipeline has stopped but resources may still be allocated.

  • Unloading: Resources are being released.


Producer

Producers generate the initial audio stream that enters the pipeline.

They act as the source of audio data and can either capture, read, synthesize, or receive audio.

Available producers:

Producer Type

Description

AudioRecorder

Captures audio from a recording device such as a microphone.

File

Reads audio data from a file stored on disk.

Stream

Accepts audio streamed from the client through the session WebSocket.

VoiceSynthesis

Generates speech audio from text input using speech synthesis technology.


Modifiers

Modifiers process or transform the audio stream between the producer and consumers.

They are optional and can be chained together to build complex transformations.

Available modifiers:

Modifier Type

Description

ChannelExtractor

Extracts a specific audio channel from multi-channel audio input.

SpeechEnhancement

Improves audio quality by reducing noise or enhancing speech clarity.


Consumers

Consumers represent the endpoints of a pipeline. They receive processed audio data and perform actions such as playback, storage, or analysis.

A pipeline can have multiple consumers, enabling parallel outputs.

Available consumers:

Consumer Type

Description

AudioPlayer

Plays audio through an output device.

File

Writes processed audio to a file.

Stream

Streams processed audio back to the client.

VoiceBiometrics

Performs speaker identification or verification.

VoiceRecognition

Converts speech to text using speech recognition.


Usage

Running the VDK Service

The first step is to ensure that the VDK service is properly configured for your tasks. This means having the appropriate plugins installed (ASR, TTS, Biometrics) and the correct technology configuration (models, voices, enrolled users).

This configuration is done through VDK Studio. Refer to the corresponding documentation for detailed instructions.

Once your service is running, it is recommended to verify that the VDK Service is correctly configured by requesting the list of available models or voices depending on the relying technologies you’ve requested.

vdk_service_verification_step.png

VDK Service - Verification Step

If a request fails or an internal error occurs, the HTTP response will contain a JSON object with an "errors" field describing what went wrong.
We recommend using a dedicated tool such as Postman to interact with and test the API.

Creating a session

A sample project is available in both C# and Python to help you get started quickly. Since the usage depends on your service configuration, you will need to adjust the placeholder configuration included in the sample project to match your setup.

The route used to create a session is the following.

CODE
[POST] /v1/sessions

The request must include a JSON body describing the session configuration with at least one pipeline. In the following example, the producer is an Audio File Producer and the consumer is Voice Recognition (ASR).

The pipeline therefore reads the file and processes it using speech recognition.

JSON
# Request JSON body
{
    "pipelines": {
        "myPipeline": {
            "producer": {
                "type": "File",
                "path": "path/to/myWav.wav",
                "sample_rate": 16000,
                "channels": 1
            },
            "consumers": {
                "myConsumer": {
                    "type": "VoiceRecognition",
                    "recognizer": "myRecognizer",
                    "models": [ "myModel" ],
                    "confidence": 0
                }
            }
        }
    }
}

If your configuration is valid, you should retrieve a token from the response.

JSON
# Response body (OK)
{
    "session_id": "7bc9e16c79-4442-2eac-bffc-22c30769bd"
}

It should be listed in the session list route.

CODE
[GET] /v1/sessions

If you are already familiar with the basic routes of the VDK Service, the process is the same. You connect to the session using the WebSocket API, which serves as the entry point for sending and receiving audio.

Running a pipeline

By default, pipelines are unloaded. You must first load them and then start them.

The loading phase serves two purposes:

  • Resolve potential resource conflicts: when a pipeline is loaded, it reserves the required resources.

  • Keep resources ready for fast restarts, preventing them from being released and reallocated, which would otherwise impact performance and startup time.

CODE
[POST] /v1/sessions/{sessionId}/pipelines/{pipelineId}/load

The request is synchronous, meaning it performs the loading operation and then returns an HTTP response code indicating whether the operation was successful.

If you are already connected to the session through a WebSocket, you will also receive an event on the socket similar to the following.

CODE
{'state': 'Loaded', 'type': 'PipelineStateChanged'}

Once the pipeline is loaded, you can start it. Depending on the consumer, it may stop automatically, and if you’re using a Stream producer, you will have to send the audio through the WebSocket.

CODE
[POST] /v1/sessions/{sessionId}/pipelines/{pipelineId}/start

Sending and receiving audio

You will receive audio on the WebSocket so be sure to be connected to the session.

Since a session can contain multiple pipelines, you must specify which pipeline you are sending audio to. Likewise, the service will indicate from which pipeline each event originates.

JSON
{
  "pipeline": "myPipeline"
  "data": "<base64AudioData>"
  "last": false|true
}

There is also a special case for Acoustic Echo Cancellation (AEC): to send the echo/reference signal, you must stream it through the socket with additional fields:

  • is_reference: Indicates that this audio contains only the echo/reference signal.

  • modifier: The name of the speech enhancement modifier in the pipeline.

JSON
{
  "pipeline": "myPipeline"
  "data": "<base64AudioData>"
  "last": false|true,
  "is_reference": true,
  "modifier": "my_speech_enhancer_name"
}

The WebSocket JSON request format:

JSON Schema
JSON
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "Description of a WebSocket request",
    "type": "object",
    "properties": {
        "pipeline": {
            "description": "The pipeline to target",
            "type": "string"
        },
        "data": {
            "description": "A chunk of audio data to process in base64 audio/pcm",
            "type": "string"
        },
        "last": {
            "description": "Whether the audio chunk is the last one",
            "type": "boolean"
        },
        "modifier": {
            "description": "The modifier to target",
            "type": "string"
        },
        "is_reference": {
            "description": "Is the audio chunk a reference audio chunk",
            "type": "boolean"
        }
    },
    "required": ["pipeline", "data", "last"]
})

Updating a pipeline

Each component of a pipeline may be updated depending on its type. For example, with Speech Recognition, you may want to change the model, or with a File producer, update the file being streamed into the pipeline.

You need identifiers and a corresponding JSON request body.

CODE
[PUT] /v1/sessions/{sessionId}/pipelines/{pipelineId}/producer
[PUT] /v1/sessions/{sessionId}/pipelines/{pipelineId}/modifiers/{modifierId}
[PUT] /v1/sessions/{sessionId}/pipelines/{pipelineId}/consumers/{consumerId}

Additional routes

It is also worth noting that a pipeline can be paused and resumed. You can also dynamically create new pipelines or delete existing ones. Those routes are documented in the HTTP Api → Sessions swagger documentation.

Sample project

To help you get started and experiment with this feature, please refer to our sample project available in the examples.

Configurations

For each component, there are two types of configuration:

  • Creation configuration: used during pipeline creation.

  • Update configuration: applied after the pipeline has been created.

You can update all fields except the type. Some parameters cannot be modified while the pipeline is running. The fields highlighted in blue indicate which parameters can still be updated when the pipeline is running

The available enum values and minimum/maximum constraints are defined in the raw JSON schema below.

Producers

Audio Recorder

Component type: AudioRecorder

For Linux/Windows:

Field

type

Required

Default value

Description

type

string

yes

-

Component type: AudioRecorder

device

string

-

default

Device to record

host_api

string

-

''

Top level grouping for audio devices

sample_rate

enum

-

16000

Sample Rate (Hz)
Values: 16000, 22050

channels

enum

-

1

Channels count (Mono/Stereo)
Values: 1, 2

buffer_size

integer

-

0 (any)

An option to buffer audio in the producer until size is reached

You can find a list of available devices using the routes:

CODE
[GET] /v1/audio/input-devices
Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["AudioRecorder"] },
    "device": { "type": "string", "minLength": 1, "default": "default" },
    "host_api": { "type": "string", "minLength": 1, "default": "" },
    "sample_rate": {"type": "integer","enum": [16000, 22050],"default": 16000},
    "channels": { "type": "integer", "enum": [1, 2], "default": 1 },
    "buffer_size": { "type": "integer", "minimum": 0, "default": 0 }
  },
  "required": ["type"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "device": { "type": "string", "minLength": 1 },
    "host_api": { "type": "string", "minLength": 1 },
    "sample_rate": {"type": "integer","enum": [16000, 22050] },
    "channels": { "type": "integer", "enum": [1, 2] },
    "buffer_size": { "type": "integer", "minimum": 0 }
  }
}
For Android

Field

type

Required

Default value

Description

type

string

yes

-

Component type: AudioRecorder

audio_source

enum

-

VoiceCommunication

Defines where recorded audio comes from (more).

Values: Camcorder, Default, Mic, RemoteSubmix, VoiceCall, VoiceCommunication, VoiceDownlink, VoicePerformance, VoiceRecognition, VoiceUplink

sample_rate

enum

-

16000

Sample Rate (Hz)
Values: 16000, 22050

channels

enum

-

1

Channels count (Mono/Stereo)
Values: 1, 2

buffer_size

integer

-

1024

An option to buffer audio in the producer until size is reached

Raw JSON Schema
  • Creation schema

JSON
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "type": "object",
    "properties": {
    "type": { "type": "string", "enutgm": ["AudioRecorder"] },
    "sample_rate": {"type": "integer","enum": [16000, 22050],"default": 16000},
    "channels": { "type": "integer", "enum": [1, 2], "default": 1 },
    "buffer_size": { "type": "integer", "minimum": 100, "default": 1024 },
    "audio_source": {
        "type": "string",
        "enum": [
        "Camcorder", "Default", "Mic", "RemoteSubmix", "VoiceCall",
        "VoiceCommunication", "VoiceDownlink", "VoicePerformance",
        "VoiceRecognition", "VoiceUplink"
        ],
        "default": "VoiceCommunication"
    }
    },
    "required": ["type"]
}
  • Update schema

JSON
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "description": "Pipeline configuration",
    "type": "object",
    "properties": {
    "sample_rate": {"type": "integer","enum": [16000, 22050] },
    "channels": { "type": "integer", "enum": [1, 2] },
    "buffer_size": { "type": "integer", "minimum": 100 },
    "audio_source": {
        "type": "string",
        "enum": [
        "Camcorder", "Default", "Mic", "RemoteSubmix", "VoiceCall",
        "VoiceCommunication", "VoiceDownlink", "VoicePerformance",
        "VoiceRecognition", "VoiceUplink"
        ]
    }
    }
}

File Producer

Component type: File

Field

type

Required

Default value

Description

type

string

yes

-

Component type: File

path

string

yes

-

Path to the audio file

sample_rate

enum

-

16000

Sample Rate (Hz)
Values: 16000, 22050

channels

enum

-

1

Channels count (Mono/Stereo)
Values: 1, 2

buffer_size

integer

-

1024

An option to buffer audio in the producer until size is reached

acceleration_rate

number

-

1.0

A way to stream audio faster than real time

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["File"] },
    "path": { "type": "string", "minLength": 1 },
    "sample_rate": { "type": "integer", "enum": [16000, 22050], "default": 16000 },
    "channels": { "type": "integer", "enum": [1, 2], "default": 1 },
    "buffer_size": { "type": "integer", "minimum": 100, "default": 1024 },
    "acceleration_rate": { "type": "number", "minimum": 1.0, "default": 1.0 }
  },
  "required": ["type", "path"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "path": { "type": "string", "minLength": 1 },
    "sample_rate": { "type": "integer", "enum": [16000, 22050] },
    "channels": { "type": "integer", "enum": [1, 2] },
    "buffer_size": { "type": "integer", "minimum": 100 },
    "acceleration_rate": { "type": "number", "minimum": 1.0 }
  }
}

Voice Synthesis (TTS)

Component type: VoiceSynthesis

channel, voice and text cannot be updated while synthesizing text.

Field

type

Required

Default value

Description

type

string

yes

-

Component type: VoiceSynthesis

channel

string

yes

-

TTS Channel name

voice

string

yes

-

TTS Voice name

text

string

-

''

Text to synthesize

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["VoiceSynthesis"] },
    "channel": { "type": "string", "minLength": 1 },
    "voice": { "type": "string", "minLength": 1 },
    "text": { "type": "string", "minLength": 1, "default": "" }
  },
  "required": ["type", "channel", "voice"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "channel": { "type": "string", "minLength": 1 },
    "voice": { "type": "string", "minLength": 1 },
    "text": { "type": "string", "minLength": 1 }
  }
}

Stream producer

Component type: Stream

Field

type

Required

Default value

Description

type

string

yes

-

Component type: Stream

sample_rate

enum

-

16000

Sample Rate (Hz)
Values: 16000, 22050

channels

enum

-

1

Channels count (Mono/Stereo)
Values: 1, 2

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["Stream"] },
    "sample_rate": { "type": "integer", "enum": [16000, 22050], "default": 16000 },
    "channels": { "type": "integer", "enum": [1, 2], "default": 1 }
  },
  "required": ["type"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "sample_rate": { "type": "integer", "enum": [16000, 22050] },
    "channels": { "type": "integer", "enum": [1, 2] }
  }
}

Modifiers

Speech Enhancement

Component type: SpeechEnhancement

Field

type

Required

Default value

Description

type

string

yes

-

Component type: SpeechEnhancement

enhancer

string

yes

-

Enhancer name

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["SpeechEnhancement"] },
    "enhancer": { "type": "string", "minLength": 1 }
  },
  "required": ["type", "enhancer"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "enhancer": { "type": "string", "minLength": 1 }
  }
}

Channel Extractor

Component type: ChannelExtractor

Field

type

Required

Default value

Description

type

string

yes

-

Component type

channel_index

integer

-

0

Channel to extract

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["ChannelExtractor"] },
    "channel_index": { "type": "integer", "minimum": 0 , "default": 0 }
  },
  "required": ["type"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "channel_index": { "type": "integer", "minimum": 0 }
  }
}

Consumers

Audio Player

Component type: AudioPlayer

For Linux/Windows:

Field

type

Required

Default value

Description

type

string

yes

-

Component type: AudioPlayer

device

string

-

default

Output device name

host_api

string

-

''

Top level grouping for audio devices

volume

number

-

1.0

Output volume (ranging from 0.0 to 1.0)

You can find a list of available devices using the routes:

CODE
[GET] /v1/audio/output-devices
Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["AudioPlayer"] },
    "device": { "type": "string", "default": "default", "minLength": 1 },
    "host_api": { "type": "string", "minLength": 1, "default": "" },
    "volume": { "type": "number", "minimum": 0.0, "maximum": 1.0, "default": 1.0 }
  },
  "required": ["type"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "device": { "type": "string", "minLength": 1 },
    "host_api": { "type": "string", "minLength": 1 },
    "volume": { "type": "number", "minimum": 0.0, "maximum": 1.0 }
  }
}
For Android:

Field

type

Required

Default value

Description

type

string

yes

-

Component type

content_type

enum

-

Speech

Defines the type of the sound (more)
Values: Unknown, Movie, Music, Sonification, Speech

usage

enum

-

Media

The usage type (more)
Values: Unknown, Media, VoiceCommunication, VoiceCommunicationSignalling, Alarm,
Notification, NotificationRingtone, NotificationCommunicationRequest,
NotificationCommunicationInstant, NotificationCommunicationDelayed,
NotificationEvent, AssistanceAccessibility, AssistanceNavigationGuidance,
AssistanceSonification, Game, Assistant, CallAssistant, Emergency,
Safety, VehicleStatus, Announcement, SpeakerCleanup

volume

number

-

1.0

Output volume (ranging from 0.0 to 1.0)

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["AudioPlayer"] },
    "volume": { "type": "number", "minimum": 0.0, "maximum": 1.0, "default": 1.0 },
    "content_type": { "type": "string", "enum": [
        "Unknown", "Movie", "Music", "Sonification", "Speech"
    ], "default": "Speech" },
    "usage": { "type": "string", "enum": [
      "Unknown", "Media", "VoiceCommunication", "VoiceCommunicationSignalling", "Alarm",
      "Notification", "NotificationRingtone", "NotificationCommunicationRequest",
      "NotificationCommunicationInstant", "NotificationCommunicationDelayed",
      "NotificationEvent", "AssistanceAccessibility", "AssistanceNavigationGuidance",
      "AssistanceSonification", "Game", "Assistant", "CallAssistant", "Emergency",
      "Safety", "VehicleStatus", "Announcement", "SpeakerCleanup"
    ], "default": "Media" }
  },
  "required": ["type"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "content_type": { "type": "string", "enum": [
        "Unknown", "Movie", "Music", "Sonification", "Speech"
    ]},
    "usage": { "type": "string", "enum": [
      "Unknown", "Media", "VoiceCommunication", "VoiceCommunicationSignalling", "Alarm",
      "Notification", "NotificationRingtone", "NotificationCommunicationRequest",
      "NotificationCommunicationInstant", "NotificationCommunicationDelayed",
      "NotificationEvent", "AssistanceAccessibility", "AssistanceNavigationGuidance",
      "AssistanceSonification", "Game", "Assistant", "CallAssistant", "Emergency",
      "Safety", "VehicleStatus", "Announcement", "SpeakerCleanup"
    ]},
    "volume": { "type": "number", "minimum": 0.0, "maximum": 1.0 }
  }
}

File Consumer

Component type: File

Field

type

Required

Default value

Description

type

string

yes

-

Component type: File

path

string

yes

-

Path to the audio file

truncate

boolean

-

true

Overwrite existing

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["File"] },
    "path": { "type": "string", "minLength": 1 },
    "truncate": { "type": "boolean", "default": true }
  },
  "required": ["type", "path"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "path": { "type": "string", "minLength": 1 },
    "truncate": { "type": "boolean" }
  }
}

Stream Consumer

Component type: Stream

Field

type

Required

Default value

Description

type

string

yes

-

Component type: Stream

buffer_size

integer

-

0 (any)

If set, buffers until the size reaches the threshold or the last buffer is received

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["Stream"] },
    "buffer_size": { "type": "integer", "minimum": 0, "default": 0 }
  },
  "required": ["type"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "buffer_size": { "type": "integer", "minimum": 0 }
  }
}

Voice Biometrics

Component type: VoiceBiometrics

Field

type

Required

Default value

Description

type

string

yes

-

Component type: VoiceBiometrics

model_name

string

yes

-

Biometrics model name

model_type

enum

yes

-

Values: TextDependent, TextIndependent

mode

enum

yes

-

Values: Authentication, Identification

username

string

-

-

Required for authentication

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "enum": ["VoiceBiometrics"] },
    "model_name": { "type": "string", "minLength": 1 },
    "model_type": { "type": "string", "enum": ["TextDependent", "TextIndependent"] },
    "mode": { "type": "string", "enum": ["Authentication", "Identification"] },
    "username": { "type": "string", "minLength": 1 }
  },
  "required": ["type", "model_name", "model_type", "mode"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "model_name": { "type": "string", "minLength": 1 },
    "model_type": { "type": "string", "enum": ["TextDependent", "TextIndependent"] },
    "mode": { "type": "string", "enum": ["Authentication", "Identification"] },
    "username": { "type": "string", "minLength": 1 }
  }
}

Voice Recognition

Component type: VoiceRecognition

Field

type

Required

Default value

Description

type

string

yes

-

Component type: VoiceRecognition

recognizer

string

yes

-

Recognizer name

models

array

yes

-

ASR models to run

confidence

integer

-

0

Minimum confidence threshold for results

stop_at_first_result

boolean

-

false

Stop pipeline at first result

models_settings

object

ASR models configuration object

vec

object

-

-

Post-processor VEC configuration object

start_time

integer

-

-

Is available only in update to set the model(s).

models_settings.<model_name>

Field

type

Required

Default value

Description

user

string

-

-

Optional UserWord user

slots

object

-

-

Slot configuration for the model

models_settings.<model_name>.slots.<slot_name>

Field

type

Required

Default value

Description

values

array

-

-

List of values for the slot

Raw JSON Schema
  • Creation schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "type": { "type": "string", "enum": ["VoiceRecognition"] },
    "recognizer": { "type": "string", "minLength": 1 },
    "vec": {
      "description": "Post processing configuration.",
      "type": "object",
      "properties": {
        "accent": {
          "description": "The accent to use for VEC.",
          "type": "string",
          "default": ""
        },
        "context": {
          "description": "The context to use for VEC.",
          "type": "array",
          "items": { "type": "string" }
        }
      }
    },
    "models": { "type": "array", "minLength": 1, "items": { "type": "string" } },
    "confidence": { "type": "integer", "minimum": 0, "maximum": 10000, "default": 0 },
    "stop_at_first_result": { "type": "boolean", "default": false },
    "models_settings": {
      "type": "object",
      "properties": {
        "user": { "type": "string", "minLength": 1 },
        "slots": {
          "type": "object",
          "properties": {
            "slot": {
              "type": "object",
              "properties": {
                "values": { "type": "array", "items": { "type": "string" } }
              }
            }
          }
        }
      }
    }
  },
  "required": ["type", "recognizer", "models"]
}
  • Update schema

JSON
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "description": "Pipeline configuration",
  "type": "object",
  "properties": {
    "recognizer": { "type": "string", "minLength": 1 },
    "vec": {
      "description": "Post processing configuration.",
      "type": "object",
      "properties": {
        "accent": {
          "description": "The accent to use for VEC.",
          "type": "string",
          "default": ""
        },
        "context": {
          "description": "The context to use for VEC.",
          "type": "array",
          "items": { "type": "string" }
        }
      }
    },
    "start_time": { "type": "integer" },
    "models": { "type": "array", "minLength": 1, "items": { "type": "string" } },
    "confidence": { "type": "integer", "minimum": 0, "maximum": 10000 },
    "stop_at_first_result": { "type": "boolean" },
    "models_settings": {
      "type": "object",
      "properties": {
        "user": { "type": "string", "minLength": 1 },
        "slots": {
          "type": "object",
          "properties": {
            "slot": {
              "type": "object",
              "properties": {
                "values": { "type": "array", "items": { "type": "string" } }
              }
            }
          }
        }
      }
    }
  }
}

A complete configuration example is provided below for clarity.

JSON
{
    "type": "VoiceRecognition",
    "recognizer": "rec_eng-US",
    "models": [ "csdk-dynamic-drinks" ],
    "confidence": 0,
    "stop_at_first_result": false,
    "models_settings": {
        "csdk-dynamic-drinks": {
            "slots": {
                "drink": {
                    "values": [ "coffee", "tea" ]
                }
            },
            "user": "emmanuel"
        }
    },
    "vec": {
        "context": ["a 1 3", "b 1 4"],
        "accent": "eng"
    }
}

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.