Skip to main content
Skip table of contents

Speech Recognition (ASR)

Automatic Speech Recognition (ASR) is the technology that converts spoken language into written text.

ASR is especially valuable in environments where:

  • Speed matters — speaking is often faster than typing.

  • Mobility is essential — workers need to move freely without relying on a screen or keyboard.

  • Hygiene or security is a concern — touchless interaction improves safety in sterile or controlled environments such as healthcare.

  • Hands or eyes are already occupied — for example during warehouse picking, field inspections, or surgical procedures.

Key features

Multilingual Support:
Supports 41+ languages and regional dialects for global coverage.

Offline Operation:
Fully offline capability ensures privacy and continuous functionality without an internet connection.

Efficient Performance:
Optimized for low-power embedded devices, maintaining high accuracy with minimal resource usage.

Confidence Scoring:
Delivers confidence scores and interim results for accurate post-processing and effective error correction.

Real-Time Recognition:
Low-latency engine ideal for interactive applications.

Performance

We’ve built our solutions on best-in-class technology to ensure high accuracy and responsiveness, even under challenging conditions.

Optimized for Verticals

The core verticals we serve, including:

  • Logistics (e.g., voice picking and inventory control)

  • Field Services (e.g., maintenance reporting, diagnostics)

  • Healthcare (e.g., hands-free device control)

Adaptability for Other Applications

For use cases outside our primary focus areas, our ASR engine still offers high performance, but accuracy may vary depending on the environment, speech patterns, and complexity of interaction. In such cases, we recommend early-stage testing to ensure optimal results.

Recognition Modes

ASR technology offers three primary modes to serve different application scenarios:

  • Grammar-Based Recognition
    Designed for applications where commands or specific phrases are predefined. The grammar-based recognizer uses a custom grammar to match spoken input exactly. This mode is ideal for scenarios where precision and control are crucial, such as voice picking commands or industrial interfaces.

  • Continuous Recognition
    Continuous mode processes unrestricted speech in real time. This mode is best for applications requiring fluid conversation or dictation.

  • Dictation
    Dictation takes continuous recognition a step further by considering punctuation and other linguistic nuances. This mode is ideal for tasks such as taking notes and writing emails or documents where punctuation and other linguistic nuances are important.

Choosing Between Continuous and Grammar-Based

Choose Grammar-Based if:

  • The expected input is limited to a set of predefined commands.

  • Accuracy and reliability are highly important.

  • Use cases include industrial workflows, voice picking, or command-based interfaces where users are trained to say specific terms.

Choose Continuous if:

  • The user is expected to speak in free-flowing language.

  • Your application must understand varied and unpredictable input.

  • You're building virtual assistants, conversational interfaces, or dictation tools where flexibility and user experience are key.

For hybrid needs, these modes can be combined or switched dynamically depending on the context.

Customization Capabilities

Custom Grammar Support

Users can create and upload their own grammars to define exactly what the ASR engine should recognize. This is ideal for structured commands, controlled vocabularies, or industry-specific jargon. Grammars can be designed using our intuitive tools and documentation.

How-to: Create your grammar

Model Training

At this stage, users cannot train entirely new ASR models. However, the system allows for extensive customization via grammars, lexicons, and context adaptation to significantly improve recognition for specific use cases.

Speaker Adaptation

Speaker adaptation will be introduced soon, especially for extreme cases involving heavily accented or non-standard speech. This will help improve recognition performance in diverse workforce scenarios. First version will be available for Grammar-based ASR.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.