Features

iApp Whisper ASR engine Version 3.0 is an AI-powered tool for fast automatic speech recognition (70 times real-time) on Thai and English language with parallel processing achieve 98.5% accuracy rate on low-noise audios. It also supports automatic speaker diarization. You can experience its capabilities at https://ai.iapp.co.th/product/speech_to_text_asr

iApp Whisper ASR Engine is built upon OpenAI Whisper, additionally trained on a diverse dataset of audio, including both Thai and English over 500,000 transcription pairs. Our models can provide the voice activity detection feature, accurate speech timing information, speaker diarization and support parallel processing.

Features

🇹🇭đŸ‡Ŧ🇧 Supports Thai / English languages

đŸŽ¯ Very high accuracy, tested on a test set of 10,000 audio files from Common Voice 11, spoken by over 1,000 people, including male, female, and child voices, resulting in an average Word Error Rate (WER) of 1.5% (on single-speaker audio files without background noise)

⚡ī¸ High-speed transcription at 70 times real-time

đŸĒļ Faster backend system with parallelizing the answer selection process into 5 simultaneous processes

đŸŽ¯ Accurate word-level timestamps using audio-to-text alignment (wav2vec2 alignment)

đŸ‘¯â€â™‚ī¸ Multi-speaker recognition using speaker diarization from pyannote-audio (with labels for each speaker)

đŸ—Ŗī¸ Voice Activity Detection (VAD) pre-processing helps reduce false predictions and process batches without compromising accuracy

đŸ—Ŗī¸ Phoneme-based ASR uses an improved set of models to distinguish the smallest units of sound in speech, helping to differentiate one word from another, such as the 'p' sound in "tap"

đŸ—Ŗī¸ Supports both gRPC streaming and file upload with REST API

Last updated