Besim Alibegovic 1a7c5323a9 Merge pull request #9 from ahmetoner/language-detection
add language detection endpoint
2022-09-25 18:29:13 +02:00
2022-09-23 17:23:04 +02:00
2022-09-23 15:23:57 +02:00
2022-09-23 19:48:33 +02:00
2022-09-23 16:45:36 +02:00
2022-09-25 15:53:50 +02:00

Whisper ASR Webservice

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. For more details: github.com/openai/whisper

Run (Development Environment)

Enable venv:

python3.9 -m venv venv
source venv/bin/activate

Install poetry with following command:

pip3 install poetry==1.2.0

Install packages:

poetry install

Starting the Webservice:

poetry run whisper_asr

Quick start

After running the docker image or poetry run whisper_asr interactive Swagger API documentation is available at localhost:9000/docs

There are two endpoints available:

  • /asr
  • /detect-language

Automatic Speech recognition service /asr

If you choose the transcribe task, transcribes the uploaded sound file. You can provide the language or it will be automatically recognized. If you choose the translate task it will provide an English transcript no matter which language was spoken.

Returns a json with following fields:

  • text : Contains the full transcript
  • segments : Contains an entry per segment. Each entry provides time stamps, transcript, token ids and other metadata
  • language: detected or provided language (as a language code)

Language detection service /detect-language

Detects the language spoken in the uploaded sound file. For longer files it only processes first 30 seconds.

Returns a json with following fields:

  • detected_language
  • langauge_code

Build

Run

poetry build

Configuring the Model

export ASR_MODEL=base

Docker

Build Image

docker build -t whisper-asr-webservice .

Run Container

docker run -d -p 9000:9000 whisper-asr-webservice
# or
docker run -d -p 9000:9000 -e ASR_MODEL=base whisper-asr-webservice

TODO

  • Detailed README file
  • Github pipeline
  • Unit tests
  • CUDA version of Docker image
  • Hosted Swagger documentation with descriptions
Description
Languages
Python 89.9%
Dockerfile 10.1%