whisper : add option to speed up the audio tempo by x2

Using a Phase Vocoder for speeding up the audio tempo by scaling down the frequencies in the frequency domain. This reduces the computation in the Encoder by a factor of 2. The transcription accuracy is degraded, but for slow to normal speech - it seems to be still very good. I think this can find application for real-time transcription - i.e. the "stream" example.
2023-11-04 02:52:44 +03:00 · 2022-11-12 18:03:49 +02:00
parent 41b48ab7f1
commit 83c742f1a7
4 changed files with 64 additions and 10 deletions
--- a/whisper.h
+++ b/whisper.h
@@ -202,6 +202,9 @@ extern "C" {
        float thold_ptsum;      // timestamp token sum probability threshold (~0.01)
        int   max_len;          // max segment length in characters

+        // [EXPERIMENTAL] speed-up techniques
+        bool speed_up; // speed-up the audio by 2x using Phase Vocoder
+
        const char * language;

        struct {