whisper : add option to speed up the audio tempo by x2

Using a Phase Vocoder for speeding up the audio tempo by scaling down
the frequencies in the frequency domain.

This reduces the computation in the Encoder by a factor of 2.
The transcription accuracy is degraded, but for slow to normal speech -
it seems to be still very good.

I think this can find application for real-time transcription - i.e. the
"stream" example.
This commit is contained in:
Georgi Gerganov
2022-11-12 18:03:49 +02:00
parent 41b48ab7f1
commit 83c742f1a7
4 changed files with 64 additions and 10 deletions

View File

@@ -202,6 +202,9 @@ extern "C" {
float thold_ptsum; // timestamp token sum probability threshold (~0.01)
int max_len; // max segment length in characters
// [EXPERIMENTAL] speed-up techniques
bool speed_up; // speed-up the audio by 2x using Phase Vocoder
const char * language;
struct {