OpenAI has definitely paved the way for developers to create AI tools with their APIs like ChatGPT, DALL-E and Whisper. Now, an optimized implementation of its Whisper model known as Whisper JAX has been touted as the fastest text-to-speech API on the market.
Early last month in a blog post, OpenAI revealed the availability of both Whisper and ChatGPT on their APIs, which would improve the development of AI tools and provide another way to prompt commands, other than text-only.
Developers have already made sure to take advantage of the tool, with AI Breakfast noting on Twitter that OpenAI’s speech-to-text API “Whisper” got a major upgrade.
Also read: Generative AI Arms Race Heats Up as Meta Get Building
“OpenAI’s speech-to-text API Whisper just got supercharged: This tool transcribes audio 70x faster than Whisper. A 2-hour podcast can now be transcribed in 30 seconds using Whisper JAX: The Fastest Whisper API,” posted AI Breakfast.
On their blog from September 2022, OpenAI described Whisper as an automatic speech recognition (ASR) system trained using about 680,000 hours of multilingual and multitask supervised data. The benefits of training Whisper using different languages is that it can understand different dialects and has the ability to translate other languages into English.
What is Whisper JAX?
Whisper JAX is essentially an optimized implementation of OpenAI’s Whisper model, which runs on JAX with a TPU v8-8 in the backend. Compared to PyTorch on an A100 GPU, it is over 70 times faster, making it the fastest Whisper API currently available.
The latest version, which has already won the hearts of users due to its speed, is also available to developers. Those keen to skip the waiting line on the website can implement the code on their own from the repository.
Life is too short to wait for slow transcription models 🥱
That's why we've made Whisper **70x faster**
Whisper JAX ⚡️ is a highly optimised Whisper implementation for both GPU and TPU
Try it here: https://t.co/JaROauBaJc
And transcribe a 1 hour of audio in under 15 seconds! pic.twitter.com/g6jzhE8TRp
— Sanchit Gandhi (@sanchitgandhi99) April 20, 2023
One Twitter user shared how the tool was able to understand Arabic, even though they had not used the formal Arabic dialect.
“I just tried it with Arabic. Recognizing the words doesn’t compete with what I have on gboard. I had to do another take with a slower pace and stressing on letters. But I spoke not in formal Arabic and it understood me and transcribed correctly (the second time),” said Father Of Sarah.
The Whisper JAX tool has already made a huge impact in the tech community, with users who have tested the tool testifying to its highly favorable speeds as compared to what’s available elsewhere.
AI Breakfast compared Whisper to Otter, which is another speech-to-text tool available on iOS and Android platforms. This was after one user asked if Whisper would be a good fit for transcribing lecture notes and other study materials.
“Probably one of the best uses for it. Otter is a speech-to-text app on iOS and Android that segments speakers from recorded audio and is the best consumer version available imo,” said AI Breakfast.
Trying Whisper JAX
To try Whisper JAX for yourself, visit the website and select the input format, then either record from microphone or upload an audio file/paste a YouTube link. Given that it’s still a demo platform, you may face queues during busy spells.
According to the website, upon submitting a request your queue will be displayed in the demo pane. Once you reach the front, your audio file will be transcribed, with the progress displayed on a progress bar.
To skip the queue, you may wish to create your own inference endpoint, details for which can be found in the Whisper JAX repository.
While the tool has been hyped particularly for its speed, there are other users who have tried it and believe more still needs to be done to improve – especially in terms of accuracy.
What model is this using? becasue after making different tests, the accuracy was downgraded. I hope this can be fixed in next updates, or else, I prefer to stay with WhisperX which is not so fast, but the accuracy is better.
— Jimmy Neutrino (@_JimmyNeutrino) April 24, 2023