Today we would like to talk about open source speech recognition software.
Speech recognition is a technology that allows converting audio into written text data that can be used later for any purpose. You could use it for example for:
Writing articles and content with your voice.
Creating transcripts of a meeting you had with your colleagues.
Converting a podcast into a written article.
Helping blind users to listen to any written text you have.
Much more.
One of our most read articles of all time on FOSS Post is the “Top 10 Open Source Speech Recognition/Speech-to-Text Systems” article, but in this article we do not present ready programs for end-users. Instead, we present libraries and toolkits that can be used to develop these programs.
Unfortunately, it has come to our attention that there doesn’t exist any open source speech recognition programs for the desktop. By “programs” we mean finished and ready GUI applications that can be easily used to transcribe any audio file to text, without requiring the user to know programming or how to train speech recognition models.
There are some command line tools like nerd-dictation which employs the Vosk API, but we couldn’t find anything other than that.
What are the possible reasons for that?
Perhaps the interest in this use case is not much? Doesn’t sound so, we receive hundreds of visitors to our article (which currently ranks #1 in Google results) from people searching about “open source speech recognition”.
Maybe it’s hard developing such a program? Well, there are tens of libraries and toolkits that we have mentioned before, and many of them already ship the language models as well. So the only remaining effort is to simply create a GUI app, which shouldn’t be hard.
If you are a desktop app developer and currently have some free time for a hobby open source project, then perhaps this idea could be of use to you! Simply use any of the speech toolkits we mentioned as your underlying engine, and build a simple GTK/Qt/Electron app that allows the user to choose an .mp3/wav file or record audio to transcribe it to text.
If you know an actual open source desktop program that enables speech recognition for end users, then please let us know in the comments below. Perhaps we overlooked it.
New Articles on FOSS Post
Here are some articles which were published on our website since the last newsletter:
5 Ways a Linux User Can Make Use of ChatGPT: We talk about the new ChatGPT bot from OpenAI, and how a Linux user can find a good use of it. An example on our creative writing (*cough* humble down *cough*).
Bypassing Government Censorship Around NordVPN on Linux: If you are a NordVPN user on Linux and you live under a bad government that blocks access to VPN services, then use this guide to bypass the censorship.
Last Week Open Source News
We follow open source news from all over the Internet so you don’t have to.
Cosmic DE is a new desktop environment under development written in Rust programming language from System76. In this blog post, their share updates on their progress so far.
OnlyOffice is an open source office suite that serves as a good alternative to Microsoft Office (it even looks similar in terms of UI). Version 7.3 of it was released this week with tons of updates.
Version 7.5 of the famous LibreOffice suite was released with many changes and bug fixes.
In the upcoming Ubuntu 23.04, Telegram will become a Snap package. A very sad change.
Version 8.0 if the not-a-Windows-Emulator Wine has been released. It brings us a step closer to being able to run Windows 32-bit applications on Linux 64-bit hosts without needing to install 32-bit libraries.
Interesting Stuff from the Web
These discussions and articles might be interesting for you, all related to either Linux or open source software:
Hardware benchmarks in gaming still show that Xorg performs sometimes better than Wayland in performance. So if you are a Linux gamer, perhaps you want to stick to Xorg for the moment.
Benchmarks have also shown that turning off security mitigations for Intel CPUs (Skylake architecture specifically) may give you an extra 16% in performance. This means that the security patches applied for Intel CPU drivers in the last 5 years since the discovery of Spectre and Meltdown hardware vulnerabilities are greatly impacting desktop performance.
Source code for the operating system of Apple Lisa computer from 1984 is now open source.
Many open source people were affected by the last lay-off by Google. These people were key to many Google contributions to the open source world, and laying them off could mean a setback in the company’s commitment toward open source software.
The EU hosted a summit about its policy towards open source software. The summit is now finished and we are still waiting for video uploads for the discussions that occurred.
Talking about conferences… FOSDEM 2023 is now live streaming.
Mark your Pledge
So here ends the first newsletter sent to you from FOSS Post Substack! Hope you liked it.
Kindly don’t forget to share this digest with your friends and relatives.
Additionally, if you really like this content and would like us to continue making it, you can use the “pledge” feature in Substack to mark your financial support.
You won’t be charged instantly; but it will allow us to know how many people are willing to support us if we continue making such newsletters and increase its quality content in the future.
Until our next digest… Goodbye!
Checkout Buzz, a GUI for whisper https://github.com/chidiwilliams/buzz