7

I'm looking for an offline speech recognition software for Linux which can handle also German language and which is easy to use and configure.

I already tried CMU Sphinx and a few more others, but all of them had one in common: they have been way too complicated to install/use, mainly because of lack of a good manual and also because of a very crude concept (I try to avoid the word "usability" in this context).

So...is there a speech recognition software out there which can be set-up and configured in finite time, is able to execute scripts on recognised commands and works fully offline, means does not need a cloud service or remote server to analyse spoken words? I'm also willing to pay money for a working and usable solution!

Every hint and idea is welcome!

Thanks!

PS: I'm aware of the thread Is there any decent speech recognition software for Linux? - but the answers given there do NOT point to offline solutions!

AdminBee
  • 22,803

3 Answers3

2

It's worth keeping an eye on what Michael Sheldon is doing: http://blog.mikeasoft.com/2017/12/30/speech-recognition-mozillas-deepspeech-gstreamer-and-ibus/

Caveat: it is not yet of any practical use, in my opinion. BUT... after struggling and struggling to configure things I was eventually able to get recognition of spoken words (in English... I have no idea about German).

Mike Sheldon is using the DeepSpeech model from Mozilla, which sounds good.

The comments on that page (my comment no. 100 was when I managed to get some speech recognition) seem to have stopped in July 2018. I have no idea whether he's still working on it.

mike rodent
  • 1,132
2

Try nerd-dictation (demo video).

I ran into the same problem and ended up writing my own tool, while it has some opinionated decisions I find it generally works well for basic dictation needs (based on the excellent VOSK-API).

ideasman42
  • 1,211
0

A post I created recently had some of this information answered in a little more detail (credit to geb and adabru for some of the information below) which may be helpful to read, bookmark and check back for updates: Eye Gaze Tracking With Head Tracking Solutions On Linux

One of the more productive and easier options to set up according to adabru, https://handsfreecoding.org/ and many others I've come across online: https://talonvoice.com

Appears to work offline for analysing spoken words (see 7. Privacy): https://talonvoice.com/EULA.txt

You can use the Vosk engine in Talon for German support if you pay $25/month, at the time of writing this, for the Beta version (see Vosk and the Talon community wiki for languages supported):

https://alphacephei.com/vosk/

https://talon.wiki/speech_engines/

https://talon.wiki/faq/#are-languages-other-than-english-supported

There is also a free version of Talon but keep in mind that Talon isn't all open source code.

I would give Numen a hard look. It's free and open source software that uses Vosk which supports German. Looks like a very good option if you primarily use keyboard-centric programs (some are listed in the link): https://git.sr.ht/%7Egeb/numen

There may be other Vosk projects that suit your needs at: https://alphacephei.com/vosk/integrations

You can use Dragon with Talon but Dragon is native to Windows. So as far as I know, you would likely need a Linux virtual machine in Windows or have to use Cygwin in Windows (see https://handsfreecoding.org/using-dragon-with-linux). Probably not what you're looking for, but Dragon supports German and I think I remember Nuance told me Dragon works offline for analysing spoken words (I would double check this). You could also use Dragon with Dragonfly, which is mentioned at https://handsfreecoding.org/. Dragon is going to cost you about $300-$500 (see https://talon.wiki/speech_engines/) and it's proprietary. I personally wouldn't recommend Dragon from my experience with it and it wouldn't be my first consideration.