Title
GitHub - petewarden/spchcat: Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.
Go Home
Category
Description
Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi. - GitHub - petewarden/spchcat: Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.
Address
Phone Number
+1 609-831-2326 (US) | Message me
Site Icon
GitHub - petewarden/spchcat: Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.
Tags
to,
Page Views
0
Share
Update Time
2022-10-09 16:23:22

"I love GitHub - petewarden/spchcat: Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi."

www.speechcat.org VS www.gqak.com

2022-10-09 16:23:22

Skip to content Toggle navigation Signup Product Actions Automate any workflow Packages Host and manage packages Security Find and fix vulnerabilities Codespaces Instant dev environments Copilot Write better code with AI Code review Manage code changes Issues Plan and track work Discussions Collaborate outside of code Explore All features Documentation GitHub Skills Blog Solutions By Plan Enterprise Teams Compare all By Solution CI/CD & Automation DevOps DevSecOps Case Studies Customer Stories Resources Open Source GitHub Sponsors Fund open source developers The ReadME Project GitHub community articles Repositories Topics Trending Collections Pricing Sign in Sign up {{ message }} petewarden / spchcat Public Notifications Fork 20 Star 247 Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi. License MPL-2.0 license 247 stars 20 forks Star Notifications Code Issues 7 Pull requests 4 Actions Projects 0 Security Insights More Code Issues Pull requests Actions Projects Security Insights petewarden/spchcat This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags 1 branch 3 tags Code Clone HTTPS GitHub CLI Use Git or checkout with SVN using the web URL. Work fast with our official CLI. Learn more. Open with GitHub Desktop Download ZIP Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit petewarden Fixed logging type problem … cb668e7 Mar 3, 2022 Fixed logging type problem cb668e7 Git stats 60 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time .vscode Initial support for live inputs Feb 10, 2022 notebooks Fix for missing sox library Jan 4, 2022 scripts Fixed architecture Jan 4, 2022 src Fixed logging type problem Mar 3, 2022 .editorconfig Fixes Dec 31, 2021 .gitignore Initial support for live inputs Feb 10, 2022 LICENSE Initial import Dec 30, 2021 Makefile Fixes for interactive printing Feb 11, 2022 README.md Added gif link Jan 6, 2022 definitions.mk Build fixes Jan 4, 2022 View code Spchcat Description Installation x86 Raspberry Pi Usage System Audio WAV Files Language Support Saving Output Build from Source Tool Models Installer Release Process Contributors License README.md SpchcatSpeech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.Descriptionspchcat is a command-line tool that reads in audio from .WAV files, a microphone, or system audio inputs and converts any speech found into text. It runs locally on your machine, with no web API calls or network activity, and is open source. It is built on top of Coqui's speech to text library, TensorFlow, KenLM, and data from Mozilla's Common Voice project.It supports multiple languages thanks to Coqui's library of models. The accuracy of the recognized text will vary widely depending on the language, since some have only small amounts of training data. You can help improve future models by contributing your voice.Installationx86On Debian-based x86 Linux systems like Ubuntu you should be able to install the latest .deb package by downloading and double-clicking it. Other distributions are currently unsupported. The tool requires PulseAudio, which is already present on most desktop systems, but can be installed manually.There's a notebook you can run in Colab at notebooks/install.ipynb that shows all installation steps.Raspberry PiTo install on a Raspberry Pi, download the latest .deb installer package and either double-click on it from the desktop, or run dpkg -i ~/Downloads/spchcat_0.0-2_armhf.deb from the terminal. It will take several minutes to unpack all the language files. This version has only been tested on the latest release of Raspbian, released October 30th 2021, and on a Raspberry Pi 4. It's expected to fail on Raspberry Pi 1's and 0's, due to their CPU architecture.UsageAfter installation, you should be able to run it with no arguments to start capturing audio from the default microphone source, with the results output to the terminal:spchcatAfter you've run the command, start speaking, and you should see the words you're saying appear. The speech recognition is still a work in progress, and the accuracy will depend a lot on the noise levels, your accent, and the complexity of the words, but hopefully you should see something close enough to be useful for simple note taking or other purposes.System AudioIf you don't have a microphone attached, or want to transcribe audio coming from another program, you can set the --source argument to 'system'. This will attempt to listen to the audio that your machine is playing, including any videos or songs, and transcribe any speech found.spchcat --source=systemWAV FilesOne of the most common audio file formats is WAV. If you don't have any to test with, you can download Coqui's test set to try this option out. If you need to convert files from another format like '.mp3', I recommend using FFMPeg. As with the other source options, spchcat will attempt to find any speech in the files and convert it into a transcript. You don't have to explicitly set the --source argument, as long as file names are present on the command line that will be the default.spchcat audio/8455-210777-0068.wav If you're using the audio file from the test set, you should see output like the following:TensorFlow: v2.3.0-14-g4bdd3955115 Coqui STT: v1.1.0-0-gf3605e23your power is sufficient i said You can also specify a folder instead of a single filename, and all .wav files within that directory will be transcribed.Language SupportSo far this documentation has assumed you're using American English, but the tool will default to looking for the language your system has been configured to use. It first looks for the one specified in the LANG environment variable. If no model for that language is found, it will default back to 'en_US'. You can override this by setting the --language argument on the command line, for example:spchcat --language=de_DEThis works independently of --source and other options, so you can transcribe microphone, system audio, or files in any of the supported languages. It should be noted that some languages have very small amounts of data and so their quality may suffer. If you don't care about country-specific variants, you can also just specify the language part of the code, for example --language=en. This will pick any model that supports the language, regardless of country. The same thing happens if a particular language and country pair isn't found, it will log a warning and fall back to any country that supports the language. For example, if 'en_GB' is specified but only 'en_US' is present, 'en_US' will be used.Language NameCodeam_ETAmharicbn_INBengalibr_FRBretonca_ESCatalancnh_MMHakha-Chincs_CZCzechcv_RUChuvashcy_GBWelshde_DEGermandv_MVDhivehiel_GRGreeken_USEnglishet_EEEstonianeu_ESBasquefi_FIFinnishfr_FRFrenchfy_NLFrisianga_IEIrishhu_HUHungarianid_IDIndonesianit_ITItalianka_GEGeorgianky_KGKyrgyzlg_UGLugandalt_LTLithuanianlv_LVLatvianmn_MNMongolianmt_MTMaltesenl_NLDutchor_INOdiapt_PTPortugueserm_CHRomansh-Sursilvanro_RORomanianru_RURussianrw_RWKinyarwandasah_RUSakhasb_DEUpper-Sorbiansl_SISloveniansw_KESwahili-Congota_INTamilth_THThaitr_TRTurkishtt_RUTataruk_UKUkrainianwo_SNWolofyo_NGYorubaAll of these models have been collected by Coqui, and contributed by organizations like Inclusive Technology for Marginalized Languages or individuals. All are using the conventions for Coqui's STT library, so custom models could potentially be used, but training and deployment of those is outside the scope of this document. The models themselves are provided under a variety of open source licenses, which can be inspected in their source folders (typically inside /etc/spchcat/models/).Saving OutputBy default spchcat writes any recognized text to the terminal, but it's designed to behave like a normal Unix command-line tool, so it can also be written to a file using indirection like this:spchcat audio/8455-210777-0068.wav > /tmp/transcript.txtIf you then run cat /tmp/transcript.txt (or open it in an editor) you should see `your power is sufficient i said'. You can also pipe the output to another command. Unfortunately you can't pipe audio into the tool from another executable, since pipes aren't designed for non-text data.There is one subtle difference between writing to a file and to the terminal. The transcription itself can take some time to settle into a final form, especially when waiting for long words to finish, so when it's being run live in a terminal you'll often see the last couple of words change. This isn't useful when writing to a file, so instead the output is finalized before it's written. This can introduce a small delay when writing live microphone or system audio input.Build from SourceToolIt's possible to build all dependencies from source, but I recommending downloading binary versions of Coqui's STT, TensorFlow Lite, and KenLM libraries from github.com/coqui-ai/STT/releases/download/v1.1.0/native_client.tflite.Linux.tar.xz. Extract this to a folder, and then from inside a folder containing this repo run to build the spchcat tool itself:make spchcat LINK_PATH_STT=-L../STT_downloadYou should replace ../STT_download with the path to the Coqui library folder. After this you should see a spchcat executable binary in the repo folder. Because it relies on shared libraries, you'll need to specify a path to these too using LD_LIBRARY_PATH unless you have copies in system folders.LD_LIBRARY_PATH=../STT_download ./spchcatModelsThe previous step only built the executable binary itself, but for the complete tool you also need data files for each language. If you have the gh GitHub command line tool you can run the download_models.py script to fetch Coqui's releases into the build/models folder in your local repo. You can then run your locally-built tool against these models using the --languages_dir option:LD_LIBRARY_PATH=../STT_download ./spchcat --languages_dir=build/models/InstallerAfter you have the tool built and the model data downloaded, create_deb_package.sh will attempt to package them into a Debian installer archive. It will take several minutes to run, and the result ends up in spchcat_0.0-2_amd64.deb.Release ProcessThere's a notebook at notebooks/build.pynb that runs through all the build steps needed to downloaded dependencies, data, build the executable, and create the final package. These steps are run inside an Ubuntu 18.04 Docker image to create the binaries that are released.sudo docker run -it -v`pwd`:/spchcat ubuntu:bionic bashContributorsTool code written by Pete Warden, [email protected], heavily based on Coqui's STT example. It's a pretty thin wrapper on top of Coqui's speech to text library, so the Coqui team should get credit for their amazing work. Also relies on TensorFlow, KenLM, data from Mozilla's Common Voice project, and all the contributors to Coqui's model zoo.LicenseTool code is licensed under the Mozilla Public License Version 2.0, see LICENSE in this folder.All other libraries and model data are released under their own licenses, see the relevant folders for more details. About Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi. Topics linux raspberry-pi speech-recognition Resources Readme License MPL-2.0 license Stars 247 starsWatchers 8 watchingForks 20 forks Releases 3 tags Packages 0 No packages published Languages C 56.5% Jupyter Notebook 28.9% Makefile 7.1% C++ 5.1% Shell 1.9% Python 0.5% Footer © 2022 GitHub, Inc. Footer navigation Terms Privacy Security Status Docs Contact GitHub Pricing API Training Blog About You can’t perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.