Cmu Sphinx Configuration File
Robust Group Tutorial Carnegie Mellon University 's Open Source Tutorial Learning to use the CMU SPHINX Automatic Speech Recognition system. Introduction In this tutorial, you will learn to handle a complete state-of-the-art HMM-based speech recognition system. The system you will use is the SPHINX system, designed at Carnegie Mellon University. SPHINX is one of the best and most versatile recognition systems in the world today. An HMM-based system, like all other speech recognition systems, functions by first learning the characteristics (or parameters) of a set of sound units, and then using what it has learned about the units to find the most probable sequence of sound units for a given speech signal. The process of learning about the sound units is called training. The process of using the knowledge acquired to deduce the most probable sequence of units in a given signal is called decoding, or simply recognition.
You can find full examples of Sphinx-4 configuration file in sources. For example, check the file sphinx4/src/apps/edu/cmu/sphinx/demo/transcriber/config.xml.
Accordingly, you will need those components of the SPHINX system that you can use for training and for recognition. In other words, you will need the SPHINX trainer and a SPHINX decoder. You will be given instructions on how to download, compile, and run the components needed to build a complete speech recognition system. Namely, you will be given instructions on how to use and.
Please check a CMUSphinx project page for more details on available decoders and their applications. This tutorial does not instruct you on how to build a language model, but you can check the page for an excellent manual. At the end of this tutorial, you will be in a position to train and use this system for your own recognition tasks. More importantly, through your exposure to this system, you will have learned about several important issues involved in using a real HMM-based ASR system.
Important note for members of the Sphinx group: This tutorial. The internal, csh-based is still available, though its use is discouraged. Components provided for training The SPHINX trainer consists of a set of programs, each responsible for a well defined task, and a set of scripts that organizes the order in which the programs are called. You have to compile the code in your favorite platform.
The trainer learns the parameters of the models of the sound units using a set of sample speech signals. This is called a training database. A choice of training databases will also be provided to you. The trainer also needs to be told which sound units you want it to learn the parameters of, and at least the sequence in which they occur in every speech signal in your training database.
This information is provided to the trainer through a file called the transcript file, in which the sequence of words and non-speech sounds are written exactly as they occurred in a speech signal, followed by a tag which can be used to associate this sequence with the corresponding speech signal. The trainer then looks into a dictionary which maps every word to a sequence of sound units, to derive the sequence of sound units associated with each signal. Thus, in addition to the speech signals, you will also be given a set of transcripts for the database (in a single file) and two dictionaries, one in which legitimate words in the language are mapped sequences of sound units (or sub-word units), and another in which non-speech sounds are mapped to corresponding non-speech or speech-like sound units.
We will refer to the former as the language dictionary and the latter as the filler dictionary. In summary, the components provided to you for training will be:. The trainer source code. The acoustic signals. The corresponding transcript file. A language dictionary. A filler dictionary Components provided for decoding The decoder also consists of a set of programs, which have been compiled to give a single executable that will perform the recognition task, given the right inputs.
The inputs that need to be given are: the trained acoustic models, a model index file, a language model, a language dictionary, a filler dictionary, and the set of acoustic signals that need to be recognized. The data to be recognized are commonly referred to as test data. In summary, the components provided to you for decoding will be:.
The decoder source code. The language dictionary.
The filler dictionary. The language model.
The test data In addition to these components, you will need the acoustic models that you have trained for recognition. You will have to provide these to the decoder. While you train the acoustic models, the trainer will generate appropriately named model-index files. A model-index file simply contains numerical identifiers for each state of each HMM, which are used by the trainer and the decoder to access the correct sets of parameters for those HMM states. With any given set of acoustic models, the corresponding model-index file must be used for decoding. If you would like to know more about the structure of the model-index file, you will find a description following the link. Setting up your system You will have to download and build several components to set up the complete systems.
Provided you have all the necessary software, you will have to download the data package, the trainer, and one of the SPHINX decoders. The following instructions detail the steps. Required software before you start You will need Perl to run the provided scripts, and a C compiler to compile the source code. Perl You will need Perl to use the scripts provided. Linux usually comes with some version of Perl.
If you do not have Perl installed, please check the site, where you can download it for free. For Windows, a popular version, is available from ActiveState. If you are using Windows, even if you have cygwin installed, ActivePerl is better at handling the end of line character, and it is faster than cygwin's Perl. Additionally, if a package is missing from the distribution, you can easily download and install it using the ppm utility. For example, to install the File::Copy module, all you have to do is: perl ppm install File::Copy C Compiler SphinxTrain and SPHINX-3 use GNU autoconf to find out basic information about your system, and should compile on most Unix and Unix-like systems, and certainly on Linux. The code compiles using GNU's make and GNU's C compiler ( gcc), available in all Linux distributions, and available for free for most platforms. We also provide files supporting compilation using Microsoft's Visual C, i.e., the solution (.sln) and project (.vcproj) files needed to compile code in native Windows format.
Word Alignment You will need a word alignment program if you want to measure the accuracy of a decoder. A commonly used one, available from the National Institute of Standards and Technology (NIST), is sclite, provided as part of their scoring packages. You will find their scoring packages in the page. The software is available for those in the speech group at robust/archive/thirdpartypackages/NISTscoringtools/sctk/linux/bin/sclite. Internally, at CMU, you may also want to use the align program, which does the same job as the NIST program, but does not have some of the features. You can find it in the robust home directory at robust/archive/thirdpartypackages/align/linux/align. Setting up the data The Sphinx Group makes it available two audio databases that can be used with this tutorial.
Cmu Sphinx Tutorial
Each has its peculiarities, and are provided just as a convenience. The data provided are not sufficient to build a high performance speech recognition system. They are only provided with the goal of helping you learn how to use the system. The databases are provided at the page. Choose either. AN4 includes the audio, but it is a very small database.
You can choose it if you want to include the creation of feature files in your experiments. RM1 is a little larger, thus resulting in a system with slightly better performance.
Audio is not provided, since it is licensed material. We provide the feature files used directly by the trainer and decoders.
For more information about RM1, please check with the. The steps involved:. Create a directory for the system, and move to that directory: mkdir tutorial cd tutorial. Download the audio tarball, either or, by clicking on the link and choosing 'Save' when the dialog window appears. Save it to the same tutorial directory you just created. For those not familiar with the term, a tarball in our context is a file with extension.tar.gz. Extract the contents as follows.
In Windows, using the Windows Explorer, go to the tutorial directory, right-click the audio tarball, and choose 'Extract to here' in the WinZip menu. In Linux/Unix: # If you are using AN4 gunzip -c an4sphere.tar.gz tar xf - # If you are using RM1 gunzip -c rm1cepstra.tar.gz tar xf - By the time you finish this, you will have a tutorial directory with the following contents. tutorial. an4. an4sphere.tar.gz Or.
tutorial. rm1. rm1cepstra.tar.gz Setting up the trainer Code retrieval SphinxTrain can be retrieved using (svn) or by downloading a. Svn makes it easier to update the code as new changes are added to the repository, but requires you to install svn. The tarball is more readily available.
You can find more information about svn at the.
Building an application with sphinx4. Caution! This tutorial uses the sphinx4 API from the.
The API described here is not supported in earlier versions. Overview Sphinx4 is a pure Java speech recognition library. It provides a quick and easy API to convert the speech recordings into text with the help of CMUSphinx acoustic models. It can be used on servers and in desktop applications. Besides speech recognition, Sphinx4 helps to identify speakers, to adapt models, to align existing transcription to audio for timestamping and more. Sphinx4 supports US English and many other languages.
Using sphinx4 in your projects As any library in Java all you need to do to use sphinx4 is to add the jars to the dependencies of your project and then you can write code using the API. The easiest way to use sphinx4 is to use modern build tools like. Sphinx-4 is available as a maven package in the. In gradle you need the following lines in build.gradle. Edu.cmu.sphinx sphinx4-data 5prealpha-SNAPSHOT Many IDEs like Eclipse, Netbeans or Idea have support for Gradle either through plugins or with built-in features. In that case you can just include sphinx4 libraries into your project with the help of your IDE.
Please check the relevant part of your IDE documentation, for example the. You can also use Sphinx4 in a non-maven project. In this case you need to download the jars from the manually. You might also need to download the dependencies (which we try to keep small) and include them in your project. You need the sphinx4-core jar and the sphinx4-data jar if you are going to use US English acoustic model: Here is an example for how to include the jars in Eclipse: Basic Usage To quickly start with sphinx4, create a java project as described above, add the required dependencies and type the following simple code. LiveSpeechRecognizer recognizer = new LiveSpeechRecognizer ( configuration ); // Start recognition process pruning previously cached data. StartRecognition ( true ); SpeechResult result = recognizer.
GetResult ; // Pause recognition process. It can be resumed then with startRecognition(false). StopRecognition ; StreamSpeechRecognizer The StreamSpeechRecognizer uses an InputStream as the speech source. You can pass the data from a file, a network socket or from an existing byte array.