Download corpus

Corpus CIEMPIESS

The CIEMPIESS Corpus was designed to create acoustic models for automatic speech recognition. It consists in 17 hour of radio programs with spontaneous speech between the radio moderator and his guests. The entire corpus was taken from Radio-IUS (UNAM) . It includes text transcriptions and the files needed to perform experiments within the CMU-Sphinx recognition system.

CIEMPIESS_Statistics

README.txt file

Click Here For More Information

How to cite?
License: Licencia Creative Commons
CIEMPIESS Corpus by Carlos Daniel Hernandez Mena is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License. Based on a work at http://odin.fi-b.unam.mx/CIEMPIESS-UNAM/.

Download CIEMPIESS

HTK2SPHINX-CONVERTER

HTK2SPHINX-CONVERTER Is a software coded in python 2.7 that lets the user use the speech recognition system HTK almost the same way as the speech recognition system CMU-SPHINX 3 and with the same input files.

HTK2SPHINX-CONVERTER can also perform "live decoding" using the speech recognition system Julius.

The two main differences beyween the HTK2SPHINX-CONVERTER and the CMU-SPHINX3 is that the former is a grammar based recognition system speaker dependent, and the latter can use a language model and could be speaker independent.

Click Here For More Information

© Copyright 2014 Carlos Daniel Hernandez Mena

Dowload HTK2SPHINX-CONVERTER

Fonetica2 Library v2

The fonetica2 library contains functions to perform phonetic and phonological transcriptions to spanish words.

© Copyright 2014 Carlos Daniel Hernandez Mena

Download Fonetica2 Library v2

HTK-BENCHMARK

HTK-BENCHMARK Is a software coded in python 2.7 that lets the user use the speech recognition system HTK almost the same way as the speech recognition system CMU-SPHINX 3 and with the same input files.


HTK-BENCHMARK do not perform "live decoding".


HTK-BENCHMARK is based on recognition using a 3-gram language model in ARPA format compatible with SPHINX3.


© Copyright 2015 Carlos Daniel Hernandez Mena

Download HTK-BENCHMARK

Fonetica3 Library

The fonetica3 library contains functions to perform phonetic and phonological transcriptions to spanish words.

© Copyright 2017 Carlos Daniel Hernandez Mena

Download Fonetica3 Library

CORPUS CHM150

The CHM150 is a corpus of microphone speech of mexican Spanish taken from 75 male speakers and 75 female speakers in a noise environment of a "quiet office" with a total duration of 1.63 hours.

Speakers were encouraged to respond between some pre selected open questions or they could also describe a particular painting showed to them in a computer monitor. By so, the speech is completely spontaneous and one can see it in the transcription file, that captures disfluencies and mispronunciations in an orthographic way.

The CHM150 corpus contains a total of 2663 utterances classified by speaker, and it also contains a small vocabulary of 1898 unique words. For these reasons the CHM150 could be so small for speech recognition but it is fine for doing spoken term detection and forensic speaker identification.

You can download it from the Linguistic Data Consortium (LDC) website. You just have to create a new account, then you can request the corpus by email.

Download from LDC website

CORPUS CIEMPIESS LIGHT

The CIEMPIESS LIGHT Corpus is an enhanced version of the CIEMPIESS Corpus (LDC item LDC2015S07).

CIEMPIESS LIGHT is "light" because it doesn't include much of the files of the first version of CIEMPIESS and it is "enhanced" because it has a lot of improvements, some of them suggested by our community of users, that make this version more convenient for the new speech recognition engines such as Kaldi (http://kaldi-asr.org/).

You can download it from the Linguistic Data Consortium (LDC) website. You just have to create a new account, then you can request the corpus by email.

Download from LDC website

Download Zone

In this section you can download the tools and language resorces developed by the CIEMPIESS-UNAM Project. All of our contents are protected by international licenses that work free of charge to the public, so you can modify, distribute and adapt our creations to your particular needs at no cost. If you find bugs in our software, please tell us, if you improve it , please share it !!!
If you download our tools for Academic use, please cite us, that is so good for us !!!