CIEMPIESS-UNAM Project News
In this section we share the latest news related to the CIEMPIESS-UNAM Project Read this to find out our new software releases, tutorials, language resources, documentation, publications and so on.
Updated Transcriptions for CIEMPIESS-TEST
Carlos Mena
2023-03-03
We have released an update of the CIEMPIESS-TEST transcripts.
Special thanks to Mónica Alejandra Ruíz López who is responsible for this update....
NEW Corpus Published: Wikipedia Spanish Corpus
Carlos Mena
2021-08-18
Few days ago (August 16th, 2021) the Linguistic Data Consortium (LDC) has released the Wikipedia Spanish Corpus developed by the CIEMPIESS-UNAM Project through the Social Service Students be...
The first corpus in Maltese for ASR
Carlos Mena
2020-01-22
The MASRI Project at the University of Malta has released today the first dataset in Maltese language for the ASR task.
This corpus can be requested at the MASRI website
LibriVox Spanish Corpus Published by LDC
Carlos Mena
2020-01-16
Today was published the Librivox Spanish Corpus by the Linguistic Data Consortium. It is a corpus with a size of 73 hours of clean speech taken from the audio-books of the LibriVox Proyect....
Releasing of the CIEMPIESS-PNPD in the OpenCor2019 Conference
Carlos Mena
2019-10-14
The CIEMPIESS Proper-Names Pronouncing Dictionary (CIEMPIESS-PNPD) is a pronouncing dictionary with more than 200 thousand entries.
It was released in October 8th at the OpenCor 2019 Confer...
Releasing of our PocketSphinx Models by the CMU Sphinx Group
Carlos Mena
2019-08-24
The CMU Sphinx Group has released our PocketSphinx Models in Spanish made out of 581 hour of audio recordings. The download link is the following:
https://sourceforge.net/project...
Publication of the CIEMPIESS Experimentation Package
Carlos Mena
2019-05-15
CIEMPIESS Experimentation is a set of three different data sets; specifically the CIEMPIESS Complementary, the CIEMPIESS Fem and the CIEMPIESS Test.
It was published today at the Linguisti...
The TEDx Spanish Corpus is released
Carlos Mena
2019-05-10
The TEDx Spanish Corpus is a dataset of 24 hours of duration destined to create acoustic models for automatic speech recognition. It was published at the OPEN SLR Website ( http://www.opensl...
23th IEEE International Meeting
Carlos Mena
2019-05-10
We presented a paper about the Corpus CHM150 at the ROC&C Conference organized by the IEEE. This event was reported at the "Gaceta DIgital" of the Faculty of Engineering ( época 2, año V, nú...
portal development
Frederick Álvarez
2014-04-15
With the spread of agencies work done in the laboratory of voice processing to other education institution and thereby grow the software community is provided.
The portal will hav...