CIEMPIESS-UNAM

CIEMPIESS-UNAM Project News

In this section we share the latest news related to the CIEMPIESS-UNAM Project Read this to find out our new software releases, tutorials, language resources, documentation, publications and so on.

Updated Transcriptions for CIEMPIESS-TEST

Carlos Mena
2023-03-03
We have released an update of the CIEMPIESS-TEST transcripts. Special thanks to Mónica Alejandra Ruíz López who is responsible for this update....

Full article

NEW Corpus Published: Wikipedia Spanish Corpus

Carlos Mena
2021-08-18
Few days ago (August 16th, 2021) the Linguistic Data Consortium (LDC) has released the Wikipedia Spanish Corpus developed by the CIEMPIESS-UNAM Project through the Social Service Students be...

Full article

The first corpus in Maltese for ASR

Carlos Mena
2020-01-22
The MASRI Project at the University of Malta has released today the first dataset in Maltese language for the ASR task.

This corpus can be requested at the MASRI website

Full article

LibriVox Spanish Corpus Published by LDC

Carlos Mena
2020-01-16
Today was published the Librivox Spanish Corpus by the Linguistic Data Consortium. It is a corpus with a size of 73 hours of clean speech taken from the audio-books of the LibriVox Proyect....

Full article

Releasing of the CIEMPIESS-PNPD in the OpenCor2019 Conference

Carlos Mena
2019-10-14
The CIEMPIESS Proper-Names Pronouncing Dictionary (CIEMPIESS-PNPD) is a pronouncing dictionary with more than 200 thousand entries. It was released in October 8th at the OpenCor 2019 Confer...

Full article

Releasing of our PocketSphinx Models by the CMU Sphinx Group

Carlos Mena
2019-08-24
The CMU Sphinx Group has released our PocketSphinx Models in Spanish made out of 581 hour of audio recordings. The download link is the following:

https://sourceforge.net/project...

Full article

Publication of the CIEMPIESS Experimentation Package

Carlos Mena
2019-05-15
CIEMPIESS Experimentation is a set of three different data sets; specifically the CIEMPIESS Complementary, the CIEMPIESS Fem and the CIEMPIESS Test. It was published today at the Linguisti...

Full article

The TEDx Spanish Corpus is released

Carlos Mena
2019-05-10
The TEDx Spanish Corpus is a dataset of 24 hours of duration destined to create acoustic models for automatic speech recognition. It was published at the OPEN SLR Website ( http://www.opensl...

Full article

23th IEEE International Meeting

Carlos Mena
2019-05-10
We presented a paper about the Corpus CHM150 at the ROC&C Conference organized by the IEEE. This event was reported at the "Gaceta DIgital" of the Faculty of Engineering ( época 2, año V, nú...

Full article

portal development

Frederick Álvarez
2014-04-15
With the spread of agencies work done in the laboratory of voice processing to other education institution and thereby grow the software community is provided.

The portal will hav...

Full article