COMPARISON TABLE BETWEEN DIFFERENT CORPUS TAKEN FROM LDC



Characteristics of the query in the LDC website:

Language(s): Mandarin

DCMI Type(s): Sound

Application(s): speech recognition



No. LDC Catalog No. Item Name Author(s) Release Date Member Year(s) DCMI Type(s) Sample Type Sample Rate Data Source(s) Application(s) Language(s)
1 LDC2001S91 1997 HUB4 Broadcast News Evaluation Non-English Test Material Jonathan Fiscus, John Garofolo, Mark Przybocki, William Fisher, David Pallett Not Specified 2001 Sound Not Specified Not Specified broadcast news speech recognition Spanish, Mandarin Chinese
2 LDC2001S93 TDT2 Mandarin Audio Corpus David Graff Not Specified 2001 Sound Not Specified Not Specified broadcast news topic detection and tracking, speech recognition Mandarin Chinese
3 LDC2001S95 TDT3 Mandarin Audio David Graff Not Specified 2001 Sound 1-channel pcm 16000 broadcast news topic detection and tracking, speech recognition Mandarin Chinese
4 LDC2002S12 2001 HUB5 Mandarin Evaluation David Graff, Alvin Martin, David Miller, Mark Przybocki, Kevin Walker April 30, 2002 2002 Sound 2-channel ulaw 8000 telephone conversations speech recognition Mandarin Chinese
5 LDC2012S01 2006 NIST Speaker Recognition Evaluation Test Set Part 2 NIST Multimodal Information Group January 19, 2012 2012 Sound ulaw 8000 telephone speech, microphone speech speech recognition Yue Chinese, Urdu, Thai, Spanish, Russian, Korean, Hindi, Persian, English, Mandarin Chinese, Bengali, Standard Arabic, Dari, Iranian Persian, Chinese, Arabic
6 LDC2011S05 2008 NIST Speaker Recognition Evaluation Training Set Part 1 NIST Multimodal Information Group August 15, 2011 2011 Sound ulaw 8000 telephone speech, microphone speech speech recognition Yue Chinese, Wu Chinese, Vietnamese, Uzbek, Urdu, Tigrinya, Thai, Tagalog, Spanish, Russian, Panjabi, Min Nan Chinese, Lao, Korean, Central Khmer, Georgian, Japanese, Italian, Hindi, Persian, English, Mandarin Chinese, Bengali, Egyptian Arabic, Moroccan Arabic, Northern Khmer, Dari, Iranian Persian, Chinese, Arabic
7 LDC94S17 OGI Multilanguage Corpus Ronald Cole, Yeshwant Muthusamy Not Specified 1994 Sound 1-channel pcm compressed 8000 telephone speech speech recognition Vietnamese, Tamil, Korean, Japanese, Hindi, French, English, German, Spanish, Mandarin Chinese, Persian, Dari, Iranian Persian
8 LDC96S34 CALLHOME Mandarin Chinese Speech Alexandra Canavan, George Zipperlen Not Specified 1996, 1997 Sound 2-channel ulaw 8000 telephone conversations speech recognition Mandarin Chinese
9 LDC98S69 HUB5 Mandarin Telephone Speech Corpus Not Specified Not Specified 1998 Sound 2-channel ulaw 8000 telephone conversations speech recognition Mandarin Chinese
10 LDC98S72 Taiwanese Putonghua Speech and Transcripts San Duanmu, Gregory Wakefield, Yi-ping Hsu, Shan-ping Qui, Guevara Rowena Cristina Not Specified 1998 Sound 1-channel pcm 16000 microphone speech speech recognition Mandarin Chinese
11 LDC98S73 1997 Mandarin Broadcast News Speech (HUB4-NE) Shudong Huang, Jing Liu, Xuling Wu, Lei Wu, Yongmin Yan, Zhoakai Qin Not Specified 1998 Sound 1-channel pcm 16000 broadcast news speech recognition Mandarin Chinese
12 LDC2007S09 Mandarin Affective Speech Yingchun Yang, Zhaohui Wu, Tian Wu, Dongdong Li July 17, 2007 2007 Sound pcm 22050 microphone speech prosody, pronunciation modeling, speech recognition Mandarin Chinese
13 LDC2006S31 2003 NIST Language Recognition Evaluation Alvin Martin, Mark Pryzbocki June 15, 2006 2006 Sound ulaw 8000 telephone conversations speech recognition Vietnamese, Tamil, Spanish, Iranian Persian, Korean, Japanese, Hindi, French, English, German, Mandarin Chinese, Egyptian Arabic
14 LDC2008S05 2005 NIST Language Recognition Evaluation Audrey Le, Alvin Martin, Hannah Hadfield, Jacques de Villiers, John-Paul Hosom, Jan van Santen June 16, 2008 2008 Sound ulaw 8000 telephone conversations speech recognition, language identification Tamil, Korean, Japanese, Hindi, English, Spanish, Mandarin Chinese
15 LDC2011S01 2005 NIST Speaker Recognition Evaluation Training Data NIST Multimodal Information Group May 24, 2011 2011 Sound ulaw 8000 telephone speech speech recognition Spanish, Russian, English, Mandarin Chinese, Arabic
16 LDC2011S04 2005 NIST Speaker Recognition Evaluation Test Data NIST Multimodal Information Group July 15, 2011 2011 Sound ulaw 8000 telephone speech speech recognition Spanish, Russian, English, Mandarin Chinese, Arabic
17 LDC2011S07 2008 NIST Speaker Recognition Evaluation Training Set Part 2 NIST Multimodal Information Group September 15, 2011 2011 Sound ulaw 8000 telephone speech, microphone speech speech recognition Yue Chinese, Wu Chinese, Vietnamese, Uzbek, Urdu, Tigrinya, Thai, Tagalog, Spanish, Russian, Panjabi, Min Nan Chinese, Lao, Korean, Central Khmer, Georgian, Japanese, Italian, Hindi, Persian, English, Mandarin Chinese, Bengali, Egyptian Arabic, Moroccan Arabic, Northern Khmer, Dari, Iranian Persian, Chinese, Arabic
18 LDC2011S08 2008 NIST Speaker Recognition Evaluation Test Set NIST Multimodal Information Group October 21, 2011 2011 Sound ulaw 8000 telephone speech, microphone speech speech recognition Yue Chinese, Wu Chinese, Vietnamese, Uzbek, Urdu, Thai, Tagalog, Tamil, Russian, Panjabi, Min Nan Chinese, Lao, Korean, Japanese, Italian, Hindi, Persian, Mandarin Chinese, Bengali, Egyptian Arabic, Moroccan Arabic, Dari, Iranian Persian, English, Chinese, Arabic
19 LDC2011S09 2006 NIST Speaker Recognition Evaluation Training Set NIST Multimodal Information Group November 16, 2011 2011 Sound ulaw 8000 telephone speech speech recognition Yue Chinese, Urdu, Thai, Russian, Korean, Hindi, English, Mandarin Chinese, Bengali, Standard Arabic, Chinese, Arabic
20 LDC2011S10 2006 NIST Speaker Recognition Evaluation Test Set Part 1 NIST Multimodal Information Group December 15, 2011 2011 Sound ulaw 8000 telephone speech, microphone speech speech recognition Yue Chinese, Urdu, Thai, Spanish, Russian, Korean, Hindi, Persian, English, Mandarin Chinese, Bengali, Standard Arabic, Dari, Iranian Persian, Chinese, Arabic
21 LDC2013S04 GALE Phase 2 Chinese Broadcast Conversation Speech Kevin Walker, Christopher Caruso, Kazuaki Maeda, Denise DiPersio, Stephanie Strassel April 15, 2013 2013 Sound pcm 16000 broadcast conversation speech recognition Mandarin Chinese, Chinese
22 LDC2013S08 GALE Phase 2 Chinese Broadcast News Speech Kevin Walker, Christopher Caruso, Kazuaki Maeda, Denise DiPersio, Stephanie Strassel October 16, 2013 2013 Sound pcm 16000 broadcast news speech recognition Mandarin Chinese, Chinese
23 LDC2014S08 United Nations Proceedings Speech Kevin Chay, Cecilia Elizalde, Michal Ziemski October 15, 2014 2014 Sound flac 22050 microphone speech speech recognition, language identification English, Mandarin Chinese, Standard Arabic, French, Russian, Spanish
24 LDC2014S09 GALE Phase 3 Chinese Broadcast Conversation Speech Part 1 Kevin Walker, Christopher Caruso, Kazuaki Maeda, Denise DiPersio, Stephanie Strassel December 15, 2014 2014 Sound pcm 16000 broadcast conversation speech recognition Mandarin Chinese, Chinese