BABEL Speech Corpus

The BABEL speech corpus is a corpus of recorded speech materials from five Central and Eastern European languages. Intended for use in speech technology applications, it was funded by a grant from the European Union and completed in 1998. It is distributed by the European Language Resources Association.

Development of the BABEL Project

Following the creation of a speech corpus of European Union languages by the SAM project, funding was granted by the European Union for the creation along similar lines of a speech corpus of languages of Central and Eastern Europe, with the name of BABEL.
The initial impetus came from the SAM project funded by the European Union as ESPRIT Project #1541 in 1987–89. This project was conducted by an international group of phoneticians, and was applied in the first instance to the European Communities languages Danish, Dutch, English, French, German, and Italian. SAM produced many speech research tools and a corpus of recorded speech material distributed on CD-ROM. A proposal was made to the European Union under the Copernicus initiative in 1994, with the objective of creating a corpus of spoken Bulgarian, Estonian, Hungarian, Polish and Romanian, and Grant #1304 was awarded for this. A pilot project to create a small corpus of spoken Bulgarian was carried out jointly by the Universities of Sofia and Reading. The initial meeting of the whole project team took place at the University of Reading in 1995.

Recorded material

Since the objective was to produce material suitable for use in speech technology applications, the digital recordings were made in strictly controlled conditions in recording studios. For each language the material had the following composition:

Many-talker set: 30 males and 30 females each read 100 numbers, 3 connected-speech passages and 5 "filler" sentences or 4 passages if no fillers were needed.
Few-talker set: 5 males and 5 females, normally selected from the above group, each read 5 blocks of 100 numbers, 15 passages and 25 filler sentences, plus 5 lists of syllables.
Very-few-talker set: 1 male and 1 female selected from the above read 5 blocks of syllables, with and without carrier sentences.
Membership of the BABEL Project

Project Director: P. Roach

Project leaders in Central and Eastern Europe

Bulgaria: initially, A. Misheva until her death in 1995, then S. Dimitrova.
Estonia: E. Meister
Hungary: K. Vicsi
Poland: R. Gubrynowicz and W. Gonet
Romania: M. Boldea

Project members in Western Europe

France: L. Lamel ; A. Marchal
Germany : W. Barry ; K. Marasek
United Kingdom: J. Wells ; P. Roach

Project outcomes

An intermediate project assessment meeting was held in Lublin, Poland, in 1996. Work then continued until a final assessment and presentation of outcomes in Granada, Spain, at the First International Conference on Language Resources and Evaluation, in 1998. The project was completed in December 1998. The resulting set of corpora was then supplied to the European Language Resources Association. ELRA is exclusively responsible for distributing the material to users via their website.
At the time of its completion, BABEL was the largest high-quality speech database available for research purposes in languages such as Hungarian and Estonian. It has been used for research into topics such as pronunciation modeling and automatic speech recognition. The project was also part of what has been called the most significant recent development in corpus linguistics – the increasing range of languages covered by corpus data, which promises to bring to a wider range of languages the benefits that corpus linguistics has brought to the study of Western European languages.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...