Spoken English Corpus

The Spoken English Corpus is a speech corpus used in corpus linguistics consisting of a collection of recordings of spoken British English compiled during the period 1984-7 through a collaboration, funded by IBM, between the Unit for Computer Research on the English Language at the University of Lancaster and the IBM Scientific Centre in Winchester. The corpus comprises 53 recorded passages, mainly recorded from the BBC, spoken in the accent usually referred to as Received Pronunciation, or RP. It covers categories such as commentary. news broadcast, lecture and dialogue. The corpus contains 52,637 words, in a recording time of 339 minutes. The compilation of the corpus is described by Lita Taylor in her 1996 article "The Compilation of the Spoken English Corpus." The whole corpus in print can be purchased at Routledge or Book Depository.

Transcription of the recordings

A system was devised for transcription of the intonation of the material in the recordings, and two transcribers, Gerry Knowles and Briony Williams, analysed the entire corpus. The transcription system is explained by Williams, and an experiment was conducted by Brian Pickering to assess the degree of agreement between the two transcribers on a section of the Corpus containing around 1000 tone-units which was transcribed by both transcribers. Good agreement was found.

Other analysis

of each word was added to the text of the SEC by an automatic process; the fact that this tagging was in machine-readable form made it possible to relate grammatical and prosodic information in the texts. Subsequent work used probabilistic models to develop further the grammatical tagging and to produce automatic parsing techniques.

Machine-Readable Spoken English Corpus (MARSEC)

Although the text and its associated tagging existed in machine-readable form, the recordings themselves existed only as tape-recordings. A collaboration, funded by the Economic and Social Research Council in 1992-4, between speech scientists at the Universities of Lancaster and Leeds in the United Kingdom set out to produce a version of the corpus which contained the recordings in digital form, time-linked to the text. The principal researchers were Gerry Knowles and Tamas Varadi and Peter Roach and Simon Arnfield. The outline of the project is set out in Knowles, and the automatic time-alignment is described by Roach and Arnfield. The digitized recordings were recorded on CD-ROM; it was subsequently made available for downloading for research purposes from Leeds University, though this facility is no longer supported.

Aix-MARSEC

The work on MARSEC in Lancaster and Leeds finished around 1995, but the corpus has subsequently been the object of a considerable amount of further development at the University of Aix-en-Provence, France, under the direction of Daniel Hirst. The database consists of two major components: the digitalized recordings from MARSEC and the annotations. Annotations have so far been undertaken at nine levels, including phonemes, syllables, words, stress feet, rhythm units and minor and major turn units. Two supplementary levels, the grammatical annotation by CLAWS and a Property Grammar system developed at Aix-en-Provence, are to be integrated soon. A possible disadvantage of this treatment is that the corpus can only be searched using specially written scripts. The database, together with tools, is available under GNU GPL licensing at the Aix-MARSEC project site.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...