Corpus of Contemporary American English

The Corpus of Contemporary American English is a more than 560-million-word corpus of American English. It was created by Mark Davies, Professor of Corpus Linguistics at Brigham Young University.

Content

The corpus is composed of more than 1 billion words from 220,225 texts, including 20 million words from each of the years 1990 through 2017. The most recent update was made in December 2017. The corpus is used by approximately tens of thousands of people each month, which may make it the most widely used "structured" corpus currently available.
For each year, the corpus is evenly divided between the following five genres: spoken, fiction, popular magazines, newspapers, and academic journals. The texts come from a variety of sources:

Spoken: Transcripts of unscripted conversation from nearly 150 different TV and radio programs.
Fiction: Short stories and plays, first chapters of books 1990–present, and movie scripts.
Popular magazines: Nearly 100 different magazines, from a range of domains such as news, health, home and gardening, women's, financial, religion, and sports.
Newspapers: Ten newspapers from across the US, with text from different sections of the newspapers, such as local news, opinion, sports, and the financial section.
Academic Journals: Nearly 100 different peer-reviewed journals. These were selected to cover the entire range of the Library of Congress Classification system.
Availability

The corpus is free to search through its web interface, with a limit on the number of queries per day, and less-restricted access is available at cost.
The full corpus texts are available for a further fee.

Queries

The interface is the same as the BYU-BNC interface for the 100 million word British National Corpus, the 100 million word TIME Magazine corpus, and the 400 million word Corpus of *Historical* American English, 1810s–2000s
Queries by word, phrase, alternates, substring, part of speech, lemma, synonyms, and customized lists
The corpus is tagged by CLAWS, the same part of speech tagger that was used for the BNC and the TIME corpus
Chart listings and table listings
Full collocates searching
Re-sortable concordances, showing the most common words/strings to the left and right of the searched word
Comparisons between genres or time periods
One-step comparisons of collocates of related words, to study semantic or cultural differences between words
Users can include semantic information from a 60,000 entry thesaurus directly as part of the query syntax
Users can also create their own 'customized' word lists, and then re-use these as part of subsequent queries
Note that the corpus is available only through the web interface, due to copyright restrictions.
Related

The corpus of contains about 1.9 billion words of text from twenty different countries. This makes it about 100 times as large as other corpora like the International Corpus of English, and it allows for many types of searches that would not be possible otherwise. In addition to this online interface, you can also download full-text data from the corpus.
it is unique in the way that it allows you to carry out comparisons between different varieties of English. GloWbE is related to the many other corpora of English.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Corpus of Contemporary American English

Content

Availability

Queries

Related