Indigenous Tweets


Indigenous Tweets is a website that records minority language Twitter messages to help indigenous speakers contact each other. It was founded in March 2011 by Kevin Scannell, who does research in computational linguistics in the Department of Mathematics and Computer Science at Saint Louis University in St. Louis, Missouri, United States. The website's purpose is to enable minority language speakers to communicate on the Internet.
On its homepage, the website displays a list of minority languages it has cached. After selecting a language, the user is brought to a table of everyone who is tweeting in that language. Indigenous Tweets provides the profile picture of each Twitter user and statistics about each person's number of followers. In addition to providing statistics about the percentage of tweets a person writes in different languages, Indigenous Tweets has a selection of the trending topics in the various minority languages.

History

At the website's inception in March 2011, it cataloged 35 languages. On April 16, 2011, it recorded tweets in 76 minority languages. By April 26, 2011, the website supported 82. The cataloged languages include the "esoteric" Gamilaraay and the "better-known" Haitian Creole and Basque, which have the first and second most Tweeters, respectively. Welsh is ranked third on Indigenous Tweets.
Kapampangan, which was ranked seventh in the last week of April 2011, was the first Philippine language supported by the website.

Data mining

Indigenous Tweets employs a data bank of words and phrases from the minority languages to locate people who speak those languages. In an April 2011 interview with BBC News, Scannell said that he has spent 8 years building a data bank of around 500 languages by reviewing blogs, newspapers, and websites.
Indigenous Tweets gathers data through Twitter's API by searching a data bank of words and phrases from the minority languages. The website's search engine cannot decipher the language of a tweet when a word is in more than one language. To avoid this conundrum, Scannell inputs words that are unique to the language.