LRE Map

The LRE Map is a freely accessible large database on resources dedicated to Natural language processing. The original feature of LRE Map is that the records are collected during the submission of different major Natural language processing conferences. The records are then cleaned and gathered into a global database called "LRE Map".
The LRE Map is intended to be an instrument for collecting information about language resources and to become, at the same time, a community for users, a place to share and discover resources, discuss opinions, provide feedback, discover new trends, etc. It is an instrument for discovering, searching and documenting language resources, here intended in a broad sense, as both data and tools.
The large amount of information contained in the Map can be analyzed in many different ways. For instance, the LRE Map can provide information about the most frequent type of resource, the most represented language, the applications for which resources are used or are being developed, the proportion of new resources vs. already existing ones, or the way in which resources are distributed to the community.

Context

Several institutions worldwide maintain catalogues of language resources
However, it has been estimated that only 10% of existing resources are known, either through distribution catalogues or via direct publicity by providers. The rest remains hidden, the only occasions where it briefly emerges being when a resource is presented in the context of a research paper or report at some conference. Even in this case, nevertheless, it might be that a resource remains in the background simply because the focus of the research is not on the resource per se.

History

The LRE Map originated under the name "LREC Map" during the preparation of LREC 2010 conference. More specifically, the idea was discussed within the FlaReNet project, and in collaboration with and the , the Map was put in place at LREC 2010. The LREC organizers asked the authors to provide some basic information about all the resources, either used or created, described in their papers. All these descriptors were then gathered in a global matrix called the LREC Map.
The same methodology and requirements from the authors has been then applied and extended to other conferences, namely COLING-2010, EMNLP-2010, RANLP-2011, LREC 2012, LREC 2014 and LREC 2016.
After this generalization to other conferences, the LREC Map has been renamed as the LRE Map.

Size and content

The size of the database increases over time. The data collected amount to 4776 entries.
Each resource is described according to the following attributes:

Resource type, e.g. lexicon, annotation tool, tagger/parser.
Resource production status, e.g. newly created finished, existing-updated.
Resource availability, e.g. freely available, from data center.
Resource modality, e.g. speech, written, sign language.
Resource use, e.g. named entity recognition, language identification, machine translation.
Resource language, e.g. English, 23 European Union languages, official languages of India.
Uses

The LRE map is a very important tool to chart the NLP field. Compared to other studied based on subjective scorings, the LRE map is made of real facts.
The map has a great potential for many uses, in addition to being an information gathering tool:

It is a great instrument for monitoring the evolution of the field, if applied in different contexts and times.
It can be seen as a huge joint effort, the beginning of an even larger cooperative action not just among few leaders but among all the researchers.
It is also an "educational" means towards the broad acknowledgment of the need of meta-research activities with the active involvement of many.
It is also instrumental in introducing the new notion of "citation of resources" that could provide an award and a means of scholarly recognition for researchers engaged in resource creation.
It is used to help the organization of the conferences of the field like LREC.
Derived matrices

The data were then cleaned and sorted by Joseph Mariani and Gil Francopoulo in order to compute the various matrices of the final FLaReNet reports. One of them, the matrix for written data at LREC 2010 is as follows:

	Corpus	Lexicon	Ontology	Grammar/Language Model	Terminology
Bulgarian	7	6	1	1	1
Czech	12	7	2	1	1
Danish	6	2	0	2	0
Dutch	17	8	2	1	2
English	206	77	18	11	10
Estonian	3	1	0	0	1
Finnish	3	2	0	1	0
French	44	24	3	4	5
German	43	15	4	2	3
Greek	10	3	2	0	0
Hungarian	8	4	0	1	1
Irish	1	0	0	0	0
Italian	32	16	4	2	0
Latvian	9	0	0	0	1
Lithuanian	4	0	2	0	1
Maltese	1	0	0	1	0
Polish	7	2	1	2	1
Portuguese	19	6	1	1	0
Romanian	12	7	1	1	0
Slovak	2	0	0	1	0
Slovene	5	1	0	0	0
Spanish	29	19	4	5	2
Swedish	19	4	0	1	0
Other Europe	19	11	3	3	2
Regional Europe	18	8	0	1	3
Multilingual	5	3	1	0	1
Language independent	9	3	16	2	1
Non applicable	2	0	2	1	0
Total	552	229	67	45	36

English is the most studied language. Secondly, come French and German languages and then Italian and Spanish.

Future

The LRE Map has been extended to Language Resources and Evaluation Journal and other conferences.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

LRE Map

Context

History

Size and content

Uses

Derived matrices

Future