Information Retrieval Facility

The Information Retrieval Facility, founded 2006 and located in Vienna, Austria, was a research platform for networking and collaboration for professionals in the field of information retrieval. It ceased operations in 2012.
The IRF had members in the following categories:

Researchers in information retrieval or related scientific areas
Industrial/corporate information management professionals
Patent authorities and governmental institutions
Students of one of the above
Scientific Board
Maristella Agosti, Professor,
Gerhard Budin, Director of the , Director of the
Jamie Callan, Professor,
Yves Chiaramella, Professor Emeritus,
Kilnam Chon, Professor, Computer Science Department,
W. Bruce Croft, Distinguished Professor, University of Massachusetts Amherst
Hamish Cunningham, Research Professor,
Norbert Fuhr, Chairman of the Scientific Board, Professor,
David Hawking, Science Leader, Project Leader,
Noriko Kando, Professor,
Arcot Desai Narasimhalu, Associate Dean,
John Tait, Chief Scientific Officer of the IRF,
Benjamin T'sou, Director,
C. J. van Rijsbergen,
Scientific goals
Modelling innovative and specialised information retrieval systems for global patent document collections.
Investigating and developing an adequate technical infrastructure that allows interactive experimentation with formal, mathematical retrieval concepts for very large-scale document collections.<
Studying the usability of multi modal user-interfaces to very large-scale information retrieval systems.
Integrating real users with actual information needs into the research process of modelling information retrieval systems to allow accurate performance evaluation.
Ability to create different views of patent data depending on the focus of the information need.
Defining standardised methods for benchmarking the information retrieval process in patent document collections.
Ability to handle text and non-text parts of a patent in a coherent manner.
Designing, experimenting and evaluating search engines able to retrieve structured and semi-structured documents in very large-scale patent collections.
Integrating the temporal dimension of patent documents in retrieval strategies.
Improving effectiveness and precision of patent retrieval, based on ontologies and natural-language understanding techniques.
Refining IR methods that allow unstructured querying by exploiting available structure within the patent documents.
Formal identification and specification of relevant business information needs in the field of intellectual property information.
Investigating efficient scaling mechanisms for information retrieval taking into account the characteristics of patent data.
Investigating and experimenting with computing architectures for very high-capacity information management.
Establishing an open eScience platform that enables a standardised and easy way of creating and performing IR experiments on a common research infrastructure.
Discovering and investigating novel use cases and business applications deriving from intellectual property information.
Enabling the formal information retrieval, natural language and semantic processing research to grow into the field of applied sciences in the global, industrial context.
Development and integration of different information access methods.
Research on effective methods for interactive information retrieval.
Semantic supercomputing

Current technologies to extract concepts from unstructured documents are extremely computational intensive. To allow interactive experimentation with rich and huge text corpora, the IRF has built a high performance computing environment, into which the latest technological advances have been implemented:

multi-node clusters
highest speed interconnect technology
single system image with large compound memory
fully integrated configurable computing

The combination of these HPC features to accelerate text mining represents the IRF implementation of semantic supercomputing.

The World Patent Corpus

The IRF aims to bring state-of-the-art information retrieval technology to the community of patent information professionals. We expect information retrieval technology to become the focus of information technology very soon. All industry sectors can profit from applying modern and future text mining processes to the special requirements of patent research. Although all ideas and concepts are universally applicable to all sorts of intellectual property information, patents require the most sophistication, and confront us with challenging technical and organisational problems.
The entire body of patent-related documents possibly constitutes the largest corpus of compound documents, making it a rewarding target for text mining scientists and end-users alike. What’s more, patents have become a crucial issue, in particular for large global corporations and universities. The industrial users of patent data are among the most demanding and important information professionals. As a consequence, they could benefit the most from technology that relieves the burden of researching the large body of patent information.

Research collections

The IRF provides a number of test data collections that have either been developed by the IRF, by one of its members or by third parties. These data collections can be used freely for scientific experimentations.
The MAtrixware REsearch Collection is the first standardised patent data corpus for research purposes. It consists of 19 million patent documents in different languages, normalised to a highly specific XML format. The collection has been developed by Matrixware for the IRF.
The ClueWeb09 collection is a 25 terabyte dataset of about 1 billion web pages crawled in January and February, 2009. It has been created by the Language Technologies Institute at Carnegie Mellon University to support research on information retrieval and related human language technologies.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Information Retrieval Facility

Scientific Board

Scientific goals

Semantic supercomputing

The World Patent Corpus

Research collections