Basis Technology


Basis Technology Corp. is a software company specializing in applying artificial intelligence techniques to understanding documents and unstructured data written in different languages. It has headquarters in Cambridge, Massachusetts and offices in San Francisco, Washington, D.C., London, and Tokyo.
The company was founded in 1995 by graduates of the Massachusetts Institute of Technology to use artificial intelligence techniques to help understand the many different languages that humans use. Its software focuses on finding structure inside text so algorithms can do a better job understanding the meaning of the words. The tools identify different forms of names and phrases. The name of someone, say Albert P. Jones for instance, can appear in many different ways. Some texts will call him "Al Jones", others "Mr. Jones" and others "Albert Paul Jons". Basis Technology's software can match all of these instances.
Their software enhances parsing tools by classifying the role of words and provides metadata on the role of words to other algorithms. Software from Basis Technology will, for instance, identify the language of an incoming stream of characters and then identify the parts of each sentence like the subject or the direct object.
The company is best known for its Rosette Linguistics Platform which uses Natural Language Processing techniques to improve information retrieval, text mining, search engines and other applications. The tool is used to create normalized forms of text by major search engines, and, translators. Basis Technology software is also used by forensic analysts to search through files for words, tokens, phrases or numbers that may be important to investigators.

Rosette

The Rosette Linguistics Platform consists of a component library for multilingual text retrieval and analysis. Rosette provides automatic language identification, linguistic analysis, entity extraction, and entity translation from unstructured text. It can be integrated into applications to help analyse volumes of unstructured text.
The Rosette Linguistics Platform is composed of these modules:
The Rosette Platform is used in both the United States government offices to support translation and by major Internet infrastructure firms like search engines.

Digital forensics

Basis Technology develops open-source digital forensics tools, The Sleuth Kit and Autopsy, to help identify and extract clues from data storage devices like hard disks or flash cards, as well as devices such as smart phones and iPods. The open-source licensing model allows them to be used as the foundation for larger projects like a Hadoop-based tool for massively parallel forensic analysis of very large data collections.
The digital forensics tool set is used to perform analysis of file systems, new media types, new file types and file system metadata. The tools can search for particular patterns in the files allowing it to target significant files or usage profiles. It can, for instance, look for common files using hash functions and also deconstruct the data structures of the important operating system log files.
The tools are designed to be customizable with an open plugin architecture. Basis Technology helps manage a large and diverse community of developers who use the tool in investigations.

Highlight

Highlight is transliteration software designed to assist linguists and analysts standardize names and places, allowing them to concentrate on "connecting the dots". Highlight is a plug-in to Microsoft Office Excel and Word. Key features include:
Highlight can: