Tag (metadata)


In information systems, a tag is a keyword or term assigned to a piece of information. This kind of metadata helps describe an item and allows it to be found again by browsing or searching. Tags are generally chosen informally and personally by the item's creator or by its viewer, depending on the system, although they may also be chosen from a controlled vocabulary.
Tagging was popularized by websites associated with Web 2.0 and is an important feature of many Web 2.0 services. It is now also part of other database systems, desktop applications, and operating systems.

Overview

People use tags to aid classification, mark ownership, note boundaries, and indicate online identity. Tags may take the form of words, images, or other identifying marks. An analogous example of tags in the physical world is museum object tagging. People were using textual keywords to classify information and objects long before computers. Computer based search algorithms made the use of such keywords a rapid way of exploring records.
Tagging gained popularity due to the growth of social bookmarking, image sharing, and social networking websites. These sites allow users to create and manage labels that categorize content using simple keywords. Websites that include tags often display collections of tags as tag clouds, as do some desktop applications. On websites that aggregate the tags of all users, an individual user's tags can be useful both to them and to the larger community of the website's users.
Tagging systems have sometimes been classified into two kinds: top-down and bottom-up. Top-down taxonomies are created by an authorized group of designers, whereas bottom-up taxonomies are created by all users. This definition of "top down" and "bottom up" should not be confused with the distinction between a single hierarchical tree structure versus multiple non-hierarchical sets ; the structure of both top-down and bottom-up taxonomies may be either hierarchical, non-hierarchical, or a combination of both. Some researchers and applications have experimented with combining hierarchical and non-hierarchical tagging to aid in information retrieval. Others are combining top-down and bottom-up tagging, including in some large library catalogs such as WorldCat.
When tags or other taxonomies have further properties such as relationships and attributes, they constitute an ontology.
Metadata tags as described in this article should not be confused with the use of the word "tag" in some software to refer to an automatically generated cross-reference; examples of the latter are tags tables in Emacs and smart tags in Microsoft Office.

History

The use of keywords as part of an identification and classification system long predates computers. Paper data storage devices, notably edge-notched cards, that permitted classification and sorting by multiple criteria were already in use prior to the twentieth century, and faceted classification has been used by libraries since the 1930s.
In the late 1970s and early 1980s, the Unix text editor Emacs offered a companion software program called Tags that could automatically build a table of cross-references called a tags table that Emacs could use to jump between a function call and that function's definition. This use of the word "tag" did not refer to metadata tags, but was an early use of the word "tag" in software to refer to a word index.
Online databases and early websites deployed keyword tags as a way for publishers to help users find content. In the early days of the World Wide Web, the keywords meta element was used by web designers to tell web search engines what the web page was about, but these keywords were only visible in a web page's source code and were not modifiable by users.
In 1997, the collaborative portal "A Description of the Equator and Some ØtherLands" produced by documenta X, Germany, used the folksonomic term Tag for its co-authors and guest authors on its Upload page. In "The Equator" the term Tag for user-input was described as an abstract literal or keyword to aid the user. However, users defined singular Tags, and did not share Tags at that point.
In 2003, the social bookmarking website Delicious provided a way for its users to add "tags" to their bookmarks ; Delicious also provided browseable aggregated views of the bookmarks of all users featuring a particular tag. Within a couple of years, the photo sharing website Flickr allowed its users to add their own text tags to each of their pictures, constructing flexible and easy metadata that made the pictures highly searchable. The success of Flickr and the influence of Delicious popularized the concept, and other social software websites—such as YouTube, Technorati, and Last.fm—also implemented tagging. In 2005, the Atom web syndication standard provided a "category" element for inserting subject categories into web feeds, and in 2007 Tim Bray proposed a "tag" URN.

Examples

Within a blog

Many systems allow authors to add free-form tags to a post, along with placing the post into a predetermined category. For example, a post may display that it has been tagged with baseball and tickets. Each of those tags is usually a web link leading to a index page listing all of the posts associated with that tag. The blog may have a sidebar listing all the tags in use on that blog, with each tag leading to an index page. To reclassify a post, an author edits its list of tags. All connections between posts are automatically tracked and updated by the blog software; there is no need to relocate the page within a complex hierarchy of categories.

Within application software

Some desktop applications and web applications feature their own tagging systems, such as email tagging in Gmail and Mozilla Thunderbird, bookmark tagging in Firefox, audio tagging in iTunes or Winamp, and photo tagging in various applications. Some of these applications display collections of tags as tag clouds.

Assigned to computer files

There are various systems for applying tags to the files in a computer's file system. In Apple's macOS, the operating system has allowed users to assign multiple arbitrary tags as extended file attributes to any file or folder ever since OS X 10.9 was released in 2013, and before that time the open-source OpenMeta standard provided similar tagging functionality in macOS. Several semantic file systems that implement tags are available for the Linux kernel, including Tagsistant. Microsoft Windows allows users to set tags only on Microsoft Office documents and some kinds of picture files.
Cross-platform file tagging standards include Extensible Metadata Platform, an ISO standard for embedding metadata into popular image, video and document file formats, such as JPEG and PDF, without breaking their readability by applications that do not support XMP. XMP largely supersedes the earlier IPTC Information Interchange Model. Exif is a standard that specifies the image and audio file formats used by digital cameras, including some metadata tags. TagSpaces is an open-source cross-platform application for tagging files; it inserts tags into the filename.

For an event

An official tag is a keyword adopted by events and conferences for participants to use in their web publications, such as blog entries, photos of the event, and presentation slides. Search engines can then index them to make relevant materials related to the event searchable in a uniform way. In this case, the tag is part of a controlled vocabulary.

In research

A researcher may work with a large collection of items in digital form. If he/she wishes to associate each with a small number of themes, then a group of tags for these themes can be attached to each of the items in the larger collection. In this way, freeform classification allows the author to manage what would otherwise be unwieldy amounts of information.

Special types

Triple tags

A triple tag or machine tag uses a special syntax to define extra semantic information about the tag, making it easier or more meaningful for interpretation by a computer program. Triple tags comprise three parts: a namespace, a, and a value. For example, geo:long=50.123456 is a tag for the geographical longitude coordinate whose value is 50.123456. This triple structure is similar to the Resource Description Framework model for information.
The triple tag format was first devised for geolicious in November 2004, to map Delicious bookmarks, and gained wider acceptance after its adoption by Mappr and GeoBloggers to map Flickr photos. In January 2007, Aaron Straup Cope at Flickr introduced the term machine tag as an alternative name for the triple tag, adding some questions and answers on purpose, syntax, and use.
Specialized metadata for geographical identification is known as geotagging; machine tags are also used for other purposes, such as identifying photos taken at a specific event or naming species using binomial nomenclature.

Hashtags

A hashtag is a kind of metadata tag marked by the prefix #, sometimes known as a "hash" symbol. This form of tagging is used on microblogging and social networking services such as Twitter, Facebook, Google+, VK and Instagram.

Knowledge tags

A knowledge tag is a type of meta-information that describes or defines some aspect of a piece of information. Knowledge tags are more than traditional non-hierarchical keywords or terms; they are a type of metadata that captures knowledge in the form of descriptions, categorizations, classifications, semantics, comments, notes, annotations, hyperdata, hyperlinks, or references that are collected in tag profiles. These tag profiles reference an information resource that resides in a distributed, and often heterogeneous, storage repository.
Knowledge tags are part of a knowledge management discipline that leverages Enterprise 2.0 methodologies for users to capture insights, expertise, attributes, dependencies, or relationships associated with a data resource. Different kinds of knowledge can be captured in knowledge tags, including factual knowledge, conceptual knowledge, expectational knowledge, and methodological knowledge. These forms of knowledge often exist outside the data itself and are derived from personal experience, insight, or expertise. Knowledge tags are considered an expansion of the information itself that adds additional value, context, and meaning to the information. Knowledge tags are valuable for preserving organizational intelligence that is often lost due to turnover, for sharing knowledge stored in the minds of individuals that is typically isolated and unharnessed by the organization, and for connecting knowledge that is often lost or disconnected from an information resource.

Advantages and disadvantages

In a typical tagging system, there is no explicit information about the meaning or semantics of each tag, and a user can apply new tags to an item as easily as applying older tags. Hierarchical classification systems can be slow to change, and are rooted in the culture and era that created them; in contrast, the flexibility of tagging allows users to classify their collections of items in the ways that they find useful, but the personalized variety of terms can present challenges when searching and browsing.
When users can freely choose tags, the resulting metadata can include homonyms and synonyms, which may lead to inappropriate connections between items and inefficient searches for information about a subject. For example, the tag "orange" may refer to the fruit or the color, and items related to a version of the Linux kernel may be tagged "Linux", "kernel", "Penguin", "software", or a variety of other terms. Users can also choose tags that are different inflections of words, which can contribute to navigation difficulties if the system does not include stemming of tags when searching or browsing. Larger-scale folksonomies address some of the problems of tagging, in that users of tagging systems tend to notice the current use of "tag terms" within these systems, and thus use existing tags in order to easily form connections to related items. In this way, folksonomies may collectively develop a partial set of tagging conventions.

Complex system dynamics

Despite the apparent lack of control, research has shown that a simple form of shared vocabulary emerges in social bookmarking systems. Collaborative tagging exhibits a form of complex systems dynamics. Thus, even if no central controlled vocabulary constrains the actions of individual users, the distribution of tags converges over time to stable power law distributions. Once such stable distributions form, simple folksonomic vocabularies can be extracted by examining the correlations that form between different tags. In addition, research has suggested that it is easier for machine learning algorithms to learn tag semantics when users tag "verbosely"—when they annotate resources with a wealth of freely associated, descriptive keywords.

Spamming

Tagging systems open to the public are also open to tag spam, in which people apply an excessive number of tags or unrelated tags to an item in order to attract viewers. This abuse can be mitigated using human or statistical identification of spam items. The number of tags allowed may also be limited to reduce spam.

Syntax

Some tagging systems provide a single text box to enter tags, so to be able to tokenize the string, a must be used. Two popular separators are the space character and the comma. To enable the use of separators in the tags, a system may allow for higher-level separators or escape characters. Systems can avoid the use of separators by allowing only one tag to be added to each input widget at a time, although this makes adding multiple tags more time-consuming.
A syntax for use within HTML is to use the rel-tag microformat which uses the rel attribute with value "tag" to indicate that the linked-to page acts as a tag for the current context.