EIDR


EIDR, or the Entertainment Identifier Registry, is a global unique identifier system for a broad array of audio visual objects, including motion pictures, television, and radio programs. The identification system resolves an identifier to a metadata record that is associated with top-level titles, edits, DVDs, encodings, clips, and mash-ups. EIDR also provides identifiers for Video Service providers, such as broadcast and cable networks.
As of June, 2020, EIDR contains over 2 million records, including almost 400 thousand movies, and almost one million episodes from over 40K TV series.
EIDR is an implementation of a Digital Object Identifier.

History

Media asset identification systems have existed for decades. The common motivation for their creation is to enable the management of media assets through the assignment of a unique id to a set of metadata representing salient characteristics of each asset. Over time such systems tend to proliferate, with each arising to deal with a specific set of issues. As a result, there is considerable variation between systems in terms of which assets are categorized, which metadata is associated with each asset, and the very definition of an asset. To name a few examples, should a “director’s cut” of a movie be distinct from the original theatrical release? How should regional variations be accounted for? Further complications include the procedures for adding new assets, editing existing assets, and creating derivative assets.
EIDR was created to address these issues, as well as others encountered in video asset workflows, both in a B2B context and the intramural post-production activities of content producers. EIDR has the following characteristics:
EIDR is intended to supplement —not replace— existing asset identification systems. To the contrary, a key feature is to allow an EIDR record to include references to that asset’s ID under other systems. This feature is particularly useful for film and television archives, making it easy for them to cross-reference their holdings with other sources for the work and metadata about it. By design, EIDR does not replicate features of other asset ID systems, e.g. commercial systems that seek to add value through enhanced metadata. It is also a non-goal to track ownership and rights information, which can, however, be implemented as applications that use the EIDR ID.

Content Model

EIDR is built on a collection of records that are stored in a central registry. These records are referenced externally by DOIs, which are assigned when a record is created, and each identifier is immutable thereafter. The identifier resolution system underlying DOIs is the Handle System and so each native EIDR Content ID is a handle formatted, in increasing specificity, to handle, DOI and EIDR standards.

Content ID Format

The canonical form of an EIDR Content ID is an instance of a Handle and has the format:
where
There is also a 96-bit Compact Binary form that is intended for embedding in small payloads such as watermarks. This form is generated from the canonical format as follows:
The URN form for an EIDR ID is specified in IETF RFC 7302.
For use on the web an EIDR content ID can be represented as a URI in one of these forms:
There are four types of content records, each associated with a reserved prefix:
The sub-prefixes 5237, 5238, 5239, and 5240 are all assigned to the EIDR Association.

Content Records

Content records are objects categorized by their types and relationships. Each has three different kinds of type:
The following fields comprise the base object data of a content record:
An EIDR ID must be always resolvable, thus under normal circumstances the corresponding Content Record will be permanent. There are two mechanisms available to deal with errors or other unusual circumstances. The preferred one is aliasing, whereby an EIDR ID is transparently redirected to another content record. Aliasing is commonly employed to deal with an asset being registered twice.
The other mechanism is the use of tombstone records. This is employed when the Content Record is corrupted, or an otherwise invalid asset was accidentally registered. In this case the ID will be aliased to a special tombstone record. The tombstone can be recognized by applications because its EIDR ID field will be set to the distinguished value “”. Note that “X” means the 24th letter of the Latin alphabet.

Alternate ID

Having a rich set of Alternate IDs for content is one of the primary goals of EIDR. This allows EIDR IDs to be used everywhere in content workflows; if an alternate ID is needed it can be found in the metadata for the EIDR ID. EIDR supports the inclusion both proprietary and other standard ID references. Additional Alternate IDs can be added when needed. Below is an example of alternate IDs for the EIDR asset . If an Alternate ID is resolvable algorithmically, for example by placing it appropriately in a template URL, EIDR makes that link available.
Alternate ID
Alternate IDType: ISAN
Alternate ID #2
Alternate ID #2Type: IVA
Alternate ID #3
Alternate ID #3Type: Proprietary Domain: amazon.com
Alternate ID #4
Alternate ID #4Type: Proprietary Domain: flixster.com
Alternate ID #515042
Alternate ID #5Type: Proprietary Domain: thecinemasource.com
Alternate ID #6
Alternate ID #6Type: IMDB Relation: IsSameAs
Alternate ID #7E0087486000
Alternate ID #7Type: Proprietary Domain: spe.sony.com/MPM
Alternate ID #83929
Alternate ID #8Type: Proprietary Domain: spe.sony.com/ProductID
Alternate ID #92002029
Alternate ID #9Type: Proprietary Domain: warnerbros.com/MPM
Alternate ID #10389785
Alternate ID #10Type: Proprietary Domain veronicamagazine.nl
Alternate ID #11
Alternate ID #11Type: Proprietary Domain: amazon.com
Alternate ID #12
Alternate ID #12Type: Proprietary Domain: bfi.org.uk

Alternate IDs are partitioned into non-proprietary and proprietary. The former have distinguished, predefined types, whereas proprietary IDs are all of type “Proprietary”, and are further distinguished by an associated DNS domain. As of July, 2017, there are over 2 million Alternate IDs directly available through EIDR.

Relationships Between Objects

Content objects can be related to each other according to the following table. These relations are expressed as additional fields in the content record and are thus relative to that object. Note that the subject object is the child and the target is the parent. Additional constraints are noted in the table.

Use in Standards & Applications

Use in Standards & Applications
EIDR has been incorporated into many standards. A few of the more significant ones are listed here:
EIDR identifiers have found their way into an increasing number of commercial applications. The following are illustrative of some of the advantages of using EIDR:
EIDR is administered by the non-profit EIDR Association, which was founded in October 2010 by MovieLabs, CableLabs, Comcast and Rovi. Membership has grown steadily since then: as of late-2014 it has 79 members divided between the Industry Promoters and Industry Contributor levels. The fastest growing category is non-US companies, which now accounts for about 20% of membership.
The EIDR Association operates two EIDR registries: Production and Sandbox. The former is the official site, and the latter is reserved for test and development. Both systems are available publicly online, but the contents of the sandbox are not guaranteed to be correct, complete, or even to refer to assets that exist. Only members of the EIDR association may modify the registry.

Registration

Registration of new assets can be done individually or in bulk. In either case, the workflow comprises a combination of automated and manual processes. It is also iterative, as the initial matching process may identify a variety of gaps and errors that need to be dealt with.
Registering new assets is a complex process that requires some preparation, particularly in the case of bulk submission. The automated processes will check syntax, make sure that the basic metadata is supplied, and that any dependencies are honored. Manual steps include making sure the correct Parties are associated with the asset. One of the most important steps is ensuring that a new asset does not already exist in the registry: this is covered in the next section.
In order to register a new asset a user must be associated with a party that has been granted the “Registrant” role by the EIDR operator. A registrant may be a principal agent, such as a studio or an encoding house, but it may also be a Party doing bulk registration of back‐catalogue items, or a Party acting on behalf of someone else. It is also a requirement that a registrant be an EIDR member. In general, content ownership, metadata authority, and registration capability are separate and unrelated concepts.

Deduplication

This refers to flagging assets being submitted to the registry as falling into one of the following three categories:
This assessment is based on applying a set of rules to the candidate asset, which results a numerical score. Bucketing occurs as the result of comparing the score to two thresholds:
Assets falling between the low and high threshold are deemed to have a high possibility of being a duplicate: the proposed record addition/modification will not proceed until manually reviewed by EIDR operations staff.

Architecture

The components of the EIDR system are shown below.
The principal functional blocks are as follows:
An EIDR ID is a specialized example of a Digital Object Identifier, which in turn is built on top of the Handle System developed by the Corporation for National Research Initiatives. The EIDR-specific aspects of the lower layers are described in more detail below.

Digital Object Identifier (EIDR Aspects)

A Digital Object Identifier, standardized as ISO 26324, seeks to uniquely identify a wide range of digital artifacts including books, recordings, research data, and other digital content. The goal is not just for the IDs to be unique, but persistent and immutable. As opposed to URLs, DOI identifiers stay the same even if the objects move to another location, or become owned by another organization. Here are some of the characteristics of DOI:
The DOI data model provides the means to associate metadata with each object, as well as policies governing its use. In the words of the DOI Handbook, metadata may include “names, identifiers, descriptions, types, classifications, locations, times, measurements, relationships and any other kind of information related to .” Metadata flows between the following entities:
To foster interoperability between RAs, DOI has the concept of a metadata Kernel. This is a core set of metadata that all objects stored within the DOI framework should have. The full set may be found in the DOI handbook. Interoperability is a large topic extending beyond the scope of EIDR, but the following subset is particularly relevant to EIDR assets:
EIDR metadata is available in standard DOI kernel metadata format as well as EIDR-specific formats. The DOI for the DOI metadata schema is .

Handle System (EIDR Aspects)

DOI is in turn implemented on top of the Handle System, a distributed, highly scalable, name resolution service. A handle is defined as:
The Naming Authority is globally unique and defines both an administrative space and the syntax of the Handle Local Name. For EIDR in the definition above, the “10.5240” is the EIDR Naming Authority, and is responsible for resolving the suffix. The range of allowable Naming Authorities is more general than is employed by DOI.
The distributed nature of the Handle System allows each local namespace to be hosted on multiple geographically distributed service sites. This is a federated model where each local name space has complete control over the placement and operation of its service sites. Furthermore, each service site may contain multiple resolution servers: requests directed to a particular service site will be dispatched evenly across its constituent servers.
The data model of the Handle System is simple but flexible. An arbitrary number of values may be associated with each handle. Over time, these values may be created, modified, and destroyed. Each such datum has the following attributes:
Accessing the Handle System is done via a wire protocol defined in RFC 3652; EIDR applications don’t have to be concerned with this because of the layering of protocols.