Digital curation
Digital curation is the selection, preservation, maintenance, collection and archiving of digital assets.
Digital curation establishes, maintains and adds value to repositories of digital data for present and future use. This is often accomplished by archivists, librarians, scientists, historians, and scholars. Enterprises are starting to use digital curation to improve the quality of information and data within their operational and strategic processes. Successful digital curation will mitigate digital obsolescence, keeping the information accessible to users indefinitely. Digital curation includes digital asset management, data curation, digital preservation, and electronic records management.
Word History
Much like the word archive has layered meanings and uses, the word curation is both a noun and a verb used originally in the field of museology to represent a wide range of activities, most often associated with collection care, long-term preservation, and exhibition design. Curation can be a reference to physical repositories that store cultural heritage or natural resource collections or a representation of varied policies and processes involved with the long-term care and management of heritage collections, digital archives, and research data. Yet curation is also associated with short-term objectives and processes of selection and interpretation for the purposes of presentation, such as for gallery exhibitions and websites, which contribute to knowledge creation. It has also been applied to interaction with social media including compiling digital images, web links and movie files.The term curation entered the legal framework through federal historic preservation laws, starting with the National Historic Preservation Act of 1966, and was further defined and coded into federal regulations through 36 CFR Part 79: Curation of Federally-owned and Administered Archaeological Collections. Curation has since permeated into an array of disciplines but remains closely tied to heritage and information management.
Core Principles and Activities
The term “digital curation” was first used in the e-science and biological science fields as a means of differentiating the additional suite of activities ordinarily employed by library and museum curators to add value to their collections and enable its reuse from the smaller subtask of simply preserving the data, a significantly more concise archival task. Additionally, the historical understanding of the term “curator” demands more than simple care of the collection. A curator is expected to command academic mastery of the subject matter as a requisite part of appraisal and selection of assets and any subsequent adding of value to the collection through application of metadata.Principles
There are five commonly accepted principles that govern the occupation of digital curation:- Manage the complete birth-to-retirement life cycle of the digital asset.
- Evaluate and cull assets for inclusion in the collection.
- Apply preservation methods to strengthen the asset’s integrity and reusability for future users.
- Act proactively throughout the asset life cycle to add value to both the digital asset and the collection.
- Facilitate the appropriate degree of access to users.
Methodology
Sequential Actions:
- Conceptualize: Consider what digital material you will be creating and develop storage options. Take into account websites, publications, email, among other types of digital output.
- Create: Produce digital material and attach all relevant metadata, typically the more metadata the more accessible the information.
- Appraise and select: Consult the mission statement of the institution or private collection and determine what digital data is relevant. There may also be legal guidelines in place that will guide the decision process for a particular collection.
- Ingest: Send digital material to the predetermined storage solution. This may be an archive, repository or other facility.
- Preservation action: Employ measures to maintain the integrity of the digital material.
- Store: Secure data within the predetermined storage facility.
- Access, use and reuse: Determine the level of accessibility for the range of digital material created. Some material may be accessible only by password and other material may be freely accessible to the public. Routinely check that material is still accessible for the intended audience and that the material has not been compromised through multiple uses.
- Transform: If desirable or necessary the material may be transferred into a different digital format.
- Dispose: Discard any digital material that is not deemed necessary to the institution.
- Reappraise: Reevaluate material to ensure that is it still relevant and is true to its original form.
- Migrate:Migrate data to another format in order to protect data for using better in the future.
Related terms
Data curation is another term that is often used interchangeably with digital curation, however common usage of the two terms differs. While “data” is a more all-encompassing term that can be used generally to indicate anything recorded in binary form, the term “data curation” is most common in scientific parlance and usually refers to accumulating and managing information relative to the process of research. Data-driven research of education request the role of information professional gradually develop tradition of digital service to data curation particularly at the management of digital research data. So, while documents and other discrete digital assets are technically a subset of the broader concept of data, in the context of scientific vernacular digital curation represents a broader purview of responsibilities than data curation due to its interest in preserving and adding value to digital assets of any kind.
Challenges
Rate of creation of new data and data sets
The ever lowering cost, and increasing prevalence of entirely new categories of technology has led to a quickly growing flow of new data sets. These come from well established sources such as business and government, but the trend is also driven by new styles of sensors becoming embedded in more areas of modern life. This is particularly true of consumers, whose production of digital assets is no longer relegated strictly to work. Consumers now create wider ranges of digital assets, including videos, photos, location data, purchases, and fitness tracking data, just to name a few, and share them in wider ranges of social platforms.Additionally, the advance of technology has introduced new ways of working with data. Some examples of this are international partnerships that leverage astronomical data to create “virtual observatories”, and similar partnerships have also leveraged data resulting from research at the Large Hadron Collider at CERN and the database of protein structures at the Protein Data Bank.
Storage format evolution and obsolescence
By comparison, archiving of analog assets is notably passive in nature, often limited to simply ensuring a suitable storage environment. Digital preservation requires a more proactive approach. Today’s artifacts of cultural significance are notably transient in nature and prone to obsolescence when social trends or dependent technologies change. This rapid progression of technology occasionally makes it necessary to migrate digital asset holdings from one file format to another in order to mitigate the dangers of hardware and software obsolescence which would render the asset unusable.Underestimation of human labor costs
Modern tools for program planning often underestimate the amount of human labor costs required for adequate digital curation of large collections. As a result cost-benefit assessments often paint an inaccurate picture of both the amount of work involved, and the true cost to the institution for both successful outcomes and failures.The concept of cost in business field would be more obvious. Varieties of business systems are running for daily operations. For example, human resources systems deal with recruitment and payroll, communication systems manage internal and external email, and administration systems handle finance, marketing and other aspects. However, business systems in institutions are not designed for long-term information preservation initially. In some instances, business systems are revised to become Digital Curation systems for preserving transaction information due to cost consideration. The example of business systems are Enterprise Content Management applications, which are used by designated group people such as business executives, customers for information management that support key processes organizationally. In the long run, to transfer digital content from ECM applications to Digital Curation applications would be a trend in large organizations domestically or internationally. The improvement of maturity models of ECM and DC may add value to information that request cost deduction and extensive use for further modification.
Standardization and coordination between institutions
An absence of coordination across different sectors of society and industry in areas such as the standardization of semantic and ontological definitions, and in forming partnerships for proper stewardship of assets has resulted in a lack of interoperability between institutions, and a partial breakdown in digital curation practice from the standpoint of the ordinary user. The example of coordination is Open Archival Information System.OAIS Reference Model allows professionals and many other organizations and individuals to contribute efforts to the OAIS open forums for developing international standards of archival information in long-term access.
Digitization of analog materials
The curation of digital objects is not limited to strictly born-digital assets. Many institutions have engaged in monumental efforts to digitize analog holdings in an effort to increase access to their collections. Examples of these materials are books, photographs, maps, audio recordings, and more. The process of converting printed resources into digital collections has been epitomized to some degree by librarians and related specialists. For example, The Digital Curation Centre is claimed to be a "world leading centre of expertise in digital information curation" that assists higher education research institutions in such conversions.New representational formats
For some topics, knowledge is embodied in forms that have not been conducive to print, such as how choreography of dance or of the motion of skilled workers or artisans is difficult to encode. New digital approaches such as 3D holograms and other computer-programmed expressions are developing.For mathematics, it seems possible for a new common language to be developed that would express mathematical ideas in ways that can be digitally stored, linked, and made accessible. The Global Digital Mathematics Library is a project to define and develop such a language.
Accessibility
The ability of the intended user community to access the repository’s holdings is of equal importance to all the preceding curatorial tasks. This must take into account not only the user community’s format and communication preferences, but also a consideration of communities that should not have access for various legal or privacy reasons.Access can be increased by providing information about open access status with open data and open source methods such as the OAI-PMH endpoints of an open archive, which are then aggregated by databases and search engines like BASE, CORE and Unpaywall for academic papers.
Responses to challenges
- Specialized research institutions
- Academic courses
- Dedicated symposia
- Peer reviewed technical and industry journals
Approaches
Many approaches to digital curation exist, and have evolved over time in response to the changing technological landscape. Two examples of this are sheer curation and channelization.Sheer curation is an approach to digital curation where curation activities are quietly integrated into the normal work flow of those creating and managing data and other digital assets. The word sheer is used to emphasize the lightweight and virtually transparent nature of these curation activities. The term sheer curation was coined by Alistair Miles in the ImageStore project, and the UK Digital Curation Centre's SCARP project. The approach depends on curators having close contact or 'immersion' in data creators' working practices. An example is the case study of a neuroimaging research group by Whyte et al., which explored ways of building its digital curation capacity around the apprenticeship style of learning of neuroimaging researchers, through which they share access to datasets and re-use experimental procedures.
Sheer curation depends on the hypothesis that good data and digital asset management at the point of creation and primary use is also good practice in preparation for sharing, publication and/or long-term preservation of these assets. Therefore, sheer curation attempts to identify and promote tools and good practices in local data and digital asset management in specific domains, where those tools and practices add immediate value to the creators and primary users of those assets. Curation can best be supported by identifying existing practices of sharing, stewardship and re-use that add value, and augmenting them in ways that both have short-term benefits, and in the longer term reduce risks to digital assets or provide new opportunities to sustain their long-term accessibility and re-use value.
The aim of sheer curation is to establish a solid foundation for other curation activities which may not directly benefit the creators and primary users of digital assets, especially those required to ensure long-term preservation. By providing this foundation, further curation activities may be carried out by specialists at appropriate institutional and organisation levels, whilst causing the minimum of interference to others.
A similar idea is curation at source used in the context of Laboratory Information Management Systems LIMS. This refers more specifically to automatic recording of metadata or information about data at the point of capture, and has been developed to apply semantic web techniques to integrate laboratory instrumentation and documentation systems. Sheer curation and curation-at-source can be contrasted with post hoc digital preservation, where a project is initiated to preserve a collection of digital assets that have already been created and are beyond the period of their primary use.
Channelization is curation of digital assets on the web, often by brands and media companies, into continuous flows of content, turning the user experience from a lean-forward interactive medium, to a lean-back passive medium. The curation of content can be done by an independent third party, that selects media from any number of on-demand outlets from across the globe and adds them to a playlist to offer a digital "channel" dedicated to certain subjects, themes, or interests so that the end user would see and/or hear a continuous stream of content.