Microdata (statistics)


In the study of survey and census data, microdata is information at the level of individual respondents. For instance, a national census might collect age, home address, educational level, employment status, and many other variables, recorded separately for every person who responds; this is microdata.

Advantages

Survey/census results are most commonly published as aggregates, both for privacy reasons and because of the large quantities of data involved; microdata for one census can easily contain millions of records, each with several dozen data items.
However, summarizing results to an aggregate level results in information loss. For instance, if statistics for education and employment are aggregated separately, they cannot be used to explore a relationship between these two variables. Access to microdata allows researchers much more freedom to investigate such interactions and perform detailed analysis.

Availability

For this reason, some statistical organizations allow access to microdata for research purposes. Controls are generally imposed to limit the risk that this data may be abused or lead to loss of privacy. For example, the Integrated Public Use Microdata Series requires researchers to implement security measures, avoid redistribution of microdata, use microdata only for noncommercial research/education purposes, and not make any attempt to identify the individuals recorded. Names and fine-level geographical data are removed, some data items are altered as necessary to make it impossible to identify individuals, and small ethnic categories are merged.
The International Household Survey Network has developed tools and guidelines to help interested statistical agencies improve their microdata management practices. The Microdata Management Toolkit is a DDI metadata editor which is now used in about 80 countries, with the support of the Accelerated Data Program, implemented by the PARIS21 Secretariat, the World Bank, and other partners, in the context of the Marrakech Action Plan for Statistics.