Data journalism


Data journalism is "a way of enhancing reporting and news writing with the use and examination of statistics in order to provide a deeper insight into a news story and to highlight relevant data. One trend in the digital era of journalism has been to disseminate information to the public via interactive online content through data visualization tools such as tables, graphs, maps, infographics, microsites, and visual worlds. The in-depth examination of such data sets can lead to more concrete results and observations regarding timely topics of interest. In addition, data journalism may reveal hidden issues that seemingly were not a priority in the news coverage". Data journalism is a type of journalism reflecting the increased role that numerical data is used in the production and distribution of information in the digital era. It reflects the increased interaction between content producers and several other fields such as design, computer science and statistics. From the point of view of journalists, it represents "an overlapping set of competencies drawn from disparate fields".
Data journalism has been widely used to unite several concepts and link them to journalism. Some see these as levels or stages leading from the simpler to the more complex uses of new technologies in the journalistic process.
Designers are not always part of the process. According to author and data journalism trainer Henk van Ess, "Data journalism can be based on any data that has to be processed first with tools before a relevant story is possible. It doesn't include visualization per se."
However, one of the problems for defining data journalism is that many definitions are not clear enough and focus on describing the computational methods of optimization, analysis, and visualization of information.

Areas covered

  1. Cybercrime reporting.
  2. Computer assisted reporting and data-driven journalism, where journalists make use of large databases to produce stories.
  3. Infographics.
  4. Data visualization.
  5. Interactive visualization.
  6. Serious games, in the sense that they take interaction a step further.
  7. Database journalism or structured journalism, an information management system where pieces of information are organized in a database.

    Emergence as a concept

One of the earliest examples of using computers with journalism dates back to a 1952 endeavor by CBS to use a mainframe computer to predict the outcome of the presidential election, but it wasn't until 1967 that using computers for data analysis began to be more widely adopted.
Working for the Detroit Free Press at the time, Philip Meyer used a mainframe to improve reporting on the riots spreading throughout the city. With a new precedent set for data analysis in journalism, Meyer collaborated with Donald Barlett and James Steele to look at patterns with conviction sentencings in Philadelphia during the 1970s. Meyer later wrote a book titled Precision Journalism that advocated the use of these techniques for combining data analysis into journalism.
Toward the end of the 1980s, significant events began to occur that helped to formally organize the field of computer assisted reporting. Investigative reporter Bill Dedman of The Atlanta Journal-Constitution won a Pulitzer Prize in 1989 for The Color of Money, his 1988 series of stories using CAR techniques to analyze racial discrimination by banks and other mortgage lenders in middle-income black neighborhoods. The National Institute for Computer Assisted Reporting was formed at the Missouri School of Journalism in collaboration with the Investigative Reporters and Editors. The first conference dedicated to CAR was organized by NICAR in conjunction with James Brown at Indiana University and held in 1990. The NICAR conferences have been held annually since and is now the single largest gathering of data journalists.
Although data journalism has been used informally by practitioners of computer-assisted reporting for decades, the first recorded use by a major news organization is The Guardian, which launched its Datablog in March 2009. And although the paternity of the term is disputed, it is widely used since Wikileaks' Afghan War documents leak in July, 2010.
The Guardian coverage of the war logs took advantage of free data visualization tools such as Google Fusion Tables, another common aspect of data journalism. Facts are Sacred by The Guardian Datablog editor Simon Rogers describes data journalism like this: