Australian Web Archive

The Australian Web Archive is an publicly available online database of archived Australian websites, hosted by the National Library of Australia on its Trove platform, an online library database aggregator. It comprises the NLA's own PANDORA archive, the Australian Government Web Archive and the National Library of Australia's ".au" domain collections. Access is through a single interface in Trove, which is publicly available. The Australian Web Archive was created in March 2019, and is one of the biggest web archives in the world. Its purpose is to provide a resource for historians and researchers, now and into the future.

History of the three components

The PANDORA service started archiving websites in October 1996.
In 2005, the NLA started archiving annual snapshots of the entire Australian web domain, collected via large crawl harvests. Later, the earliest websites from the.au web domain, dating back to 1996, were obtained from the Internet Archive. In 2019 this content was first made publicly accessible through Trove.
The PANDORA infrastructure, which works well for a selective small scale archiving, does not adapt to large scale "bulk harvesting" of web content, so a new technical system had to be developed whereby a web archiving service which would integrate the delivery of archived websites within a live website interface delivering the archived websites seamlessly to the user, which is difficult to achieve technically.

AGWA

websites are Commonwealth records, and are therefore publications to be managed in accordance with the Archives Act 1983.
The Australian Government Web Archive consists of bulk archiving of Commonwealth Government websites. The NLA began regular harvests of the websites in June 2011, after a significant obstacle had been overcome with an administrative agreement made in May 2010 allowing the NLA to collect, preserve and make accessible government websites without having to seek prior permission for each website or document, as was the case before that. The service uses the Heritrix web crawler for harvesting, WARC files for storage and Open Wayback for delivery of the service. There is a huge amount of publishing by the government, but many challenges to overcome trying to preserve content, such as its sudden disappearance. In March 2014, the AGWA was made publicly accessible.
The AGWA meets the preservation and retention requirements for websites as "retain as national archives" material under the Archives Act; however videos and document files are not always captured, so must be managed separately.
As of early 2015, the AGWA includes content, from 2005, amounted to about 144 million files, occupying 15 terabytes. It only included Commonwealth Government websites which are collected through bulk harvests of nearly 1000 seed URLs. The scheduling of the harvests is not routinely established yet but harvests are currently being conducted roughly three times per year.

Amalgamation

In 2017, the AGWA and the PANDORA archive were amalgamated with the other web archive collections, to form the Trove web archive collection. After further development and the creation of the Australia Web Archive, government websites archived via AGWA and now included in AWA can still be searched separately using the "Advanced Search" option.

Description of AWA

A web archive is described by the NLA as a "collection of snapshots of websites captured while they are accessible on the web, and then preserved in a static copy". The collection archived in the AWA is "relevant to the cultural, social, political, research and commercial life and activities of Australia and Australians". It collects web material via both scheduled archiving of selected websites and publications as well as some ad hoc harvesting relating to significant events.
As of March 2019, when it began, AWA already contained around 600 terabytes of data, with 9 billion records. It contains more functionality than the Wayback Machine, hosted by the Internet Archive, allowing full-text searching using a search engine built in-house. The developers also devised techniques to filter out unwanted "noise". The data remains on the Library servers, although a move to the cloud is envisaged in the future, as content grows. Usability by a wide range of users, and in particular the search functionality, were major focuses during development.
The archive is fully searchable, based on a combination of techniques used by the developers. Each team created a unique and complex search algorithm, by adapting a version of Google’s page ranking algorithm, modified to lead to better, high-quality resources. Other technologies include a Bayesian filter, a Not Safe For Work classifier from Yahoo, and machine learning.
There is a "Limit to the gov.au web domain" option before searching, and government websites archived via AGWA can still be searched separately using the "Advanced Search" option. Other options in Advanced Search are to limit by timespan of the snapshots, domain and file type.
With many of the earlier websites from the 1990s now lost, mainly because of the frequent change of web platforms, the Australian Web Archive is a significant initiative that will help to save current and future web pages, especially Australian content. Material will continue to be added to the Archive, and other online material collected in accordance with the National Library Act 1960, the legal deposit provisions of the Copyright Act 1968 and the NLA's digital collections selection policy.

Asia/Pacific websites

Websites in the Asia Pacific region are not included in the AWA, but NLA partners with the Internet Archive to collect and preserve "selected Asia/Pacific websites related to specific events or socio-political groups".

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...