Monday 12 October 2009

Archiving the web: a record of history

Anyone visiting the Amazon website last week would have been struck by a letter on the home page from Jeff Bezos informing UK shoppers that they could now purchase Amazon's e-reader, the Kindle, for shipping to the UK.

Cue a flurry of commentary about the death of books and the demise of culture.

Inevitably we are moving towards a world of digital content. Blu-Ray may well be the last ever 'physical' entertainment media format. But as we move away from information being stored on physical formats (CDs/DVDs/paper/books) to data stored in the internet how, in the future, are we going to look back and understand the state of knowledge at a particular time in history?

The written word, stored on parchment and paper and filling libraries and archives has always provided historians with an evolving and largely permanent record of human history. Digital content, in contrast, is more fluid and more concerned with the state of information at the present time. New content trumps old content. Older information on the web is largely constrained to out of date blogs and websites rather than a systematic approach to archiving.

To put this into context the journal Science has found that 13% of Internet references in scholarly articles were inactive after only 27 months.

However, since 1996 a non-profit organisation called the Internet Archive has been archiving digital content with the goal of building an 'internet library'. It has archived over 150 billion web pages and hundreds of thousands of moving images, live music files, audio and document texts. Its "Way Back Machine" is a useful tool that provides a snapshot of selected major sites over the years. Here is Apple.com back in in 1997.

The Internet Archive has also partnered with eleven National Libraries (Australia, Canada, Denmark, Finland, France, Iceland, Italy, Norway, Sweden, The British Library & The US Library of Congress) to create the The International Internet Preservation Consortium (IIPC). The mission of the IIPC is to acquire, preserve and make accessible knowledge and information from the Internet for future generations.

However, no single project can ever hope to archive the entire web. The approach to preserving digital content may not be of major concern today because the web is simply too new. However as the internet becomes the de facto reference point for all human knowledge: it will become of critical importance.

No comments: