Libraries begin UK digital harvesting project

April 5, 2013 | 10:49

Tags: #archiveorg #ebooks #internet #libraries #library #world-wide-web

Companies: #british-library #internet-archive #research

The British Library has announced a project to archive blogs, eBooks and all .uk domain content as a means of preserving digital data for future generations.

Thanks to new regulations which come into force on Saturday, the British Library has been given the legal go-ahead to start archiving electronic content in much the same way as it holds copies of every physical book, newspaper and magazine. ever officially published in the UK. In collaboration with the National Library of Scotland, the National Library of Wales, the Bodleian Libraries, Cambridge University Library and Trinity College Library Dublin, copies of all electronic publications made in the UK - including eBooks, blog posts, and potentially even content from UK-based social networking services - will be stored for generations to come.

Part of an extension to the 'legal deposit' regulations, the project aims to ensure that ephemera like website can be forever preserver for future research. 'Legal deposit arrangements remain vitally important,' claimed culture minister Ed Vaizey MP at the announcement. 'Preserving and maintaining a record of everything that has been published provides a priceless resource for the researchers of today and the future. So it’s right that these long-standing arrangements have now been brought up to date for the 21st century, covering the UK’s digital publications for the first time. The Joint Committee on Legal Deposit has worked very successfully in creating practical policies and processes so that digital content can now be effectively archived and our academic and literary heritage preserved, in whatever form it takes.'

The new regulations will allow the named libraries to receive and store copies of all electronic publications made either on-line or in CD-ROM, DVD-ROM or other 'offline' format, along with an initial archive of 4.8 million websites from the .uk top-level domain (TLD.) These materials will be made available for viewing at specialised reading rooms located at each of the universities - although only recognised researchers will be granted access at first, while the libraries involved work out the technical details of providing access to such a large corpus of digital data.

'Ten years ago, there was a very real danger of a black hole opening up and swallowing our digital heritage, with millions of web pages, e-publications and other non-print items falling through the cracks of a system that was devised primarily to capture ink and paper,' claimed Roly Keating, chief executive of the British Library, in support of the project. 'The regulations now coming into force make digital legal deposit a reality, and ensure that the Legal Deposit Libraries themselves are able to evolve – collecting, preserving and providing long-term access to the profusion of cultural and intellectual content appearing online or in other digital formats.'

The process of spidering and mirroring the chosen sites from the .uk TLD will take until the end of the year, the British Library has stated, after which the content will be made available to researchers.

While the move to storing digital content is a boon for British libraries recognised in the Legal Deposit Libraries Act, it's far from the first project to create a cache of digital date: The Internet Archive is a non-profit organisation which has been providing access to digital and digitised analogue content for many years, and operates the Wayback Machine for accessing stored website content from any available historical period.

More information on the content covered under legal deposit regulations can be found on the British Library website.
Discuss this in the forums
YouTube logo
MSI MPG Velox 100R Chassis Review

October 14 2021 | 15:04