The Internet Archive saves nine million Wikipedia references

October 3, 2018 | 10:52

Tags: #archiving #iabot #mark-graham #maximilian-doerr #preservation #stephen-balbach #the-wayback-machine

Companies: #the-internet-archive #wikipedia

The Internet Archive, a not-for-profit project with no lesser aim than the digitisation of all the world's media, has announced a milestone achievement: the restoration of more than nine million since-deleted references on collaborative encyclopedia Wikipedia.

The Internet Archive has been making great strides in both preserving and providing access to otherwise unavailable media: From vintage computing and arcade games playable directly in the browser to the Wayback Machine, which automatically saves copies of websites and which recently expanded its crawling efforts, the site provides a home for content that would otherwise have been lost to the ages.

This week, the Internet Archive's Mark Graham has announced a milestone for the Wayback Machine: the restoration of reference content for more than nine million URLs used to provide citations on the Wikipedia collaborative encyclopedia project but since deleted or made unavailable - making the citations impossible to prove.

'For more than five years, the Internet Archive has been archiving nearly every URL referenced in close to 300 wikipedia sites as soon as those links are added or changed at the rate of about 20 million URLs/week,' Graham explains. 'And for the past three years, we have been running a software robot called IABot on 22 Wikipedia language editions looking for broken links (URLs that return a "404", or "Page Not Found"). When broken links are discovered, IABot searches for archives in the Wayback Machine and other web archives to replace them with. Restoring links ensures Wikipedia remains accurate and verifiable and thus meets one of Wikipedia’s three core content policies: "Verifiability."'

The effort is partly automated, partly manual: Graham explains that around six million broken links have been repaired by IABot, while a further three million have been fixed manually by Wikipedia community members. The result is impressive: Wikipedia's internal traffic figures for its primary English site show that the saved pages are receiving more than 25,000 clicks per day, putting the Internet Archive at the very top of the external site chart by quite some margin.

Graham has confirmed that the project will continue, archiving additional external resources and checking more Wikipedia sites, while also extending the link-checking and repairing efforts - led by Maximilian Doerr and Stephen Balbach - to additional media.

Discuss this in the forums
YouTube logo
MSI MPG Velox 100R Chassis Review

October 14 2021 | 15:04