2006

Archiving & Preserving the Web

Kristine Hanna was the main speaker for this session, and Linda Freuh also contributed. Both are from the Internet Archive. The session opened with a brief outline of the history of the Internet Archive. They were founded in 1996, and are a non profit organization dedicated to, well, archiving the Internet. They crawl two billion pages a month, plus other media files like audio clips. These snapshots are then stored and made available online. Currently the archive holds 55 billion pages from 55 million sites! To put this in perspective, Kristine estimated that if printed out the pages would reach to the moon and back 19 times. IA makes no distinction between what should be archived and what shouldn’t – the web is so ephermeral that they’re focused on just grabbing the data for now. All software used in the process is open source and developed from partnerships between IA…