
The Internet Archive utilizes the PetaBox system for storing and preserving vast amounts of data, specifically designed to handle petabytes of information. A PetaBox can store and process one petabyte (one million gigabytes) of data and operates with low power consumption, typically around 6 kW per rack and 60 kW for the entire storage cluster. The PetaBox design is shipping container-friendly, allowing it to function within a 20' by 8' by 8' shipping container[1].
The first operational PetaBox was established in Amsterdam in June 2004, and by 2007, the Internet Archive's data center housed approximately three petabytes of PetaBox technology. The configuration allows for significant storage capacity, with later versions enabling up to 480 TB of raw storage per PetaBox[1].
As of December 2021, the Internet Archive's storage system consists of four data centers featuring 745 nodes and approximately 28,000 spinning disks. The Wayback Machine, a service provided by the Internet Archive, contains over 57 petabytes of information, with additional collections amounting to 42 petabytes, totaling 99 petabytes of unique data. Notably, content is backed up in multiple locations, bringing the total data storage to 212 petabytes[1].
Data is stored on a large cluster of Linux nodes and is regularly updated through web crawling techniques to ensure the preservation of internet content over time. The Internet Archive's commitment to 'universal access to all knowledge' drives its efforts in archiving historical web pages and digital content efficiently[4].
Get more accurate answers with Super Pandi, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.
Let's look at alternatives: