Samuel Goebert


The Internet is changing from a web of documents to a web of data. Open-data collections like Wikipedia, Internet Archive, Stack Exchange and OpenStreetMap have become important sources of global knowledge. The data are freely available and everybody is invited to contribute. Preserving digital collections is also performed by entities not affiliated with the original initiatives and involves copying content and meta-data about a collection to new storage locations. A copied collection retrieved from an untrusted location requires the task of revalidating the authenticity of the data. Since budgets are limited, novel ways of finding storage space have to be acquired. Safely storing and validating data without the need to own and control the storage location enables this. This thesis develops a protocol for decentralised hosting of digital collections. The result is a formalised, decentral mode of discovery, curation and hosting for datasets, retaining authenticity even at untrusted storage locations. Donating storage space and bandwidth becomes possible for entities not affiliated with the original initiative and ensures long-term access to the authentic collection for the public at the same time. The protocol is leveraging the bittorrent protocol, a variation of the block chain protocol and is backwards compatible with existing web application architecture. This novel approach is validated through a proof-of-concept prototype. A series of test scenarios is used to illustrate how a decentralised collection would behave given multiple participants. The results support the use of a decentralised hosting approach for digital collections leveraging storage locations that are under the supervision of entities not affiliated with the original initiative. The thesis concludes with a detailed summary of the contributions to the field and suggests further areas of study in the context of distributed-preservation.

Document Type


Publication Date