Where I discover the WARC format to archive websites.
Created: Nov 08, 2017 by Pradeep Gowda
On HN, I read
about the warc format.
WARC, standardized as ISO 28500:2009, Information and documentation –
WARC file format. Developed under the auspices of the International
Internet Preservation Consortium. WARC was developed as an extension to
ARC in part to provide better capabilities for managing Web archives for
the long term, allowing for capture of more metadata about the
circumstances of archiving. WARC files are often compressed using gzip,
resulting in a .warc.gz extension.
I have always thought of publishing my website as an archive for
posteirity. But, wondered what would be a good way to achieve that. The
questions I have were around - where can I store this archive (can I
send it to archive.org?), why would they consider storing it? Maybe I
should make a contribution to help them with costs etc.,
Looking into the links on that wikipedia page seems like a good
Both the above tools (in Python) are a good candidate to be rewritten
in somehting like D or Rust as a nice programming exercise!