Download websites with WebMirror
View the source code here: https://github.com/StephenHapp/WebMirror
This tool lets you download a copy of a website or any portion
of a website. Specify which web pages to start from and rules
for which links to follow, and WebMirror will download each
page it discovers, complete with images, videos, audio and
other media. These pages are then modified to link to each
other locally, allowing for easy offline viewing and sharing.
Downloaded sites can then also serve as an archive of how
the website once appeared.
This project started from a narrower project: I wanted to
download a portion of Wikipedia. Apparently there are already
archives of just the text, but I think the images and media are
important enough to be included in an archive too. I tried
using HTTrack
for this, but ultimately decided to create my own tool that could give
me the precision I wanted.