I want to archive an external website in Wayback Machine. You can do that by uploading a Google Sheet with links to all pages you want to archive. How can I, either using a Mac or some webservice (I assume), crawl a site and extract the links to a text file?
I have googled, looked at Github for tools etc and what I have found mostly fall into two categories:
- Extract all links from one single web page.
- Spider and then download a whole site, including all html, images etc etc.
Of course, you could patch something together from these components but I think there should exist ready made solutions that are much more robust than what I can hack together within a reasonable time frame.