How to use wget to grab copy of Google Code site documents?

Question

I have a Google Code project which has a lot of wiki'ed documentation. I would like to create a copy of this documentation for offline browsing. I would like to use wget or a similar utility.

I have tried the following:

$ wget --no-parent \
       --recursive \
       --page-requisites \
       --html-extension \
       --base="http://code.google.com/p/myProject/" \
       "http://code.google.com/p/myProject/"

The problem is that links from within the mirrored copy have links like:

file:///p/myProject/documentName

This renaming of links in this way causes 404 (not found) errors, since the links point to nowhere valid on the filesystem.

What options should I use instead with wget, so that I can make a local copy of the site's documentation and other pages?

score 2 · Answer 1 · edited Oct 28 '12 at 02:48

2

If the URL looks like:

https://code.google.com/p/projectName/downloads/detail?name=yourFILE.tar.gz

Turn it into:

$wget https://projectName.googlecode.com/files/yourFILE.tar.gz

This works fine for me.

edited Oct 28 '12 at 02:48

jonsca

4,084

answered Oct 28 '12 at 02:25

vag

21

score 0 · Answer 2 · answered Mar 27 '12 at 20:04

After lots of playing around, I managed to get the following to work for me:

$ wget --no-parent \
       --recursive \
       --page-requisites \
       --html-extension \
       --convert-links \ 
       -E -l 3 \
       http://code.google.com/p/myProject/

The result is entirely self-contained, now.

How to use wget to grab copy of Google Code site documents?

2 Answers2