82

I am trying to download a full website directory using CURL. The following command does not work:

curl -LO http://example.com/

It returns an error: curl: Remote file name has no length!.

But when I do this: curl -LO http://example.com/someFile.type it works. Any idea how to download all files in the specified directory? Thanks.

Oliver Salzburg
  • 89,072
  • 65
  • 269
  • 311
Foo
  • 821

8 Answers8

105

Always works for me, included no parent and recursive to only get the desired directory.

 wget --no-parent -r http://WEBSITE.com/DIRECTORY
StanleyZheng
  • 1,151
36

HTTP doesn't really have a notion of directories. The slashes other than the first three (http://example.com/) do not have any special meaning except with respect to .. in relative URLs. So unless the server follows a particular format, there's no way to “download all files in the specified directory”.

If you want to download the whole site, your best bet is to traverse all the links in the main page recursively. Curl can't do it, but wget can. This will work if the website is not too dynamic (in particular, wget won't see links that are constructed by Javascript code). Start with wget -r http://example.com/, and look under “Recursive Retrieval Options” and “Recursive Accept/Reject Options” in the wget manual for more relevant options (recursion depth, exclusion lists, etc).

If the website tries to block automated downloads, you may need to change the user agent string (-U Mozilla), and to ignore robots.txt (create an empty file example.com/robots.txt and use the -nc option so that wget doesn't try to download it from the server).

25

In this case, curl is NOT the best tool. You can use wget with the -r argument, like this:

wget -r http://example.com/ 

This is the most basic form, and and you can use additional arguments as well. For more information, see the manpage (man wget).

Canadian Luke
  • 24,640
moroccan
  • 251
8

This isn't possible. There is no standard, generally implemented, way for a web server to return the contents of a directory to you. Most servers do generate an HTML index of a directory, if configured to do so, but this output isn't standard, nor guaranteed by any means. You could parse this HTML, but keep in mind that the format will change from server to server, and won't always be enabled.

Brad
  • 6,629
5

lftp -c mirror <url>

Obviously, you need to install lftp first.

HappyFace
  • 1,389
4

When you're downloading from a directory listing add one more argument to wget called reject.

wget --no-parent -r --reject "index.html*" "http://url"
LAamanni
  • 161
3

You can use the Firefox extension DownThemAll! It will let you download all the files in a directory in one click. It is also customizable and you can specify what file types to download. This is the easiest way I have found.

Asdf
  • 31
1

You might find a use for a website ripper here, this will download everything and modify the contents/internal links for local use. A good one can be found here: http://www.httrack.com