0% found this document useful (0 votes)
264 views7 pages

How To Download HTTP Directory With All Files and Sub-Directories As They Appear On The Online Files - Folders List - Stack Overflow

The document discusses downloading an entire directory structure and files from an HTTP directory via wget. It provides a wget command that uses options like -r, -np, -nH, --cut-dirs=3, and -R index.html to recursively download all files and subdirectories without depth limit while excluding index.html files and saving the files in the correct subdirectory structure. Answers provide explanations of the wget options used and additional options like -A to filter file types.

Uploaded by

Steven Frayne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
264 views7 pages

How To Download HTTP Directory With All Files and Sub-Directories As They Appear On The Online Files - Folders List - Stack Overflow

The document discusses downloading an entire directory structure and files from an HTTP directory via wget. It provides a wget command that uses options like -r, -np, -nH, --cut-dirs=3, and -R index.html to recursively download all files and subdirectories without depth limit while excluding index.html files and saving the files in the correct subdirectory structure. Answers provide explanations of the wget options used and additional options like -A to filter file types.

Uploaded by

Steven Frayne
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Stack Overflow Sign up Log in

Questions Jobs Tags Users Badges Ask

How to download HTTP directory with all files and sub-directories as they appear on the online
248 files/folders list?
html http get download wget

There is an online HTTP directory that I have access to. I have tried to download all sub-directories and files
via wget . But, the problem is that when wget downloads sub-directories it downloads the index.html file
which contains the list of files in that directory without downloading the files themselves.

Is there a way to download the sub-directories and files without depth limit (as if the directory I want to
download is just a folder which I want to copy to my computer).

Share Improve this question Follow

Omar asked
4,259 ● 4 ● 15 ● 31 May 3 '14 at 15:54

leiyc edited
871 ● 7 ● 19 Oct 22 '18 at 3:46

This answer worked wonderful for me: stackoverflow.com/a/61796867/316343 – Jahan May 8 at 18:40

Add a comment

8 Answers order by votes

Solution:
458
wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/

Explanation:

It will download all files and subfolders in ddd directory

-r : recursively

-np : not going to upper directories, like ccc/…

-nH : not saving files to hostname folder

--cut-dirs=3 : but saving it to ddd by omitting first 3 folders aaa, bbb, ccc

-R index.html : excluding index.html files

Reference: http://bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-
from-certain-directory-listed-by-apache/

Share Improve this answer Follow

Mingjiang Shi answered


6,433 ● 1 ● 24 ● 29 Oct 9 '14 at 3:17

gibbone edited
1,492 ● 15 ● 16 Oct 4 '18 at 3:09

23 Thank you! Also, FYI according to this you can use -R like -R css to exclude all CSS files, or use -A like -A pdf to
only download PDF files. – John Apr 13 '15 at 20:52

13 Thanks! Additional advice taken from wget man page


When downloading from Internet servers, consider using the ‘-w’ option to introduce a delay betwe
en accesses to the server. The download will take a while longer, but the server administrator wi
ll not be alarmed by your rudeness.
– jgrump2012 Jul 8 '16 at 16:26

4 I get this error 'wget' is not recognized as an internal or external command, operable program or batch file. – hamish Mar
5 '17 at 1:42

1 @hamish you may need to install wget first or the wget is not in your $PATH. – Mingjiang Shi Mar 7 '17 at 3:30

23 Great answer, but note that if there is a robots.txt file disallowing the downloading of files in the directory, this won't
work. In that case you need to add -e robots=off . See unix.stackexchange.com/a/252564/10312
– Daniel Hershcovich Apr 16 '18 at 11:02

Show 12 more comments

I was able to get this to work thanks to this post utilizing VisualWGet. It worked great for me. The
59 important part seems to be to check the -recursive flag (see image).

Also found that the -no-parent flag is important, othewise it will try to download everything.
Share Improve this answer Follow

mateuscb answered
8,540 ● 3 ● 47 ● 74 Jun 16 '14 at 15:56

edited
Oct 23 '15 at 15:16
3 Just found this - Dec 2017. It works fine. I got it at sourceforge.net/projects/visualwget – SDsolar Dec 9 '17 at 7:02

2 Worked fine on Windows machine, don't forget to check in the options mentioned in the answer , else it won't work
– coder3521 Dec 28 '17 at 8:50

Doesn't work with certain https. @DaveLucre if you tried with wget in cmd solution you would be able to download as well,
but some severs do not allow it I guess – Yannis Dran May 4 '19 at 3:02

what does checked --no-parent do? – T.Todua Aug 8 '19 at 11:33

1 Working in March 2020! – Mr Programmer Mar 11 '20 at 18:00

Show 2 more comments

wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/


13

From man wget

‘-r’ ‘--recursive’ Turn on recursive retrieving. See Recursive Download, for more details. The default
maximum depth is 5.

‘-np’ ‘--no-parent’ Do not ever ascend to the parent directory when retrieving recursively. This is a
useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
See Directory-Based Limits, for more details.

‘-nH’ ‘--no-host-directories’ Disable generation of host-prefixed directories. By default, invoking


Wget with ‘-r http://fly.srk.fer.hr/’ will create a structure of directories beginning with fly.srk.fer.hr/.
This option disables such behavior.

‘--cut-dirs=number’ Ignore number directory components. This is useful for getting a fine-grained
control over the directory where recursive retrieval will be saved.

Take, for example, the directory at ‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with ‘-r’, it
will be saved locally under ftp.xemacs.org/pub/xemacs/. While the ‘-nH’ option can remove the
ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where ‘--cut-dirs’ comes in handy;
it makes Wget not “see” number remote directory components. Here are several examples of how ‘-
-cut-dirs’ option works.

No options -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --
cut-dirs=2 -> .

--cut-dirs=1 -> ftp.xemacs.org/xemacs/ ... If you just want to get rid of the directory structure, this
option is similar to a combination of ‘-nd’ and ‘-P’. However, unlike ‘-nd’, ‘--cut-dirs’ does not lose
with subdirectories—for instance, with ‘-nH --cut-dirs=1’, a beta/ subdirectory will be placed to
xemacs/beta, as one would expect.
Share Improve this answer Follow

Natalie Ng answered
141 ● 1 ● 2 Jun 19 '17 at 20:36

Ryan R edited
8,012 ● 14 ● 75 ● 107 Aug 31 '17 at 17:06

3 Some explanations would be great. – Benoît Latinier Jun 19 '17 at 20:47

What about downloading a specific file type using VisualWget? Is it possible to download only mp3 files in a directory and
its sub-directories in VisualWget? – user5871859 May 30 '20 at 7:15

Add a comment

you can use lftp, the swish army knife of downloading if you have bigger files you can add --use-
13 pget-n=10 to command

lftp -c 'mirror --parallel=100 https://example.com/files/ ;exit'

Share Improve this answer Follow

nwgat answered
533 ● 5 ● 11 May 14 '20 at 12:02

1 worked perfectly and really fast, this maxed out my internet line downloading thousands of small files. Very good. – n13
Jun 27 '20 at 19:47

Explain what these parametres do please – leetbacoon Nov 26 '20 at 8:37

-c = continue, mirror = mirrors content locally, parallel=100 = downloads 100 files, ;exit = exits the program, use-pget =
splits bigger files into segments and downloads parallels – nwgat Dec 17 '20 at 6:55

I had issues with this command. Some videos I was trying to download were broken. If I download them normally and
individually from the browser it works perfectly. – Hassen Ch. Dec 30 '20 at 13:12

The most voted solution has no problem with any file. All good! – Hassen Ch. Dec 30 '20 at 13:34

Show 2 more comments

5
No Software or Plugin required!
(only usable if you don't need recursive deptch)

Use bookmarklet. Drag this link in bookmarks, then edit and paste this code:

(function(){ var arr=[], l=document.links; var ext=prompt("select extension for dow


and go on page (from where you want to download files), and click that bookmarklet.

Share Improve this answer Follow

T.Todua answered
46.4k ● 18 ● 202 ● 196 Jan 20 '18 at 16:13

edited
Aug 8 '19 at 11:35

Does this open the save as dialog for every file? – akkk3 Jun 9 at 22:59

Add a comment

wget is an invaluable resource and something I use myself. However sometimes there are
4 characters in the address that wget identifies as syntax errors. I'm sure there is a fix for that, but
as this question did not ask specifically about wget I thought I would offer an alternative for those
people who will undoubtedly stumble upon this page looking for a quick fix with no learning curve
required.

There are a few browser extensions that can do this, but most require installing download
managers, which aren't always free, tend to be an eyesore, and use a lot of resources. Heres one
that has none of these drawbacks:

"Download Master" is an extension for Google Chrome that works great for downloading from
directories. You can choose to filter which file-types to download, or download the entire directory.

https://chrome.google.com/webstore/detail/download-master/dljdacfojgikogldjffnkdcielnklkce

For an up-to-date feature list and other information, visit the project page on the developer's blog:

http://monadownloadmaster.blogspot.com/

Share Improve this answer Follow

Moscarda answered
343 ● 2 ● 14 Feb 21 '16 at 0:04

Peter edited
2,257 ● 2 ● 21 ● 35 May 25 '16 at 15:42

You can use this Firefox addon to download all files in HTTP Directory.
3
https://addons.mozilla.org/en-US/firefox/addon/http-directory-downloader/

Share Improve this answer Follow

Rushikesh Tade answered


435 ● 3 ● 7 Mar 6 '19 at 7:09
1 wget generally works in this way, but some sites may have problems and it may create too many
unnecessary html files. In order to make this work easier and to prevent unnecessary file creation, I
am sharing my getwebfolder script, which is the first linux script I wrote for myself. This script
downloads all content of a web folder entered as parameter.

When you try to download an open web folder by wget which contains more then one file, wget
downloads a file named index.html. This file contains a file list of the web folder. My script converts
file names written in index.html file to web addresses and downloads them clearly with wget.

Tested at Ubuntu 18.04 and Kali Linux, It may work at other distros as well.

Usage :

extract getwebfolder file from zip file provided below

chmod +x getwebfolder (only for first time)

./getwebfolder webfolder_URL

such as ./getwebfolder http://example.com/example_folder/

Download Link

Details on blog

Share Improve this answer Follow

Byte Bitter answered


21 ● 3 Feb 10 '19 at 13:11

edited
Feb 10 '19 at 14:43

Highly active question. Earn 10 reputation (not counting the association bonus) in order to answer this question. The
reputation requirement helps protect this question from spam and non-answer activity.

meta chat tour help blog privacy policy legal contact us cookie settings full site
2021 Stack Exchange, Inc. user contributions under cc by-sa

You might also like