0% found this document useful (0 votes)
772 views5 pages

Download Data with wget Guide

This document provides instructions for downloading data files from an HTTP service using wget: 1. Install wget and set up authorization with Earthdata Login credentials and cookies files. 2. Get the HTTP URL for the desired data files, such as MERRA-2 monthly files from 1981. 3. Download single files or multiple files recursively with wget commands specifying authorization options and file type filters.

Uploaded by

Henry Ezeilo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
772 views5 pages

Download Data with wget Guide

This document provides instructions for downloading data files from an HTTP service using wget: 1. Install wget and set up authorization with Earthdata Login credentials and cookies files. 2. Get the HTTP URL for the desired data files, such as MERRA-2 monthly files from 1981. 3. Download single files or multiple files recursively with wget commands specifying authorization options and file type filters.

Uploaded by

Henry Ezeilo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

How to Download Data Files from HTTP Service with wget

Overview:
Note: this is extracted from >> http://disc.sci.gsfc.nasa.gov/recipes/?q=recipes/How-toDownload-Data-Files-from-HTTP-Service-with-wget
This data recipe shows an example for downloading data files from an HTTP service at GES
DISC with GNU wget commend. The GNU wget is a free software for non-interactive
downloading of files from the Web. It is a Unix-based command-line tool, but is also available
for other operating system, such as Linux, Windows, Mac OS X, etc.
Best When:
Want to script data downloading of multiple files.
Task:
Obtaining Data
Example:
Download MERRA-2 Monthly data files for 1981.
Time to complete the following procedures: 10 minutes

This data recipe has been tested on: Linux (wget version 1.12), and Mac OS X (wget
version 1.17.1 ).
Procedure:
1. Install wget
Skip this step if you already have wget installed.
Download wget: https://www.gnu.org/software/wget/

2. Authorize NASA GESDISC DATA ARCHIVE Data Access and setup cookies
Starting August 1 2016, access to GES DISC data requires all users to be registered with
Earthdata Login and then authorize NASA GESDISC DATA ARCHIVE Data Access by
following the instructions:
How to Register a New User in Earthdata Login
How to Authorize NASA GESDISC DATA ARCHIVE Data Access in Earthdata
Login

To run wget, you need to set up .netrc and create a cookie file:

Create a .netrc file in your home directory.


a. cd ~ or cd $HOME
b. touch .netrc
c. echo "machine urs.earthdata.nasa.gov login <uid> password <password>" >> .netrc
where <uid> is your user name and <password> is your URS password
d. chmod 0600 .netrc (so only you can access it)

Create a cookie file. This file will be used to persist sessions across calls to Wget or Curl.
For example:
a. cd ~ or cd $HOME
b. touch .urs_cookies
Please read more regarding user registration and data access at:
http://disc.sci.gsfc.nasa.gov/registration

3. Get the HTTP URL


For example, for MERRA-2 monthly product, MERRA-2 tavgM_2d_slv_Nx: 2d,Monthly
mean,Time-Averaged,Single-Level,Assimilation,Single-Level Diagnostics
(M2TMNXSLV.5.12.4), year 1981, the HTTP URL is:
http://goldsmr4.sci.gsfc.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/1981

Preview the list of data files by opening the URL with any Web browser.

4. List data files (Optional step)


The following is similar to ftp ls function.
wget -q -nH -nd <URL> -O - | grep <filename_pattern> | cut -f4 -d\"
Where,

<URL>: URL of the directory containing data files of interest


<filename_pattern>: pattern of the filename. The pattern can be found by previewing the
data files with a Web browser.

In this example, we use filename_pattern=MERRA2_100:


wget -q -nH
-nd http://goldsmr4.sci.gsfc.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/1981/ -O
- | grep MERRA2_100 | cut -f4 -d\"

Note: On Mac OS X (or any Unix system which has the "curl" command available), list data files
can be done via curl by substituting 'curl -s' for 'wget -q -nH -nd', and omitting '-O -'. For
example,
curl
-s http://goldsmr4.sci.gsfc.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/1981/ |
grep MERRA2_100 | cut -f4 -d\"
Since curl does not have the ability to do recursive download. wget or a download manager may
work better for multi-file downloads.

5. Download Data Files

Download one data file:


wget <auth> <URL_file>
where,
<auth> : authorization options, e.g.:
--load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-nochallenge=on --keep-session-cookies
<URL_file>: URL of a data file
For example:

wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on


--keep-sessioncookieshttp://goldsmr4.sci.gsfc.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/1
981/MERRA2_100.tavgM_2d_slv_Nx.198101.nc4

Download multiple files with recursive function:


The following is similar to ftp mget function.
wget <auth> r -c -nH -nd -np -A <acclist> <URL>
where,
<auth>: authorization options, e.g.:
--load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-nochallenge=on --keep-session-cookies
<acclist>: filename suffixes or patterns of the data files, e.g., nc4, nc, hdf5,
xml
<URL>: URL of the directory containing data files of interest
The Recursive Accept/Reject Options enables specifying comma-separated lists of file
name suffixes or patterns to accept or reject. Read more in the Discussion section.

For example,
To download all data and metadata files in the directory:
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on
--keep-session-cookies -r -c -nH -nd -np -A nc4,xml
"http://goldsmr4.sci.gsfc.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/1981
/"
To download only data files in the directory:
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on
--keep-session-cookies -r -c -nH -nd -np -A nc4
"http://goldsmr4.sci.gsfc.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/1981

/"
To download part of data files in the directory (from Oct 1981 to Dec 1981):
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on
--keep-session-cookies -r -c -nH -nd -np -A '*19811*nc4'
"http://goldsmr4.sci.gsfc.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/1981
/"
or
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on
--keep-session-cookies -r -c -nH -nd -np -A '*19811*nc4'
"http://goldsmr4.sci.gsfc.nasa.gov/data/MERRA2_MONTHLY/M2TMNXSLV.5.12.4/"

Download multiple files using a text files containing a list of URLs:


Users who already have a list of URLs saved to their workstation in a file can simply issue the
following command line, using wget 1.14 ( or higher):
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on
--keep-session-cookies -i myfile.dat
where "myfile.dat" is the name of the file containing the list of urls.
Discussion:
The Recursive Accept/Reject Options:
-A acclist --accept acclist
-R rejlist --reject rejlist
This option enables specifying comma-separated lists of file name suffixes or patterns to accept
or reject. Note that if any of the wildcard characters, *, ?, [ or ], appear in an element
of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to
enclose the pattern into quotes to prevent your shell from expanding it, like in -A "*.mp3"
or -A '*.mp3'.
Read more options from the software manual.

You might also like