2000-11-15 05:44:18 -05:00
|
|
|
|
This is wget.info, produced by makeinfo version 4.0 from wget.texi.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
INFO-DIR-SECTION Net Utilities
|
|
|
|
|
INFO-DIR-SECTION World Wide Web
|
|
|
|
|
START-INFO-DIR-ENTRY
|
|
|
|
|
* Wget: (wget). The non-interactive network downloader.
|
|
|
|
|
END-INFO-DIR-ENTRY
|
|
|
|
|
|
|
|
|
|
This file documents the the GNU Wget utility for downloading network
|
|
|
|
|
data.
|
|
|
|
|
|
2000-02-29 20:03:39 -05:00
|
|
|
|
Copyright (C) 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
Permission is granted to make and distribute verbatim copies of this
|
|
|
|
|
manual provided the copyright notice and this permission notice are
|
|
|
|
|
preserved on all copies.
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Permission is granted to copy, distribute and/or modify this document
|
|
|
|
|
under the terms of the GNU Free Documentation License, Version 1.1 or
|
|
|
|
|
any later version published by the Free Software Foundation; with the
|
|
|
|
|
Invariant Sections being "GNU General Public License" and "GNU Free
|
|
|
|
|
Documentation License", with no Front-Cover Texts, and with no
|
|
|
|
|
Back-Cover Texts. A copy of the license is included in the section
|
|
|
|
|
entitled "GNU Free Documentation License".
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-10-20 01:55:46 -04:00
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Directory-Based Limits, Next: FTP Links, Prev: Types of Files, Up: Following Links
|
|
|
|
|
|
|
|
|
|
Directory-Based Limits
|
|
|
|
|
======================
|
|
|
|
|
|
|
|
|
|
Regardless of other link-following facilities, it is often useful to
|
|
|
|
|
place the restriction of what files to retrieve based on the directories
|
|
|
|
|
those files are placed in. There can be many reasons for this--the
|
|
|
|
|
home pages may be organized in a reasonable directory structure; or some
|
|
|
|
|
directories may contain useless information, e.g. `/cgi-bin' or `/dev'
|
|
|
|
|
directories.
|
|
|
|
|
|
|
|
|
|
Wget offers three different options to deal with this requirement.
|
|
|
|
|
Each option description lists a short name, a long name, and the
|
|
|
|
|
equivalent command in `.wgetrc'.
|
|
|
|
|
|
|
|
|
|
`-I LIST'
|
|
|
|
|
`--include LIST'
|
|
|
|
|
`include_directories = LIST'
|
|
|
|
|
`-I' option accepts a comma-separated list of directories included
|
|
|
|
|
in the retrieval. Any other directories will simply be ignored.
|
|
|
|
|
The directories are absolute paths.
|
|
|
|
|
|
|
|
|
|
So, if you wish to download from `http://host/people/bozo/'
|
|
|
|
|
following only links to bozo's colleagues in the `/people'
|
|
|
|
|
directory and the bogus scripts in `/cgi-bin', you can specify:
|
|
|
|
|
|
|
|
|
|
wget -I /people,/cgi-bin http://host/people/bozo/
|
|
|
|
|
|
|
|
|
|
`-X LIST'
|
|
|
|
|
`--exclude LIST'
|
|
|
|
|
`exclude_directories = LIST'
|
|
|
|
|
`-X' option is exactly the reverse of `-I'--this is a list of
|
2000-11-15 05:44:18 -05:00
|
|
|
|
directories _excluded_ from the download. E.g. if you do not want
|
2000-10-20 01:55:46 -04:00
|
|
|
|
Wget to download things from `/cgi-bin' directory, specify `-X
|
|
|
|
|
/cgi-bin' on the command line.
|
|
|
|
|
|
|
|
|
|
The same as with `-A'/`-R', these two options can be combined to
|
|
|
|
|
get a better fine-tuning of downloading subdirectories. E.g. if
|
|
|
|
|
you want to load all the files from `/pub' hierarchy except for
|
|
|
|
|
`/pub/worthless', specify `-I/pub -X/pub/worthless'.
|
|
|
|
|
|
|
|
|
|
`-np'
|
|
|
|
|
`--no-parent'
|
|
|
|
|
`no_parent = on'
|
|
|
|
|
The simplest, and often very useful way of limiting directories is
|
|
|
|
|
disallowing retrieval of the links that refer to the hierarchy
|
|
|
|
|
"above" than the beginning directory, i.e. disallowing ascent to
|
|
|
|
|
the parent directory/directories.
|
|
|
|
|
|
|
|
|
|
The `--no-parent' option (short `-np') is useful in this case.
|
|
|
|
|
Using it guarantees that you will never leave the existing
|
|
|
|
|
hierarchy. Supposing you issue Wget with:
|
|
|
|
|
|
|
|
|
|
wget -r --no-parent http://somehost/~luzer/my-archive/
|
|
|
|
|
|
|
|
|
|
You may rest assured that none of the references to
|
|
|
|
|
`/~his-girls-homepage/' or `/~luzer/all-my-mpegs/' will be
|
|
|
|
|
followed. Only the archive you are interested in will be
|
|
|
|
|
downloaded. Essentially, `--no-parent' is similar to
|
|
|
|
|
`-I/~luzer/my-archive', only it handles redirections in a more
|
|
|
|
|
intelligent fashion.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: FTP Links, Prev: Directory-Based Limits, Up: Following Links
|
|
|
|
|
|
|
|
|
|
Following FTP Links
|
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
The rules for FTP are somewhat specific, as it is necessary for them
|
|
|
|
|
to be. FTP links in HTML documents are often included for purposes of
|
|
|
|
|
reference, and it is often inconvenient to download them by default.
|
|
|
|
|
|
|
|
|
|
To have FTP links followed from HTML documents, you need to specify
|
|
|
|
|
the `--follow-ftp' option. Having done that, FTP links will span hosts
|
|
|
|
|
regardless of `-H' setting. This is logical, as FTP links rarely point
|
|
|
|
|
to the same host where the HTTP server resides. For similar reasons,
|
|
|
|
|
the `-L' options has no effect on such downloads. On the other hand,
|
|
|
|
|
domain acceptance (`-D') and suffix rules (`-A' and `-R') apply
|
|
|
|
|
normally.
|
|
|
|
|
|
|
|
|
|
Also note that followed links to FTP directories will not be
|
|
|
|
|
retrieved recursively further.
|
|
|
|
|
|
2000-08-30 07:26:21 -04:00
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Time-Stamping, Next: Startup File, Prev: Following Links, Up: Top
|
|
|
|
|
|
|
|
|
|
Time-Stamping
|
|
|
|
|
*************
|
|
|
|
|
|
|
|
|
|
One of the most important aspects of mirroring information from the
|
|
|
|
|
Internet is updating your archives.
|
|
|
|
|
|
|
|
|
|
Downloading the whole archive again and again, just to replace a few
|
|
|
|
|
changed files is expensive, both in terms of wasted bandwidth and money,
|
|
|
|
|
and the time to do the update. This is why all the mirroring tools
|
|
|
|
|
offer the option of incremental updating.
|
|
|
|
|
|
|
|
|
|
Such an updating mechanism means that the remote server is scanned in
|
|
|
|
|
search of "new" files. Only those new files will be downloaded in the
|
|
|
|
|
place of the old ones.
|
|
|
|
|
|
|
|
|
|
A file is considered new if one of these two conditions are met:
|
|
|
|
|
|
|
|
|
|
1. A file of that name does not already exist locally.
|
|
|
|
|
|
|
|
|
|
2. A file of that name does exist, but the remote file was modified
|
|
|
|
|
more recently than the local file.
|
|
|
|
|
|
|
|
|
|
To implement this, the program needs to be aware of the time of last
|
|
|
|
|
modification of both remote and local files. Such information are
|
|
|
|
|
called the "time-stamps".
|
|
|
|
|
|
|
|
|
|
The time-stamping in GNU Wget is turned on using `--timestamping'
|
|
|
|
|
(`-N') option, or through `timestamping = on' directive in `.wgetrc'.
|
|
|
|
|
With this option, for each file it intends to download, Wget will check
|
|
|
|
|
whether a local file of the same name exists. If it does, and the
|
|
|
|
|
remote file is older, Wget will not download it.
|
|
|
|
|
|
|
|
|
|
If the local file does not exist, or the sizes of the files do not
|
|
|
|
|
match, Wget will download the remote file no matter what the time-stamps
|
|
|
|
|
say.
|
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* Time-Stamping Usage::
|
|
|
|
|
* HTTP Time-Stamping Internals::
|
|
|
|
|
* FTP Time-Stamping Internals::
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Time-Stamping Usage, Next: HTTP Time-Stamping Internals, Prev: Time-Stamping, Up: Time-Stamping
|
|
|
|
|
|
|
|
|
|
Time-Stamping Usage
|
|
|
|
|
===================
|
|
|
|
|
|
|
|
|
|
The usage of time-stamping is simple. Say you would like to
|
|
|
|
|
download a file so that it keeps its date of modification.
|
|
|
|
|
|
|
|
|
|
wget -S http://www.gnu.ai.mit.edu/
|
|
|
|
|
|
|
|
|
|
A simple `ls -l' shows that the time stamp on the local file equals
|
|
|
|
|
the state of the `Last-Modified' header, as returned by the server. As
|
|
|
|
|
you can see, the time-stamping info is preserved locally, even without
|
|
|
|
|
`-N'.
|
|
|
|
|
|
|
|
|
|
Several days later, you would like Wget to check if the remote file
|
|
|
|
|
has changed, and download it if it has.
|
|
|
|
|
|
|
|
|
|
wget -N http://www.gnu.ai.mit.edu/
|
|
|
|
|
|
|
|
|
|
Wget will ask the server for the last-modified date. If the local
|
|
|
|
|
file is newer, the remote file will not be re-fetched. However, if the
|
|
|
|
|
remote file is more recent, Wget will proceed fetching it normally.
|
|
|
|
|
|
|
|
|
|
The same goes for FTP. For example:
|
|
|
|
|
|
|
|
|
|
wget ftp://ftp.ifi.uio.no/pub/emacs/gnus/*
|
|
|
|
|
|
|
|
|
|
`ls' will show that the timestamps are set according to the state on
|
|
|
|
|
the remote server. Reissuing the command with `-N' will make Wget
|
2000-11-15 05:44:18 -05:00
|
|
|
|
re-fetch _only_ the files that have been modified.
|
2000-08-30 07:26:21 -04:00
|
|
|
|
|
|
|
|
|
In both HTTP and FTP retrieval Wget will time-stamp the local file
|
|
|
|
|
correctly (with or without `-N') if it gets the stamps, i.e. gets the
|
|
|
|
|
directory listing for FTP or the `Last-Modified' header for HTTP.
|
|
|
|
|
|
|
|
|
|
If you wished to mirror the GNU archive every week, you would use the
|
|
|
|
|
following command every week:
|
|
|
|
|
|
|
|
|
|
wget --timestamping -r ftp://prep.ai.mit.edu/pub/gnu/
|
|
|
|
|
|
2000-08-22 23:04:20 -04:00
|
|
|
|
|
|
|
|
|
File: wget.info, Node: HTTP Time-Stamping Internals, Next: FTP Time-Stamping Internals, Prev: Time-Stamping Usage, Up: Time-Stamping
|
|
|
|
|
|
|
|
|
|
HTTP Time-Stamping Internals
|
|
|
|
|
============================
|
|
|
|
|
|
|
|
|
|
Time-stamping in HTTP is implemented by checking of the
|
|
|
|
|
`Last-Modified' header. If you wish to retrieve the file `foo.html'
|
|
|
|
|
through HTTP, Wget will check whether `foo.html' exists locally. If it
|
|
|
|
|
doesn't, `foo.html' will be retrieved unconditionally.
|
|
|
|
|
|
|
|
|
|
If the file does exist locally, Wget will first check its local
|
|
|
|
|
time-stamp (similar to the way `ls -l' checks it), and then send a
|
|
|
|
|
`HEAD' request to the remote server, demanding the information on the
|
|
|
|
|
remote file.
|
|
|
|
|
|
|
|
|
|
The `Last-Modified' header is examined to find which file was
|
|
|
|
|
modified more recently (which makes it "newer"). If the remote file is
|
|
|
|
|
newer, it will be downloaded; if it is older, Wget will give up.(1)
|
|
|
|
|
|
|
|
|
|
When `--backup-converted' (`-K') is specified in conjunction with
|
|
|
|
|
`-N', server file `X' is compared to local file `X.orig', if extant,
|
|
|
|
|
rather than being compared to local file `X', which will always differ
|
|
|
|
|
if it's been converted by `--convert-links' (`-k').
|
|
|
|
|
|
|
|
|
|
Arguably, HTTP time-stamping should be implemented using the
|
|
|
|
|
`If-Modified-Since' request.
|
|
|
|
|
|
|
|
|
|
---------- Footnotes ----------
|
|
|
|
|
|
|
|
|
|
(1) As an additional check, Wget will look at the `Content-Length'
|
|
|
|
|
header, and compare the sizes; if they are not the same, the remote
|
|
|
|
|
file will be downloaded no matter what the time-stamp says.
|
|
|
|
|
|
2000-05-22 22:29:38 -04:00
|
|
|
|
|
|
|
|
|
File: wget.info, Node: FTP Time-Stamping Internals, Prev: HTTP Time-Stamping Internals, Up: Time-Stamping
|
|
|
|
|
|
|
|
|
|
FTP Time-Stamping Internals
|
|
|
|
|
===========================
|
|
|
|
|
|
|
|
|
|
In theory, FTP time-stamping works much the same as HTTP, only FTP
|
|
|
|
|
has no headers--time-stamps must be received from the directory
|
|
|
|
|
listings.
|
|
|
|
|
|
|
|
|
|
For each directory files must be retrieved from, Wget will use the
|
|
|
|
|
`LIST' command to get the listing. It will try to analyze the listing,
|
|
|
|
|
assuming that it is a Unix `ls -l' listing, and extract the
|
|
|
|
|
time-stamps. The rest is exactly the same as for HTTP.
|
|
|
|
|
|
|
|
|
|
Assumption that every directory listing is a Unix-style listing may
|
|
|
|
|
sound extremely constraining, but in practice it is not, as many
|
|
|
|
|
non-Unix FTP servers use the Unixoid listing format because most (all?)
|
|
|
|
|
of the clients understand it. Bear in mind that RFC959 defines no
|
|
|
|
|
standard way to get a file list, let alone the time-stamps. We can
|
|
|
|
|
only hope that a future standard will define this.
|
|
|
|
|
|
|
|
|
|
Another non-standard solution includes the use of `MDTM' command
|
|
|
|
|
that is supported by some FTP servers (including the popular
|
|
|
|
|
`wu-ftpd'), which returns the exact time of the specified file. Wget
|
|
|
|
|
may support this command in the future.
|
|
|
|
|
|
2000-03-11 01:48:06 -05:00
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Startup File, Next: Examples, Prev: Time-Stamping, Up: Top
|
|
|
|
|
|
|
|
|
|
Startup File
|
|
|
|
|
************
|
|
|
|
|
|
|
|
|
|
Once you know how to change default settings of Wget through command
|
|
|
|
|
line arguments, you may wish to make some of those settings permanent.
|
|
|
|
|
You can do that in a convenient way by creating the Wget startup
|
|
|
|
|
file--`.wgetrc'.
|
|
|
|
|
|
|
|
|
|
Besides `.wgetrc' is the "main" initialization file, it is
|
|
|
|
|
convenient to have a special facility for storing passwords. Thus Wget
|
|
|
|
|
reads and interprets the contents of `$HOME/.netrc', if it finds it.
|
|
|
|
|
You can find `.netrc' format in your system manuals.
|
|
|
|
|
|
|
|
|
|
Wget reads `.wgetrc' upon startup, recognizing a limited set of
|
|
|
|
|
commands.
|
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* Wgetrc Location:: Location of various wgetrc files.
|
|
|
|
|
* Wgetrc Syntax:: Syntax of wgetrc.
|
|
|
|
|
* Wgetrc Commands:: List of available commands.
|
|
|
|
|
* Sample Wgetrc:: A wgetrc example.
|
|
|
|
|
|
2000-03-02 02:06:10 -05:00
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Wgetrc Location, Next: Wgetrc Syntax, Prev: Startup File, Up: Startup File
|
|
|
|
|
|
|
|
|
|
Wgetrc Location
|
|
|
|
|
===============
|
|
|
|
|
|
|
|
|
|
When initializing, Wget will look for a "global" startup file,
|
|
|
|
|
`/usr/local/etc/wgetrc' by default (or some prefix other than
|
|
|
|
|
`/usr/local', if Wget was not installed there) and read commands from
|
|
|
|
|
there, if it exists.
|
|
|
|
|
|
|
|
|
|
Then it will look for the user's file. If the environmental variable
|
|
|
|
|
`WGETRC' is set, Wget will try to load that file. Failing that, no
|
|
|
|
|
further attempts will be made.
|
|
|
|
|
|
|
|
|
|
If `WGETRC' is not set, Wget will try to load `$HOME/.wgetrc'.
|
|
|
|
|
|
|
|
|
|
The fact that user's settings are loaded after the system-wide ones
|
2000-11-15 05:44:18 -05:00
|
|
|
|
means that in case of collision user's wgetrc _overrides_ the
|
2000-03-02 02:06:10 -05:00
|
|
|
|
system-wide wgetrc (in `/usr/local/etc/wgetrc' by default). Fascist
|
|
|
|
|
admins, away!
|
|
|
|
|
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Wgetrc Syntax, Next: Wgetrc Commands, Prev: Wgetrc Location, Up: Startup File
|
|
|
|
|
|
|
|
|
|
Wgetrc Syntax
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
The syntax of a wgetrc command is simple:
|
|
|
|
|
|
|
|
|
|
variable = value
|
|
|
|
|
|
|
|
|
|
The "variable" will also be called "command". Valid "values" are
|
|
|
|
|
different for different commands.
|
|
|
|
|
|
|
|
|
|
The commands are case-insensitive and underscore-insensitive. Thus
|
|
|
|
|
`DIr__PrefiX' is the same as `dirprefix'. Empty lines, lines beginning
|
|
|
|
|
with `#' and lines containing white-space only are discarded.
|
|
|
|
|
|
|
|
|
|
Commands that expect a comma-separated list will clear the list on an
|
|
|
|
|
empty command. So, if you wish to reset the rejection list specified in
|
|
|
|
|
global `wgetrc', you can do it with:
|
|
|
|
|
|
|
|
|
|
reject =
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Wgetrc Commands, Next: Sample Wgetrc, Prev: Wgetrc Syntax, Up: Startup File
|
|
|
|
|
|
|
|
|
|
Wgetrc Commands
|
|
|
|
|
===============
|
|
|
|
|
|
2000-10-20 02:59:30 -04:00
|
|
|
|
The complete set of commands is listed below. Legal values are
|
|
|
|
|
listed after the `='. Simple Boolean values can be set or unset using
|
|
|
|
|
`on' and `off' or `1' and `0'. A fancier kind of Boolean allowed in
|
|
|
|
|
some cases is the "lockable" Boolean, which may be set to `on', `off',
|
|
|
|
|
`always', or `never'. If an option is set to `always' or `never', that
|
|
|
|
|
value will be locked in for the duration of the wget invocation -
|
|
|
|
|
commandline options will not override.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-10-24 02:19:17 -04:00
|
|
|
|
Some commands take pseudo-arbitrary values. ADDRESS values can be
|
|
|
|
|
hostnames or dotted-quad IP addresses. N can be any positive integer,
|
|
|
|
|
or `inf' for infinity, where appropriate. STRING values can be any
|
|
|
|
|
non-empty string.
|
2000-10-20 02:59:30 -04:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Most of these commands have commandline equivalents (*note
|
2000-10-20 02:59:30 -04:00
|
|
|
|
Invoking::), though some of the more obscure or rarely used ones do not.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
accept/reject = STRING
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Same as `-A'/`-R' (*note Types of Files::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
add_hostdir = on/off
|
|
|
|
|
Enable/disable host-prefixed file names. `-nH' disables it.
|
|
|
|
|
|
|
|
|
|
continue = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Enable/disable continuation of the retrieval - the same as `-c'
|
1999-12-02 02:42:23 -05:00
|
|
|
|
(which enables it).
|
|
|
|
|
|
|
|
|
|
background = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Enable/disable going to background - the same as `-b' (which
|
|
|
|
|
enables it).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-02-29 20:03:39 -05:00
|
|
|
|
backup_converted = on/off
|
|
|
|
|
Enable/disable saving pre-converted files with the suffix `.orig'
|
|
|
|
|
- the same as `-K' (which enables it).
|
|
|
|
|
|
1999-12-02 02:42:23 -05:00
|
|
|
|
base = STRING
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Consider relative URLs in URL input files forced to be interpreted
|
|
|
|
|
as HTML as being relative to STRING - the same as `-B'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-10-24 02:19:17 -04:00
|
|
|
|
bind_address = ADDRESS
|
|
|
|
|
Bind to ADDRESS, like the `--bind-address' option.
|
|
|
|
|
|
1999-12-02 02:42:23 -05:00
|
|
|
|
cache = on/off
|
|
|
|
|
When set to off, disallow server-caching. See the `-C' option.
|
|
|
|
|
|
|
|
|
|
convert links = on/off
|
|
|
|
|
Convert non-relative links locally. The same as `-k'.
|
|
|
|
|
|
|
|
|
|
cut_dirs = N
|
|
|
|
|
Ignore N remote directory components.
|
|
|
|
|
|
|
|
|
|
debug = on/off
|
|
|
|
|
Debug mode, same as `-d'.
|
|
|
|
|
|
|
|
|
|
delete_after = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Delete after download - the same as `--delete-after'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
dir_prefix = STRING
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Top of directory tree - the same as `-P'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
dirstruct = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Turning dirstruct on or off - the same as `-x' or `-nd',
|
1999-12-02 02:42:23 -05:00
|
|
|
|
respectively.
|
|
|
|
|
|
|
|
|
|
domains = STRING
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Same as `-D' (*note Domain Acceptance::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
dot_bytes = N
|
|
|
|
|
Specify the number of bytes "contained" in a dot, as seen
|
|
|
|
|
throughout the retrieval (1024 by default). You can postfix the
|
|
|
|
|
value with `k' or `m', representing kilobytes and megabytes,
|
|
|
|
|
respectively. With dot settings you can tailor the dot retrieval
|
2000-11-15 05:44:18 -05:00
|
|
|
|
to suit your needs, or you can use the predefined "styles" (*note
|
1999-12-02 02:42:23 -05:00
|
|
|
|
Download Options::).
|
|
|
|
|
|
|
|
|
|
dots_in_line = N
|
|
|
|
|
Specify the number of dots that will be printed in each line
|
|
|
|
|
throughout the retrieval (50 by default).
|
|
|
|
|
|
|
|
|
|
dot_spacing = N
|
|
|
|
|
Specify the number of dots in a single cluster (10 by default).
|
|
|
|
|
|
|
|
|
|
dot_style = STRING
|
|
|
|
|
Specify the dot retrieval "style", as with `--dot-style'.
|
|
|
|
|
|
|
|
|
|
exclude_directories = STRING
|
|
|
|
|
Specify a comma-separated list of directories you wish to exclude
|
2000-11-15 05:44:18 -05:00
|
|
|
|
from download - the same as `-X' (*note Directory-Based Limits::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
exclude_domains = STRING
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Same as `--exclude-domains' (*note Domain Acceptance::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
follow_ftp = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Follow FTP links from HTML documents - the same as `-f'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-03-11 01:48:06 -05:00
|
|
|
|
follow_tags = STRING
|
|
|
|
|
Only follow certain HTML tags when doing a recursive retrieval,
|
|
|
|
|
just like `--follow-tags'.
|
|
|
|
|
|
1999-12-02 02:42:23 -05:00
|
|
|
|
force_html = on/off
|
|
|
|
|
If set to on, force the input filename to be regarded as an HTML
|
2000-08-23 18:41:21 -04:00
|
|
|
|
document - the same as `-F'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
ftp_proxy = STRING
|
|
|
|
|
Use STRING as FTP proxy, instead of the one specified in
|
|
|
|
|
environment.
|
|
|
|
|
|
|
|
|
|
glob = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Turn globbing on/off - the same as `-g'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
header = STRING
|
|
|
|
|
Define an additional header, like `--header'.
|
|
|
|
|
|
2000-10-20 01:55:46 -04:00
|
|
|
|
html_extension = on/off
|
|
|
|
|
Add a `.html' extension to `text/html' files without it, like `-E'.
|
|
|
|
|
|
1999-12-02 02:42:23 -05:00
|
|
|
|
http_passwd = STRING
|
|
|
|
|
Set HTTP password.
|
|
|
|
|
|
|
|
|
|
http_proxy = STRING
|
|
|
|
|
Use STRING as HTTP proxy, instead of the one specified in
|
|
|
|
|
environment.
|
|
|
|
|
|
|
|
|
|
http_user = STRING
|
|
|
|
|
Set HTTP user to STRING.
|
|
|
|
|
|
|
|
|
|
ignore_length = on/off
|
|
|
|
|
When set to on, ignore `Content-Length' header; the same as
|
|
|
|
|
`--ignore-length'.
|
|
|
|
|
|
2000-03-11 01:48:06 -05:00
|
|
|
|
ignore_tags = STRING
|
|
|
|
|
Ignore certain HTML tags when doing a recursive retrieval, just
|
|
|
|
|
like `-G' / `--ignore-tags'.
|
|
|
|
|
|
1999-12-02 02:42:23 -05:00
|
|
|
|
include_directories = STRING
|
|
|
|
|
Specify a comma-separated list of directories you wish to follow
|
2000-08-23 18:41:21 -04:00
|
|
|
|
when downloading - the same as `-I'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
input = STRING
|
|
|
|
|
Read the URLs from STRING, like `-i'.
|
|
|
|
|
|
|
|
|
|
kill_longer = on/off
|
|
|
|
|
Consider data longer than specified in content-length header as
|
|
|
|
|
invalid (and retry getting it). The default behaviour is to save
|
|
|
|
|
as much data as there is, provided there is more than or equal to
|
|
|
|
|
the value in `Content-Length'.
|
|
|
|
|
|
|
|
|
|
logfile = STRING
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Set logfile - the same as `-o'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
login = STRING
|
|
|
|
|
Your user name on the remote machine, for FTP. Defaults to
|
|
|
|
|
`anonymous'.
|
|
|
|
|
|
|
|
|
|
mirror = on/off
|
|
|
|
|
Turn mirroring on/off. The same as `-m'.
|
|
|
|
|
|
|
|
|
|
netrc = on/off
|
|
|
|
|
Turn reading netrc on or off.
|
|
|
|
|
|
|
|
|
|
noclobber = on/off
|
|
|
|
|
Same as `-nc'.
|
|
|
|
|
|
|
|
|
|
no_parent = on/off
|
|
|
|
|
Disallow retrieving outside the directory hierarchy, like
|
2000-11-15 05:44:18 -05:00
|
|
|
|
`--no-parent' (*note Directory-Based Limits::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
no_proxy = STRING
|
|
|
|
|
Use STRING as the comma-separated list of domains to avoid in
|
|
|
|
|
proxy loading, instead of the one specified in environment.
|
|
|
|
|
|
|
|
|
|
output_document = STRING
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Set the output filename - the same as `-O'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-08-30 07:26:21 -04:00
|
|
|
|
page_requisites = on/off
|
|
|
|
|
Download all ancillary documents necessary for a single HTML page
|
|
|
|
|
to display properly - the same as `-p'.
|
|
|
|
|
|
2000-10-20 02:59:30 -04:00
|
|
|
|
passive_ftp = on/off/always/never
|
|
|
|
|
Set passive FTP - the same as `--passive-ftp'. Some scripts and
|
|
|
|
|
`.pm' (Perl module) files download files using `wget
|
|
|
|
|
--passive-ftp'. If your firewall does not allow this, you can set
|
|
|
|
|
`passive_ftp = never' to override the commandline.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
passwd = STRING
|
|
|
|
|
Set your FTP password to PASSWORD. Without this setting, the
|
|
|
|
|
password defaults to `username@hostname.domainname'.
|
|
|
|
|
|
|
|
|
|
proxy_user = STRING
|
|
|
|
|
Set proxy authentication user name to STRING, like `--proxy-user'.
|
|
|
|
|
|
|
|
|
|
proxy_passwd = STRING
|
|
|
|
|
Set proxy authentication password to STRING, like `--proxy-passwd'.
|
|
|
|
|
|
2000-10-20 02:59:30 -04:00
|
|
|
|
referer = STRING
|
|
|
|
|
Set HTTP `Referer:' header just like `--referer'. (Note it was
|
|
|
|
|
the folks who wrote the HTTP spec who got the spelling of
|
|
|
|
|
"referrer" wrong.)
|
|
|
|
|
|
1999-12-02 02:42:23 -05:00
|
|
|
|
quiet = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Quiet mode - the same as `-q'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
quota = QUOTA
|
2000-04-13 15:37:52 -04:00
|
|
|
|
Specify the download quota, which is useful to put in the global
|
|
|
|
|
`wgetrc'. When download quota is specified, Wget will stop
|
|
|
|
|
retrieving after the download sum has become greater than quota.
|
|
|
|
|
The quota can be specified in bytes (default), kbytes `k'
|
|
|
|
|
appended) or mbytes (`m' appended). Thus `quota = 5m' will set
|
|
|
|
|
the quota to 5 mbytes. Note that the user's startup file overrides
|
|
|
|
|
system settings.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
reclevel = N
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Recursion level - the same as `-l'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
recursive = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Recursive on/off - the same as `-r'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
relative_only = on/off
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Follow only relative links - the same as `-L' (*note Relative
|
1999-12-02 02:42:23 -05:00
|
|
|
|
Links::).
|
|
|
|
|
|
|
|
|
|
remove_listing = on/off
|
|
|
|
|
If set to on, remove FTP listings downloaded by Wget. Setting it
|
|
|
|
|
to off is the same as `-nr'.
|
|
|
|
|
|
|
|
|
|
retr_symlinks = on/off
|
|
|
|
|
When set to on, retrieve symbolic links as if they were plain
|
|
|
|
|
files; the same as `--retr-symlinks'.
|
|
|
|
|
|
|
|
|
|
robots = on/off
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Use (or not) `/robots.txt' file (*note Robots::). Be sure to know
|
1999-12-02 02:42:23 -05:00
|
|
|
|
what you are doing before changing the default (which is `on').
|
|
|
|
|
|
|
|
|
|
server_response = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Choose whether or not to print the HTTP and FTP server responses -
|
1999-12-02 02:42:23 -05:00
|
|
|
|
the same as `-S'.
|
|
|
|
|
|
|
|
|
|
simple_host_check = on/off
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Same as `-nh' (*note Host Checking::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
span_hosts = on/off
|
|
|
|
|
Same as `-H'.
|
|
|
|
|
|
|
|
|
|
timeout = N
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Set timeout value - the same as `-T'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
timestamping = on/off
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Turn timestamping on/off. The same as `-N' (*note Time-Stamping::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
tries = N
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Set number of retries per URL - the same as `-t'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
use_proxy = on/off
|
|
|
|
|
Turn proxy support on/off. The same as `-Y'.
|
|
|
|
|
|
|
|
|
|
verbose = on/off
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Turn verbose on/off - the same as `-v'/`-nv'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
wait = N
|
2000-08-23 18:41:21 -04:00
|
|
|
|
Wait N seconds between retrievals - the same as `-w'.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-04-12 23:41:58 -04:00
|
|
|
|
waitretry = N
|
2000-04-13 15:37:52 -04:00
|
|
|
|
Wait up to N seconds between retries of failed retrievals only -
|
|
|
|
|
the same as `--waitretry'. Note that this is turned on by default
|
|
|
|
|
in the global `wgetrc'.
|
2000-04-12 23:41:58 -04:00
|
|
|
|
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Sample Wgetrc, Prev: Wgetrc Commands, Up: Startup File
|
|
|
|
|
|
|
|
|
|
Sample Wgetrc
|
|
|
|
|
=============
|
|
|
|
|
|
|
|
|
|
This is the sample initialization file, as given in the distribution.
|
|
|
|
|
It is divided in two section--one for global usage (suitable for global
|
|
|
|
|
startup file), and one for local usage (suitable for `$HOME/.wgetrc').
|
|
|
|
|
Be careful about the things you change.
|
|
|
|
|
|
2000-04-13 15:37:52 -04:00
|
|
|
|
Note that almost all the lines are commented out. For a command to
|
|
|
|
|
have any effect, you must remove the `#' character at the beginning of
|
|
|
|
|
its line.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Examples, Next: Various, Prev: Startup File, Up: Top
|
|
|
|
|
|
|
|
|
|
Examples
|
|
|
|
|
********
|
|
|
|
|
|
|
|
|
|
The examples are classified into three sections, because of clarity.
|
|
|
|
|
The first section is a tutorial for beginners. The second section
|
|
|
|
|
explains some of the more complex program features. The third section
|
|
|
|
|
contains advice for mirror administrators, as well as even more complex
|
|
|
|
|
features (that some would call perverted).
|
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* Simple Usage:: Simple, basic usage of the program.
|
|
|
|
|
* Advanced Usage:: Advanced techniques of usage.
|
|
|
|
|
* Guru Usage:: Mirroring and the hairy stuff.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Simple Usage, Next: Advanced Usage, Prev: Examples, Up: Examples
|
|
|
|
|
|
|
|
|
|
Simple Usage
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
* Say you want to download a URL. Just type:
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
wget http://fly.srk.fer.hr/
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
The response will be something like:
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
--13:30:45-- http://fly.srk.fer.hr:80/en/
|
1999-12-02 02:42:23 -05:00
|
|
|
|
=> `index.html'
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Connecting to fly.srk.fer.hr:80... connected!
|
1999-12-02 02:42:23 -05:00
|
|
|
|
HTTP request sent, awaiting response... 200 OK
|
|
|
|
|
Length: 4,694 [text/html]
|
|
|
|
|
|
|
|
|
|
0K -> .... [100%]
|
|
|
|
|
|
|
|
|
|
13:30:46 (23.75 KB/s) - `index.html' saved [4694/4694]
|
|
|
|
|
|
|
|
|
|
* But what will happen if the connection is slow, and the file is
|
|
|
|
|
lengthy? The connection will probably fail before the whole file
|
|
|
|
|
is retrieved, more than once. In this case, Wget will try getting
|
|
|
|
|
the file until it either gets the whole of it, or exceeds the
|
|
|
|
|
default number of retries (this being 20). It is easy to change
|
|
|
|
|
the number of tries to 45, to insure that the whole file will
|
|
|
|
|
arrive safely:
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
* Now let's leave Wget to work in the background, and write its
|
|
|
|
|
progress to log file `log'. It is tiring to type `--tries', so we
|
|
|
|
|
shall use `-t'.
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
The ampersand at the end of the line makes sure that Wget works in
|
|
|
|
|
the background. To unlimit the number of retries, use `-t inf'.
|
|
|
|
|
|
|
|
|
|
* The usage of FTP is as simple. Wget will take care of login and
|
|
|
|
|
password.
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
$ wget ftp://gnjilux.srk.fer.hr/welcome.msg
|
|
|
|
|
--10:08:47-- ftp://gnjilux.srk.fer.hr:21/welcome.msg
|
1999-12-02 02:42:23 -05:00
|
|
|
|
=> `welcome.msg'
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Connecting to gnjilux.srk.fer.hr:21... connected!
|
1999-12-02 02:42:23 -05:00
|
|
|
|
Logging in as anonymous ... Logged in!
|
|
|
|
|
==> TYPE I ... done. ==> CWD not needed.
|
|
|
|
|
==> PORT ... done. ==> RETR welcome.msg ... done.
|
|
|
|
|
Length: 1,340 (unauthoritative)
|
|
|
|
|
|
|
|
|
|
0K -> . [100%]
|
|
|
|
|
|
|
|
|
|
10:08:48 (1.28 MB/s) - `welcome.msg' saved [1340]
|
|
|
|
|
|
|
|
|
|
* If you specify a directory, Wget will retrieve the directory
|
|
|
|
|
listing, parse it and convert it to HTML. Try:
|
|
|
|
|
|
|
|
|
|
wget ftp://prep.ai.mit.edu/pub/gnu/
|
|
|
|
|
lynx index.html
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Advanced Usage, Next: Guru Usage, Prev: Simple Usage, Up: Examples
|
|
|
|
|
|
|
|
|
|
Advanced Usage
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
* You would like to read the list of URLs from a file? Not a problem
|
|
|
|
|
with that:
|
|
|
|
|
|
|
|
|
|
wget -i file
|
|
|
|
|
|
|
|
|
|
If you specify `-' as file name, the URLs will be read from
|
|
|
|
|
standard input.
|
|
|
|
|
|
|
|
|
|
* Create a mirror image of GNU WWW site (with the same directory
|
|
|
|
|
structure the original has) with only one try per document, saving
|
|
|
|
|
the log of the activities to `gnulog':
|
|
|
|
|
|
|
|
|
|
wget -r -t1 http://www.gnu.ai.mit.edu/ -o gnulog
|
|
|
|
|
|
|
|
|
|
* Retrieve the first layer of yahoo links:
|
|
|
|
|
|
|
|
|
|
wget -r -l1 http://www.yahoo.com/
|
|
|
|
|
|
|
|
|
|
* Retrieve the index.html of `www.lycos.com', showing the original
|
|
|
|
|
server headers:
|
|
|
|
|
|
|
|
|
|
wget -S http://www.lycos.com/
|
|
|
|
|
|
|
|
|
|
* Save the server headers with the file:
|
|
|
|
|
wget -s http://www.lycos.com/
|
|
|
|
|
more index.html
|
|
|
|
|
|
|
|
|
|
* Retrieve the first two levels of `wuarchive.wustl.edu', saving them
|
|
|
|
|
to /tmp.
|
|
|
|
|
|
|
|
|
|
wget -P/tmp -l2 ftp://wuarchive.wustl.edu/
|
|
|
|
|
|
|
|
|
|
* You want to download all the GIFs from an HTTP directory. `wget
|
|
|
|
|
http://host/dir/*.gif' doesn't work, since HTTP retrieval does not
|
|
|
|
|
support globbing. In that case, use:
|
|
|
|
|
|
|
|
|
|
wget -r -l1 --no-parent -A.gif http://host/dir/
|
|
|
|
|
|
|
|
|
|
It is a bit of a kludge, but it works. `-r -l1' means to retrieve
|
2000-11-15 05:44:18 -05:00
|
|
|
|
recursively (*note Recursive Retrieval::), with maximum depth of 1.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
`--no-parent' means that references to the parent directory are
|
2000-11-15 05:44:18 -05:00
|
|
|
|
ignored (*note Directory-Based Limits::), and `-A.gif' means to
|
1999-12-02 02:42:23 -05:00
|
|
|
|
download only the GIF files. `-A "*.gif"' would have worked too.
|
|
|
|
|
|
|
|
|
|
* Suppose you were in the middle of downloading, when Wget was
|
|
|
|
|
interrupted. Now you do not want to clobber the files already
|
|
|
|
|
present. It would be:
|
|
|
|
|
|
|
|
|
|
wget -nc -r http://www.gnu.ai.mit.edu/
|
|
|
|
|
|
|
|
|
|
* If you want to encode your own username and password to HTTP or
|
2000-11-15 05:44:18 -05:00
|
|
|
|
FTP, use the appropriate URL syntax (*note URL Format::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
wget ftp://hniksic:mypassword@jagor.srce.hr/.emacs
|
|
|
|
|
|
|
|
|
|
* If you do not like the default retrieval visualization (1K dots
|
|
|
|
|
with 10 dots per cluster and 50 dots per line), you can customize
|
2000-11-15 05:44:18 -05:00
|
|
|
|
it through dot settings (*note Wgetrc Commands::). For example,
|
1999-12-02 02:42:23 -05:00
|
|
|
|
many people like the "binary" style of retrieval, with 8K dots and
|
|
|
|
|
512K lines:
|
|
|
|
|
|
|
|
|
|
wget --dot-style=binary ftp://prep.ai.mit.edu/pub/gnu/README
|
|
|
|
|
|
|
|
|
|
You can experiment with other styles, like:
|
|
|
|
|
|
|
|
|
|
wget --dot-style=mega ftp://ftp.xemacs.org/pub/xemacs/xemacs-20.4/xemacs-20.4.tar.gz
|
2000-11-15 05:44:18 -05:00
|
|
|
|
wget --dot-style=micro http://fly.srk.fer.hr/
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
To make these settings permanent, put them in your `.wgetrc', as
|
2000-11-15 05:44:18 -05:00
|
|
|
|
described before (*note Sample Wgetrc::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Guru Usage, Prev: Advanced Usage, Up: Examples
|
|
|
|
|
|
|
|
|
|
Guru Usage
|
|
|
|
|
==========
|
|
|
|
|
|
|
|
|
|
* If you wish Wget to keep a mirror of a page (or FTP
|
|
|
|
|
subdirectories), use `--mirror' (`-m'), which is the shorthand for
|
|
|
|
|
`-r -N'. You can put Wget in the crontab file asking it to
|
|
|
|
|
recheck a site each Sunday:
|
|
|
|
|
|
|
|
|
|
crontab
|
|
|
|
|
0 0 * * 0 wget --mirror ftp://ftp.xemacs.org/pub/xemacs/ -o /home/me/weeklog
|
|
|
|
|
|
|
|
|
|
* You may wish to do the same with someone's home page. But you do
|
|
|
|
|
not want to download all those images--you're only interested in
|
|
|
|
|
HTML.
|
|
|
|
|
|
|
|
|
|
wget --mirror -A.html http://www.w3.org/
|
|
|
|
|
|
|
|
|
|
* But what about mirroring the hosts networkologically close to you?
|
|
|
|
|
It seems so awfully slow because of all that DNS resolving. Just
|
2000-11-15 05:44:18 -05:00
|
|
|
|
use `-D' (*note Domain Acceptance::).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
wget -rN -Dsrce.hr http://www.srce.hr/
|
|
|
|
|
|
|
|
|
|
Now Wget will correctly find out that `regoc.srce.hr' is the same
|
|
|
|
|
as `www.srce.hr', but will not even take into consideration the
|
|
|
|
|
link to `www.mit.edu'.
|
|
|
|
|
|
|
|
|
|
* You have a presentation and would like the dumb absolute links to
|
|
|
|
|
be converted to relative? Use `-k':
|
|
|
|
|
|
|
|
|
|
wget -k -r URL
|
|
|
|
|
|
|
|
|
|
* You would like the output documents to go to standard output
|
|
|
|
|
instead of to files? OK, but Wget will automatically shut up
|
|
|
|
|
(turn on `--quiet') to prevent mixing of Wget output and the
|
|
|
|
|
retrieved documents.
|
|
|
|
|
|
|
|
|
|
wget -O - http://jagor.srce.hr/ http://www.srce.hr/
|
|
|
|
|
|
|
|
|
|
You can also combine the two options and make weird pipelines to
|
|
|
|
|
retrieve the documents from remote hotlists:
|
|
|
|
|
|
|
|
|
|
wget -O - http://cool.list.com/ | wget --force-html -i -
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Various, Next: Appendices, Prev: Examples, Up: Top
|
|
|
|
|
|
|
|
|
|
Various
|
|
|
|
|
*******
|
|
|
|
|
|
|
|
|
|
This chapter contains all the stuff that could not fit anywhere else.
|
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* Proxies:: Support for proxy servers
|
|
|
|
|
* Distribution:: Getting the latest version.
|
|
|
|
|
* Mailing List:: Wget mailing list for announcements and discussion.
|
|
|
|
|
* Reporting Bugs:: How and where to report bugs.
|
|
|
|
|
* Portability:: The systems Wget works on.
|
|
|
|
|
* Signals:: Signal-handling performed by Wget.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Proxies, Next: Distribution, Prev: Various, Up: Various
|
|
|
|
|
|
|
|
|
|
Proxies
|
|
|
|
|
=======
|
|
|
|
|
|
|
|
|
|
"Proxies" are special-purpose HTTP servers designed to transfer data
|
|
|
|
|
from remote servers to local clients. One typical use of proxies is
|
|
|
|
|
lightening network load for users behind a slow connection. This is
|
|
|
|
|
achieved by channeling all HTTP and FTP requests through the proxy
|
|
|
|
|
which caches the transferred data. When a cached resource is requested
|
|
|
|
|
again, proxy will return the data from cache. Another use for proxies
|
|
|
|
|
is for companies that separate (for security reasons) their internal
|
|
|
|
|
networks from the rest of Internet. In order to obtain information
|
|
|
|
|
from the Web, their users connect and retrieve remote data using an
|
|
|
|
|
authorized proxy.
|
|
|
|
|
|
|
|
|
|
Wget supports proxies for both HTTP and FTP retrievals. The
|
|
|
|
|
standard way to specify proxy location, which Wget recognizes, is using
|
|
|
|
|
the following environment variables:
|
|
|
|
|
|
|
|
|
|
`http_proxy'
|
|
|
|
|
This variable should contain the URL of the proxy for HTTP
|
|
|
|
|
connections.
|
|
|
|
|
|
|
|
|
|
`ftp_proxy'
|
|
|
|
|
This variable should contain the URL of the proxy for HTTP
|
|
|
|
|
connections. It is quite common that HTTP_PROXY and FTP_PROXY are
|
|
|
|
|
set to the same URL.
|
|
|
|
|
|
|
|
|
|
`no_proxy'
|
|
|
|
|
This variable should contain a comma-separated list of domain
|
2000-11-15 05:44:18 -05:00
|
|
|
|
extensions proxy should _not_ be used for. For instance, if the
|
1999-12-02 02:42:23 -05:00
|
|
|
|
value of `no_proxy' is `.mit.edu', proxy will not be used to
|
|
|
|
|
retrieve documents from MIT.
|
|
|
|
|
|
|
|
|
|
In addition to the environment variables, proxy location and settings
|
|
|
|
|
may be specified from within Wget itself.
|
|
|
|
|
|
|
|
|
|
`-Y on/off'
|
|
|
|
|
`--proxy=on/off'
|
|
|
|
|
`proxy = on/off'
|
|
|
|
|
This option may be used to turn the proxy support on or off. Proxy
|
|
|
|
|
support is on by default, provided that the appropriate environment
|
|
|
|
|
variables are set.
|
|
|
|
|
|
|
|
|
|
`http_proxy = URL'
|
|
|
|
|
`ftp_proxy = URL'
|
|
|
|
|
`no_proxy = STRING'
|
|
|
|
|
These startup file variables allow you to override the proxy
|
|
|
|
|
settings specified by the environment.
|
|
|
|
|
|
|
|
|
|
Some proxy servers require authorization to enable you to use them.
|
|
|
|
|
The authorization consists of "username" and "password", which must be
|
|
|
|
|
sent by Wget. As with HTTP authorization, several authentication
|
|
|
|
|
schemes exist. For proxy authorization only the `Basic' authentication
|
|
|
|
|
scheme is currently implemented.
|
|
|
|
|
|
|
|
|
|
You may specify your username and password either through the proxy
|
|
|
|
|
URL or through the command-line options. Assuming that the company's
|
|
|
|
|
proxy is located at `proxy.srce.hr' at port 8001, a proxy URL location
|
|
|
|
|
containing authorization data might look like this:
|
|
|
|
|
|
|
|
|
|
http://hniksic:mypassword@proxy.company.com:8001/
|
|
|
|
|
|
|
|
|
|
Alternatively, you may use the `proxy-user' and `proxy-password'
|
|
|
|
|
options, and the equivalent `.wgetrc' settings `proxy_user' and
|
|
|
|
|
`proxy_passwd' to set the proxy username and password.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Distribution, Next: Mailing List, Prev: Proxies, Up: Various
|
|
|
|
|
|
|
|
|
|
Distribution
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
Like all GNU utilities, the latest version of Wget can be found at
|
|
|
|
|
the master GNU archive site prep.ai.mit.edu, and its mirrors. For
|
2000-02-29 20:03:39 -05:00
|
|
|
|
example, Wget 1.5.3+dev can be found at
|
2000-11-15 05:44:18 -05:00
|
|
|
|
<ftp://prep.ai.mit.edu/gnu/wget/wget-1.5.3+dev.tar.gz>
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Mailing List, Next: Reporting Bugs, Prev: Distribution, Up: Various
|
|
|
|
|
|
|
|
|
|
Mailing List
|
|
|
|
|
============
|
|
|
|
|
|
|
|
|
|
Wget has its own mailing list at <wget@sunsite.auc.dk>, thanks to
|
|
|
|
|
Karsten Thygesen. The mailing list is for discussion of Wget features
|
|
|
|
|
and web, reporting Wget bugs (those that you think may be of interest
|
|
|
|
|
to the public) and mailing announcements. You are welcome to
|
|
|
|
|
subscribe. The more people on the list, the better!
|
|
|
|
|
|
|
|
|
|
To subscribe, send mail to <wget-subscribe@sunsite.auc.dk>. the
|
|
|
|
|
magic word `subscribe' in the subject line. Unsubscribe by mailing to
|
|
|
|
|
<wget-unsubscribe@sunsite.auc.dk>.
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
The mailing list is archived at <http://fly.srk.fer.hr/archive/wget>.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Reporting Bugs, Next: Portability, Prev: Mailing List, Up: Various
|
|
|
|
|
|
|
|
|
|
Reporting Bugs
|
|
|
|
|
==============
|
|
|
|
|
|
|
|
|
|
You are welcome to send bug reports about GNU Wget to
|
|
|
|
|
<bug-wget@gnu.org>. The bugs that you think are of the interest to the
|
|
|
|
|
public (i.e. more people should be informed about them) can be Cc-ed to
|
|
|
|
|
the mailing list at <wget@sunsite.auc.dk>.
|
|
|
|
|
|
|
|
|
|
Before actually submitting a bug report, please try to follow a few
|
|
|
|
|
simple guidelines.
|
|
|
|
|
|
|
|
|
|
1. Please try to ascertain that the behaviour you see really is a
|
|
|
|
|
bug. If Wget crashes, it's a bug. If Wget does not behave as
|
|
|
|
|
documented, it's a bug. If things work strange, but you are not
|
|
|
|
|
sure about the way they are supposed to work, it might well be a
|
|
|
|
|
bug.
|
|
|
|
|
|
|
|
|
|
2. Try to repeat the bug in as simple circumstances as possible.
|
|
|
|
|
E.g. if Wget crashes on `wget -rLl0 -t5 -Y0 http://yoyodyne.com -o
|
|
|
|
|
/tmp/log', you should try to see if it will crash with a simpler
|
|
|
|
|
set of options.
|
|
|
|
|
|
|
|
|
|
Also, while I will probably be interested to know the contents of
|
|
|
|
|
your `.wgetrc' file, just dumping it into the debug message is
|
|
|
|
|
probably a bad idea. Instead, you should first try to see if the
|
|
|
|
|
bug repeats with `.wgetrc' moved out of the way. Only if it turns
|
|
|
|
|
out that `.wgetrc' settings affect the bug, should you mail me the
|
|
|
|
|
relevant parts of the file.
|
|
|
|
|
|
|
|
|
|
3. Please start Wget with `-d' option and send the log (or the
|
|
|
|
|
relevant parts of it). If Wget was compiled without debug support,
|
2000-11-15 05:44:18 -05:00
|
|
|
|
recompile it. It is _much_ easier to trace bugs with debug support
|
1999-12-02 02:42:23 -05:00
|
|
|
|
on.
|
|
|
|
|
|
|
|
|
|
4. If Wget has crashed, try to run it in a debugger, e.g. `gdb `which
|
|
|
|
|
wget` core' and type `where' to get the backtrace.
|
|
|
|
|
|
|
|
|
|
5. Find where the bug is, fix it and send me the patches. :-)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Portability, Next: Signals, Prev: Reporting Bugs, Up: Various
|
|
|
|
|
|
|
|
|
|
Portability
|
|
|
|
|
===========
|
|
|
|
|
|
|
|
|
|
Since Wget uses GNU Autoconf for building and configuring, and avoids
|
|
|
|
|
using "special" ultra-mega-cool features of any particular Unix, it
|
|
|
|
|
should compile (and work) on all common Unix flavors.
|
|
|
|
|
|
|
|
|
|
Various Wget versions have been compiled and tested under many kinds
|
|
|
|
|
of Unix systems, including Solaris, Linux, SunOS, OSF (aka Digital
|
|
|
|
|
Unix), Ultrix, *BSD, IRIX, and others; refer to the file `MACHINES' in
|
|
|
|
|
the distribution directory for a comprehensive list. If you compile it
|
|
|
|
|
on an architecture not listed there, please let me know so I can update
|
|
|
|
|
it.
|
|
|
|
|
|
|
|
|
|
Wget should also compile on the other Unix systems, not listed in
|
|
|
|
|
`MACHINES'. If it doesn't, please let me know.
|
|
|
|
|
|
|
|
|
|
Thanks to kind contributors, this version of Wget compiles and works
|
|
|
|
|
on Microsoft Windows 95 and Windows NT platforms. It has been compiled
|
|
|
|
|
successfully using MS Visual C++ 4.0, Watcom, and Borland C compilers,
|
|
|
|
|
with Winsock as networking software. Naturally, it is crippled of some
|
|
|
|
|
features available on Unix, but it should work as a substitute for
|
|
|
|
|
people stuck with Windows. Note that the Windows port is *neither
|
|
|
|
|
tested nor maintained* by me--all questions and problems should be
|
|
|
|
|
reported to Wget mailing list at <wget@sunsite.auc.dk> where the
|
|
|
|
|
maintainers will look at them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Signals, Prev: Portability, Up: Various
|
|
|
|
|
|
|
|
|
|
Signals
|
|
|
|
|
=======
|
|
|
|
|
|
|
|
|
|
Since the purpose of Wget is background work, it catches the hangup
|
|
|
|
|
signal (`SIGHUP') and ignores it. If the output was on standard
|
|
|
|
|
output, it will be redirected to a file named `wget-log'. Otherwise,
|
|
|
|
|
`SIGHUP' is ignored. This is convenient when you wish to redirect the
|
|
|
|
|
output of Wget after having started it.
|
|
|
|
|
|
|
|
|
|
$ wget http://www.ifi.uio.no/~larsi/gnus.tar.gz &
|
|
|
|
|
$ kill -HUP %% # Redirect the output to wget-log
|
|
|
|
|
|
|
|
|
|
Other than that, Wget will not try to interfere with signals in any
|
|
|
|
|
way. `C-c', `kill -TERM' and `kill -KILL' should kill it alike.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Appendices, Next: Copying, Prev: Various, Up: Top
|
|
|
|
|
|
|
|
|
|
Appendices
|
|
|
|
|
**********
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
This chapter contains some references I consider useful.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* Robots:: Wget as a WWW robot.
|
|
|
|
|
* Security Considerations:: Security with Wget.
|
|
|
|
|
* Contributors:: People who helped.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Robots, Next: Security Considerations, Prev: Appendices, Up: Appendices
|
|
|
|
|
|
|
|
|
|
Robots
|
|
|
|
|
======
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
It is extremely easy to make Wget wander aimlessly around a web site,
|
|
|
|
|
sucking all the available data in progress. `wget -r SITE', and you're
|
|
|
|
|
set. Great? Not for the server admin.
|
|
|
|
|
|
|
|
|
|
While Wget is retrieving static pages, there's not much of a problem.
|
|
|
|
|
But for Wget, there is no real difference between the smallest static
|
|
|
|
|
page and the hardest, most demanding CGI or dynamic page. For instance,
|
|
|
|
|
a site I know has a section handled by an, uh, bitchin' CGI script that
|
|
|
|
|
converts all the Info files to HTML. The script can and does bring the
|
|
|
|
|
machine to its knees without providing anything useful to the
|
|
|
|
|
downloader.
|
|
|
|
|
|
|
|
|
|
For such and similar cases various robot exclusion schemes have been
|
|
|
|
|
devised as a means for the server administrators and document authors to
|
|
|
|
|
protect chosen portions of their sites from the wandering of robots.
|
|
|
|
|
|
|
|
|
|
The more popular mechanism is the "Robots Exclusion Standard"
|
|
|
|
|
written by Martijn Koster et al. in 1994. It is specified by placing a
|
|
|
|
|
file named `/robots.txt' in the server root, which the robots are
|
|
|
|
|
supposed to download and parse. Wget supports this specification.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
Norobots support is turned on only when retrieving recursively, and
|
2000-11-15 05:44:18 -05:00
|
|
|
|
_never_ for the first page. Thus, you may issue:
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
wget -r http://fly.srk.fer.hr/
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
First the index of fly.srk.fer.hr will be downloaded. If Wget finds
|
|
|
|
|
anything worth downloading on the same host, only _then_ will it load
|
1999-12-02 02:42:23 -05:00
|
|
|
|
the robots, and decide whether or not to load the links after all.
|
2000-11-15 05:44:18 -05:00
|
|
|
|
`/robots.txt' is loaded only once per host.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Note that the exlusion standard discussed here has undergone some
|
|
|
|
|
revisions. However, but Wget supports only the first version of RES,
|
|
|
|
|
the one written by Martijn Koster in 1994, available at
|
|
|
|
|
<http://info.webcrawler.com/mak/projects/robots/norobots.html>. A
|
|
|
|
|
later version exists in the form of an internet draft
|
|
|
|
|
<draft-koster-robots-00.txt> titled "A Method for Web Robots Control",
|
|
|
|
|
which expired on June 4, 1997. I am not aware if it ever made to an
|
|
|
|
|
RFC. The text of the draft is available at
|
|
|
|
|
<http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>.
|
|
|
|
|
Wget does not yet support the new directives specified by this draft,
|
|
|
|
|
but we plan to add them.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
This manual no longer includes the text of the old standard.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
The second, less known mechanism, enables the author of an individual
|
|
|
|
|
document to specify whether they want the links from the file to be
|
|
|
|
|
followed by a robot. This is achieved using the `META' tag, like this:
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
<meta name="robots" content="nofollow">
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
This is explained in some detail at
|
|
|
|
|
<http://info.webcrawler.com/mak/projects/robots/meta-user.html>.
|
|
|
|
|
Unfortunately, Wget does not support this method of robot exclusion yet,
|
|
|
|
|
but it will be implemented in the next release.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
|
|
|
|
|
File: wget.info, Node: Security Considerations, Next: Contributors, Prev: Robots, Up: Appendices
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Security Considerations
|
|
|
|
|
=======================
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
When using Wget, you must be aware that it sends unencrypted
|
|
|
|
|
passwords through the network, which may present a security problem.
|
|
|
|
|
Here are the main issues, and some solutions.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
1. The passwords on the command line are visible using `ps'. If this
|
|
|
|
|
is a problem, avoid putting passwords from the command line--e.g.
|
|
|
|
|
you can use `.netrc' for this.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
2. Using the insecure "basic" authentication scheme, unencrypted
|
|
|
|
|
passwords are transmitted through the network routers and gateways.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
3. The FTP passwords are also in no way encrypted. There is no good
|
|
|
|
|
solution for this at the moment.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
4. Although the "normal" output of Wget tries to hide the passwords,
|
|
|
|
|
debugging logs show them, in all forms. This problem is avoided by
|
|
|
|
|
being careful when you send debug logs (yes, even when you send
|
|
|
|
|
them to me).
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
File: wget.info, Node: Contributors, Prev: Security Considerations, Up: Appendices
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Contributors
|
|
|
|
|
============
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
GNU Wget was written by Hrvoje Niksic <hniksic@arsdigita.com>.
|
|
|
|
|
However, its development could never have gone as far as it has, were it
|
|
|
|
|
not for the help of many people, either with bug reports, feature
|
|
|
|
|
proposals, patches, or letters saying "Thanks!".
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Special thanks goes to the following people (no particular order):
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Karsten Thygesen--donated system resources such as the mailing
|
|
|
|
|
list, web space, and FTP space, along with a lot of time to make
|
|
|
|
|
these actually work.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Shawn McHorse--bug reports and patches.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Kaveh R. Ghazi--on-the-fly `ansi2knr'-ization. Lots of
|
|
|
|
|
portability fixes.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Gordon Matzigkeit--`.netrc' support.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Zlatko Calusic, Tomislav Vujec and Drazen Kacar--feature
|
|
|
|
|
suggestions and "philosophical" discussions.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Darko Budor--initial port to Windows.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Antonio Rosella--help and suggestions, plus the Italian
|
|
|
|
|
translation.
|
|
|
|
|
|
|
|
|
|
* Tomislav Petrovic, Mario Mikocevic--many bug reports and
|
|
|
|
|
suggestions.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Francois Pinard--many thorough bug reports and discussions.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Karl Eichwalder--lots of help with internationalization and other
|
|
|
|
|
things.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Junio Hamano--donated support for Opie and HTTP `Digest'
|
|
|
|
|
authentication.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
* Brian Gough--a generous donation.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
The following people have provided patches, bug/build reports, useful
|
|
|
|
|
suggestions, beta testing services, fan mail and all the other things
|
|
|
|
|
that make maintenance so much fun:
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Tim Adam, Adrian Aichner, Martin Baehr, Dieter Baron, Roger Beeman
|
|
|
|
|
and the Gurus at Cisco, Dan Berger, Mark Boyns, John Burden, Wanderlei
|
|
|
|
|
Cavassin, Gilles Cedoc, Tim Charron, Noel Cragg, Kristijan Conkas, John
|
|
|
|
|
Daily, Andrew Davison, Andrew Deryabin, Ulrich Drepper, Marc Duponcheel,
|
|
|
|
|
Damir Dzeko, Aleksandar Erkalovic, Andy Eskilsson, Masashi Fujita,
|
|
|
|
|
Howard Gayle, Marcel Gerrits, Hans Grobler, Mathieu Guillaume, Dan
|
|
|
|
|
Harkless, Heiko Herold, Karl Heuer, HIROSE Masaaki, Gregor Hoffleit,
|
|
|
|
|
Erik Magnus Hulthen, Richard Huveneers, Simon Josefsson, Mario Juric,
|
|
|
|
|
Const Kaplinsky, Goran Kezunovic, Robert Kleine, Fila Kolodny,
|
|
|
|
|
Alexander Kourakos, Martin Kraemer, Simos KSenitellis, Hrvoje Lacko,
|
|
|
|
|
Daniel S. Lewart, Dave Love, Alexander V. Lukyanov, Jordan Mendelson,
|
|
|
|
|
Lin Zhe Min, Simon Munton, Charlie Negyesi, R. K. Owen, Andrew Pollock,
|
|
|
|
|
Steve Pothier, Jan Prikryl, Marin Purgar, Keith Refson, Tyler Riddle,
|
|
|
|
|
Tobias Ringstrom, Juan Jose Rodrigues, Edward J. Sabol, Heinz Salzmann,
|
|
|
|
|
Robert Schmidt, Andreas Schwab, Toomas Soome, Tage Stabell-Kulo, Sven
|
|
|
|
|
Sternberger, Markus Strasser, Szakacsits Szabolcs, Mike Thomas, Russell
|
|
|
|
|
Vincent, Charles G Waldman, Douglas E. Wegscheid, Jasmin Zainul, Bojan
|
|
|
|
|
Zdrnja, Kristijan Zimmer.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Apologies to all who I accidentally left out, and many thanks to all
|
|
|
|
|
the subscribers of the Wget mailing list.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
File: wget.info, Node: Copying, Next: Concept Index, Prev: Appendices, Up: Top
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Copying
|
|
|
|
|
*******
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
Wget is "free software", where "free" refers to liberty, not price.
|
|
|
|
|
The exact legal distribution terms follow below, but in short, it means
|
|
|
|
|
that you have the right (freedom) to run and change and copy Wget, and
|
|
|
|
|
even--if you want--charge money for any of those things. The sole
|
|
|
|
|
restriction is that you have to grant your recipients the same rights.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
This method of licensing software is also known as "open-source",
|
|
|
|
|
because it requires that the recipients always receive a program's
|
|
|
|
|
source code along with the program.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
More specifically:
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
This program is free software; you can redistribute it and/or
|
|
|
|
|
modify it under the terms of the GNU General Public License as
|
|
|
|
|
published by the Free Software Foundation; either version 2 of the
|
|
|
|
|
License, or (at your option) any later version.
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|
2000-11-15 05:44:18 -05:00
|
|
|
|
This program is distributed in the hope that it will be useful, but
|
|
|
|
|
WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
|
General Public License for more details.
|
|
|
|
|
|
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
|
|
|
along with this program; if not, write to the Free Software
|
|
|
|
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
|
|
|
|
|
|
|
|
|
In addition to this, this manual is free in the same sense:
|
|
|
|
|
|
|
|
|
|
Permission is granted to copy, distribute and/or modify this
|
|
|
|
|
document under the terms of the GNU Free Documentation License,
|
|
|
|
|
Version 1.1 or any later version published by the Free Software
|
|
|
|
|
Foundation; with the Invariant Sections being "GNU General Public
|
|
|
|
|
License" and "GNU Free Documentation License", with no Front-Cover
|
|
|
|
|
Texts, and with no Back-Cover Texts. A copy of the license is
|
|
|
|
|
included in the section entitled "GNU Free Documentation License".
|
|
|
|
|
|
|
|
|
|
The full texts of the GNU General Public License and of the GNU Free
|
|
|
|
|
Documentation License are available below.
|
|
|
|
|
|
|
|
|
|
* Menu:
|
|
|
|
|
|
|
|
|
|
* GNU General Public License::
|
|
|
|
|
* GNU Free Documentation License::
|
1999-12-02 02:42:23 -05:00
|
|
|
|
|