1
0
mirror of https://github.com/moparisthebest/wget synced 2024-07-03 16:38:41 -04:00

[svn] Robots doc changes.

Published at <sxsn1f1o6s2.fsf@florida.arsdigita.de>.
This commit is contained in:
hniksic 2000-11-15 02:44:18 -08:00
parent d889ef73f4
commit 1a5c5a006a
6 changed files with 797 additions and 544 deletions

View File

@ -1,3 +1,7 @@
2000-11-15 Hrvoje Niksic <hniksic@arsdigita.com>
* wget.texi (Robots): Rearrange text. Mention the meta tag.
2000-11-14 Hrvoje Niksic <hniksic@arsdigita.com>
* wget.texi: Add GFDL; remove norobots specification.

View File

@ -1,5 +1,4 @@
This is Info file wget.info, produced by Makeinfo version 1.68 from the
input file ./wget.texi.
This is wget.info, produced by makeinfo version 4.0 from wget.texi.
INFO-DIR-SECTION Net Utilities
INFO-DIR-SECTION World Wide Web
@ -16,73 +15,73 @@ data.
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the sections entitled "Copying" and "GNU General Public License"
are included exactly as in the original, and provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License" and "GNU Free
Documentation License", with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the section
entitled "GNU Free Documentation License".

Indirect:
wget.info-1: 961
wget.info-2: 48745
wget.info-3: 97411
wget.info-1: 1010
wget.info-2: 48842
wget.info-3: 94301

Tag Table:
(Indirect)
Node: Top961
Node: Overview1850
Node: Invoking5024
Node: URL Format5833
Node: Option Syntax8163
Node: Basic Startup Options9587
Node: Logging and Input File Options10287
Node: Download Options12812
Node: Directory Options20910
Node: HTTP Options23388
Node: FTP Options28104
Node: Recursive Retrieval Options30086
Node: Recursive Accept/Reject Options35107
Node: Recursive Retrieval38333
Node: Following Links40631
Node: Relative Links41659
Node: Host Checking42173
Node: Domain Acceptance44198
Node: All Hosts45868
Node: Types of Files46295
Node: Directory-Based Limits48745
Node: FTP Links51385
Node: Time-Stamping52255
Node: Time-Stamping Usage53892
Node: HTTP Time-Stamping Internals55461
Node: FTP Time-Stamping Internals56931
Node: Startup File58139
Node: Wgetrc Location59012
Node: Wgetrc Syntax59827
Node: Wgetrc Commands60542
Node: Sample Wgetrc68941
Node: Examples73960
Node: Simple Usage74567
Node: Advanced Usage76961
Node: Guru Usage79712
Node: Various81374
Node: Proxies81898
Node: Distribution84663
Node: Mailing List85014
Node: Reporting Bugs85713
Node: Portability87498
Node: Signals88873
Node: Appendices89527
Node: Robots89942
Node: Introduction to RES91089
Node: RES Format92982
Node: User-Agent Field94086
Node: Disallow Field94850
Node: Norobots Examples95461
Node: Security Considerations96415
Node: Contributors97411
Node: Copying100054
Node: Concept Index119217
Node: Top1010
Node: Overview1924
Node: Invoking5106
Node: URL Format5915
Ref: URL Format-Footnote-18143
Node: Option Syntax8245
Node: Basic Startup Options9670
Node: Logging and Input File Options10370
Node: Download Options12896
Node: Directory Options20995
Node: HTTP Options23477
Node: FTP Options28194
Node: Recursive Retrieval Options30177
Node: Recursive Accept/Reject Options35199
Node: Recursive Retrieval38426
Node: Following Links40724
Node: Relative Links41753
Node: Host Checking42267
Node: Domain Acceptance44293
Node: All Hosts45965
Node: Types of Files46392
Node: Directory-Based Limits48842
Node: FTP Links51482
Node: Time-Stamping52352
Node: Time-Stamping Usage53989
Node: HTTP Time-Stamping Internals55558
Ref: HTTP Time-Stamping Internals-Footnote-156829
Node: FTP Time-Stamping Internals57028
Node: Startup File58236
Node: Wgetrc Location59109
Node: Wgetrc Syntax59924
Node: Wgetrc Commands60639
Node: Sample Wgetrc69038
Node: Examples69562
Node: Simple Usage70169
Node: Advanced Usage72571
Node: Guru Usage75323
Node: Various76985
Node: Proxies77509
Node: Distribution80274
Node: Mailing List80625
Node: Reporting Bugs81325
Node: Portability83110
Node: Signals84485
Node: Appendices85139
Node: Robots85457
Node: Security Considerations88309
Node: Contributors89305
Node: Copying92189
Node: GNU General Public License94301
Node: GNU Free Documentation License113501
Node: Concept Index133231

End Tag Table

View File

@ -1,5 +1,4 @@
This is Info file wget.info, produced by Makeinfo version 1.68 from the
input file ./wget.texi.
This is wget.info, produced by makeinfo version 4.0 from wget.texi.
INFO-DIR-SECTION Net Utilities
INFO-DIR-SECTION World Wide Web
@ -16,12 +15,13 @@ data.
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the sections entitled "Copying" and "GNU General Public License"
are included exactly as in the original, and provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License" and "GNU Free
Documentation License", with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the section
entitled "GNU Free Documentation License".

File: wget.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir)
@ -32,7 +32,7 @@ Wget 1.5.3+dev
This manual documents version 1.5.3+dev of GNU Wget, the freely
available utility for network download.
Copyright (C) 1996, 1997, 1998 Free Software Foundation, Inc.
Copyright (C) 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
* Menu:
@ -45,7 +45,7 @@ available utility for network download.
* Examples:: Examples of usage.
* Various:: The stuff that doesn't fit anywhere else.
* Appendices:: Some useful references.
* Copying:: You may give out copies of Wget.
* Copying:: You may give out copies of Wget and of this manual.
* Concept Index:: Topics covered by this manual.

@ -67,13 +67,15 @@ being:
constant user's presence, which can be a great hindrance when
transferring a lot of data.
* Wget is capable of descending recursively through the structure of
HTML documents and FTP directory trees, making a local copy of the
directory hierarchy similar to the one on the remote server. This
feature can be used to mirror archives and home pages, or traverse
the web in search of data, like a WWW robot (*Note Robots::). In
the web in search of data, like a WWW robot (*note Robots::). In
that spirit, Wget understands the `norobots' convention.
* File name wildcard matching and recursive mirroring of directories
are available when retrieving via FTP. Wget can read the
time-stamp information given by both HTTP and FTP servers, and
@ -82,12 +84,14 @@ being:
version if it has. This makes Wget suitable for mirroring of FTP
sites, as well as home pages.
* Wget works exceedingly well on slow or unstable connections,
retrying the document until it is fully retrieved, or until a
user-specified retry count is surpassed. It will try to resume the
download from the point of interruption, using `REST' with FTP and
`Range' with HTTP servers that support them.
* By default, Wget supports proxy servers, which can lighten the
network load, speed up retrieval and provide access behind
firewalls. However, if you are behind a firewall that requires
@ -95,23 +99,27 @@ being:
and build wget with support for socks. Wget also supports the
passive FTP downloading as an option.
* Builtin features offer mechanisms to tune which links you wish to
follow (*Note Following Links::).
follow (*note Following Links::).
* The retrieval is conveniently traced with printing dots, each dot
representing a fixed amount of data received (1KB by default).
These representations can be customized to your preferences.
* Most of the features are fully configurable, either through
command line options, or via the initialization file `.wgetrc'
(*Note Startup File::). Wget allows you to define "global"
(*note Startup File::). Wget allows you to define "global"
startup files (`/usr/local/etc/wgetrc' by default) for site
settings.
* Finally, GNU Wget is free software. This means that everyone may
use it, redistribute it and/or modify it under the terms of the
GNU General Public License, as published by the Free Software
Foundation (*Note Copying::).
Foundation (*note Copying::).

File: wget.info, Node: Invoking, Next: Recursive Retrieval, Prev: Overview, Up: Top
@ -128,7 +136,7 @@ line. URL is a "Uniform Resource Locator", as defined below.
However, you may wish to change some of the default parameters of
Wget. You can do it two ways: permanently, adding the appropriate
command to `.wgetrc' (*Note Startup File::), or specifying it on the
command to `.wgetrc' (*note Startup File::), or specifying it on the
command line.
* Menu:
@ -218,7 +226,7 @@ remember, but take time to type. You may freely mix different option
styles, or specify options after the command-line arguments. Thus you
may write:
wget -r --tries=10 http://fly.cc.fer.hr/ -o log
wget -r --tries=10 http://fly.srk.fer.hr/ -o log
The space between the option accepting an argument and the argument
may be omitted. Instead `-o log' you can write `-olog'.
@ -243,7 +251,7 @@ convention that specifying an empty list clears its value. This can be
useful to clear the `.wgetrc' settings. For instance, if your `.wgetrc'
sets `exclude_directories' to `/cgi-bin', the following example will
first reset it, and then set it to exclude `/~nobody' and `/~somebody'.
You can also clear the lists in `.wgetrc' (*Note Wgetrc Syntax::).
You can also clear the lists in `.wgetrc' (*note Wgetrc Syntax::).
wget -X '' -X /~nobody,/~somebody
@ -268,8 +276,8 @@ Basic Startup Options
`-e COMMAND'
`--execute COMMAND'
Execute COMMAND as if it were a part of `.wgetrc' (*Note Startup
File::). A command thus invoked will be executed *after* the
Execute COMMAND as if it were a part of `.wgetrc' (*note Startup
File::). A command thus invoked will be executed _after_ the
commands in `.wgetrc', thus taking precedence over them.

@ -296,8 +304,8 @@ Logging and Input File Options
administrator may have chosen to compile Wget without debug
support, in which case `-d' will not work. Please note that
compiling with debug support is always safe--Wget compiled with
the debug support will *not* print any debug info unless requested
with `-d'. *Note Reporting Bugs:: for more information on how to
the debug support will _not_ print any debug info unless requested
with `-d'. *Note Reporting Bugs::, for more information on how to
use `-d' for sending bug reports.
`-q'
@ -392,7 +400,7 @@ Download Options
When running wget with `-N', with or without `-r', the decision as
to whether or not to download a newer copy of a file depends on
the local and remote timestamp and size of the file (*Note
the local and remote timestamp and size of the file (*note
Time-Stamping::). `-nc' may not be specified at the same time as
`-N'.
@ -449,7 +457,7 @@ Download Options
`-N'
`--timestamping'
Turn on time-stamping. *Note Time-Stamping:: for details.
Turn on time-stamping. *Note Time-Stamping::, for details.
`-S'
`--server-response'
@ -491,7 +499,7 @@ Download Options
retry.
`--waitretry=SECONDS'
If you don't want Wget to wait between *every* retrieval, but only
If you don't want Wget to wait between _every_ retrieval, but only
between retries of failed downloads, you can use this option.
Wget will use "linear backoff", waiting 1 second after the first
failure on a given file, then waiting 2 seconds after the second
@ -540,14 +548,14 @@ Directory Options
`--force-directories'
The opposite of `-nd'--create a hierarchy of directories, even if
one would not have been created otherwise. E.g. `wget -x
http://fly.cc.fer.hr/robots.txt' will save the downloaded file to
`fly.cc.fer.hr/robots.txt'.
http://fly.srk.fer.hr/robots.txt' will save the downloaded file to
`fly.srk.fer.hr/robots.txt'.
`-nH'
`--no-host-directories'
Disable generation of host-prefixed directories. By default,
invoking Wget with `-r http://fly.cc.fer.hr/' will create a
structure of directories beginning with `fly.cc.fer.hr/'. This
invoking Wget with `-r http://fly.srk.fer.hr/' will create a
structure of directories beginning with `fly.srk.fer.hr/'. This
option disables such behavior.
`--cut-dirs=NUMBER'
@ -609,7 +617,7 @@ HTTP Options
doesn't yet know that the URL produces output of type `text/html'.
To prevent this re-downloading, you must use `-k' and `-K' so
that the original version of the file will be saved as `X.orig'
(*Note Recursive Retrieval Options::).
(*note Recursive Retrieval Options::).
`--http-user=USER'
`--http-passwd=PASSWORD'
@ -619,7 +627,7 @@ HTTP Options
scheme.
Another way to specify username and password is in the URL itself
(*Note URL Format::). For more information about security issues
(*note URL Format::). For more information about security issues
with Wget, *Note Security Considerations::.
`-C on/off'
@ -653,7 +661,7 @@ HTTP Options
wget --header='Accept-Charset: iso-8859-2' \
--header='Accept-Language: hr' \
http://fly.cc.fer.hr/
http://fly.srk.fer.hr/
Specification of an empty string as the header value will clear all
previous user-defined headers.
@ -727,7 +735,7 @@ FTP Options
and `]' to retrieve more than one file from the same directory at
once, like:
wget ftp://gnjilux.cc.fer.hr/*.msg
wget ftp://gnjilux.srk.fer.hr/*.msg
By default, globbing will be turned on if the URL contains a
globbing character. This option may be used to turn globbing on
@ -751,17 +759,17 @@ Recursive Retrieval Options
`-r'
`--recursive'
Turn on recursive retrieving. *Note Recursive Retrieval:: for more
details.
Turn on recursive retrieving. *Note Recursive Retrieval::, for
more details.
`-l DEPTH'
`--level=DEPTH'
Specify recursion maximum depth level DEPTH (*Note Recursive
Specify recursion maximum depth level DEPTH (*note Recursive
Retrieval::). The default maximum depth is 5.
`--delete-after'
This option tells Wget to delete every single file it downloads,
*after* having done so. It is useful for pre-fetching popular
_after_ having done so. It is useful for pre-fetching popular
pages through a proxy, e.g.:
wget -r -nd --delete-after http://whatever.com/~popular/page/
@ -788,7 +796,7 @@ Recursive Retrieval Options
`-K'
`--backup-converted'
When converting a file, back up the original version with a `.orig'
suffix. Affects the behavior of `-N' (*Note HTTP Time-Stamping
suffix. Affects the behavior of `-N' (*note HTTP Time-Stamping
Internals::).
`-m'
@ -837,7 +845,7 @@ Recursive Retrieval Options
wget -r -l 2 -p http://SITE/1.html
all the above files *and* `3.html''s requisite `3.gif' will be
all the above files _and_ `3.html''s requisite `3.gif' will be
downloaded. Similarly,
wget -r -l 1 -p http://SITE/1.html
@ -879,18 +887,18 @@ Recursive Accept/Reject Options
`-A ACCLIST --accept ACCLIST'
`-R REJLIST --reject REJLIST'
Specify comma-separated lists of file name suffixes or patterns to
accept or reject (*Note Types of Files:: for more details).
accept or reject (*note Types of Files:: for more details).
`-D DOMAIN-LIST'
`--domains=DOMAIN-LIST'
Set domains to be accepted and DNS looked-up, where DOMAIN-LIST is
a comma-separated list. Note that it does *not* turn on `-H'.
a comma-separated list. Note that it does _not_ turn on `-H'.
This option speeds things up, even if only one host is spanned
(*Note Domain Acceptance::).
(*note Domain Acceptance::).
`--exclude-domains DOMAIN-LIST'
Exclude the domains given in a comma-separated DOMAIN-LIST from
DNS-lookup (*Note Domain Acceptance::).
DNS-lookup (*note Domain Acceptance::).
`--follow-ftp'
Follow FTP links from HTML documents. Without this option, Wget
@ -924,29 +932,29 @@ Recursive Accept/Reject Options
`-H'
`--span-hosts'
Enable spanning across hosts when doing recursive retrieving
(*Note All Hosts::).
(*note All Hosts::).
`-L'
`--relative'
Follow relative links only. Useful for retrieving a specific home
page without any distractions, not even those from the same hosts
(*Note Relative Links::).
(*note Relative Links::).
`-I LIST'
`--include-directories=LIST'
Specify a comma-separated list of directories you wish to follow
when downloading (*Note Directory-Based Limits:: for more
when downloading (*note Directory-Based Limits:: for more
details.) Elements of LIST may contain wildcards.
`-X LIST'
`--exclude-directories=LIST'
Specify a comma-separated list of directories you wish to exclude
from download (*Note Directory-Based Limits:: for more details.)
from download (*note Directory-Based Limits:: for more details.)
Elements of LIST may contain wildcards.
`-nh'
`--no-host-lookup'
Disable the time-consuming DNS lookup of almost all hosts (*Note
Disable the time-consuming DNS lookup of almost all hosts (*note
Host Checking::).
`-np'
@ -954,8 +962,8 @@ Recursive Accept/Reject Options
`--no-parent'
Do not ever ascend to the parent directory when retrieving
recursively. This is a useful option, since it guarantees that
only the files *below* a certain hierarchy will be downloaded.
*Note Directory-Based Limits:: for more details.
only the files _below_ a certain hierarchy will be downloaded.
*Note Directory-Based Limits::, for more details.

File: wget.info, Node: Recursive Retrieval, Next: Following Links, Prev: Invoking, Up: Top
@ -1003,7 +1011,7 @@ which can grind the machine to a halt.
(`-l') and/or by lowering the number of retries (`-t'). You may also
consider using the `-w' option to slow down your requests to the remote
servers, as well as the numerous options to narrow the number of
followed links (*Note Following Links::).
followed links (*note Following Links::).
Recursive retrieval is a good thing when used properly. Please take
all precautions not to wreak havoc through carelessness.
@ -1019,7 +1027,7 @@ unnecessary data. Most of the time the users bear in mind exactly what
they want to download, and want Wget to follow only specific links.
For example, if you wish to download the music archive from
`fly.cc.fer.hr', you will not want to download all the home pages that
`fly.srk.fer.hr', you will not want to download all the home pages that
happen to be referenced by an obscure part of the archive.
Wget possesses several mechanisms that allows you to fine-tune which
@ -1061,13 +1069,13 @@ following links) all URLs that refer to the same host will be retrieved.
The problem with this option are the aliases of the hosts and
domains. Thus there is no way for Wget to know that `regoc.srce.hr' and
`www.srce.hr' are the same host, or that `fly.cc.fer.hr' is the same as
`fly.cc.etf.hr'. Whenever an absolute link is encountered, the host is
DNS-looked-up with `gethostbyname' to check whether we are maybe
`www.srce.hr' are the same host, or that `fly.srk.fer.hr' is the same
as `fly.cc.fer.hr'. Whenever an absolute link is encountered, the host
is DNS-looked-up with `gethostbyname' to check whether we are maybe
dealing with the same hosts. Although the results of `gethostbyname'
are cached, it is still a great slowdown, e.g. when dealing with large
indices of home pages on different hosts (because each of the hosts
must be DNS-resolved to see whether it just *might* be an alias of the
must be DNS-resolved to see whether it just _might_ be an alias of the
starting host).
To avoid the overhead you may use `-nh', which will turn off
@ -1079,7 +1087,7 @@ and `regoc.srce.hr' will be flagged as different hosts).
"virtual servers", each having its own directory hierarchy. Such
"servers" are distinguished by their hostnames (all of which point to
the same IP address); for this to work, a client must send a `Host'
header, which is what Wget does. However, in that case Wget *must not*
header, which is what Wget does. However, in that case Wget _must not_
try to divine a host's "real" address, nor try to use the same hostname
for each access, i.e. `-nh' must be turned on.
@ -1098,17 +1106,17 @@ Domain Acceptance
followed. The hosts the domain of which is not in this list will not be
DNS-resolved. Thus you can specify `-Dmit.edu' just to make sure that
*nothing outside of MIT gets looked up*. This is very important and
useful. It also means that `-D' does *not* imply `-H' (span all
useful. It also means that `-D' does _not_ imply `-H' (span all
hosts), which must be specified explicitly. Feel free to use this
options since it will speed things up, with almost all the reliability
of checking for all hosts. Thus you could invoke
wget -r -D.hr http://fly.cc.fer.hr/
wget -r -D.hr http://fly.srk.fer.hr/
to make sure that only the hosts in `.hr' domain get DNS-looked-up
for being equal to `fly.cc.fer.hr'. So `fly.cc.etf.hr' will be checked
(only once!) and found equal, but `www.gnu.ai.mit.edu' will not even be
checked.
for being equal to `fly.srk.fer.hr'. So `fly.cc.fer.hr' will be
checked (only once!) and found equal, but `www.gnu.ai.mit.edu' will not
even be checked.
Of course, domain acceptance can be used to limit the retrieval to
particular domains with spanning of hosts in them, but then you must
@ -1121,7 +1129,7 @@ and Stanford.
If there are domains you want to exclude specifically, you can do it
with `--exclude-domains', which accepts the same type of arguments of
`-D', but will *exclude* all the listed domains. For example, if you
`-D', but will _exclude_ all the listed domains. For example, if you
want to download all the hosts from `foo.edu' domain, with the
exception of `sunsite.foo.edu', you can do it like this:
@ -1177,7 +1185,7 @@ in `.wgetrc'.
`--reject REJLIST'
`reject = REJLIST'
The `--reject' option works the same way as `--accept', only its
logic is the reverse; Wget will download all files *except* the
logic is the reverse; Wget will download all files _except_ the
ones matching the suffixes (or patterns) in the list.
So, if you want to download a whole page except for the cumbersome
@ -1189,7 +1197,7 @@ in `.wgetrc'.
The `-A' and `-R' options may be combined to achieve even better
fine-tuning of which files to retrieve. E.g. `wget -A "*zelazny*" -R
.ps' will download all the files having `zelazny' as a part of their
name, but *not* the PostScript files.
name, but _not_ the PostScript files.
Note that these two options do not affect the downloading of HTML
files; Wget must load all the HTMLs to know where to go at

View File

@ -1,5 +1,4 @@
This is Info file wget.info, produced by Makeinfo version 1.68 from the
input file ./wget.texi.
This is wget.info, produced by makeinfo version 4.0 from wget.texi.
INFO-DIR-SECTION Net Utilities
INFO-DIR-SECTION World Wide Web
@ -16,12 +15,13 @@ data.
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the sections entitled "Copying" and "GNU General Public License"
are included exactly as in the original, and provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License" and "GNU Free
Documentation License", with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the section
entitled "GNU Free Documentation License".

File: wget.info, Node: Directory-Based Limits, Next: FTP Links, Prev: Types of Files, Up: Following Links
@ -57,7 +57,7 @@ equivalent command in `.wgetrc'.
`--exclude LIST'
`exclude_directories = LIST'
`-X' option is exactly the reverse of `-I'--this is a list of
directories *excluded* from the download. E.g. if you do not want
directories _excluded_ from the download. E.g. if you do not want
Wget to download things from `/cgi-bin' directory, specify `-X
/cgi-bin' on the command line.
@ -184,7 +184,7 @@ remote file is more recent, Wget will proceed fetching it normally.
`ls' will show that the timestamps are set according to the state on
the remote server. Reissuing the command with `-N' will make Wget
re-fetch *only* the files that have been modified.
re-fetch _only_ the files that have been modified.
In both HTTP and FTP retrieval Wget will time-stamp the local file
correctly (with or without `-N') if it gets the stamps, i.e. gets the
@ -300,7 +300,7 @@ further attempts will be made.
If `WGETRC' is not set, Wget will try to load `$HOME/.wgetrc'.
The fact that user's settings are loaded after the system-wide ones
means that in case of collision user's wgetrc *overrides* the
means that in case of collision user's wgetrc _overrides_ the
system-wide wgetrc (in `/usr/local/etc/wgetrc' by default). Fascist
admins, away!
@ -346,11 +346,11 @@ hostnames or dotted-quad IP addresses. N can be any positive integer,
or `inf' for infinity, where appropriate. STRING values can be any
non-empty string.
Most of these commands have commandline equivalents (*Note
Most of these commands have commandline equivalents (*note
Invoking::), though some of the more obscure or rarely used ones do not.
accept/reject = STRING
Same as `-A'/`-R' (*Note Types of Files::).
Same as `-A'/`-R' (*note Types of Files::).
add_hostdir = on/off
Enable/disable host-prefixed file names. `-nH' disables it.
@ -397,14 +397,14 @@ dirstruct = on/off
respectively.
domains = STRING
Same as `-D' (*Note Domain Acceptance::).
Same as `-D' (*note Domain Acceptance::).
dot_bytes = N
Specify the number of bytes "contained" in a dot, as seen
throughout the retrieval (1024 by default). You can postfix the
value with `k' or `m', representing kilobytes and megabytes,
respectively. With dot settings you can tailor the dot retrieval
to suit your needs, or you can use the predefined "styles" (*Note
to suit your needs, or you can use the predefined "styles" (*note
Download Options::).
dots_in_line = N
@ -419,10 +419,10 @@ dot_style = STRING
exclude_directories = STRING
Specify a comma-separated list of directories you wish to exclude
from download - the same as `-X' (*Note Directory-Based Limits::).
from download - the same as `-X' (*note Directory-Based Limits::).
exclude_domains = STRING
Same as `--exclude-domains' (*Note Domain Acceptance::).
Same as `--exclude-domains' (*note Domain Acceptance::).
follow_ftp = on/off
Follow FTP links from HTML documents - the same as `-f'.
@ -497,7 +497,7 @@ noclobber = on/off
no_parent = on/off
Disallow retrieving outside the directory hierarchy, like
`--no-parent' (*Note Directory-Based Limits::).
`--no-parent' (*note Directory-Based Limits::).
no_proxy = STRING
Use STRING as the comma-separated list of domains to avoid in
@ -550,7 +550,7 @@ recursive = on/off
Recursive on/off - the same as `-r'.
relative_only = on/off
Follow only relative links - the same as `-L' (*Note Relative
Follow only relative links - the same as `-L' (*note Relative
Links::).
remove_listing = on/off
@ -562,7 +562,7 @@ retr_symlinks = on/off
files; the same as `--retr-symlinks'.
robots = on/off
Use (or not) `/robots.txt' file (*Note Robots::). Be sure to know
Use (or not) `/robots.txt' file (*note Robots::). Be sure to know
what you are doing before changing the default (which is `on').
server_response = on/off
@ -570,7 +570,7 @@ server_response = on/off
the same as `-S'.
simple_host_check = on/off
Same as `-nh' (*Note Host Checking::).
Same as `-nh' (*note Host Checking::).
span_hosts = on/off
Same as `-H'.
@ -579,7 +579,7 @@ timeout = N
Set timeout value - the same as `-T'.
timestamping = on/off
Turn timestamping on/off. The same as `-N' (*Note Time-Stamping::).
Turn timestamping on/off. The same as `-N' (*note Time-Stamping::).
tries = N
Set number of retries per URL - the same as `-t'.
@ -613,114 +613,6 @@ Be careful about the things you change.
have any effect, you must remove the `#' character at the beginning of
its line.
###
### Sample Wget initialization file .wgetrc
###
## You can use this file to change the default behaviour of wget or to
## avoid having to type many many command-line options. This file does
## not contain a comprehensive list of commands -- look at the manual
## to find out what you can put into this file.
##
## Wget initialization file can reside in /usr/local/etc/wgetrc
## (global, for all users) or $HOME/.wgetrc (for a single user).
##
## To use the settings in this file, you will have to uncomment them,
## as well as change them, in most cases, as the values on the
## commented-out lines are the default values (e.g. "off").
##
## Global settings (useful for setting up in /usr/local/etc/wgetrc).
## Think well before you change them, since they may reduce wget's
## functionality, and make it behave contrary to the documentation:
##
# You can set retrieve quota for beginners by specifying a value
# optionally followed by 'K' (kilobytes) or 'M' (megabytes). The
# default quota is unlimited.
#quota = inf
# You can lower (or raise) the default number of retries when
# downloading a file (default is 20).
#tries = 20
# Lowering the maximum depth of the recursive retrieval is handy to
# prevent newbies from going too "deep" when they unwittingly start
# the recursive retrieval. The default is 5.
#reclevel = 5
# Many sites are behind firewalls that do not allow initiation of
# connections from the outside. On these sites you have to use the
# `passive' feature of FTP. If you are behind such a firewall, you
# can turn this on to make Wget use passive FTP by default.
#passive_ftp = off
# The "wait" command below makes Wget wait between every connection.
# If, instead, you want Wget to wait only between retries of failed
# downloads, set waitretry to maximum number of seconds to wait (Wget
# will use "linear backoff", waiting 1 second after the first failure
# on a file, 2 seconds after the second failure, etc. up to this max).
waitretry = 10
##
## Local settings (for a user to set in his $HOME/.wgetrc). It is
## *highly* undesirable to put these settings in the global file, since
## they are potentially dangerous to "normal" users.
##
## Even when setting up your own ~/.wgetrc, you should know what you
## are doing before doing so.
##
# Set this to on to use timestamping by default:
#timestamping = off
# It is a good idea to make Wget send your email address in a `From:'
# header with your request (so that server administrators can contact
# you in case of errors). Wget does *not* send `From:' by default.
#header = From: Your Name <username@site.domain>
# You can set up other headers, like Accept-Language. Accept-Language
# is *not* sent by default.
#header = Accept-Language: en
# You can set the default proxy for Wget to use. It will override the
# value in the environment.
#http_proxy = http://proxy.yoyodyne.com:18023/
# If you do not want to use proxy at all, set this to off.
#use_proxy = on
# You can customize the retrieval outlook. Valid options are default,
# binary, mega and micro.
#dot_style = default
# Setting this to off makes Wget not download /robots.txt. Be sure to
# know *exactly* what /robots.txt is and how it is used before changing
# the default!
#robots = on
# It can be useful to make Wget wait between connections. Set this to
# the number of seconds you want Wget to wait.
#wait = 0
# You can force creating directory structure, even if a single is being
# retrieved, by setting this to on.
#dirstruct = off
# You can turn on recursive retrieving by default (don't do this if
# you are not sure you know what it means) by setting this to on.
#recursive = off
# To always back up file X as X.orig before converting its links (due
# to -k / --convert-links / convert_links = on having been specified),
# set this variable to on:
#backup_converted = off
# To have Wget follow FTP links from HTML files by default, set this
# to on:
#follow_ftp = off

File: wget.info, Node: Examples, Next: Various, Prev: Startup File, Up: Top
@ -748,13 +640,13 @@ Simple Usage
* Say you want to download a URL. Just type:
wget http://fly.cc.fer.hr/
wget http://fly.srk.fer.hr/
The response will be something like:
--13:30:45-- http://fly.cc.fer.hr:80/en/
--13:30:45-- http://fly.srk.fer.hr:80/en/
=> `index.html'
Connecting to fly.cc.fer.hr:80... connected!
Connecting to fly.srk.fer.hr:80... connected!
HTTP request sent, awaiting response... 200 OK
Length: 4,694 [text/html]
@ -770,13 +662,13 @@ Simple Usage
the number of tries to 45, to insure that the whole file will
arrive safely:
wget --tries=45 http://fly.cc.fer.hr/jpg/flyweb.jpg
wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
* Now let's leave Wget to work in the background, and write its
progress to log file `log'. It is tiring to type `--tries', so we
shall use `-t'.
wget -t 45 -o log http://fly.cc.fer.hr/jpg/flyweb.jpg &
wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
The ampersand at the end of the line makes sure that Wget works in
the background. To unlimit the number of retries, use `-t inf'.
@ -784,10 +676,10 @@ Simple Usage
* The usage of FTP is as simple. Wget will take care of login and
password.
$ wget ftp://gnjilux.cc.fer.hr/welcome.msg
--10:08:47-- ftp://gnjilux.cc.fer.hr:21/welcome.msg
$ wget ftp://gnjilux.srk.fer.hr/welcome.msg
--10:08:47-- ftp://gnjilux.srk.fer.hr:21/welcome.msg
=> `welcome.msg'
Connecting to gnjilux.cc.fer.hr:21... connected!
Connecting to gnjilux.srk.fer.hr:21... connected!
Logging in as anonymous ... Logged in!
==> TYPE I ... done. ==> CWD not needed.
==> PORT ... done. ==> RETR welcome.msg ... done.
@ -848,9 +740,9 @@ Advanced Usage
wget -r -l1 --no-parent -A.gif http://host/dir/
It is a bit of a kludge, but it works. `-r -l1' means to retrieve
recursively (*Note Recursive Retrieval::), with maximum depth of 1.
recursively (*note Recursive Retrieval::), with maximum depth of 1.
`--no-parent' means that references to the parent directory are
ignored (*Note Directory-Based Limits::), and `-A.gif' means to
ignored (*note Directory-Based Limits::), and `-A.gif' means to
download only the GIF files. `-A "*.gif"' would have worked too.
* Suppose you were in the middle of downloading, when Wget was
@ -860,13 +752,13 @@ Advanced Usage
wget -nc -r http://www.gnu.ai.mit.edu/
* If you want to encode your own username and password to HTTP or
FTP, use the appropriate URL syntax (*Note URL Format::).
FTP, use the appropriate URL syntax (*note URL Format::).
wget ftp://hniksic:mypassword@jagor.srce.hr/.emacs
* If you do not like the default retrieval visualization (1K dots
with 10 dots per cluster and 50 dots per line), you can customize
it through dot settings (*Note Wgetrc Commands::). For example,
it through dot settings (*note Wgetrc Commands::). For example,
many people like the "binary" style of retrieval, with 8K dots and
512K lines:
@ -875,10 +767,10 @@ Advanced Usage
You can experiment with other styles, like:
wget --dot-style=mega ftp://ftp.xemacs.org/pub/xemacs/xemacs-20.4/xemacs-20.4.tar.gz
wget --dot-style=micro http://fly.cc.fer.hr/
wget --dot-style=micro http://fly.srk.fer.hr/
To make these settings permanent, put them in your `.wgetrc', as
described before (*Note Sample Wgetrc::).
described before (*note Sample Wgetrc::).

File: wget.info, Node: Guru Usage, Prev: Advanced Usage, Up: Examples
@ -902,7 +794,7 @@ Guru Usage
* But what about mirroring the hosts networkologically close to you?
It seems so awfully slow because of all that DNS resolving. Just
use `-D' (*Note Domain Acceptance::).
use `-D' (*note Domain Acceptance::).
wget -rN -Dsrce.hr http://www.srce.hr/
@ -976,7 +868,7 @@ the following environment variables:
`no_proxy'
This variable should contain a comma-separated list of domain
extensions proxy should *not* be used for. For instance, if the
extensions proxy should _not_ be used for. For instance, if the
value of `no_proxy' is `.mit.edu', proxy will not be used to
retrieve documents from MIT.
@ -1022,7 +914,7 @@ Distribution
Like all GNU utilities, the latest version of Wget can be found at
the master GNU archive site prep.ai.mit.edu, and its mirrors. For
example, Wget 1.5.3+dev can be found at
`ftp://prep.ai.mit.edu/gnu/wget/wget-1.5.3+dev.tar.gz'
<ftp://prep.ai.mit.edu/gnu/wget/wget-1.5.3+dev.tar.gz>

File: wget.info, Node: Mailing List, Next: Reporting Bugs, Prev: Distribution, Up: Various
@ -1040,7 +932,7 @@ subscribe. The more people on the list, the better!
magic word `subscribe' in the subject line. Unsubscribe by mailing to
<wget-unsubscribe@sunsite.auc.dk>.
The mailing list is archived at `http://fly.cc.fer.hr/archive/wget'.
The mailing list is archived at <http://fly.srk.fer.hr/archive/wget>.

File: wget.info, Node: Reporting Bugs, Next: Portability, Prev: Mailing List, Up: Various
@ -1076,7 +968,7 @@ simple guidelines.
3. Please start Wget with `-d' option and send the log (or the
relevant parts of it). If Wget was compiled without debug support,
recompile it. It is *much* easier to trace bugs with debug support
recompile it. It is _much_ easier to trace bugs with debug support
on.
4. If Wget has crashed, try to run it in a debugger, e.g. `gdb `which
@ -1138,9 +1030,7 @@ File: wget.info, Node: Appendices, Next: Copying, Prev: Various, Up: Top
Appendices
**********
This chapter contains some references I consider useful, like the
Robots Exclusion Standard specification, as well as a list of
contributors to GNU Wget.
This chapter contains some references I consider useful.
* Menu:
@ -1154,176 +1044,61 @@ File: wget.info, Node: Robots, Next: Security Considerations, Prev: Appendice
Robots
======
Since Wget is able to traverse the web, it counts as one of the Web
"robots". Thus Wget understands "Robots Exclusion Standard"
(RES)--contents of `/robots.txt', used by server administrators to
shield parts of their systems from wanderings of Wget.
It is extremely easy to make Wget wander aimlessly around a web site,
sucking all the available data in progress. `wget -r SITE', and you're
set. Great? Not for the server admin.
While Wget is retrieving static pages, there's not much of a problem.
But for Wget, there is no real difference between the smallest static
page and the hardest, most demanding CGI or dynamic page. For instance,
a site I know has a section handled by an, uh, bitchin' CGI script that
converts all the Info files to HTML. The script can and does bring the
machine to its knees without providing anything useful to the
downloader.
For such and similar cases various robot exclusion schemes have been
devised as a means for the server administrators and document authors to
protect chosen portions of their sites from the wandering of robots.
The more popular mechanism is the "Robots Exclusion Standard"
written by Martijn Koster et al. in 1994. It is specified by placing a
file named `/robots.txt' in the server root, which the robots are
supposed to download and parse. Wget supports this specification.
Norobots support is turned on only when retrieving recursively, and
*never* for the first page. Thus, you may issue:
_never_ for the first page. Thus, you may issue:
wget -r http://fly.cc.fer.hr/
wget -r http://fly.srk.fer.hr/
First the index of fly.cc.fer.hr will be downloaded. If Wget finds
anything worth downloading on the same host, only *then* will it load
First the index of fly.srk.fer.hr will be downloaded. If Wget finds
anything worth downloading on the same host, only _then_ will it load
the robots, and decide whether or not to load the links after all.
`/robots.txt' is loaded only once per host. Wget does not support the
robots `META' tag.
`/robots.txt' is loaded only once per host.
The description of the norobots standard was written, and is
maintained by Martijn Koster <m.koster@webcrawler.com>. With his
permission, I contribute a (slightly modified) TeXified version of the
RES.
Note that the exlusion standard discussed here has undergone some
revisions. However, but Wget supports only the first version of RES,
the one written by Martijn Koster in 1994, available at
<http://info.webcrawler.com/mak/projects/robots/norobots.html>. A
later version exists in the form of an internet draft
<draft-koster-robots-00.txt> titled "A Method for Web Robots Control",
which expired on June 4, 1997. I am not aware if it ever made to an
RFC. The text of the draft is available at
<http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>.
Wget does not yet support the new directives specified by this draft,
but we plan to add them.
* Menu:
This manual no longer includes the text of the old standard.
* Introduction to RES::
* RES Format::
* User-Agent Field::
* Disallow Field::
* Norobots Examples::
The second, less known mechanism, enables the author of an individual
document to specify whether they want the links from the file to be
followed by a robot. This is achieved using the `META' tag, like this:

File: wget.info, Node: Introduction to RES, Next: RES Format, Prev: Robots, Up: Robots
<meta name="robots" content="nofollow">
Introduction to RES
-------------------
"WWW Robots" (also called "wanderers" or "spiders") are programs
that traverse many pages in the World Wide Web by recursively
retrieving linked pages. For more information see the robots page.
In 1993 and 1994 there have been occasions where robots have visited
WWW servers where they weren't welcome for various reasons. Sometimes
these reasons were robot specific, e.g. certain robots swamped servers
with rapid-fire requests, or retrieved the same files repeatedly. In
other situations robots traversed parts of WWW servers that weren't
suitable, e.g. very deep virtual trees, duplicated information,
temporary information, or cgi-scripts with side-effects (such as
voting).
These incidents indicated the need for established mechanisms for
WWW servers to indicate to robots which parts of their server should
not be accessed. This standard addresses this need with an operational
solution.
This document represents a consensus on 30 June 1994 on the robots
mailing list (`robots@webcrawler.com'), between the majority of robot
authors and other people with an interest in robots. It has also been
open for discussion on the Technical World Wide Web mailing list
(`www-talk@info.cern.ch'). This document is based on a previous working
draft under the same title.
It is not an official standard backed by a standards body, or owned
by any commercial organization. It is not enforced by anybody, and there
no guarantee that all current and future robots will use it. Consider
it a common facility the majority of robot authors offer the WWW
community to protect WWW server against unwanted accesses by their
robots.
The latest version of this document can be found at
`http://info.webcrawler.com/mak/projects/robots/norobots.html'.

File: wget.info, Node: RES Format, Next: User-Agent Field, Prev: Introduction to RES, Up: Robots
RES Format
----------
The format and semantics of the `/robots.txt' file are as follows:
The file consists of one or more records separated by one or more
blank lines (terminated by `CR', `CR/NL', or `NL'). Each record
contains lines of the form:
<field>:<optionalspace><value><optionalspace>
The field name is case insensitive.
Comments can be included in file using UNIX Bourne shell conventions:
the `#' character is used to indicate that preceding space (if any) and
the remainder of the line up to the line termination is discarded.
Lines containing only a comment are discarded completely, and therefore
do not indicate a record boundary.
The record starts with one or more User-agent lines, followed by one
or more Disallow lines, as detailed below. Unrecognized headers are
ignored.
The presence of an empty `/robots.txt' file has no explicit
associated semantics, it will be treated as if it was not present, i.e.
all robots will consider themselves welcome.

File: wget.info, Node: User-Agent Field, Next: Disallow Field, Prev: RES Format, Up: Robots
User-Agent Field
----------------
The value of this field is the name of the robot the record is
describing access policy for.
If more than one User-agent field is present the record describes an
identical access policy for more than one robot. At least one field
needs to be present per record.
The robot should be liberal in interpreting this field. A case
insensitive substring match of the name without version information is
recommended.
If the value is `*', the record describes the default access policy
for any robot that has not matched any of the other records. It is not
allowed to have multiple such records in the `/robots.txt' file.

File: wget.info, Node: Disallow Field, Next: Norobots Examples, Prev: User-Agent Field, Up: Robots
Disallow Field
--------------
The value of this field specifies a partial URL that is not to be
visited. This can be a full path, or a partial path; any URL that
starts with this value will not be retrieved. For example,
`Disallow: /help' disallows both `/help.html' and `/help/index.html',
whereas `Disallow: /help/' would disallow `/help/index.html' but allow
`/help.html'.
Any empty value, indicates that all URLs can be retrieved. At least
one Disallow field needs to be present in a record.

File: wget.info, Node: Norobots Examples, Prev: Disallow Field, Up: Robots
Norobots Examples
-----------------
The following example `/robots.txt' file specifies that no robots
should visit any URL starting with `/cyberworld/map/' or `/tmp/':
# robots.txt for http://www.site.com/
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
Disallow: /tmp/ # these will soon disappear
This example `/robots.txt' file specifies that no robots should
visit any URL starting with `/cyberworld/map/', except the robot called
`cybermapper':
# robots.txt for http://www.site.com/
User-agent: *
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
# Cybermapper knows where to go.
User-agent: cybermapper
Disallow:
This example indicates that no robots should visit this site further:
# go away
User-agent: *
Disallow: /
This is explained in some detail at
<http://info.webcrawler.com/mak/projects/robots/meta-user.html>.
Unfortunately, Wget does not support this method of robot exclusion yet,
but it will be implemented in the next release.

File: wget.info, Node: Security Considerations, Next: Contributors, Prev: Robots, Up: Appendices
@ -1350,3 +1125,124 @@ Here are the main issues, and some solutions.
being careful when you send debug logs (yes, even when you send
them to me).

File: wget.info, Node: Contributors, Prev: Security Considerations, Up: Appendices
Contributors
============
GNU Wget was written by Hrvoje Niksic <hniksic@arsdigita.com>.
However, its development could never have gone as far as it has, were it
not for the help of many people, either with bug reports, feature
proposals, patches, or letters saying "Thanks!".
Special thanks goes to the following people (no particular order):
* Karsten Thygesen--donated system resources such as the mailing
list, web space, and FTP space, along with a lot of time to make
these actually work.
* Shawn McHorse--bug reports and patches.
* Kaveh R. Ghazi--on-the-fly `ansi2knr'-ization. Lots of
portability fixes.
* Gordon Matzigkeit--`.netrc' support.
* Zlatko Calusic, Tomislav Vujec and Drazen Kacar--feature
suggestions and "philosophical" discussions.
* Darko Budor--initial port to Windows.
* Antonio Rosella--help and suggestions, plus the Italian
translation.
* Tomislav Petrovic, Mario Mikocevic--many bug reports and
suggestions.
* Francois Pinard--many thorough bug reports and discussions.
* Karl Eichwalder--lots of help with internationalization and other
things.
* Junio Hamano--donated support for Opie and HTTP `Digest'
authentication.
* Brian Gough--a generous donation.
The following people have provided patches, bug/build reports, useful
suggestions, beta testing services, fan mail and all the other things
that make maintenance so much fun:
Tim Adam, Adrian Aichner, Martin Baehr, Dieter Baron, Roger Beeman
and the Gurus at Cisco, Dan Berger, Mark Boyns, John Burden, Wanderlei
Cavassin, Gilles Cedoc, Tim Charron, Noel Cragg, Kristijan Conkas, John
Daily, Andrew Davison, Andrew Deryabin, Ulrich Drepper, Marc Duponcheel,
Damir Dzeko, Aleksandar Erkalovic, Andy Eskilsson, Masashi Fujita,
Howard Gayle, Marcel Gerrits, Hans Grobler, Mathieu Guillaume, Dan
Harkless, Heiko Herold, Karl Heuer, HIROSE Masaaki, Gregor Hoffleit,
Erik Magnus Hulthen, Richard Huveneers, Simon Josefsson, Mario Juric,
Const Kaplinsky, Goran Kezunovic, Robert Kleine, Fila Kolodny,
Alexander Kourakos, Martin Kraemer, Simos KSenitellis, Hrvoje Lacko,
Daniel S. Lewart, Dave Love, Alexander V. Lukyanov, Jordan Mendelson,
Lin Zhe Min, Simon Munton, Charlie Negyesi, R. K. Owen, Andrew Pollock,
Steve Pothier, Jan Prikryl, Marin Purgar, Keith Refson, Tyler Riddle,
Tobias Ringstrom, Juan Jose Rodrigues, Edward J. Sabol, Heinz Salzmann,
Robert Schmidt, Andreas Schwab, Toomas Soome, Tage Stabell-Kulo, Sven
Sternberger, Markus Strasser, Szakacsits Szabolcs, Mike Thomas, Russell
Vincent, Charles G Waldman, Douglas E. Wegscheid, Jasmin Zainul, Bojan
Zdrnja, Kristijan Zimmer.
Apologies to all who I accidentally left out, and many thanks to all
the subscribers of the Wget mailing list.

File: wget.info, Node: Copying, Next: Concept Index, Prev: Appendices, Up: Top
Copying
*******
Wget is "free software", where "free" refers to liberty, not price.
The exact legal distribution terms follow below, but in short, it means
that you have the right (freedom) to run and change and copy Wget, and
even--if you want--charge money for any of those things. The sole
restriction is that you have to grant your recipients the same rights.
This method of licensing software is also known as "open-source",
because it requires that the recipients always receive a program's
source code along with the program.
More specifically:
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
In addition to this, this manual is free in the same sense:
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation License,
Version 1.1 or any later version published by the Free Software
Foundation; with the Invariant Sections being "GNU General Public
License" and "GNU Free Documentation License", with no Front-Cover
Texts, and with no Back-Cover Texts. A copy of the license is
included in the section entitled "GNU Free Documentation License".
The full texts of the GNU General Public License and of the GNU Free
Documentation License are available below.
* Menu:
* GNU General Public License::
* GNU Free Documentation License::

View File

@ -1,5 +1,4 @@
This is Info file wget.info, produced by Makeinfo version 1.68 from the
input file ./wget.texi.
This is wget.info, produced by makeinfo version 4.0 from wget.texi.
INFO-DIR-SECTION Net Utilities
INFO-DIR-SECTION World Wide Web
@ -16,85 +15,19 @@ data.
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided also
that the sections entitled "Copying" and "GNU General Public License"
are included exactly as in the original, and provided that the entire
resulting derived work is distributed under the terms of a permission
notice identical to this one.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "GNU General Public License" and "GNU Free
Documentation License", with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the section
entitled "GNU Free Documentation License".

File: wget.info, Node: Contributors, Prev: Security Considerations, Up: Appendices
File: wget.info, Node: GNU General Public License, Next: GNU Free Documentation License, Prev: Copying, Up: Copying
Contributors
============
GNU Wget was written by Hrvoje Niksic <hniksic@arsdigita.com>.
However, its development could never have gone as far as it has, were it
not for the help of many people, either with bug reports, feature
proposals, patches, or letters saying "Thanks!".
Special thanks goes to the following people (no particular order):
* Karsten Thygesen--donated the mailing list and the initial FTP
space.
* Shawn McHorse--bug reports and patches.
* Kaveh R. Ghazi--on-the-fly `ansi2knr'-ization.
* Gordon Matzigkeit--`.netrc' support.
* Zlatko Calusic, Tomislav Vujec and Drazen Kacar--feature
suggestions and "philosophical" discussions.
* Darko Budor--initial port to Windows.
* Antonio Rosella--help and suggestions, plus the Italian
translation.
* Tomislav Petrovic, Mario Mikocevic--many bug reports and
suggestions.
* Francois Pinard--many thorough bug reports and discussions.
* Karl Eichwalder--lots of help with internationalization and other
things.
* Junio Hamano--donated support for Opie and HTTP `Digest'
authentication.
* Brian Gough--a generous donation.
The following people have provided patches, bug/build reports, useful
suggestions, beta testing services, fan mail and all the other things
that make maintenance so much fun:
Tim Adam, Martin Baehr, Dieter Baron, Roger Beeman and the Gurus at
Cisco, Dan Berger, Mark Boyns, John Burden, Wanderlei Cavassin, Gilles
Cedoc, Tim Charron, Noel Cragg, Kristijan Conkas, Andrew Deryabin,
Damir Dzeko, Andrew Davison, Ulrich Drepper, Marc Duponcheel,
Aleksandar Erkalovic, Andy Eskilsson, Masashi Fujita, Howard Gayle,
Marcel Gerrits, Hans Grobler, Mathieu Guillaume, Dan Harkless, Heiko
Herold, Karl Heuer, HIROSE Masaaki, Gregor Hoffleit, Erik Magnus
Hulthen, Richard Huveneers, Simon Josefsson, Mario Juric, Goran
Kezunovic, Robert Kleine, Fila Kolodny, Alexander Kourakos, Martin
Kraemer, Simos KSenitellis, Hrvoje Lacko, Daniel S. Lewart, Dave Love,
Jordan Mendelson, Lin Zhe Min, Charlie Negyesi, Andrew Pollock, Steve
Pothier, Jan Prikryl, Marin Purgar, Keith Refson, Tobias Ringstrom,
Juan Jose Rodrigues, Edward J. Sabol, Heinz Salzmann, Robert Schmidt,
Toomas Soome, Tage Stabell-Kulo, Sven Sternberger, Markus Strasser,
Szakacsits Szabolcs, Mike Thomas, Russell Vincent, Charles G Waldman,
Douglas E. Wegscheid, Jasmin Zainul, Bojan Zdrnja, Kristijan Zimmer.
Apologies to all who I accidentally left out, and many thanks to all
the subscribers of the Wget mailing list.

File: wget.info, Node: Copying, Next: Concept Index, Prev: Appendices, Up: Top
GNU GENERAL PUBLIC LICENSE
**************************
GNU General Public License
==========================
Version 2, June 1991
@ -454,6 +387,391 @@ library, you may consider it more useful to permit linking proprietary
applications with the library. If this is what you want to do, use the
GNU Library General Public License instead of this License.

File: wget.info, Node: GNU Free Documentation License, Prev: GNU General Public License, Up: Copying
GNU Free Documentation License
==============================
Version 1.1, March 2000
Copyright (C) 2000 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
0. PREAMBLE
The purpose of this License is to make a manual, textbook, or other
written document "free" in the sense of freedom: to assure everyone
the effective freedom to copy and redistribute it, with or without
modifying it, either commercially or noncommercially. Secondarily,
this License preserves for the author and publisher a way to get
credit for their work, while not being considered responsible for
modifications made by others.
This License is a kind of "copyleft", which means that derivative
works of the document must themselves be free in the same sense.
It complements the GNU General Public License, which is a copyleft
license designed for free software.
We have designed this License in order to use it for manuals for
free software, because free software needs free documentation: a
free program should come with manuals providing the same freedoms
that the software does. But this License is not limited to
software manuals; it can be used for any textual work, regardless
of subject matter or whether it is published as a printed book.
We recommend this License principally for works whose purpose is
instruction or reference.
1. APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work that contains a
notice placed by the copyright holder saying it can be distributed
under the terms of this License. The "Document", below, refers to
any such manual or work. Any member of the public is a licensee,
and is addressed as "you".
A "Modified Version" of the Document means any work containing the
Document or a portion of it, either copied verbatim, or with
modifications and/or translated into another language.
A "Secondary Section" is a named appendix or a front-matter
section of the Document that deals exclusively with the
relationship of the publishers or authors of the Document to the
Document's overall subject (or to related matters) and contains
nothing that could fall directly within that overall subject.
(For example, if the Document is in part a textbook of
mathematics, a Secondary Section may not explain any mathematics.)
The relationship could be a matter of historical connection with
the subject or with related matters, or of legal, commercial,
philosophical, ethical or political position regarding them.
The "Invariant Sections" are certain Secondary Sections whose
titles are designated, as being those of Invariant Sections, in
the notice that says that the Document is released under this
License.
The "Cover Texts" are certain short passages of text that are
listed, as Front-Cover Texts or Back-Cover Texts, in the notice
that says that the Document is released under this License.
A "Transparent" copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the
general public, whose contents can be viewed and edited directly
and straightforwardly with generic text editors or (for images
composed of pixels) generic paint programs or (for drawings) some
widely available drawing editor, and that is suitable for input to
text formatters or for automatic translation to a variety of
formats suitable for input to text formatters. A copy made in an
otherwise Transparent file format whose markup has been designed
to thwart or discourage subsequent modification by readers is not
Transparent. A copy that is not "Transparent" is called "Opaque".
Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format,
SGML or XML using a publicly available DTD, and
standard-conforming simple HTML designed for human modification.
Opaque formats include PostScript, PDF, proprietary formats that
can be read and edited only by proprietary word processors, SGML
or XML for which the DTD and/or processing tools are not generally
available, and the machine-generated HTML produced by some word
processors for output purposes only.
The "Title Page" means, for a printed book, the title page itself,
plus such following pages as are needed to hold, legibly, the
material this License requires to appear in the title page. For
works in formats which do not have any title page as such, "Title
Page" means the text near the most prominent appearance of the
work's title, preceding the beginning of the body of the text.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either
commercially or noncommercially, provided that this License, the
copyright notices, and the license notice saying this License
applies to the Document are reproduced in all copies, and that you
add no other conditions whatsoever to those of this License. You
may not use technical measures to obstruct or control the reading
or further copying of the copies you make or distribute. However,
you may accept compensation in exchange for copies. If you
distribute a large enough number of copies you must also follow
the conditions in section 3.
You may also lend copies, under the same conditions stated above,
and you may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies of the Document numbering more than
100, and the Document's license notice requires Cover Texts, you
must enclose the copies in covers that carry, clearly and legibly,
all these Cover Texts: Front-Cover Texts on the front cover, and
Back-Cover Texts on the back cover. Both covers must also clearly
and legibly identify you as the publisher of these copies. The
front cover must present the full title with all words of the
title equally prominent and visible. You may add other material
on the covers in addition. Copying with changes limited to the
covers, as long as they preserve the title of the Document and
satisfy these conditions, can be treated as verbatim copying in
other respects.
If the required texts for either cover are too voluminous to fit
legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto
adjacent pages.
If you publish or distribute Opaque copies of the Document
numbering more than 100, you must either include a
machine-readable Transparent copy along with each Opaque copy, or
state in or with each Opaque copy a publicly-accessible
computer-network location containing a complete Transparent copy
of the Document, free of added material, which the general
network-using public has access to download anonymously at no
charge using public-standard network protocols. If you use the
latter option, you must take reasonably prudent steps, when you
begin distribution of Opaque copies in quantity, to ensure that
this Transparent copy will remain thus accessible at the stated
location until at least one year after the last time you
distribute an Opaque copy (directly or through your agents or
retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of
the Document well before redistributing any large number of
copies, to give them a chance to provide you with an updated
version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document
under the conditions of sections 2 and 3 above, provided that you
release the Modified Version under precisely this License, with
the Modified Version filling the role of the Document, thus
licensing distribution and modification of the Modified Version to
whoever possesses a copy of it. In addition, you must do these
things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title
distinct from that of the Document, and from those of previous
versions (which should, if there were any, be listed in the
History section of the Document). You may use the same title
as a previous version if the original publisher of that version
gives permission.
B. List on the Title Page, as authors, one or more persons or
entities responsible for authorship of the modifications in the
Modified Version, together with at least five of the principal
authors of the Document (all of its principal authors, if it
has less than five).
C. State on the Title page the name of the publisher of the
Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications
adjacent to the other copyright notices.
F. Include, immediately after the copyright notices, a license
notice giving the public permission to use the Modified Version
under the terms of this License, in the form shown in the
Addendum below.
G. Preserve in that license notice the full lists of Invariant
Sections and required Cover Texts given in the Document's
license notice.
H. Include an unaltered copy of this License.
I. Preserve the section entitled "History", and its title, and add
to it an item stating at least the title, year, new authors, and
publisher of the Modified Version as given on the Title Page.
If there is no section entitled "History" in the Document,
create one stating the title, year, authors, and publisher of
the Document as given on its Title Page, then add an item
describing the Modified Version as stated in the previous
sentence.
J. Preserve the network location, if any, given in the Document for
public access to a Transparent copy of the Document, and
likewise the network locations given in the Document for
previous versions it was based on. These may be placed in the
"History" section. You may omit a network location for a work
that was published at least four years before the Document
itself, or if the original publisher of the version it refers
to gives permission.
K. In any section entitled "Acknowledgements" or "Dedications",
preserve the section's title, and preserve in the section all the
substance and tone of each of the contributor acknowledgements
and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document,
unaltered in their text and in their titles. Section numbers
or the equivalent are not considered part of the section titles.
M. Delete any section entitled "Endorsements". Such a section
may not be included in the Modified Version.
N. Do not retitle any existing section as "Endorsements" or to
conflict in title with any Invariant Section.
If the Modified Version includes new front-matter sections or
appendices that qualify as Secondary Sections and contain no
material copied from the Document, you may at your option
designate some or all of these sections as invariant. To do this,
add their titles to the list of Invariant Sections in the Modified
Version's license notice. These titles must be distinct from any
other section titles.
You may add a section entitled "Endorsements", provided it contains
nothing but endorsements of your Modified Version by various
parties-for example, statements of peer review or that the text has
been approved by an organization as the authoritative definition
of a standard.
You may add a passage of up to five words as a Front-Cover Text,
and a passage of up to 25 words as a Back-Cover Text, to the end
of the list of Cover Texts in the Modified Version. Only one
passage of Front-Cover Text and one of Back-Cover Text may be
added by (or through arrangements made by) any one entity. If the
Document already includes a cover text for the same cover,
previously added by you or by arrangement made by the same entity
you are acting on behalf of, you may not add another; but you may
replace the old one, on explicit permission from the previous
publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this
License give permission to use their names for publicity for or to
assert or imply endorsement of any Modified Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under
this License, under the terms defined in section 4 above for
modified versions, provided that you include in the combination
all of the Invariant Sections of all of the original documents,
unmodified, and list them all as Invariant Sections of your
combined work in its license notice.
The combined work need only contain one copy of this License, and
multiple identical Invariant Sections may be replaced with a single
copy. If there are multiple Invariant Sections with the same name
but different contents, make the title of each such section unique
by adding at the end of it, in parentheses, the name of the
original author or publisher of that section if known, or else a
unique number. Make the same adjustment to the section titles in
the list of Invariant Sections in the license notice of the
combined work.
In the combination, you must combine any sections entitled
"History" in the various original documents, forming one section
entitled "History"; likewise combine any sections entitled
"Acknowledgements", and any sections entitled "Dedications". You
must delete all sections entitled "Endorsements."
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other
documents released under this License, and replace the individual
copies of this License in the various documents with a single copy
that is included in the collection, provided that you follow the
rules of this License for verbatim copying of each of the
documents in all other respects.
You may extract a single document from such a collection, and
distribute it individually under this License, provided you insert
a copy of this License into the extracted document, and follow
this License in all other respects regarding verbatim copying of
that document.
7. AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other
separate and independent documents or works, in or on a volume of
a storage or distribution medium, does not as a whole count as a
Modified Version of the Document, provided no compilation
copyright is claimed for the compilation. Such a compilation is
called an "aggregate", and this License does not apply to the
other self-contained works thus compiled with the Document, on
account of their being thus compiled, if they are not themselves
derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these
copies of the Document, then if the Document is less than one
quarter of the entire aggregate, the Document's Cover Texts may be
placed on covers that surround only the Document within the
aggregate. Otherwise they must appear on covers around the whole
aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may
distribute translations of the Document under the terms of section
4. Replacing Invariant Sections with translations requires special
permission from their copyright holders, but you may include
translations of some or all Invariant Sections in addition to the
original versions of these Invariant Sections. You may include a
translation of this License provided that you also include the
original English version of this License. In case of a
disagreement between the translation and the original English
version of this License, the original English version will prevail.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document
except as expressly provided for under this License. Any other
attempt to copy, modify, sublicense or distribute the Document is
void, and will automatically terminate your rights under this
License. However, parties who have received copies, or rights,
from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
10. FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of
the GNU Free Documentation License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns. See
http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version
number. If the Document specifies that a particular numbered
version of this License "or any later version" applies to it, you
have the option of following the terms and conditions either of
that specified version or of any later version that has been
published (not as a draft) by the Free Software Foundation. If
the Document does not specify a version number of this License,
you may choose any version ever published (not as a draft) by the
Free Software Foundation.
ADDENDUM: How to use this License for your documents
====================================================
To use this License in a document you have written, include a copy of
the License in the document and put the following copyright and license
notices just after the title page:
Copyright (C) YEAR YOUR NAME.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
A copy of the license is included in the section entitled ``GNU
Free Documentation License''.
If you have no Invariant Sections, write "with no Invariant
Sections" instead of saying which ones are invariant. If you have no
Front-Cover Texts, write "no Front-Cover Texts" instead of "Front-Cover
Texts being LIST"; likewise for Back-Cover Texts.
If your document contains nontrivial examples of program code, we
recommend releasing these examples in parallel under your choice of
free software license, such as the GNU General Public License, to
permit their use in free software.

File: wget.info, Node: Concept Index, Prev: Copying, Up: Top
@ -507,6 +825,7 @@ Concept Index
* following links: Following Links.
* force html: Logging and Input File Options.
* ftp time-stamping: FTP Time-Stamping Internals.
* GFDL: Copying.
* globbing, toggle: FTP Options.
* GPL: Copying.
* hangup: Signals.
@ -532,14 +851,9 @@ Concept Index
* mailing list: Mailing List.
* mirroring: Guru Usage.
* no parent: Directory-Based Limits.
* no warranty: Copying.
* no warranty: GNU General Public License.
* no-clobber: Download Options.
* nohup: Invoking.
* norobots disallow: Disallow Field.
* norobots examples: Norobots Examples.
* norobots format: RES Format.
* norobots introduction: Introduction to RES.
* norobots user-agent: User-Agent Field.
* number of retries: Download Options.
* operating systems: Portability.
* option syntax: Option Syntax.
@ -550,8 +864,8 @@ Concept Index
* pause: Download Options.
* portability: Portability.
* proxies: Proxies.
* proxy <1>: Download Options.
* proxy: HTTP Options.
* proxy <1>: HTTP Options.
* proxy: Download Options.
* proxy authentication: HTTP Options.
* proxy filling: Recursive Retrieval Options.
* proxy password: HTTP Options.

View File

@ -42,10 +42,11 @@ notice identical to this one except for the removal of this paragraph
@end ignore
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled ``GNU
Free Documentation License''.
any later version published by the Free Software Foundation; with the
Invariant Sections being ``GNU General Public License'' and ``GNU Free
Documentation License'', with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the section
entitled ``GNU Free Documentation License''.
@end ifinfo
@titlepage
@ -60,10 +61,11 @@ Copyright @copyright{} 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled ``GNU
Free Documentation License''.
any later version published by the Free Software Foundation; with the
Invariant Sections being ``GNU General Public License'' and ``GNU Free
Documentation License'', with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the section
entitled ``GNU Free Documentation License''.
@end titlepage
@ifinfo
@ -2485,10 +2487,26 @@ This chapter contains some references I consider useful.
@cindex robots.txt
@cindex server maintenance
Since Wget is able to traverse the web, it counts as one of the Web
@dfn{robots}. Thus Wget understands @dfn{Robots Exclusion Standard}
(@sc{res})---contents of @file{/robots.txt}, used by server
administrators to shield parts of their systems from wanderings of Wget.
It is extremely easy to make Wget wander aimlessly around a web site,
sucking all the available data in progress. @samp{wget -r @var{site}},
and you're set. Great? Not for the server admin.
While Wget is retrieving static pages, there's not much of a problem.
But for Wget, there is no real difference between the smallest static
page and the hardest, most demanding CGI or dynamic page. For instance,
a site I know has a section handled by an, uh, bitchin' CGI script that
converts all the Info files to HTML. The script can and does bring the
machine to its knees without providing anything useful to the
downloader.
For such and similar cases various robot exclusion schemes have been
devised as a means for the server administrators and document authors to
protect chosen portions of their sites from the wandering of robots.
The more popular mechanism is the @dfn{Robots Exclusion Standard}
written by Martijn Koster et al. in 1994. It is specified by placing a
file named @file{/robots.txt} in the server root, which the robots are
supposed to download and parse. Wget supports this specification.
Norobots support is turned on only when retrieving recursively, and
@emph{never} for the first page. Thus, you may issue:
@ -2500,8 +2518,7 @@ wget -r http://fly.srk.fer.hr/
First the index of fly.srk.fer.hr will be downloaded. If Wget finds
anything worth downloading on the same host, only @emph{then} will it
load the robots, and decide whether or not to load the links after all.
@file{/robots.txt} is loaded only once per host. Wget does not support
the robots @code{META} tag.
@file{/robots.txt} is loaded only once per host.
Note that the exlusion standard discussed here has undergone some
revisions. However, but Wget supports only the first version of
@ -2517,6 +2534,20 @@ but we plan to add them.
This manual no longer includes the text of the old standard.
The second, less known mechanism, enables the author of an individual
document to specify whether they want the links from the file to be
followed by a robot. This is achieved using the @code{META} tag, like
this:
@example
<meta name="robots" content="nofollow">
@end example
This is explained in some detail at
@url{http://info.webcrawler.com/mak/projects/robots/meta-user.html}.
Unfortunately, Wget does not support this method of robot exclusion yet,
but it will be implemented in the next release.
@node Security Considerations, Contributors, Robots, Appendices
@section Security Considerations
@cindex security
@ -2789,10 +2820,11 @@ In addition to this, this manual is free in the same sense:
@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled ``GNU
Free Documentation License''.
any later version published by the Free Software Foundation; with the
Invariant Sections being ``GNU General Public License'' and ``GNU Free
Documentation License'', with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the section
entitled ``GNU Free Documentation License''.
@end quotation
@c #### Maybe we should wrap these licenses in ifinfo? Stallman says