mirror of
https://github.com/moparisthebest/wget
synced 2024-07-03 16:38:41 -04:00
[svn] Robots doc changes.
Published at <sxsn1f1o6s2.fsf@florida.arsdigita.de>.
This commit is contained in:
parent
d889ef73f4
commit
1a5c5a006a
@ -1,3 +1,7 @@
|
||||
2000-11-15 Hrvoje Niksic <hniksic@arsdigita.com>
|
||||
|
||||
* wget.texi (Robots): Rearrange text. Mention the meta tag.
|
||||
|
||||
2000-11-14 Hrvoje Niksic <hniksic@arsdigita.com>
|
||||
|
||||
* wget.texi: Add GFDL; remove norobots specification.
|
||||
|
127
doc/wget.info
127
doc/wget.info
@ -1,5 +1,4 @@
|
||||
This is Info file wget.info, produced by Makeinfo version 1.68 from the
|
||||
input file ./wget.texi.
|
||||
This is wget.info, produced by makeinfo version 4.0 from wget.texi.
|
||||
|
||||
INFO-DIR-SECTION Net Utilities
|
||||
INFO-DIR-SECTION World Wide Web
|
||||
@ -16,73 +15,73 @@ data.
|
||||
manual provided the copyright notice and this permission notice are
|
||||
preserved on all copies.
|
||||
|
||||
Permission is granted to copy and distribute modified versions of
|
||||
this manual under the conditions for verbatim copying, provided also
|
||||
that the sections entitled "Copying" and "GNU General Public License"
|
||||
are included exactly as in the original, and provided that the entire
|
||||
resulting derived work is distributed under the terms of a permission
|
||||
notice identical to this one.
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.1 or
|
||||
any later version published by the Free Software Foundation; with the
|
||||
Invariant Sections being "GNU General Public License" and "GNU Free
|
||||
Documentation License", with no Front-Cover Texts, and with no
|
||||
Back-Cover Texts. A copy of the license is included in the section
|
||||
entitled "GNU Free Documentation License".
|
||||
|
||||
|
||||
Indirect:
|
||||
wget.info-1: 961
|
||||
wget.info-2: 48745
|
||||
wget.info-3: 97411
|
||||
wget.info-1: 1010
|
||||
wget.info-2: 48842
|
||||
wget.info-3: 94301
|
||||
|
||||
Tag Table:
|
||||
(Indirect)
|
||||
Node: Top961
|
||||
Node: Overview1850
|
||||
Node: Invoking5024
|
||||
Node: URL Format5833
|
||||
Node: Option Syntax8163
|
||||
Node: Basic Startup Options9587
|
||||
Node: Logging and Input File Options10287
|
||||
Node: Download Options12812
|
||||
Node: Directory Options20910
|
||||
Node: HTTP Options23388
|
||||
Node: FTP Options28104
|
||||
Node: Recursive Retrieval Options30086
|
||||
Node: Recursive Accept/Reject Options35107
|
||||
Node: Recursive Retrieval38333
|
||||
Node: Following Links40631
|
||||
Node: Relative Links41659
|
||||
Node: Host Checking42173
|
||||
Node: Domain Acceptance44198
|
||||
Node: All Hosts45868
|
||||
Node: Types of Files46295
|
||||
Node: Directory-Based Limits48745
|
||||
Node: FTP Links51385
|
||||
Node: Time-Stamping52255
|
||||
Node: Time-Stamping Usage53892
|
||||
Node: HTTP Time-Stamping Internals55461
|
||||
Node: FTP Time-Stamping Internals56931
|
||||
Node: Startup File58139
|
||||
Node: Wgetrc Location59012
|
||||
Node: Wgetrc Syntax59827
|
||||
Node: Wgetrc Commands60542
|
||||
Node: Sample Wgetrc68941
|
||||
Node: Examples73960
|
||||
Node: Simple Usage74567
|
||||
Node: Advanced Usage76961
|
||||
Node: Guru Usage79712
|
||||
Node: Various81374
|
||||
Node: Proxies81898
|
||||
Node: Distribution84663
|
||||
Node: Mailing List85014
|
||||
Node: Reporting Bugs85713
|
||||
Node: Portability87498
|
||||
Node: Signals88873
|
||||
Node: Appendices89527
|
||||
Node: Robots89942
|
||||
Node: Introduction to RES91089
|
||||
Node: RES Format92982
|
||||
Node: User-Agent Field94086
|
||||
Node: Disallow Field94850
|
||||
Node: Norobots Examples95461
|
||||
Node: Security Considerations96415
|
||||
Node: Contributors97411
|
||||
Node: Copying100054
|
||||
Node: Concept Index119217
|
||||
Node: Top1010
|
||||
Node: Overview1924
|
||||
Node: Invoking5106
|
||||
Node: URL Format5915
|
||||
Ref: URL Format-Footnote-18143
|
||||
Node: Option Syntax8245
|
||||
Node: Basic Startup Options9670
|
||||
Node: Logging and Input File Options10370
|
||||
Node: Download Options12896
|
||||
Node: Directory Options20995
|
||||
Node: HTTP Options23477
|
||||
Node: FTP Options28194
|
||||
Node: Recursive Retrieval Options30177
|
||||
Node: Recursive Accept/Reject Options35199
|
||||
Node: Recursive Retrieval38426
|
||||
Node: Following Links40724
|
||||
Node: Relative Links41753
|
||||
Node: Host Checking42267
|
||||
Node: Domain Acceptance44293
|
||||
Node: All Hosts45965
|
||||
Node: Types of Files46392
|
||||
Node: Directory-Based Limits48842
|
||||
Node: FTP Links51482
|
||||
Node: Time-Stamping52352
|
||||
Node: Time-Stamping Usage53989
|
||||
Node: HTTP Time-Stamping Internals55558
|
||||
Ref: HTTP Time-Stamping Internals-Footnote-156829
|
||||
Node: FTP Time-Stamping Internals57028
|
||||
Node: Startup File58236
|
||||
Node: Wgetrc Location59109
|
||||
Node: Wgetrc Syntax59924
|
||||
Node: Wgetrc Commands60639
|
||||
Node: Sample Wgetrc69038
|
||||
Node: Examples69562
|
||||
Node: Simple Usage70169
|
||||
Node: Advanced Usage72571
|
||||
Node: Guru Usage75323
|
||||
Node: Various76985
|
||||
Node: Proxies77509
|
||||
Node: Distribution80274
|
||||
Node: Mailing List80625
|
||||
Node: Reporting Bugs81325
|
||||
Node: Portability83110
|
||||
Node: Signals84485
|
||||
Node: Appendices85139
|
||||
Node: Robots85457
|
||||
Node: Security Considerations88309
|
||||
Node: Contributors89305
|
||||
Node: Copying92189
|
||||
Node: GNU General Public License94301
|
||||
Node: GNU Free Documentation License113501
|
||||
Node: Concept Index133231
|
||||
|
||||
End Tag Table
|
||||
|
136
doc/wget.info-1
136
doc/wget.info-1
@ -1,5 +1,4 @@
|
||||
This is Info file wget.info, produced by Makeinfo version 1.68 from the
|
||||
input file ./wget.texi.
|
||||
This is wget.info, produced by makeinfo version 4.0 from wget.texi.
|
||||
|
||||
INFO-DIR-SECTION Net Utilities
|
||||
INFO-DIR-SECTION World Wide Web
|
||||
@ -16,12 +15,13 @@ data.
|
||||
manual provided the copyright notice and this permission notice are
|
||||
preserved on all copies.
|
||||
|
||||
Permission is granted to copy and distribute modified versions of
|
||||
this manual under the conditions for verbatim copying, provided also
|
||||
that the sections entitled "Copying" and "GNU General Public License"
|
||||
are included exactly as in the original, and provided that the entire
|
||||
resulting derived work is distributed under the terms of a permission
|
||||
notice identical to this one.
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.1 or
|
||||
any later version published by the Free Software Foundation; with the
|
||||
Invariant Sections being "GNU General Public License" and "GNU Free
|
||||
Documentation License", with no Front-Cover Texts, and with no
|
||||
Back-Cover Texts. A copy of the license is included in the section
|
||||
entitled "GNU Free Documentation License".
|
||||
|
||||
|
||||
File: wget.info, Node: Top, Next: Overview, Prev: (dir), Up: (dir)
|
||||
@ -32,7 +32,7 @@ Wget 1.5.3+dev
|
||||
This manual documents version 1.5.3+dev of GNU Wget, the freely
|
||||
available utility for network download.
|
||||
|
||||
Copyright (C) 1996, 1997, 1998 Free Software Foundation, Inc.
|
||||
Copyright (C) 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
|
||||
|
||||
* Menu:
|
||||
|
||||
@ -45,7 +45,7 @@ available utility for network download.
|
||||
* Examples:: Examples of usage.
|
||||
* Various:: The stuff that doesn't fit anywhere else.
|
||||
* Appendices:: Some useful references.
|
||||
* Copying:: You may give out copies of Wget.
|
||||
* Copying:: You may give out copies of Wget and of this manual.
|
||||
* Concept Index:: Topics covered by this manual.
|
||||
|
||||
|
||||
@ -67,13 +67,15 @@ being:
|
||||
constant user's presence, which can be a great hindrance when
|
||||
transferring a lot of data.
|
||||
|
||||
|
||||
* Wget is capable of descending recursively through the structure of
|
||||
HTML documents and FTP directory trees, making a local copy of the
|
||||
directory hierarchy similar to the one on the remote server. This
|
||||
feature can be used to mirror archives and home pages, or traverse
|
||||
the web in search of data, like a WWW robot (*Note Robots::). In
|
||||
the web in search of data, like a WWW robot (*note Robots::). In
|
||||
that spirit, Wget understands the `norobots' convention.
|
||||
|
||||
|
||||
* File name wildcard matching and recursive mirroring of directories
|
||||
are available when retrieving via FTP. Wget can read the
|
||||
time-stamp information given by both HTTP and FTP servers, and
|
||||
@ -82,12 +84,14 @@ being:
|
||||
version if it has. This makes Wget suitable for mirroring of FTP
|
||||
sites, as well as home pages.
|
||||
|
||||
|
||||
* Wget works exceedingly well on slow or unstable connections,
|
||||
retrying the document until it is fully retrieved, or until a
|
||||
user-specified retry count is surpassed. It will try to resume the
|
||||
download from the point of interruption, using `REST' with FTP and
|
||||
`Range' with HTTP servers that support them.
|
||||
|
||||
|
||||
* By default, Wget supports proxy servers, which can lighten the
|
||||
network load, speed up retrieval and provide access behind
|
||||
firewalls. However, if you are behind a firewall that requires
|
||||
@ -95,23 +99,27 @@ being:
|
||||
and build wget with support for socks. Wget also supports the
|
||||
passive FTP downloading as an option.
|
||||
|
||||
|
||||
* Builtin features offer mechanisms to tune which links you wish to
|
||||
follow (*Note Following Links::).
|
||||
follow (*note Following Links::).
|
||||
|
||||
|
||||
* The retrieval is conveniently traced with printing dots, each dot
|
||||
representing a fixed amount of data received (1KB by default).
|
||||
These representations can be customized to your preferences.
|
||||
|
||||
|
||||
* Most of the features are fully configurable, either through
|
||||
command line options, or via the initialization file `.wgetrc'
|
||||
(*Note Startup File::). Wget allows you to define "global"
|
||||
(*note Startup File::). Wget allows you to define "global"
|
||||
startup files (`/usr/local/etc/wgetrc' by default) for site
|
||||
settings.
|
||||
|
||||
|
||||
* Finally, GNU Wget is free software. This means that everyone may
|
||||
use it, redistribute it and/or modify it under the terms of the
|
||||
GNU General Public License, as published by the Free Software
|
||||
Foundation (*Note Copying::).
|
||||
Foundation (*note Copying::).
|
||||
|
||||
|
||||
File: wget.info, Node: Invoking, Next: Recursive Retrieval, Prev: Overview, Up: Top
|
||||
@ -128,7 +136,7 @@ line. URL is a "Uniform Resource Locator", as defined below.
|
||||
|
||||
However, you may wish to change some of the default parameters of
|
||||
Wget. You can do it two ways: permanently, adding the appropriate
|
||||
command to `.wgetrc' (*Note Startup File::), or specifying it on the
|
||||
command to `.wgetrc' (*note Startup File::), or specifying it on the
|
||||
command line.
|
||||
|
||||
* Menu:
|
||||
@ -218,7 +226,7 @@ remember, but take time to type. You may freely mix different option
|
||||
styles, or specify options after the command-line arguments. Thus you
|
||||
may write:
|
||||
|
||||
wget -r --tries=10 http://fly.cc.fer.hr/ -o log
|
||||
wget -r --tries=10 http://fly.srk.fer.hr/ -o log
|
||||
|
||||
The space between the option accepting an argument and the argument
|
||||
may be omitted. Instead `-o log' you can write `-olog'.
|
||||
@ -243,7 +251,7 @@ convention that specifying an empty list clears its value. This can be
|
||||
useful to clear the `.wgetrc' settings. For instance, if your `.wgetrc'
|
||||
sets `exclude_directories' to `/cgi-bin', the following example will
|
||||
first reset it, and then set it to exclude `/~nobody' and `/~somebody'.
|
||||
You can also clear the lists in `.wgetrc' (*Note Wgetrc Syntax::).
|
||||
You can also clear the lists in `.wgetrc' (*note Wgetrc Syntax::).
|
||||
|
||||
wget -X '' -X /~nobody,/~somebody
|
||||
|
||||
@ -268,8 +276,8 @@ Basic Startup Options
|
||||
|
||||
`-e COMMAND'
|
||||
`--execute COMMAND'
|
||||
Execute COMMAND as if it were a part of `.wgetrc' (*Note Startup
|
||||
File::). A command thus invoked will be executed *after* the
|
||||
Execute COMMAND as if it were a part of `.wgetrc' (*note Startup
|
||||
File::). A command thus invoked will be executed _after_ the
|
||||
commands in `.wgetrc', thus taking precedence over them.
|
||||
|
||||
|
||||
@ -296,8 +304,8 @@ Logging and Input File Options
|
||||
administrator may have chosen to compile Wget without debug
|
||||
support, in which case `-d' will not work. Please note that
|
||||
compiling with debug support is always safe--Wget compiled with
|
||||
the debug support will *not* print any debug info unless requested
|
||||
with `-d'. *Note Reporting Bugs:: for more information on how to
|
||||
the debug support will _not_ print any debug info unless requested
|
||||
with `-d'. *Note Reporting Bugs::, for more information on how to
|
||||
use `-d' for sending bug reports.
|
||||
|
||||
`-q'
|
||||
@ -392,7 +400,7 @@ Download Options
|
||||
|
||||
When running wget with `-N', with or without `-r', the decision as
|
||||
to whether or not to download a newer copy of a file depends on
|
||||
the local and remote timestamp and size of the file (*Note
|
||||
the local and remote timestamp and size of the file (*note
|
||||
Time-Stamping::). `-nc' may not be specified at the same time as
|
||||
`-N'.
|
||||
|
||||
@ -449,7 +457,7 @@ Download Options
|
||||
|
||||
`-N'
|
||||
`--timestamping'
|
||||
Turn on time-stamping. *Note Time-Stamping:: for details.
|
||||
Turn on time-stamping. *Note Time-Stamping::, for details.
|
||||
|
||||
`-S'
|
||||
`--server-response'
|
||||
@ -491,7 +499,7 @@ Download Options
|
||||
retry.
|
||||
|
||||
`--waitretry=SECONDS'
|
||||
If you don't want Wget to wait between *every* retrieval, but only
|
||||
If you don't want Wget to wait between _every_ retrieval, but only
|
||||
between retries of failed downloads, you can use this option.
|
||||
Wget will use "linear backoff", waiting 1 second after the first
|
||||
failure on a given file, then waiting 2 seconds after the second
|
||||
@ -540,14 +548,14 @@ Directory Options
|
||||
`--force-directories'
|
||||
The opposite of `-nd'--create a hierarchy of directories, even if
|
||||
one would not have been created otherwise. E.g. `wget -x
|
||||
http://fly.cc.fer.hr/robots.txt' will save the downloaded file to
|
||||
`fly.cc.fer.hr/robots.txt'.
|
||||
http://fly.srk.fer.hr/robots.txt' will save the downloaded file to
|
||||
`fly.srk.fer.hr/robots.txt'.
|
||||
|
||||
`-nH'
|
||||
`--no-host-directories'
|
||||
Disable generation of host-prefixed directories. By default,
|
||||
invoking Wget with `-r http://fly.cc.fer.hr/' will create a
|
||||
structure of directories beginning with `fly.cc.fer.hr/'. This
|
||||
invoking Wget with `-r http://fly.srk.fer.hr/' will create a
|
||||
structure of directories beginning with `fly.srk.fer.hr/'. This
|
||||
option disables such behavior.
|
||||
|
||||
`--cut-dirs=NUMBER'
|
||||
@ -609,7 +617,7 @@ HTTP Options
|
||||
doesn't yet know that the URL produces output of type `text/html'.
|
||||
To prevent this re-downloading, you must use `-k' and `-K' so
|
||||
that the original version of the file will be saved as `X.orig'
|
||||
(*Note Recursive Retrieval Options::).
|
||||
(*note Recursive Retrieval Options::).
|
||||
|
||||
`--http-user=USER'
|
||||
`--http-passwd=PASSWORD'
|
||||
@ -619,7 +627,7 @@ HTTP Options
|
||||
scheme.
|
||||
|
||||
Another way to specify username and password is in the URL itself
|
||||
(*Note URL Format::). For more information about security issues
|
||||
(*note URL Format::). For more information about security issues
|
||||
with Wget, *Note Security Considerations::.
|
||||
|
||||
`-C on/off'
|
||||
@ -653,7 +661,7 @@ HTTP Options
|
||||
|
||||
wget --header='Accept-Charset: iso-8859-2' \
|
||||
--header='Accept-Language: hr' \
|
||||
http://fly.cc.fer.hr/
|
||||
http://fly.srk.fer.hr/
|
||||
|
||||
Specification of an empty string as the header value will clear all
|
||||
previous user-defined headers.
|
||||
@ -727,7 +735,7 @@ FTP Options
|
||||
and `]' to retrieve more than one file from the same directory at
|
||||
once, like:
|
||||
|
||||
wget ftp://gnjilux.cc.fer.hr/*.msg
|
||||
wget ftp://gnjilux.srk.fer.hr/*.msg
|
||||
|
||||
By default, globbing will be turned on if the URL contains a
|
||||
globbing character. This option may be used to turn globbing on
|
||||
@ -751,17 +759,17 @@ Recursive Retrieval Options
|
||||
|
||||
`-r'
|
||||
`--recursive'
|
||||
Turn on recursive retrieving. *Note Recursive Retrieval:: for more
|
||||
details.
|
||||
Turn on recursive retrieving. *Note Recursive Retrieval::, for
|
||||
more details.
|
||||
|
||||
`-l DEPTH'
|
||||
`--level=DEPTH'
|
||||
Specify recursion maximum depth level DEPTH (*Note Recursive
|
||||
Specify recursion maximum depth level DEPTH (*note Recursive
|
||||
Retrieval::). The default maximum depth is 5.
|
||||
|
||||
`--delete-after'
|
||||
This option tells Wget to delete every single file it downloads,
|
||||
*after* having done so. It is useful for pre-fetching popular
|
||||
_after_ having done so. It is useful for pre-fetching popular
|
||||
pages through a proxy, e.g.:
|
||||
|
||||
wget -r -nd --delete-after http://whatever.com/~popular/page/
|
||||
@ -788,7 +796,7 @@ Recursive Retrieval Options
|
||||
`-K'
|
||||
`--backup-converted'
|
||||
When converting a file, back up the original version with a `.orig'
|
||||
suffix. Affects the behavior of `-N' (*Note HTTP Time-Stamping
|
||||
suffix. Affects the behavior of `-N' (*note HTTP Time-Stamping
|
||||
Internals::).
|
||||
|
||||
`-m'
|
||||
@ -837,7 +845,7 @@ Recursive Retrieval Options
|
||||
|
||||
wget -r -l 2 -p http://SITE/1.html
|
||||
|
||||
all the above files *and* `3.html''s requisite `3.gif' will be
|
||||
all the above files _and_ `3.html''s requisite `3.gif' will be
|
||||
downloaded. Similarly,
|
||||
|
||||
wget -r -l 1 -p http://SITE/1.html
|
||||
@ -879,18 +887,18 @@ Recursive Accept/Reject Options
|
||||
`-A ACCLIST --accept ACCLIST'
|
||||
`-R REJLIST --reject REJLIST'
|
||||
Specify comma-separated lists of file name suffixes or patterns to
|
||||
accept or reject (*Note Types of Files:: for more details).
|
||||
accept or reject (*note Types of Files:: for more details).
|
||||
|
||||
`-D DOMAIN-LIST'
|
||||
`--domains=DOMAIN-LIST'
|
||||
Set domains to be accepted and DNS looked-up, where DOMAIN-LIST is
|
||||
a comma-separated list. Note that it does *not* turn on `-H'.
|
||||
a comma-separated list. Note that it does _not_ turn on `-H'.
|
||||
This option speeds things up, even if only one host is spanned
|
||||
(*Note Domain Acceptance::).
|
||||
(*note Domain Acceptance::).
|
||||
|
||||
`--exclude-domains DOMAIN-LIST'
|
||||
Exclude the domains given in a comma-separated DOMAIN-LIST from
|
||||
DNS-lookup (*Note Domain Acceptance::).
|
||||
DNS-lookup (*note Domain Acceptance::).
|
||||
|
||||
`--follow-ftp'
|
||||
Follow FTP links from HTML documents. Without this option, Wget
|
||||
@ -924,29 +932,29 @@ Recursive Accept/Reject Options
|
||||
`-H'
|
||||
`--span-hosts'
|
||||
Enable spanning across hosts when doing recursive retrieving
|
||||
(*Note All Hosts::).
|
||||
(*note All Hosts::).
|
||||
|
||||
`-L'
|
||||
`--relative'
|
||||
Follow relative links only. Useful for retrieving a specific home
|
||||
page without any distractions, not even those from the same hosts
|
||||
(*Note Relative Links::).
|
||||
(*note Relative Links::).
|
||||
|
||||
`-I LIST'
|
||||
`--include-directories=LIST'
|
||||
Specify a comma-separated list of directories you wish to follow
|
||||
when downloading (*Note Directory-Based Limits:: for more
|
||||
when downloading (*note Directory-Based Limits:: for more
|
||||
details.) Elements of LIST may contain wildcards.
|
||||
|
||||
`-X LIST'
|
||||
`--exclude-directories=LIST'
|
||||
Specify a comma-separated list of directories you wish to exclude
|
||||
from download (*Note Directory-Based Limits:: for more details.)
|
||||
from download (*note Directory-Based Limits:: for more details.)
|
||||
Elements of LIST may contain wildcards.
|
||||
|
||||
`-nh'
|
||||
`--no-host-lookup'
|
||||
Disable the time-consuming DNS lookup of almost all hosts (*Note
|
||||
Disable the time-consuming DNS lookup of almost all hosts (*note
|
||||
Host Checking::).
|
||||
|
||||
`-np'
|
||||
@ -954,8 +962,8 @@ Recursive Accept/Reject Options
|
||||
`--no-parent'
|
||||
Do not ever ascend to the parent directory when retrieving
|
||||
recursively. This is a useful option, since it guarantees that
|
||||
only the files *below* a certain hierarchy will be downloaded.
|
||||
*Note Directory-Based Limits:: for more details.
|
||||
only the files _below_ a certain hierarchy will be downloaded.
|
||||
*Note Directory-Based Limits::, for more details.
|
||||
|
||||
|
||||
File: wget.info, Node: Recursive Retrieval, Next: Following Links, Prev: Invoking, Up: Top
|
||||
@ -1003,7 +1011,7 @@ which can grind the machine to a halt.
|
||||
(`-l') and/or by lowering the number of retries (`-t'). You may also
|
||||
consider using the `-w' option to slow down your requests to the remote
|
||||
servers, as well as the numerous options to narrow the number of
|
||||
followed links (*Note Following Links::).
|
||||
followed links (*note Following Links::).
|
||||
|
||||
Recursive retrieval is a good thing when used properly. Please take
|
||||
all precautions not to wreak havoc through carelessness.
|
||||
@ -1019,7 +1027,7 @@ unnecessary data. Most of the time the users bear in mind exactly what
|
||||
they want to download, and want Wget to follow only specific links.
|
||||
|
||||
For example, if you wish to download the music archive from
|
||||
`fly.cc.fer.hr', you will not want to download all the home pages that
|
||||
`fly.srk.fer.hr', you will not want to download all the home pages that
|
||||
happen to be referenced by an obscure part of the archive.
|
||||
|
||||
Wget possesses several mechanisms that allows you to fine-tune which
|
||||
@ -1061,13 +1069,13 @@ following links) all URLs that refer to the same host will be retrieved.
|
||||
|
||||
The problem with this option are the aliases of the hosts and
|
||||
domains. Thus there is no way for Wget to know that `regoc.srce.hr' and
|
||||
`www.srce.hr' are the same host, or that `fly.cc.fer.hr' is the same as
|
||||
`fly.cc.etf.hr'. Whenever an absolute link is encountered, the host is
|
||||
DNS-looked-up with `gethostbyname' to check whether we are maybe
|
||||
`www.srce.hr' are the same host, or that `fly.srk.fer.hr' is the same
|
||||
as `fly.cc.fer.hr'. Whenever an absolute link is encountered, the host
|
||||
is DNS-looked-up with `gethostbyname' to check whether we are maybe
|
||||
dealing with the same hosts. Although the results of `gethostbyname'
|
||||
are cached, it is still a great slowdown, e.g. when dealing with large
|
||||
indices of home pages on different hosts (because each of the hosts
|
||||
must be DNS-resolved to see whether it just *might* be an alias of the
|
||||
must be DNS-resolved to see whether it just _might_ be an alias of the
|
||||
starting host).
|
||||
|
||||
To avoid the overhead you may use `-nh', which will turn off
|
||||
@ -1079,7 +1087,7 @@ and `regoc.srce.hr' will be flagged as different hosts).
|
||||
"virtual servers", each having its own directory hierarchy. Such
|
||||
"servers" are distinguished by their hostnames (all of which point to
|
||||
the same IP address); for this to work, a client must send a `Host'
|
||||
header, which is what Wget does. However, in that case Wget *must not*
|
||||
header, which is what Wget does. However, in that case Wget _must not_
|
||||
try to divine a host's "real" address, nor try to use the same hostname
|
||||
for each access, i.e. `-nh' must be turned on.
|
||||
|
||||
@ -1098,17 +1106,17 @@ Domain Acceptance
|
||||
followed. The hosts the domain of which is not in this list will not be
|
||||
DNS-resolved. Thus you can specify `-Dmit.edu' just to make sure that
|
||||
*nothing outside of MIT gets looked up*. This is very important and
|
||||
useful. It also means that `-D' does *not* imply `-H' (span all
|
||||
useful. It also means that `-D' does _not_ imply `-H' (span all
|
||||
hosts), which must be specified explicitly. Feel free to use this
|
||||
options since it will speed things up, with almost all the reliability
|
||||
of checking for all hosts. Thus you could invoke
|
||||
|
||||
wget -r -D.hr http://fly.cc.fer.hr/
|
||||
wget -r -D.hr http://fly.srk.fer.hr/
|
||||
|
||||
to make sure that only the hosts in `.hr' domain get DNS-looked-up
|
||||
for being equal to `fly.cc.fer.hr'. So `fly.cc.etf.hr' will be checked
|
||||
(only once!) and found equal, but `www.gnu.ai.mit.edu' will not even be
|
||||
checked.
|
||||
for being equal to `fly.srk.fer.hr'. So `fly.cc.fer.hr' will be
|
||||
checked (only once!) and found equal, but `www.gnu.ai.mit.edu' will not
|
||||
even be checked.
|
||||
|
||||
Of course, domain acceptance can be used to limit the retrieval to
|
||||
particular domains with spanning of hosts in them, but then you must
|
||||
@ -1121,7 +1129,7 @@ and Stanford.
|
||||
|
||||
If there are domains you want to exclude specifically, you can do it
|
||||
with `--exclude-domains', which accepts the same type of arguments of
|
||||
`-D', but will *exclude* all the listed domains. For example, if you
|
||||
`-D', but will _exclude_ all the listed domains. For example, if you
|
||||
want to download all the hosts from `foo.edu' domain, with the
|
||||
exception of `sunsite.foo.edu', you can do it like this:
|
||||
|
||||
@ -1177,7 +1185,7 @@ in `.wgetrc'.
|
||||
`--reject REJLIST'
|
||||
`reject = REJLIST'
|
||||
The `--reject' option works the same way as `--accept', only its
|
||||
logic is the reverse; Wget will download all files *except* the
|
||||
logic is the reverse; Wget will download all files _except_ the
|
||||
ones matching the suffixes (or patterns) in the list.
|
||||
|
||||
So, if you want to download a whole page except for the cumbersome
|
||||
@ -1189,7 +1197,7 @@ in `.wgetrc'.
|
||||
The `-A' and `-R' options may be combined to achieve even better
|
||||
fine-tuning of which files to retrieve. E.g. `wget -A "*zelazny*" -R
|
||||
.ps' will download all the files having `zelazny' as a part of their
|
||||
name, but *not* the PostScript files.
|
||||
name, but _not_ the PostScript files.
|
||||
|
||||
Note that these two options do not affect the downloading of HTML
|
||||
files; Wget must load all the HTMLs to know where to go at
|
||||
|
520
doc/wget.info-2
520
doc/wget.info-2
@ -1,5 +1,4 @@
|
||||
This is Info file wget.info, produced by Makeinfo version 1.68 from the
|
||||
input file ./wget.texi.
|
||||
This is wget.info, produced by makeinfo version 4.0 from wget.texi.
|
||||
|
||||
INFO-DIR-SECTION Net Utilities
|
||||
INFO-DIR-SECTION World Wide Web
|
||||
@ -16,12 +15,13 @@ data.
|
||||
manual provided the copyright notice and this permission notice are
|
||||
preserved on all copies.
|
||||
|
||||
Permission is granted to copy and distribute modified versions of
|
||||
this manual under the conditions for verbatim copying, provided also
|
||||
that the sections entitled "Copying" and "GNU General Public License"
|
||||
are included exactly as in the original, and provided that the entire
|
||||
resulting derived work is distributed under the terms of a permission
|
||||
notice identical to this one.
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.1 or
|
||||
any later version published by the Free Software Foundation; with the
|
||||
Invariant Sections being "GNU General Public License" and "GNU Free
|
||||
Documentation License", with no Front-Cover Texts, and with no
|
||||
Back-Cover Texts. A copy of the license is included in the section
|
||||
entitled "GNU Free Documentation License".
|
||||
|
||||
|
||||
File: wget.info, Node: Directory-Based Limits, Next: FTP Links, Prev: Types of Files, Up: Following Links
|
||||
@ -57,7 +57,7 @@ equivalent command in `.wgetrc'.
|
||||
`--exclude LIST'
|
||||
`exclude_directories = LIST'
|
||||
`-X' option is exactly the reverse of `-I'--this is a list of
|
||||
directories *excluded* from the download. E.g. if you do not want
|
||||
directories _excluded_ from the download. E.g. if you do not want
|
||||
Wget to download things from `/cgi-bin' directory, specify `-X
|
||||
/cgi-bin' on the command line.
|
||||
|
||||
@ -184,7 +184,7 @@ remote file is more recent, Wget will proceed fetching it normally.
|
||||
|
||||
`ls' will show that the timestamps are set according to the state on
|
||||
the remote server. Reissuing the command with `-N' will make Wget
|
||||
re-fetch *only* the files that have been modified.
|
||||
re-fetch _only_ the files that have been modified.
|
||||
|
||||
In both HTTP and FTP retrieval Wget will time-stamp the local file
|
||||
correctly (with or without `-N') if it gets the stamps, i.e. gets the
|
||||
@ -300,7 +300,7 @@ further attempts will be made.
|
||||
If `WGETRC' is not set, Wget will try to load `$HOME/.wgetrc'.
|
||||
|
||||
The fact that user's settings are loaded after the system-wide ones
|
||||
means that in case of collision user's wgetrc *overrides* the
|
||||
means that in case of collision user's wgetrc _overrides_ the
|
||||
system-wide wgetrc (in `/usr/local/etc/wgetrc' by default). Fascist
|
||||
admins, away!
|
||||
|
||||
@ -346,11 +346,11 @@ hostnames or dotted-quad IP addresses. N can be any positive integer,
|
||||
or `inf' for infinity, where appropriate. STRING values can be any
|
||||
non-empty string.
|
||||
|
||||
Most of these commands have commandline equivalents (*Note
|
||||
Most of these commands have commandline equivalents (*note
|
||||
Invoking::), though some of the more obscure or rarely used ones do not.
|
||||
|
||||
accept/reject = STRING
|
||||
Same as `-A'/`-R' (*Note Types of Files::).
|
||||
Same as `-A'/`-R' (*note Types of Files::).
|
||||
|
||||
add_hostdir = on/off
|
||||
Enable/disable host-prefixed file names. `-nH' disables it.
|
||||
@ -397,14 +397,14 @@ dirstruct = on/off
|
||||
respectively.
|
||||
|
||||
domains = STRING
|
||||
Same as `-D' (*Note Domain Acceptance::).
|
||||
Same as `-D' (*note Domain Acceptance::).
|
||||
|
||||
dot_bytes = N
|
||||
Specify the number of bytes "contained" in a dot, as seen
|
||||
throughout the retrieval (1024 by default). You can postfix the
|
||||
value with `k' or `m', representing kilobytes and megabytes,
|
||||
respectively. With dot settings you can tailor the dot retrieval
|
||||
to suit your needs, or you can use the predefined "styles" (*Note
|
||||
to suit your needs, or you can use the predefined "styles" (*note
|
||||
Download Options::).
|
||||
|
||||
dots_in_line = N
|
||||
@ -419,10 +419,10 @@ dot_style = STRING
|
||||
|
||||
exclude_directories = STRING
|
||||
Specify a comma-separated list of directories you wish to exclude
|
||||
from download - the same as `-X' (*Note Directory-Based Limits::).
|
||||
from download - the same as `-X' (*note Directory-Based Limits::).
|
||||
|
||||
exclude_domains = STRING
|
||||
Same as `--exclude-domains' (*Note Domain Acceptance::).
|
||||
Same as `--exclude-domains' (*note Domain Acceptance::).
|
||||
|
||||
follow_ftp = on/off
|
||||
Follow FTP links from HTML documents - the same as `-f'.
|
||||
@ -497,7 +497,7 @@ noclobber = on/off
|
||||
|
||||
no_parent = on/off
|
||||
Disallow retrieving outside the directory hierarchy, like
|
||||
`--no-parent' (*Note Directory-Based Limits::).
|
||||
`--no-parent' (*note Directory-Based Limits::).
|
||||
|
||||
no_proxy = STRING
|
||||
Use STRING as the comma-separated list of domains to avoid in
|
||||
@ -550,7 +550,7 @@ recursive = on/off
|
||||
Recursive on/off - the same as `-r'.
|
||||
|
||||
relative_only = on/off
|
||||
Follow only relative links - the same as `-L' (*Note Relative
|
||||
Follow only relative links - the same as `-L' (*note Relative
|
||||
Links::).
|
||||
|
||||
remove_listing = on/off
|
||||
@ -562,7 +562,7 @@ retr_symlinks = on/off
|
||||
files; the same as `--retr-symlinks'.
|
||||
|
||||
robots = on/off
|
||||
Use (or not) `/robots.txt' file (*Note Robots::). Be sure to know
|
||||
Use (or not) `/robots.txt' file (*note Robots::). Be sure to know
|
||||
what you are doing before changing the default (which is `on').
|
||||
|
||||
server_response = on/off
|
||||
@ -570,7 +570,7 @@ server_response = on/off
|
||||
the same as `-S'.
|
||||
|
||||
simple_host_check = on/off
|
||||
Same as `-nh' (*Note Host Checking::).
|
||||
Same as `-nh' (*note Host Checking::).
|
||||
|
||||
span_hosts = on/off
|
||||
Same as `-H'.
|
||||
@ -579,7 +579,7 @@ timeout = N
|
||||
Set timeout value - the same as `-T'.
|
||||
|
||||
timestamping = on/off
|
||||
Turn timestamping on/off. The same as `-N' (*Note Time-Stamping::).
|
||||
Turn timestamping on/off. The same as `-N' (*note Time-Stamping::).
|
||||
|
||||
tries = N
|
||||
Set number of retries per URL - the same as `-t'.
|
||||
@ -613,114 +613,6 @@ Be careful about the things you change.
|
||||
have any effect, you must remove the `#' character at the beginning of
|
||||
its line.
|
||||
|
||||
###
|
||||
### Sample Wget initialization file .wgetrc
|
||||
###
|
||||
|
||||
## You can use this file to change the default behaviour of wget or to
|
||||
## avoid having to type many many command-line options. This file does
|
||||
## not contain a comprehensive list of commands -- look at the manual
|
||||
## to find out what you can put into this file.
|
||||
##
|
||||
## Wget initialization file can reside in /usr/local/etc/wgetrc
|
||||
## (global, for all users) or $HOME/.wgetrc (for a single user).
|
||||
##
|
||||
## To use the settings in this file, you will have to uncomment them,
|
||||
## as well as change them, in most cases, as the values on the
|
||||
## commented-out lines are the default values (e.g. "off").
|
||||
|
||||
|
||||
##
|
||||
## Global settings (useful for setting up in /usr/local/etc/wgetrc).
|
||||
## Think well before you change them, since they may reduce wget's
|
||||
## functionality, and make it behave contrary to the documentation:
|
||||
##
|
||||
|
||||
# You can set retrieve quota for beginners by specifying a value
|
||||
# optionally followed by 'K' (kilobytes) or 'M' (megabytes). The
|
||||
# default quota is unlimited.
|
||||
#quota = inf
|
||||
|
||||
# You can lower (or raise) the default number of retries when
|
||||
# downloading a file (default is 20).
|
||||
#tries = 20
|
||||
|
||||
# Lowering the maximum depth of the recursive retrieval is handy to
|
||||
# prevent newbies from going too "deep" when they unwittingly start
|
||||
# the recursive retrieval. The default is 5.
|
||||
#reclevel = 5
|
||||
|
||||
# Many sites are behind firewalls that do not allow initiation of
|
||||
# connections from the outside. On these sites you have to use the
|
||||
# `passive' feature of FTP. If you are behind such a firewall, you
|
||||
# can turn this on to make Wget use passive FTP by default.
|
||||
#passive_ftp = off
|
||||
|
||||
# The "wait" command below makes Wget wait between every connection.
|
||||
# If, instead, you want Wget to wait only between retries of failed
|
||||
# downloads, set waitretry to maximum number of seconds to wait (Wget
|
||||
# will use "linear backoff", waiting 1 second after the first failure
|
||||
# on a file, 2 seconds after the second failure, etc. up to this max).
|
||||
waitretry = 10
|
||||
|
||||
|
||||
##
|
||||
## Local settings (for a user to set in his $HOME/.wgetrc). It is
|
||||
## *highly* undesirable to put these settings in the global file, since
|
||||
## they are potentially dangerous to "normal" users.
|
||||
##
|
||||
## Even when setting up your own ~/.wgetrc, you should know what you
|
||||
## are doing before doing so.
|
||||
##
|
||||
|
||||
# Set this to on to use timestamping by default:
|
||||
#timestamping = off
|
||||
|
||||
# It is a good idea to make Wget send your email address in a `From:'
|
||||
# header with your request (so that server administrators can contact
|
||||
# you in case of errors). Wget does *not* send `From:' by default.
|
||||
#header = From: Your Name <username@site.domain>
|
||||
|
||||
# You can set up other headers, like Accept-Language. Accept-Language
|
||||
# is *not* sent by default.
|
||||
#header = Accept-Language: en
|
||||
|
||||
# You can set the default proxy for Wget to use. It will override the
|
||||
# value in the environment.
|
||||
#http_proxy = http://proxy.yoyodyne.com:18023/
|
||||
|
||||
# If you do not want to use proxy at all, set this to off.
|
||||
#use_proxy = on
|
||||
|
||||
# You can customize the retrieval outlook. Valid options are default,
|
||||
# binary, mega and micro.
|
||||
#dot_style = default
|
||||
|
||||
# Setting this to off makes Wget not download /robots.txt. Be sure to
|
||||
# know *exactly* what /robots.txt is and how it is used before changing
|
||||
# the default!
|
||||
#robots = on
|
||||
|
||||
# It can be useful to make Wget wait between connections. Set this to
|
||||
# the number of seconds you want Wget to wait.
|
||||
#wait = 0
|
||||
|
||||
# You can force creating directory structure, even if a single is being
|
||||
# retrieved, by setting this to on.
|
||||
#dirstruct = off
|
||||
|
||||
# You can turn on recursive retrieving by default (don't do this if
|
||||
# you are not sure you know what it means) by setting this to on.
|
||||
#recursive = off
|
||||
|
||||
# To always back up file X as X.orig before converting its links (due
|
||||
# to -k / --convert-links / convert_links = on having been specified),
|
||||
# set this variable to on:
|
||||
#backup_converted = off
|
||||
|
||||
# To have Wget follow FTP links from HTML files by default, set this
|
||||
# to on:
|
||||
#follow_ftp = off
|
||||
|
||||
|
||||
File: wget.info, Node: Examples, Next: Various, Prev: Startup File, Up: Top
|
||||
@ -748,13 +640,13 @@ Simple Usage
|
||||
|
||||
* Say you want to download a URL. Just type:
|
||||
|
||||
wget http://fly.cc.fer.hr/
|
||||
wget http://fly.srk.fer.hr/
|
||||
|
||||
The response will be something like:
|
||||
|
||||
--13:30:45-- http://fly.cc.fer.hr:80/en/
|
||||
--13:30:45-- http://fly.srk.fer.hr:80/en/
|
||||
=> `index.html'
|
||||
Connecting to fly.cc.fer.hr:80... connected!
|
||||
Connecting to fly.srk.fer.hr:80... connected!
|
||||
HTTP request sent, awaiting response... 200 OK
|
||||
Length: 4,694 [text/html]
|
||||
|
||||
@ -770,13 +662,13 @@ Simple Usage
|
||||
the number of tries to 45, to insure that the whole file will
|
||||
arrive safely:
|
||||
|
||||
wget --tries=45 http://fly.cc.fer.hr/jpg/flyweb.jpg
|
||||
wget --tries=45 http://fly.srk.fer.hr/jpg/flyweb.jpg
|
||||
|
||||
* Now let's leave Wget to work in the background, and write its
|
||||
progress to log file `log'. It is tiring to type `--tries', so we
|
||||
shall use `-t'.
|
||||
|
||||
wget -t 45 -o log http://fly.cc.fer.hr/jpg/flyweb.jpg &
|
||||
wget -t 45 -o log http://fly.srk.fer.hr/jpg/flyweb.jpg &
|
||||
|
||||
The ampersand at the end of the line makes sure that Wget works in
|
||||
the background. To unlimit the number of retries, use `-t inf'.
|
||||
@ -784,10 +676,10 @@ Simple Usage
|
||||
* The usage of FTP is as simple. Wget will take care of login and
|
||||
password.
|
||||
|
||||
$ wget ftp://gnjilux.cc.fer.hr/welcome.msg
|
||||
--10:08:47-- ftp://gnjilux.cc.fer.hr:21/welcome.msg
|
||||
$ wget ftp://gnjilux.srk.fer.hr/welcome.msg
|
||||
--10:08:47-- ftp://gnjilux.srk.fer.hr:21/welcome.msg
|
||||
=> `welcome.msg'
|
||||
Connecting to gnjilux.cc.fer.hr:21... connected!
|
||||
Connecting to gnjilux.srk.fer.hr:21... connected!
|
||||
Logging in as anonymous ... Logged in!
|
||||
==> TYPE I ... done. ==> CWD not needed.
|
||||
==> PORT ... done. ==> RETR welcome.msg ... done.
|
||||
@ -848,9 +740,9 @@ Advanced Usage
|
||||
wget -r -l1 --no-parent -A.gif http://host/dir/
|
||||
|
||||
It is a bit of a kludge, but it works. `-r -l1' means to retrieve
|
||||
recursively (*Note Recursive Retrieval::), with maximum depth of 1.
|
||||
recursively (*note Recursive Retrieval::), with maximum depth of 1.
|
||||
`--no-parent' means that references to the parent directory are
|
||||
ignored (*Note Directory-Based Limits::), and `-A.gif' means to
|
||||
ignored (*note Directory-Based Limits::), and `-A.gif' means to
|
||||
download only the GIF files. `-A "*.gif"' would have worked too.
|
||||
|
||||
* Suppose you were in the middle of downloading, when Wget was
|
||||
@ -860,13 +752,13 @@ Advanced Usage
|
||||
wget -nc -r http://www.gnu.ai.mit.edu/
|
||||
|
||||
* If you want to encode your own username and password to HTTP or
|
||||
FTP, use the appropriate URL syntax (*Note URL Format::).
|
||||
FTP, use the appropriate URL syntax (*note URL Format::).
|
||||
|
||||
wget ftp://hniksic:mypassword@jagor.srce.hr/.emacs
|
||||
|
||||
* If you do not like the default retrieval visualization (1K dots
|
||||
with 10 dots per cluster and 50 dots per line), you can customize
|
||||
it through dot settings (*Note Wgetrc Commands::). For example,
|
||||
it through dot settings (*note Wgetrc Commands::). For example,
|
||||
many people like the "binary" style of retrieval, with 8K dots and
|
||||
512K lines:
|
||||
|
||||
@ -875,10 +767,10 @@ Advanced Usage
|
||||
You can experiment with other styles, like:
|
||||
|
||||
wget --dot-style=mega ftp://ftp.xemacs.org/pub/xemacs/xemacs-20.4/xemacs-20.4.tar.gz
|
||||
wget --dot-style=micro http://fly.cc.fer.hr/
|
||||
wget --dot-style=micro http://fly.srk.fer.hr/
|
||||
|
||||
To make these settings permanent, put them in your `.wgetrc', as
|
||||
described before (*Note Sample Wgetrc::).
|
||||
described before (*note Sample Wgetrc::).
|
||||
|
||||
|
||||
File: wget.info, Node: Guru Usage, Prev: Advanced Usage, Up: Examples
|
||||
@ -902,7 +794,7 @@ Guru Usage
|
||||
|
||||
* But what about mirroring the hosts networkologically close to you?
|
||||
It seems so awfully slow because of all that DNS resolving. Just
|
||||
use `-D' (*Note Domain Acceptance::).
|
||||
use `-D' (*note Domain Acceptance::).
|
||||
|
||||
wget -rN -Dsrce.hr http://www.srce.hr/
|
||||
|
||||
@ -976,7 +868,7 @@ the following environment variables:
|
||||
|
||||
`no_proxy'
|
||||
This variable should contain a comma-separated list of domain
|
||||
extensions proxy should *not* be used for. For instance, if the
|
||||
extensions proxy should _not_ be used for. For instance, if the
|
||||
value of `no_proxy' is `.mit.edu', proxy will not be used to
|
||||
retrieve documents from MIT.
|
||||
|
||||
@ -1022,7 +914,7 @@ Distribution
|
||||
Like all GNU utilities, the latest version of Wget can be found at
|
||||
the master GNU archive site prep.ai.mit.edu, and its mirrors. For
|
||||
example, Wget 1.5.3+dev can be found at
|
||||
`ftp://prep.ai.mit.edu/gnu/wget/wget-1.5.3+dev.tar.gz'
|
||||
<ftp://prep.ai.mit.edu/gnu/wget/wget-1.5.3+dev.tar.gz>
|
||||
|
||||
|
||||
File: wget.info, Node: Mailing List, Next: Reporting Bugs, Prev: Distribution, Up: Various
|
||||
@ -1040,7 +932,7 @@ subscribe. The more people on the list, the better!
|
||||
magic word `subscribe' in the subject line. Unsubscribe by mailing to
|
||||
<wget-unsubscribe@sunsite.auc.dk>.
|
||||
|
||||
The mailing list is archived at `http://fly.cc.fer.hr/archive/wget'.
|
||||
The mailing list is archived at <http://fly.srk.fer.hr/archive/wget>.
|
||||
|
||||
|
||||
File: wget.info, Node: Reporting Bugs, Next: Portability, Prev: Mailing List, Up: Various
|
||||
@ -1076,7 +968,7 @@ simple guidelines.
|
||||
|
||||
3. Please start Wget with `-d' option and send the log (or the
|
||||
relevant parts of it). If Wget was compiled without debug support,
|
||||
recompile it. It is *much* easier to trace bugs with debug support
|
||||
recompile it. It is _much_ easier to trace bugs with debug support
|
||||
on.
|
||||
|
||||
4. If Wget has crashed, try to run it in a debugger, e.g. `gdb `which
|
||||
@ -1138,9 +1030,7 @@ File: wget.info, Node: Appendices, Next: Copying, Prev: Various, Up: Top
|
||||
Appendices
|
||||
**********
|
||||
|
||||
This chapter contains some references I consider useful, like the
|
||||
Robots Exclusion Standard specification, as well as a list of
|
||||
contributors to GNU Wget.
|
||||
This chapter contains some references I consider useful.
|
||||
|
||||
* Menu:
|
||||
|
||||
@ -1154,176 +1044,61 @@ File: wget.info, Node: Robots, Next: Security Considerations, Prev: Appendice
|
||||
Robots
|
||||
======
|
||||
|
||||
Since Wget is able to traverse the web, it counts as one of the Web
|
||||
"robots". Thus Wget understands "Robots Exclusion Standard"
|
||||
(RES)--contents of `/robots.txt', used by server administrators to
|
||||
shield parts of their systems from wanderings of Wget.
|
||||
It is extremely easy to make Wget wander aimlessly around a web site,
|
||||
sucking all the available data in progress. `wget -r SITE', and you're
|
||||
set. Great? Not for the server admin.
|
||||
|
||||
While Wget is retrieving static pages, there's not much of a problem.
|
||||
But for Wget, there is no real difference between the smallest static
|
||||
page and the hardest, most demanding CGI or dynamic page. For instance,
|
||||
a site I know has a section handled by an, uh, bitchin' CGI script that
|
||||
converts all the Info files to HTML. The script can and does bring the
|
||||
machine to its knees without providing anything useful to the
|
||||
downloader.
|
||||
|
||||
For such and similar cases various robot exclusion schemes have been
|
||||
devised as a means for the server administrators and document authors to
|
||||
protect chosen portions of their sites from the wandering of robots.
|
||||
|
||||
The more popular mechanism is the "Robots Exclusion Standard"
|
||||
written by Martijn Koster et al. in 1994. It is specified by placing a
|
||||
file named `/robots.txt' in the server root, which the robots are
|
||||
supposed to download and parse. Wget supports this specification.
|
||||
|
||||
Norobots support is turned on only when retrieving recursively, and
|
||||
*never* for the first page. Thus, you may issue:
|
||||
_never_ for the first page. Thus, you may issue:
|
||||
|
||||
wget -r http://fly.cc.fer.hr/
|
||||
wget -r http://fly.srk.fer.hr/
|
||||
|
||||
First the index of fly.cc.fer.hr will be downloaded. If Wget finds
|
||||
anything worth downloading on the same host, only *then* will it load
|
||||
First the index of fly.srk.fer.hr will be downloaded. If Wget finds
|
||||
anything worth downloading on the same host, only _then_ will it load
|
||||
the robots, and decide whether or not to load the links after all.
|
||||
`/robots.txt' is loaded only once per host. Wget does not support the
|
||||
robots `META' tag.
|
||||
`/robots.txt' is loaded only once per host.
|
||||
|
||||
The description of the norobots standard was written, and is
|
||||
maintained by Martijn Koster <m.koster@webcrawler.com>. With his
|
||||
permission, I contribute a (slightly modified) TeXified version of the
|
||||
RES.
|
||||
Note that the exlusion standard discussed here has undergone some
|
||||
revisions. However, but Wget supports only the first version of RES,
|
||||
the one written by Martijn Koster in 1994, available at
|
||||
<http://info.webcrawler.com/mak/projects/robots/norobots.html>. A
|
||||
later version exists in the form of an internet draft
|
||||
<draft-koster-robots-00.txt> titled "A Method for Web Robots Control",
|
||||
which expired on June 4, 1997. I am not aware if it ever made to an
|
||||
RFC. The text of the draft is available at
|
||||
<http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>.
|
||||
Wget does not yet support the new directives specified by this draft,
|
||||
but we plan to add them.
|
||||
|
||||
* Menu:
|
||||
This manual no longer includes the text of the old standard.
|
||||
|
||||
* Introduction to RES::
|
||||
* RES Format::
|
||||
* User-Agent Field::
|
||||
* Disallow Field::
|
||||
* Norobots Examples::
|
||||
The second, less known mechanism, enables the author of an individual
|
||||
document to specify whether they want the links from the file to be
|
||||
followed by a robot. This is achieved using the `META' tag, like this:
|
||||
|
||||
|
||||
File: wget.info, Node: Introduction to RES, Next: RES Format, Prev: Robots, Up: Robots
|
||||
<meta name="robots" content="nofollow">
|
||||
|
||||
Introduction to RES
|
||||
-------------------
|
||||
|
||||
"WWW Robots" (also called "wanderers" or "spiders") are programs
|
||||
that traverse many pages in the World Wide Web by recursively
|
||||
retrieving linked pages. For more information see the robots page.
|
||||
|
||||
In 1993 and 1994 there have been occasions where robots have visited
|
||||
WWW servers where they weren't welcome for various reasons. Sometimes
|
||||
these reasons were robot specific, e.g. certain robots swamped servers
|
||||
with rapid-fire requests, or retrieved the same files repeatedly. In
|
||||
other situations robots traversed parts of WWW servers that weren't
|
||||
suitable, e.g. very deep virtual trees, duplicated information,
|
||||
temporary information, or cgi-scripts with side-effects (such as
|
||||
voting).
|
||||
|
||||
These incidents indicated the need for established mechanisms for
|
||||
WWW servers to indicate to robots which parts of their server should
|
||||
not be accessed. This standard addresses this need with an operational
|
||||
solution.
|
||||
|
||||
This document represents a consensus on 30 June 1994 on the robots
|
||||
mailing list (`robots@webcrawler.com'), between the majority of robot
|
||||
authors and other people with an interest in robots. It has also been
|
||||
open for discussion on the Technical World Wide Web mailing list
|
||||
(`www-talk@info.cern.ch'). This document is based on a previous working
|
||||
draft under the same title.
|
||||
|
||||
It is not an official standard backed by a standards body, or owned
|
||||
by any commercial organization. It is not enforced by anybody, and there
|
||||
no guarantee that all current and future robots will use it. Consider
|
||||
it a common facility the majority of robot authors offer the WWW
|
||||
community to protect WWW server against unwanted accesses by their
|
||||
robots.
|
||||
|
||||
The latest version of this document can be found at
|
||||
`http://info.webcrawler.com/mak/projects/robots/norobots.html'.
|
||||
|
||||
|
||||
File: wget.info, Node: RES Format, Next: User-Agent Field, Prev: Introduction to RES, Up: Robots
|
||||
|
||||
RES Format
|
||||
----------
|
||||
|
||||
The format and semantics of the `/robots.txt' file are as follows:
|
||||
|
||||
The file consists of one or more records separated by one or more
|
||||
blank lines (terminated by `CR', `CR/NL', or `NL'). Each record
|
||||
contains lines of the form:
|
||||
|
||||
<field>:<optionalspace><value><optionalspace>
|
||||
|
||||
The field name is case insensitive.
|
||||
|
||||
Comments can be included in file using UNIX Bourne shell conventions:
|
||||
the `#' character is used to indicate that preceding space (if any) and
|
||||
the remainder of the line up to the line termination is discarded.
|
||||
Lines containing only a comment are discarded completely, and therefore
|
||||
do not indicate a record boundary.
|
||||
|
||||
The record starts with one or more User-agent lines, followed by one
|
||||
or more Disallow lines, as detailed below. Unrecognized headers are
|
||||
ignored.
|
||||
|
||||
The presence of an empty `/robots.txt' file has no explicit
|
||||
associated semantics, it will be treated as if it was not present, i.e.
|
||||
all robots will consider themselves welcome.
|
||||
|
||||
|
||||
File: wget.info, Node: User-Agent Field, Next: Disallow Field, Prev: RES Format, Up: Robots
|
||||
|
||||
User-Agent Field
|
||||
----------------
|
||||
|
||||
The value of this field is the name of the robot the record is
|
||||
describing access policy for.
|
||||
|
||||
If more than one User-agent field is present the record describes an
|
||||
identical access policy for more than one robot. At least one field
|
||||
needs to be present per record.
|
||||
|
||||
The robot should be liberal in interpreting this field. A case
|
||||
insensitive substring match of the name without version information is
|
||||
recommended.
|
||||
|
||||
If the value is `*', the record describes the default access policy
|
||||
for any robot that has not matched any of the other records. It is not
|
||||
allowed to have multiple such records in the `/robots.txt' file.
|
||||
|
||||
|
||||
File: wget.info, Node: Disallow Field, Next: Norobots Examples, Prev: User-Agent Field, Up: Robots
|
||||
|
||||
Disallow Field
|
||||
--------------
|
||||
|
||||
The value of this field specifies a partial URL that is not to be
|
||||
visited. This can be a full path, or a partial path; any URL that
|
||||
starts with this value will not be retrieved. For example,
|
||||
`Disallow: /help' disallows both `/help.html' and `/help/index.html',
|
||||
whereas `Disallow: /help/' would disallow `/help/index.html' but allow
|
||||
`/help.html'.
|
||||
|
||||
Any empty value, indicates that all URLs can be retrieved. At least
|
||||
one Disallow field needs to be present in a record.
|
||||
|
||||
|
||||
File: wget.info, Node: Norobots Examples, Prev: Disallow Field, Up: Robots
|
||||
|
||||
Norobots Examples
|
||||
-----------------
|
||||
|
||||
The following example `/robots.txt' file specifies that no robots
|
||||
should visit any URL starting with `/cyberworld/map/' or `/tmp/':
|
||||
|
||||
# robots.txt for http://www.site.com/
|
||||
|
||||
User-agent: *
|
||||
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
|
||||
Disallow: /tmp/ # these will soon disappear
|
||||
|
||||
This example `/robots.txt' file specifies that no robots should
|
||||
visit any URL starting with `/cyberworld/map/', except the robot called
|
||||
`cybermapper':
|
||||
|
||||
# robots.txt for http://www.site.com/
|
||||
|
||||
User-agent: *
|
||||
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
|
||||
|
||||
# Cybermapper knows where to go.
|
||||
User-agent: cybermapper
|
||||
Disallow:
|
||||
|
||||
This example indicates that no robots should visit this site further:
|
||||
|
||||
# go away
|
||||
User-agent: *
|
||||
Disallow: /
|
||||
This is explained in some detail at
|
||||
<http://info.webcrawler.com/mak/projects/robots/meta-user.html>.
|
||||
Unfortunately, Wget does not support this method of robot exclusion yet,
|
||||
but it will be implemented in the next release.
|
||||
|
||||
|
||||
File: wget.info, Node: Security Considerations, Next: Contributors, Prev: Robots, Up: Appendices
|
||||
@ -1350,3 +1125,124 @@ Here are the main issues, and some solutions.
|
||||
being careful when you send debug logs (yes, even when you send
|
||||
them to me).
|
||||
|
||||
|
||||
File: wget.info, Node: Contributors, Prev: Security Considerations, Up: Appendices
|
||||
|
||||
Contributors
|
||||
============
|
||||
|
||||
GNU Wget was written by Hrvoje Niksic <hniksic@arsdigita.com>.
|
||||
However, its development could never have gone as far as it has, were it
|
||||
not for the help of many people, either with bug reports, feature
|
||||
proposals, patches, or letters saying "Thanks!".
|
||||
|
||||
Special thanks goes to the following people (no particular order):
|
||||
|
||||
* Karsten Thygesen--donated system resources such as the mailing
|
||||
list, web space, and FTP space, along with a lot of time to make
|
||||
these actually work.
|
||||
|
||||
* Shawn McHorse--bug reports and patches.
|
||||
|
||||
* Kaveh R. Ghazi--on-the-fly `ansi2knr'-ization. Lots of
|
||||
portability fixes.
|
||||
|
||||
* Gordon Matzigkeit--`.netrc' support.
|
||||
|
||||
* Zlatko Calusic, Tomislav Vujec and Drazen Kacar--feature
|
||||
suggestions and "philosophical" discussions.
|
||||
|
||||
* Darko Budor--initial port to Windows.
|
||||
|
||||
* Antonio Rosella--help and suggestions, plus the Italian
|
||||
translation.
|
||||
|
||||
* Tomislav Petrovic, Mario Mikocevic--many bug reports and
|
||||
suggestions.
|
||||
|
||||
* Francois Pinard--many thorough bug reports and discussions.
|
||||
|
||||
* Karl Eichwalder--lots of help with internationalization and other
|
||||
things.
|
||||
|
||||
* Junio Hamano--donated support for Opie and HTTP `Digest'
|
||||
authentication.
|
||||
|
||||
* Brian Gough--a generous donation.
|
||||
|
||||
The following people have provided patches, bug/build reports, useful
|
||||
suggestions, beta testing services, fan mail and all the other things
|
||||
that make maintenance so much fun:
|
||||
|
||||
Tim Adam, Adrian Aichner, Martin Baehr, Dieter Baron, Roger Beeman
|
||||
and the Gurus at Cisco, Dan Berger, Mark Boyns, John Burden, Wanderlei
|
||||
Cavassin, Gilles Cedoc, Tim Charron, Noel Cragg, Kristijan Conkas, John
|
||||
Daily, Andrew Davison, Andrew Deryabin, Ulrich Drepper, Marc Duponcheel,
|
||||
Damir Dzeko, Aleksandar Erkalovic, Andy Eskilsson, Masashi Fujita,
|
||||
Howard Gayle, Marcel Gerrits, Hans Grobler, Mathieu Guillaume, Dan
|
||||
Harkless, Heiko Herold, Karl Heuer, HIROSE Masaaki, Gregor Hoffleit,
|
||||
Erik Magnus Hulthen, Richard Huveneers, Simon Josefsson, Mario Juric,
|
||||
Const Kaplinsky, Goran Kezunovic, Robert Kleine, Fila Kolodny,
|
||||
Alexander Kourakos, Martin Kraemer, Simos KSenitellis, Hrvoje Lacko,
|
||||
Daniel S. Lewart, Dave Love, Alexander V. Lukyanov, Jordan Mendelson,
|
||||
Lin Zhe Min, Simon Munton, Charlie Negyesi, R. K. Owen, Andrew Pollock,
|
||||
Steve Pothier, Jan Prikryl, Marin Purgar, Keith Refson, Tyler Riddle,
|
||||
Tobias Ringstrom, Juan Jose Rodrigues, Edward J. Sabol, Heinz Salzmann,
|
||||
Robert Schmidt, Andreas Schwab, Toomas Soome, Tage Stabell-Kulo, Sven
|
||||
Sternberger, Markus Strasser, Szakacsits Szabolcs, Mike Thomas, Russell
|
||||
Vincent, Charles G Waldman, Douglas E. Wegscheid, Jasmin Zainul, Bojan
|
||||
Zdrnja, Kristijan Zimmer.
|
||||
|
||||
Apologies to all who I accidentally left out, and many thanks to all
|
||||
the subscribers of the Wget mailing list.
|
||||
|
||||
|
||||
File: wget.info, Node: Copying, Next: Concept Index, Prev: Appendices, Up: Top
|
||||
|
||||
Copying
|
||||
*******
|
||||
|
||||
Wget is "free software", where "free" refers to liberty, not price.
|
||||
The exact legal distribution terms follow below, but in short, it means
|
||||
that you have the right (freedom) to run and change and copy Wget, and
|
||||
even--if you want--charge money for any of those things. The sole
|
||||
restriction is that you have to grant your recipients the same rights.
|
||||
|
||||
This method of licensing software is also known as "open-source",
|
||||
because it requires that the recipients always receive a program's
|
||||
source code along with the program.
|
||||
|
||||
More specifically:
|
||||
|
||||
This program is free software; you can redistribute it and/or
|
||||
modify it under the terms of the GNU General Public License as
|
||||
published by the Free Software Foundation; either version 2 of the
|
||||
License, or (at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful, but
|
||||
WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
|
||||
In addition to this, this manual is free in the same sense:
|
||||
|
||||
Permission is granted to copy, distribute and/or modify this
|
||||
document under the terms of the GNU Free Documentation License,
|
||||
Version 1.1 or any later version published by the Free Software
|
||||
Foundation; with the Invariant Sections being "GNU General Public
|
||||
License" and "GNU Free Documentation License", with no Front-Cover
|
||||
Texts, and with no Back-Cover Texts. A copy of the license is
|
||||
included in the section entitled "GNU Free Documentation License".
|
||||
|
||||
The full texts of the GNU General Public License and of the GNU Free
|
||||
Documentation License are available below.
|
||||
|
||||
* Menu:
|
||||
|
||||
* GNU General Public License::
|
||||
* GNU Free Documentation License::
|
||||
|
||||
|
486
doc/wget.info-3
486
doc/wget.info-3
@ -1,5 +1,4 @@
|
||||
This is Info file wget.info, produced by Makeinfo version 1.68 from the
|
||||
input file ./wget.texi.
|
||||
This is wget.info, produced by makeinfo version 4.0 from wget.texi.
|
||||
|
||||
INFO-DIR-SECTION Net Utilities
|
||||
INFO-DIR-SECTION World Wide Web
|
||||
@ -16,85 +15,19 @@ data.
|
||||
manual provided the copyright notice and this permission notice are
|
||||
preserved on all copies.
|
||||
|
||||
Permission is granted to copy and distribute modified versions of
|
||||
this manual under the conditions for verbatim copying, provided also
|
||||
that the sections entitled "Copying" and "GNU General Public License"
|
||||
are included exactly as in the original, and provided that the entire
|
||||
resulting derived work is distributed under the terms of a permission
|
||||
notice identical to this one.
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.1 or
|
||||
any later version published by the Free Software Foundation; with the
|
||||
Invariant Sections being "GNU General Public License" and "GNU Free
|
||||
Documentation License", with no Front-Cover Texts, and with no
|
||||
Back-Cover Texts. A copy of the license is included in the section
|
||||
entitled "GNU Free Documentation License".
|
||||
|
||||
|
||||
File: wget.info, Node: Contributors, Prev: Security Considerations, Up: Appendices
|
||||
File: wget.info, Node: GNU General Public License, Next: GNU Free Documentation License, Prev: Copying, Up: Copying
|
||||
|
||||
Contributors
|
||||
============
|
||||
|
||||
GNU Wget was written by Hrvoje Niksic <hniksic@arsdigita.com>.
|
||||
However, its development could never have gone as far as it has, were it
|
||||
not for the help of many people, either with bug reports, feature
|
||||
proposals, patches, or letters saying "Thanks!".
|
||||
|
||||
Special thanks goes to the following people (no particular order):
|
||||
|
||||
* Karsten Thygesen--donated the mailing list and the initial FTP
|
||||
space.
|
||||
|
||||
* Shawn McHorse--bug reports and patches.
|
||||
|
||||
* Kaveh R. Ghazi--on-the-fly `ansi2knr'-ization.
|
||||
|
||||
* Gordon Matzigkeit--`.netrc' support.
|
||||
|
||||
* Zlatko Calusic, Tomislav Vujec and Drazen Kacar--feature
|
||||
suggestions and "philosophical" discussions.
|
||||
|
||||
* Darko Budor--initial port to Windows.
|
||||
|
||||
* Antonio Rosella--help and suggestions, plus the Italian
|
||||
translation.
|
||||
|
||||
* Tomislav Petrovic, Mario Mikocevic--many bug reports and
|
||||
suggestions.
|
||||
|
||||
* Francois Pinard--many thorough bug reports and discussions.
|
||||
|
||||
* Karl Eichwalder--lots of help with internationalization and other
|
||||
things.
|
||||
|
||||
* Junio Hamano--donated support for Opie and HTTP `Digest'
|
||||
authentication.
|
||||
|
||||
* Brian Gough--a generous donation.
|
||||
|
||||
The following people have provided patches, bug/build reports, useful
|
||||
suggestions, beta testing services, fan mail and all the other things
|
||||
that make maintenance so much fun:
|
||||
|
||||
Tim Adam, Martin Baehr, Dieter Baron, Roger Beeman and the Gurus at
|
||||
Cisco, Dan Berger, Mark Boyns, John Burden, Wanderlei Cavassin, Gilles
|
||||
Cedoc, Tim Charron, Noel Cragg, Kristijan Conkas, Andrew Deryabin,
|
||||
Damir Dzeko, Andrew Davison, Ulrich Drepper, Marc Duponcheel,
|
||||
Aleksandar Erkalovic, Andy Eskilsson, Masashi Fujita, Howard Gayle,
|
||||
Marcel Gerrits, Hans Grobler, Mathieu Guillaume, Dan Harkless, Heiko
|
||||
Herold, Karl Heuer, HIROSE Masaaki, Gregor Hoffleit, Erik Magnus
|
||||
Hulthen, Richard Huveneers, Simon Josefsson, Mario Juric, Goran
|
||||
Kezunovic, Robert Kleine, Fila Kolodny, Alexander Kourakos, Martin
|
||||
Kraemer, Simos KSenitellis, Hrvoje Lacko, Daniel S. Lewart, Dave Love,
|
||||
Jordan Mendelson, Lin Zhe Min, Charlie Negyesi, Andrew Pollock, Steve
|
||||
Pothier, Jan Prikryl, Marin Purgar, Keith Refson, Tobias Ringstrom,
|
||||
Juan Jose Rodrigues, Edward J. Sabol, Heinz Salzmann, Robert Schmidt,
|
||||
Toomas Soome, Tage Stabell-Kulo, Sven Sternberger, Markus Strasser,
|
||||
Szakacsits Szabolcs, Mike Thomas, Russell Vincent, Charles G Waldman,
|
||||
Douglas E. Wegscheid, Jasmin Zainul, Bojan Zdrnja, Kristijan Zimmer.
|
||||
|
||||
Apologies to all who I accidentally left out, and many thanks to all
|
||||
the subscribers of the Wget mailing list.
|
||||
|
||||
|
||||
File: wget.info, Node: Copying, Next: Concept Index, Prev: Appendices, Up: Top
|
||||
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
**************************
|
||||
GNU General Public License
|
||||
==========================
|
||||
|
||||
Version 2, June 1991
|
||||
|
||||
@ -454,6 +387,391 @@ library, you may consider it more useful to permit linking proprietary
|
||||
applications with the library. If this is what you want to do, use the
|
||||
GNU Library General Public License instead of this License.
|
||||
|
||||
|
||||
File: wget.info, Node: GNU Free Documentation License, Prev: GNU General Public License, Up: Copying
|
||||
|
||||
GNU Free Documentation License
|
||||
==============================
|
||||
|
||||
Version 1.1, March 2000
|
||||
|
||||
Copyright (C) 2000 Free Software Foundation, Inc.
|
||||
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
||||
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
|
||||
|
||||
0. PREAMBLE
|
||||
|
||||
The purpose of this License is to make a manual, textbook, or other
|
||||
written document "free" in the sense of freedom: to assure everyone
|
||||
the effective freedom to copy and redistribute it, with or without
|
||||
modifying it, either commercially or noncommercially. Secondarily,
|
||||
this License preserves for the author and publisher a way to get
|
||||
credit for their work, while not being considered responsible for
|
||||
modifications made by others.
|
||||
|
||||
This License is a kind of "copyleft", which means that derivative
|
||||
works of the document must themselves be free in the same sense.
|
||||
It complements the GNU General Public License, which is a copyleft
|
||||
license designed for free software.
|
||||
|
||||
We have designed this License in order to use it for manuals for
|
||||
free software, because free software needs free documentation: a
|
||||
free program should come with manuals providing the same freedoms
|
||||
that the software does. But this License is not limited to
|
||||
software manuals; it can be used for any textual work, regardless
|
||||
of subject matter or whether it is published as a printed book.
|
||||
We recommend this License principally for works whose purpose is
|
||||
instruction or reference.
|
||||
|
||||
|
||||
1. APPLICABILITY AND DEFINITIONS
|
||||
|
||||
This License applies to any manual or other work that contains a
|
||||
notice placed by the copyright holder saying it can be distributed
|
||||
under the terms of this License. The "Document", below, refers to
|
||||
any such manual or work. Any member of the public is a licensee,
|
||||
and is addressed as "you".
|
||||
|
||||
A "Modified Version" of the Document means any work containing the
|
||||
Document or a portion of it, either copied verbatim, or with
|
||||
modifications and/or translated into another language.
|
||||
|
||||
A "Secondary Section" is a named appendix or a front-matter
|
||||
section of the Document that deals exclusively with the
|
||||
relationship of the publishers or authors of the Document to the
|
||||
Document's overall subject (or to related matters) and contains
|
||||
nothing that could fall directly within that overall subject.
|
||||
(For example, if the Document is in part a textbook of
|
||||
mathematics, a Secondary Section may not explain any mathematics.)
|
||||
The relationship could be a matter of historical connection with
|
||||
the subject or with related matters, or of legal, commercial,
|
||||
philosophical, ethical or political position regarding them.
|
||||
|
||||
The "Invariant Sections" are certain Secondary Sections whose
|
||||
titles are designated, as being those of Invariant Sections, in
|
||||
the notice that says that the Document is released under this
|
||||
License.
|
||||
|
||||
The "Cover Texts" are certain short passages of text that are
|
||||
listed, as Front-Cover Texts or Back-Cover Texts, in the notice
|
||||
that says that the Document is released under this License.
|
||||
|
||||
A "Transparent" copy of the Document means a machine-readable copy,
|
||||
represented in a format whose specification is available to the
|
||||
general public, whose contents can be viewed and edited directly
|
||||
and straightforwardly with generic text editors or (for images
|
||||
composed of pixels) generic paint programs or (for drawings) some
|
||||
widely available drawing editor, and that is suitable for input to
|
||||
text formatters or for automatic translation to a variety of
|
||||
formats suitable for input to text formatters. A copy made in an
|
||||
otherwise Transparent file format whose markup has been designed
|
||||
to thwart or discourage subsequent modification by readers is not
|
||||
Transparent. A copy that is not "Transparent" is called "Opaque".
|
||||
|
||||
Examples of suitable formats for Transparent copies include plain
|
||||
ASCII without markup, Texinfo input format, LaTeX input format,
|
||||
SGML or XML using a publicly available DTD, and
|
||||
standard-conforming simple HTML designed for human modification.
|
||||
Opaque formats include PostScript, PDF, proprietary formats that
|
||||
can be read and edited only by proprietary word processors, SGML
|
||||
or XML for which the DTD and/or processing tools are not generally
|
||||
available, and the machine-generated HTML produced by some word
|
||||
processors for output purposes only.
|
||||
|
||||
The "Title Page" means, for a printed book, the title page itself,
|
||||
plus such following pages as are needed to hold, legibly, the
|
||||
material this License requires to appear in the title page. For
|
||||
works in formats which do not have any title page as such, "Title
|
||||
Page" means the text near the most prominent appearance of the
|
||||
work's title, preceding the beginning of the body of the text.
|
||||
|
||||
|
||||
2. VERBATIM COPYING
|
||||
|
||||
You may copy and distribute the Document in any medium, either
|
||||
commercially or noncommercially, provided that this License, the
|
||||
copyright notices, and the license notice saying this License
|
||||
applies to the Document are reproduced in all copies, and that you
|
||||
add no other conditions whatsoever to those of this License. You
|
||||
may not use technical measures to obstruct or control the reading
|
||||
or further copying of the copies you make or distribute. However,
|
||||
you may accept compensation in exchange for copies. If you
|
||||
distribute a large enough number of copies you must also follow
|
||||
the conditions in section 3.
|
||||
|
||||
You may also lend copies, under the same conditions stated above,
|
||||
and you may publicly display copies.
|
||||
|
||||
|
||||
3. COPYING IN QUANTITY
|
||||
|
||||
If you publish printed copies of the Document numbering more than
|
||||
100, and the Document's license notice requires Cover Texts, you
|
||||
must enclose the copies in covers that carry, clearly and legibly,
|
||||
all these Cover Texts: Front-Cover Texts on the front cover, and
|
||||
Back-Cover Texts on the back cover. Both covers must also clearly
|
||||
and legibly identify you as the publisher of these copies. The
|
||||
front cover must present the full title with all words of the
|
||||
title equally prominent and visible. You may add other material
|
||||
on the covers in addition. Copying with changes limited to the
|
||||
covers, as long as they preserve the title of the Document and
|
||||
satisfy these conditions, can be treated as verbatim copying in
|
||||
other respects.
|
||||
|
||||
If the required texts for either cover are too voluminous to fit
|
||||
legibly, you should put the first ones listed (as many as fit
|
||||
reasonably) on the actual cover, and continue the rest onto
|
||||
adjacent pages.
|
||||
|
||||
If you publish or distribute Opaque copies of the Document
|
||||
numbering more than 100, you must either include a
|
||||
machine-readable Transparent copy along with each Opaque copy, or
|
||||
state in or with each Opaque copy a publicly-accessible
|
||||
computer-network location containing a complete Transparent copy
|
||||
of the Document, free of added material, which the general
|
||||
network-using public has access to download anonymously at no
|
||||
charge using public-standard network protocols. If you use the
|
||||
latter option, you must take reasonably prudent steps, when you
|
||||
begin distribution of Opaque copies in quantity, to ensure that
|
||||
this Transparent copy will remain thus accessible at the stated
|
||||
location until at least one year after the last time you
|
||||
distribute an Opaque copy (directly or through your agents or
|
||||
retailers) of that edition to the public.
|
||||
|
||||
It is requested, but not required, that you contact the authors of
|
||||
the Document well before redistributing any large number of
|
||||
copies, to give them a chance to provide you with an updated
|
||||
version of the Document.
|
||||
|
||||
|
||||
4. MODIFICATIONS
|
||||
|
||||
You may copy and distribute a Modified Version of the Document
|
||||
under the conditions of sections 2 and 3 above, provided that you
|
||||
release the Modified Version under precisely this License, with
|
||||
the Modified Version filling the role of the Document, thus
|
||||
licensing distribution and modification of the Modified Version to
|
||||
whoever possesses a copy of it. In addition, you must do these
|
||||
things in the Modified Version:
|
||||
|
||||
A. Use in the Title Page (and on the covers, if any) a title
|
||||
distinct from that of the Document, and from those of previous
|
||||
versions (which should, if there were any, be listed in the
|
||||
History section of the Document). You may use the same title
|
||||
as a previous version if the original publisher of that version
|
||||
gives permission.
|
||||
B. List on the Title Page, as authors, one or more persons or
|
||||
entities responsible for authorship of the modifications in the
|
||||
Modified Version, together with at least five of the principal
|
||||
authors of the Document (all of its principal authors, if it
|
||||
has less than five).
|
||||
C. State on the Title page the name of the publisher of the
|
||||
Modified Version, as the publisher.
|
||||
D. Preserve all the copyright notices of the Document.
|
||||
E. Add an appropriate copyright notice for your modifications
|
||||
adjacent to the other copyright notices.
|
||||
F. Include, immediately after the copyright notices, a license
|
||||
notice giving the public permission to use the Modified Version
|
||||
under the terms of this License, in the form shown in the
|
||||
Addendum below.
|
||||
G. Preserve in that license notice the full lists of Invariant
|
||||
Sections and required Cover Texts given in the Document's
|
||||
license notice.
|
||||
H. Include an unaltered copy of this License.
|
||||
I. Preserve the section entitled "History", and its title, and add
|
||||
to it an item stating at least the title, year, new authors, and
|
||||
publisher of the Modified Version as given on the Title Page.
|
||||
If there is no section entitled "History" in the Document,
|
||||
create one stating the title, year, authors, and publisher of
|
||||
the Document as given on its Title Page, then add an item
|
||||
describing the Modified Version as stated in the previous
|
||||
sentence.
|
||||
J. Preserve the network location, if any, given in the Document for
|
||||
public access to a Transparent copy of the Document, and
|
||||
likewise the network locations given in the Document for
|
||||
previous versions it was based on. These may be placed in the
|
||||
"History" section. You may omit a network location for a work
|
||||
that was published at least four years before the Document
|
||||
itself, or if the original publisher of the version it refers
|
||||
to gives permission.
|
||||
K. In any section entitled "Acknowledgements" or "Dedications",
|
||||
preserve the section's title, and preserve in the section all the
|
||||
substance and tone of each of the contributor acknowledgements
|
||||
and/or dedications given therein.
|
||||
L. Preserve all the Invariant Sections of the Document,
|
||||
unaltered in their text and in their titles. Section numbers
|
||||
or the equivalent are not considered part of the section titles.
|
||||
M. Delete any section entitled "Endorsements". Such a section
|
||||
may not be included in the Modified Version.
|
||||
N. Do not retitle any existing section as "Endorsements" or to
|
||||
conflict in title with any Invariant Section.
|
||||
|
||||
If the Modified Version includes new front-matter sections or
|
||||
appendices that qualify as Secondary Sections and contain no
|
||||
material copied from the Document, you may at your option
|
||||
designate some or all of these sections as invariant. To do this,
|
||||
add their titles to the list of Invariant Sections in the Modified
|
||||
Version's license notice. These titles must be distinct from any
|
||||
other section titles.
|
||||
|
||||
You may add a section entitled "Endorsements", provided it contains
|
||||
nothing but endorsements of your Modified Version by various
|
||||
parties-for example, statements of peer review or that the text has
|
||||
been approved by an organization as the authoritative definition
|
||||
of a standard.
|
||||
|
||||
You may add a passage of up to five words as a Front-Cover Text,
|
||||
and a passage of up to 25 words as a Back-Cover Text, to the end
|
||||
of the list of Cover Texts in the Modified Version. Only one
|
||||
passage of Front-Cover Text and one of Back-Cover Text may be
|
||||
added by (or through arrangements made by) any one entity. If the
|
||||
Document already includes a cover text for the same cover,
|
||||
previously added by you or by arrangement made by the same entity
|
||||
you are acting on behalf of, you may not add another; but you may
|
||||
replace the old one, on explicit permission from the previous
|
||||
publisher that added the old one.
|
||||
|
||||
The author(s) and publisher(s) of the Document do not by this
|
||||
License give permission to use their names for publicity for or to
|
||||
assert or imply endorsement of any Modified Version.
|
||||
|
||||
|
||||
5. COMBINING DOCUMENTS
|
||||
|
||||
You may combine the Document with other documents released under
|
||||
this License, under the terms defined in section 4 above for
|
||||
modified versions, provided that you include in the combination
|
||||
all of the Invariant Sections of all of the original documents,
|
||||
unmodified, and list them all as Invariant Sections of your
|
||||
combined work in its license notice.
|
||||
|
||||
The combined work need only contain one copy of this License, and
|
||||
multiple identical Invariant Sections may be replaced with a single
|
||||
copy. If there are multiple Invariant Sections with the same name
|
||||
but different contents, make the title of each such section unique
|
||||
by adding at the end of it, in parentheses, the name of the
|
||||
original author or publisher of that section if known, or else a
|
||||
unique number. Make the same adjustment to the section titles in
|
||||
the list of Invariant Sections in the license notice of the
|
||||
combined work.
|
||||
|
||||
In the combination, you must combine any sections entitled
|
||||
"History" in the various original documents, forming one section
|
||||
entitled "History"; likewise combine any sections entitled
|
||||
"Acknowledgements", and any sections entitled "Dedications". You
|
||||
must delete all sections entitled "Endorsements."
|
||||
|
||||
|
||||
6. COLLECTIONS OF DOCUMENTS
|
||||
|
||||
You may make a collection consisting of the Document and other
|
||||
documents released under this License, and replace the individual
|
||||
copies of this License in the various documents with a single copy
|
||||
that is included in the collection, provided that you follow the
|
||||
rules of this License for verbatim copying of each of the
|
||||
documents in all other respects.
|
||||
|
||||
You may extract a single document from such a collection, and
|
||||
distribute it individually under this License, provided you insert
|
||||
a copy of this License into the extracted document, and follow
|
||||
this License in all other respects regarding verbatim copying of
|
||||
that document.
|
||||
|
||||
|
||||
7. AGGREGATION WITH INDEPENDENT WORKS
|
||||
|
||||
A compilation of the Document or its derivatives with other
|
||||
separate and independent documents or works, in or on a volume of
|
||||
a storage or distribution medium, does not as a whole count as a
|
||||
Modified Version of the Document, provided no compilation
|
||||
copyright is claimed for the compilation. Such a compilation is
|
||||
called an "aggregate", and this License does not apply to the
|
||||
other self-contained works thus compiled with the Document, on
|
||||
account of their being thus compiled, if they are not themselves
|
||||
derivative works of the Document.
|
||||
|
||||
If the Cover Text requirement of section 3 is applicable to these
|
||||
copies of the Document, then if the Document is less than one
|
||||
quarter of the entire aggregate, the Document's Cover Texts may be
|
||||
placed on covers that surround only the Document within the
|
||||
aggregate. Otherwise they must appear on covers around the whole
|
||||
aggregate.
|
||||
|
||||
|
||||
8. TRANSLATION
|
||||
|
||||
Translation is considered a kind of modification, so you may
|
||||
distribute translations of the Document under the terms of section
|
||||
4. Replacing Invariant Sections with translations requires special
|
||||
permission from their copyright holders, but you may include
|
||||
translations of some or all Invariant Sections in addition to the
|
||||
original versions of these Invariant Sections. You may include a
|
||||
translation of this License provided that you also include the
|
||||
original English version of this License. In case of a
|
||||
disagreement between the translation and the original English
|
||||
version of this License, the original English version will prevail.
|
||||
|
||||
|
||||
9. TERMINATION
|
||||
|
||||
You may not copy, modify, sublicense, or distribute the Document
|
||||
except as expressly provided for under this License. Any other
|
||||
attempt to copy, modify, sublicense or distribute the Document is
|
||||
void, and will automatically terminate your rights under this
|
||||
License. However, parties who have received copies, or rights,
|
||||
from you under this License will not have their licenses
|
||||
terminated so long as such parties remain in full compliance.
|
||||
|
||||
|
||||
10. FUTURE REVISIONS OF THIS LICENSE
|
||||
|
||||
The Free Software Foundation may publish new, revised versions of
|
||||
the GNU Free Documentation License from time to time. Such new
|
||||
versions will be similar in spirit to the present version, but may
|
||||
differ in detail to address new problems or concerns. See
|
||||
http://www.gnu.org/copyleft/.
|
||||
|
||||
Each version of the License is given a distinguishing version
|
||||
number. If the Document specifies that a particular numbered
|
||||
version of this License "or any later version" applies to it, you
|
||||
have the option of following the terms and conditions either of
|
||||
that specified version or of any later version that has been
|
||||
published (not as a draft) by the Free Software Foundation. If
|
||||
the Document does not specify a version number of this License,
|
||||
you may choose any version ever published (not as a draft) by the
|
||||
Free Software Foundation.
|
||||
|
||||
|
||||
ADDENDUM: How to use this License for your documents
|
||||
====================================================
|
||||
|
||||
To use this License in a document you have written, include a copy of
|
||||
the License in the document and put the following copyright and license
|
||||
notices just after the title page:
|
||||
|
||||
|
||||
Copyright (C) YEAR YOUR NAME.
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.1
|
||||
or any later version published by the Free Software Foundation;
|
||||
with the Invariant Sections being LIST THEIR TITLES, with the
|
||||
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
|
||||
A copy of the license is included in the section entitled ``GNU
|
||||
Free Documentation License''.
|
||||
If you have no Invariant Sections, write "with no Invariant
|
||||
Sections" instead of saying which ones are invariant. If you have no
|
||||
Front-Cover Texts, write "no Front-Cover Texts" instead of "Front-Cover
|
||||
Texts being LIST"; likewise for Back-Cover Texts.
|
||||
|
||||
If your document contains nontrivial examples of program code, we
|
||||
recommend releasing these examples in parallel under your choice of
|
||||
free software license, such as the GNU General Public License, to
|
||||
permit their use in free software.
|
||||
|
||||
|
||||
File: wget.info, Node: Concept Index, Prev: Copying, Up: Top
|
||||
|
||||
@ -507,6 +825,7 @@ Concept Index
|
||||
* following links: Following Links.
|
||||
* force html: Logging and Input File Options.
|
||||
* ftp time-stamping: FTP Time-Stamping Internals.
|
||||
* GFDL: Copying.
|
||||
* globbing, toggle: FTP Options.
|
||||
* GPL: Copying.
|
||||
* hangup: Signals.
|
||||
@ -532,14 +851,9 @@ Concept Index
|
||||
* mailing list: Mailing List.
|
||||
* mirroring: Guru Usage.
|
||||
* no parent: Directory-Based Limits.
|
||||
* no warranty: Copying.
|
||||
* no warranty: GNU General Public License.
|
||||
* no-clobber: Download Options.
|
||||
* nohup: Invoking.
|
||||
* norobots disallow: Disallow Field.
|
||||
* norobots examples: Norobots Examples.
|
||||
* norobots format: RES Format.
|
||||
* norobots introduction: Introduction to RES.
|
||||
* norobots user-agent: User-Agent Field.
|
||||
* number of retries: Download Options.
|
||||
* operating systems: Portability.
|
||||
* option syntax: Option Syntax.
|
||||
@ -550,8 +864,8 @@ Concept Index
|
||||
* pause: Download Options.
|
||||
* portability: Portability.
|
||||
* proxies: Proxies.
|
||||
* proxy <1>: Download Options.
|
||||
* proxy: HTTP Options.
|
||||
* proxy <1>: HTTP Options.
|
||||
* proxy: Download Options.
|
||||
* proxy authentication: HTTP Options.
|
||||
* proxy filling: Recursive Retrieval Options.
|
||||
* proxy password: HTTP Options.
|
||||
|
@ -42,10 +42,11 @@ notice identical to this one except for the removal of this paragraph
|
||||
@end ignore
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.1 or
|
||||
any later version published by the Free Software Foundation; with no
|
||||
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
|
||||
Texts. A copy of the license is included in the section entitled ``GNU
|
||||
Free Documentation License''.
|
||||
any later version published by the Free Software Foundation; with the
|
||||
Invariant Sections being ``GNU General Public License'' and ``GNU Free
|
||||
Documentation License'', with no Front-Cover Texts, and with no
|
||||
Back-Cover Texts. A copy of the license is included in the section
|
||||
entitled ``GNU Free Documentation License''.
|
||||
@end ifinfo
|
||||
|
||||
@titlepage
|
||||
@ -60,10 +61,11 @@ Copyright @copyright{} 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
|
||||
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.1 or
|
||||
any later version published by the Free Software Foundation; with no
|
||||
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
|
||||
Texts. A copy of the license is included in the section entitled ``GNU
|
||||
Free Documentation License''.
|
||||
any later version published by the Free Software Foundation; with the
|
||||
Invariant Sections being ``GNU General Public License'' and ``GNU Free
|
||||
Documentation License'', with no Front-Cover Texts, and with no
|
||||
Back-Cover Texts. A copy of the license is included in the section
|
||||
entitled ``GNU Free Documentation License''.
|
||||
@end titlepage
|
||||
|
||||
@ifinfo
|
||||
@ -2485,10 +2487,26 @@ This chapter contains some references I consider useful.
|
||||
@cindex robots.txt
|
||||
@cindex server maintenance
|
||||
|
||||
Since Wget is able to traverse the web, it counts as one of the Web
|
||||
@dfn{robots}. Thus Wget understands @dfn{Robots Exclusion Standard}
|
||||
(@sc{res})---contents of @file{/robots.txt}, used by server
|
||||
administrators to shield parts of their systems from wanderings of Wget.
|
||||
It is extremely easy to make Wget wander aimlessly around a web site,
|
||||
sucking all the available data in progress. @samp{wget -r @var{site}},
|
||||
and you're set. Great? Not for the server admin.
|
||||
|
||||
While Wget is retrieving static pages, there's not much of a problem.
|
||||
But for Wget, there is no real difference between the smallest static
|
||||
page and the hardest, most demanding CGI or dynamic page. For instance,
|
||||
a site I know has a section handled by an, uh, bitchin' CGI script that
|
||||
converts all the Info files to HTML. The script can and does bring the
|
||||
machine to its knees without providing anything useful to the
|
||||
downloader.
|
||||
|
||||
For such and similar cases various robot exclusion schemes have been
|
||||
devised as a means for the server administrators and document authors to
|
||||
protect chosen portions of their sites from the wandering of robots.
|
||||
|
||||
The more popular mechanism is the @dfn{Robots Exclusion Standard}
|
||||
written by Martijn Koster et al. in 1994. It is specified by placing a
|
||||
file named @file{/robots.txt} in the server root, which the robots are
|
||||
supposed to download and parse. Wget supports this specification.
|
||||
|
||||
Norobots support is turned on only when retrieving recursively, and
|
||||
@emph{never} for the first page. Thus, you may issue:
|
||||
@ -2500,8 +2518,7 @@ wget -r http://fly.srk.fer.hr/
|
||||
First the index of fly.srk.fer.hr will be downloaded. If Wget finds
|
||||
anything worth downloading on the same host, only @emph{then} will it
|
||||
load the robots, and decide whether or not to load the links after all.
|
||||
@file{/robots.txt} is loaded only once per host. Wget does not support
|
||||
the robots @code{META} tag.
|
||||
@file{/robots.txt} is loaded only once per host.
|
||||
|
||||
Note that the exlusion standard discussed here has undergone some
|
||||
revisions. However, but Wget supports only the first version of
|
||||
@ -2517,6 +2534,20 @@ but we plan to add them.
|
||||
|
||||
This manual no longer includes the text of the old standard.
|
||||
|
||||
The second, less known mechanism, enables the author of an individual
|
||||
document to specify whether they want the links from the file to be
|
||||
followed by a robot. This is achieved using the @code{META} tag, like
|
||||
this:
|
||||
|
||||
@example
|
||||
<meta name="robots" content="nofollow">
|
||||
@end example
|
||||
|
||||
This is explained in some detail at
|
||||
@url{http://info.webcrawler.com/mak/projects/robots/meta-user.html}.
|
||||
Unfortunately, Wget does not support this method of robot exclusion yet,
|
||||
but it will be implemented in the next release.
|
||||
|
||||
@node Security Considerations, Contributors, Robots, Appendices
|
||||
@section Security Considerations
|
||||
@cindex security
|
||||
@ -2789,10 +2820,11 @@ In addition to this, this manual is free in the same sense:
|
||||
@quotation
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.1 or
|
||||
any later version published by the Free Software Foundation; with no
|
||||
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
|
||||
Texts. A copy of the license is included in the section entitled ``GNU
|
||||
Free Documentation License''.
|
||||
any later version published by the Free Software Foundation; with the
|
||||
Invariant Sections being ``GNU General Public License'' and ``GNU Free
|
||||
Documentation License'', with no Front-Cover Texts, and with no
|
||||
Back-Cover Texts. A copy of the license is included in the section
|
||||
entitled ``GNU Free Documentation License''.
|
||||
@end quotation
|
||||
|
||||
@c #### Maybe we should wrap these licenses in ifinfo? Stallman says
|
||||
|
Loading…
Reference in New Issue
Block a user