mirror of
https://github.com/moparisthebest/wget
synced 2024-07-03 16:38:41 -04:00
0a8054755c
dependencies, and distclean cleanup of this new file. * sample.wgetrc: Uncommented waitretry and set it to 10, clarified some wording, and re-wrapped some text to 71 columns due to @sample indentation in wget.texi. * wget.texi: Herold further expounded on the behavior of waitretry -- reworded docs again. Changed note saying _all_ lines in sample.wgetrc are commented out. Don't have an entire hand- cut-and-pasted copy of sample.wgetrc in this file -- use @include.
3099 lines
110 KiB
Plaintext
3099 lines
110 KiB
Plaintext
\input texinfo @c -*-texinfo-*-
|
|
|
|
@c %**start of header
|
|
@setfilename wget.info
|
|
@settitle GNU Wget Manual
|
|
@c Disable the monstrous rectangles beside overfull hbox-es.
|
|
@finalout
|
|
@c Use `odd' to print double-sided.
|
|
@setchapternewpage on
|
|
@c %**end of header
|
|
|
|
@iftex
|
|
@c Remove this if you don't use A4 paper.
|
|
@afourpaper
|
|
@end iftex
|
|
|
|
@c This should really be auto-generated!
|
|
@set VERSION 1.5.3+dev
|
|
@set UPDATED Feb 2000
|
|
|
|
@dircategory Net Utilities
|
|
@dircategory World Wide Web
|
|
@direntry
|
|
* Wget: (wget). The non-interactive network downloader.
|
|
@end direntry
|
|
|
|
@ifinfo
|
|
This file documents the the GNU Wget utility for downloading network
|
|
data.
|
|
|
|
Copyright (C) 1996, 1997, 1998, 2000 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to make and distribute verbatim copies of
|
|
this manual provided the copyright notice and this permission notice
|
|
are preserved on all copies.
|
|
|
|
@ignore
|
|
Permission is granted to process this file through TeX and print the
|
|
results, provided the printed document carries a copying permission
|
|
notice identical to this one except for the removal of this paragraph
|
|
(this paragraph not being relevant to the printed manual).
|
|
@end ignore
|
|
Permission is granted to copy and distribute modified versions of this
|
|
manual under the conditions for verbatim copying, provided also that the
|
|
sections entitled ``Copying'' and ``GNU General Public License'' are
|
|
included exactly as in the original, and provided that the entire
|
|
resulting derived work is distributed under the terms of a permission
|
|
notice identical to this one.
|
|
@end ifinfo
|
|
|
|
@titlepage
|
|
@title GNU Wget
|
|
@subtitle The noninteractive downloading utility
|
|
@subtitle Updated for Wget @value{VERSION}, @value{UPDATED}
|
|
@author by Hrvoje Nik@v{s}i@'{c} and the developers
|
|
|
|
@page
|
|
@vskip 0pt plus 1filll
|
|
Copyright @copyright{} 1996, 1997, 1998 Free Software Foundation, Inc.
|
|
|
|
Permission is granted to make and distribute verbatim copies of this
|
|
manual provided the copyright notice and this permission notice are
|
|
preserved on all copies.
|
|
|
|
Permission is granted to copy and distribute modified versions of this
|
|
manual under the conditions for verbatim copying, provided also that the
|
|
sections entitled ``Copying'' and ``GNU General Public License'' are
|
|
included exactly as in the original, and provided that the entire
|
|
resulting derived work is distributed under the terms of a permission
|
|
notice identical to this one.
|
|
|
|
Permission is granted to copy and distribute translations of this manual
|
|
into another language, under the above conditions for modified versions,
|
|
except that this permission notice may be stated in a translation
|
|
approved by the Free Software Foundation.
|
|
@end titlepage
|
|
|
|
@ifinfo
|
|
@node Top, Overview, (dir), (dir)
|
|
@top Wget @value{VERSION}
|
|
|
|
This manual documents version @value{VERSION} of GNU Wget, the freely
|
|
available utility for network download.
|
|
|
|
Copyright @copyright{} 1996, 1997, 1998 Free Software Foundation, Inc.
|
|
|
|
@menu
|
|
* Overview:: Features of Wget.
|
|
* Invoking:: Wget command-line arguments.
|
|
* Recursive Retrieval:: Description of recursive retrieval.
|
|
* Following Links:: The available methods of chasing links.
|
|
* Time-Stamping:: Mirroring according to time-stamps.
|
|
* Startup File:: Wget's initialization file.
|
|
* Examples:: Examples of usage.
|
|
* Various:: The stuff that doesn't fit anywhere else.
|
|
* Appendices:: Some useful references.
|
|
* Copying:: You may give out copies of Wget.
|
|
* Concept Index:: Topics covered by this manual.
|
|
@end menu
|
|
@end ifinfo
|
|
|
|
@node Overview, Invoking, Top, Top
|
|
@chapter Overview
|
|
@cindex overview
|
|
@cindex features
|
|
|
|
GNU Wget is a freely available network utility to retrieve files from
|
|
the World Wide Web, using @sc{http} (Hyper Text Transfer Protocol) and
|
|
@sc{ftp} (File Transfer Protocol), the two most widely used Internet
|
|
protocols. It has many useful features to make downloading easier, some
|
|
of them being:
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Wget is non-interactive, meaning that it can work in the background,
|
|
while the user is not logged on. This allows you to start a retrieval
|
|
and disconnect from the system, letting Wget finish the work. By
|
|
contrast, most of the Web browsers require constant user's presence,
|
|
which can be a great hindrance when transferring a lot of data.
|
|
|
|
@sp 1
|
|
@item
|
|
Wget is capable of descending recursively through the structure of
|
|
@sc{html} documents and @sc{ftp} directory trees, making a local copy of
|
|
the directory hierarchy similar to the one on the remote server. This
|
|
feature can be used to mirror archives and home pages, or traverse the
|
|
web in search of data, like a @sc{www} robot (@xref{Robots}). In that
|
|
spirit, Wget understands the @code{norobots} convention.
|
|
|
|
@sp 1
|
|
@item
|
|
File name wildcard matching and recursive mirroring of directories are
|
|
available when retrieving via @sc{ftp}. Wget can read the time-stamp
|
|
information given by both @sc{http} and @sc{ftp} servers, and store it
|
|
locally. Thus Wget can see if the remote file has changed since last
|
|
retrieval, and automatically retrieve the new version if it has. This
|
|
makes Wget suitable for mirroring of @sc{ftp} sites, as well as home
|
|
pages.
|
|
|
|
@sp 1
|
|
@item
|
|
Wget works exceedingly well on slow or unstable connections,
|
|
retrying the document until it is fully retrieved, or until a
|
|
user-specified retry count is surpassed. It will try to resume the
|
|
download from the point of interruption, using @code{REST} with @sc{ftp}
|
|
and @code{Range} with @sc{http} servers that support them.
|
|
|
|
@sp 1
|
|
@item
|
|
By default, Wget supports proxy servers, which can lighten the network
|
|
load, speed up retrieval and provide access behind firewalls. However,
|
|
if you are behind a firewall that requires that you use a socks style
|
|
gateway, you can get the socks library and build wget with support for
|
|
socks. Wget also supports the passive @sc{ftp} downloading as an
|
|
option.
|
|
|
|
@sp 1
|
|
@item
|
|
Builtin features offer mechanisms to tune which links you wish to follow
|
|
(@xref{Following Links}).
|
|
|
|
@sp 1
|
|
@item
|
|
The retrieval is conveniently traced with printing dots, each dot
|
|
representing a fixed amount of data received (1KB by default). These
|
|
representations can be customized to your preferences.
|
|
|
|
@sp 1
|
|
@item
|
|
Most of the features are fully configurable, either through command line
|
|
options, or via the initialization file @file{.wgetrc} (@xref{Startup
|
|
File}). Wget allows you to define @dfn{global} startup files
|
|
(@file{/usr/local/etc/wgetrc} by default) for site settings.
|
|
|
|
@sp 1
|
|
@item
|
|
Finally, GNU Wget is free software. This means that everyone may use
|
|
it, redistribute it and/or modify it under the terms of the GNU General
|
|
Public License, as published by the Free Software Foundation
|
|
(@xref{Copying}).
|
|
@end itemize
|
|
|
|
@node Invoking, Recursive Retrieval, Overview, Top
|
|
@chapter Invoking
|
|
@cindex invoking
|
|
@cindex command line
|
|
@cindex arguments
|
|
@cindex nohup
|
|
|
|
By default, Wget is very simple to invoke. The basic syntax is:
|
|
|
|
@example
|
|
wget [@var{option}]@dots{} [@var{URL}]@dots{}
|
|
@end example
|
|
|
|
Wget will simply download all the @sc{url}s specified on the command
|
|
line. @var{URL} is a @dfn{Uniform Resource Locator}, as defined below.
|
|
|
|
However, you may wish to change some of the default parameters of
|
|
Wget. You can do it two ways: permanently, adding the appropriate
|
|
command to @file{.wgetrc} (@xref{Startup File}), or specifying it on
|
|
the command line.
|
|
|
|
@menu
|
|
* URL Format::
|
|
* Option Syntax::
|
|
* Basic Startup Options::
|
|
* Logging and Input File Options::
|
|
* Download Options::
|
|
* Directory Options::
|
|
* HTTP Options::
|
|
* FTP Options::
|
|
* Recursive Retrieval Options::
|
|
* Recursive Accept/Reject Options::
|
|
@end menu
|
|
|
|
@node URL Format, Option Syntax, Invoking, Invoking
|
|
@section URL Format
|
|
@cindex URL
|
|
@cindex URL syntax
|
|
|
|
@dfn{URL} is an acronym for Uniform Resource Locator. A uniform
|
|
resource locator is a compact string representation for a resource
|
|
available via the Internet. Wget recognizes the @sc{url} syntax as per
|
|
@sc{rfc1738}. This is the most widely used form (square brackets denote
|
|
optional parts):
|
|
|
|
@example
|
|
http://host[:port]/directory/file
|
|
ftp://host[:port]/directory/file
|
|
@end example
|
|
|
|
You can also encode your username and password within a @sc{url}:
|
|
|
|
@example
|
|
ftp://user:password@@host/path
|
|
http://user:password@@host/path
|
|
@end example
|
|
|
|
Either @var{user} or @var{password}, or both, may be left out. If you
|
|
leave out either the @sc{http} username or password, no authentication
|
|
will be sent. If you leave out the @sc{ftp} username, @samp{anonymous}
|
|
will be used. If you leave out the @sc{ftp} password, your email
|
|
address will be supplied as a default password.@footnote{If you have a
|
|
@file{.netrc} file in your home directory, password will also be
|
|
searched for there.}
|
|
|
|
You can encode unsafe characters in a @sc{url} as @samp{%xy}, @code{xy}
|
|
being the hexadecimal representation of the character's @sc{ascii}
|
|
value. Some common unsafe characters include @samp{%} (quoted as
|
|
@samp{%25}), @samp{:} (quoted as @samp{%3A}), and @samp{@@} (quoted as
|
|
@samp{%40}). Refer to @sc{rfc1738} for a comprehensive list of unsafe
|
|
characters.
|
|
|
|
Wget also supports the @code{type} feature for @sc{ftp} @sc{url}s. By
|
|
default, @sc{ftp} documents are retrieved in the binary mode (type
|
|
@samp{i}), which means that they are downloaded unchanged. Another
|
|
useful mode is the @samp{a} (@dfn{ASCII}) mode, which converts the line
|
|
delimiters between the different operating systems, and is thus useful
|
|
for text files. Here is an example:
|
|
|
|
@example
|
|
ftp://host/directory/file;type=a
|
|
@end example
|
|
|
|
Two alternative variants of @sc{url} specification are also supported,
|
|
because of historical (hysterical?) reasons and their widespreaded use.
|
|
|
|
@sc{ftp}-only syntax (supported by @code{NcFTP}):
|
|
@example
|
|
host:/dir/file
|
|
@end example
|
|
|
|
@sc{http}-only syntax (introduced by @code{Netscape}):
|
|
@example
|
|
host[:port]/dir/file
|
|
@end example
|
|
|
|
These two alternative forms are deprecated, and may cease being
|
|
supported in the future.
|
|
|
|
If you do not understand the difference between these notations, or do
|
|
not know which one to use, just use the plain ordinary format you use
|
|
with your favorite browser, like @code{Lynx} or @code{Netscape}.
|
|
|
|
@node Option Syntax, Basic Startup Options, URL Format, Invoking
|
|
@section Option Syntax
|
|
@cindex option syntax
|
|
@cindex syntax of options
|
|
|
|
Since Wget uses GNU getopts to process its arguments, every option has a
|
|
short form and a long form. Long options are more convenient to
|
|
remember, but take time to type. You may freely mix different option
|
|
styles, or specify options after the command-line arguments. Thus you
|
|
may write:
|
|
|
|
@example
|
|
wget -r --tries=10 http://fly.cc.fer.hr/ -o log
|
|
@end example
|
|
|
|
The space between the option accepting an argument and the argument may
|
|
be omitted. Instead @samp{-o log} you can write @samp{-olog}.
|
|
|
|
You may put several options that do not require arguments together,
|
|
like:
|
|
|
|
@example
|
|
wget -drc @var{URL}
|
|
@end example
|
|
|
|
This is a complete equivalent of:
|
|
|
|
@example
|
|
wget -d -r -c @var{URL}
|
|
@end example
|
|
|
|
Since the options can be specified after the arguments, you may
|
|
terminate them with @samp{--}. So the following will try to download
|
|
@sc{url} @samp{-x}, reporting failure to @file{log}:
|
|
|
|
@example
|
|
wget -o log -- -x
|
|
@end example
|
|
|
|
The options that accept comma-separated lists all respect the convention
|
|
that specifying an empty list clears its value. This can be useful to
|
|
clear the @file{.wgetrc} settings. For instance, if your @file{.wgetrc}
|
|
sets @code{exclude_directories} to @file{/cgi-bin}, the following
|
|
example will first reset it, and then set it to exclude @file{/~nobody}
|
|
and @file{/~somebody}. You can also clear the lists in @file{.wgetrc}
|
|
(@xref{Wgetrc Syntax}).
|
|
|
|
@example
|
|
wget -X '' -X /~nobody,/~somebody
|
|
@end example
|
|
|
|
@node Basic Startup Options, Logging and Input File Options, Option Syntax, Invoking
|
|
@section Basic Startup Options
|
|
|
|
@table @samp
|
|
@item -V
|
|
@itemx --version
|
|
Display the version of Wget.
|
|
|
|
@item -h
|
|
@itemx --help
|
|
Print a help message describing all of Wget's command-line options.
|
|
|
|
@item -b
|
|
@itemx --background
|
|
Go to background immediately after startup. If no output file is
|
|
specified via the @samp{-o}, output is redirected to @file{wget-log}.
|
|
|
|
@cindex execute wgetrc command
|
|
@item -e @var{command}
|
|
@itemx --execute @var{command}
|
|
Execute @var{command} as if it were a part of @file{.wgetrc}
|
|
(@xref{Startup File}). A command thus invoked will be executed
|
|
@emph{after} the commands in @file{.wgetrc}, thus taking precedence over
|
|
them.
|
|
@end table
|
|
|
|
@node Logging and Input File Options, Download Options, Basic Startup Options, Invoking
|
|
@section Logging and Input File Options
|
|
|
|
@table @samp
|
|
@cindex output file
|
|
@cindex log file
|
|
@item -o @var{logfile}
|
|
@itemx --output-file=@var{logfile}
|
|
Log all messages to @var{logfile}. The messages are normally reported
|
|
to standard error.
|
|
|
|
@cindex append to log
|
|
@item -a @var{logfile}
|
|
@itemx --append-output=@var{logfile}
|
|
Append to @var{logfile}. This is the same as @samp{-o}, only it appends
|
|
to @var{logfile} instead of overwriting the old log file. If
|
|
@var{logfile} does not exist, a new file is created.
|
|
|
|
@cindex debug
|
|
@item -d
|
|
@itemx --debug
|
|
Turn on debug output, meaning various information important to the
|
|
developers of Wget if it does not work properly. Your system
|
|
administrator may have chosen to compile Wget without debug support, in
|
|
which case @samp{-d} will not work. Please note that compiling with
|
|
debug support is always safe---Wget compiled with the debug support will
|
|
@emph{not} print any debug info unless requested with @samp{-d}.
|
|
@xref{Reporting Bugs} for more information on how to use @samp{-d} for
|
|
sending bug reports.
|
|
|
|
@cindex quiet
|
|
@item -q
|
|
@itemx --quiet
|
|
Turn off Wget's output.
|
|
|
|
@cindex verbose
|
|
@item -v
|
|
@itemx --verbose
|
|
Turn on verbose output, with all the available data. The default output
|
|
is verbose.
|
|
|
|
@item -nv
|
|
@itemx --non-verbose
|
|
Non-verbose output---turn off verbose without being completely quiet
|
|
(use @samp{-q} for that), which means that error messages and basic
|
|
information still get printed.
|
|
|
|
@cindex input-file
|
|
@item -i @var{file}
|
|
@itemx --input-file=@var{file}
|
|
Read @sc{url}s from @var{file}, in which case no @sc{url}s need to be on
|
|
the command line. If there are @sc{url}s both on the command line and
|
|
in an input file, those on the command lines will be the first ones to
|
|
be retrieved. The @var{file} need not be an @sc{html} document (but no
|
|
harm if it is)---it is enough if the @sc{url}s are just listed
|
|
sequentially.
|
|
|
|
However, if you specify @samp{--force-html}, the document will be
|
|
regarded as @samp{html}. In that case you may have problems with
|
|
relative links, which you can solve either by adding @code{<base
|
|
href="@var{url}">} to the documents or by specifying
|
|
@samp{--base=@var{url}} on the command line.
|
|
|
|
@cindex force html
|
|
@item -F
|
|
@itemx --force-html
|
|
When input is read from a file, force it to be treated as an @sc{html}
|
|
file. This enables you to retrieve relative links from existing
|
|
@sc{html} files on your local disk, by adding @code{<base
|
|
href="@var{url}">} to @sc{html}, or using the @samp{--base} command-line
|
|
option.
|
|
@end table
|
|
|
|
@node Download Options, Directory Options, Logging and Input File Options, Invoking
|
|
@section Download Options
|
|
|
|
@table @samp
|
|
@cindex retries
|
|
@cindex tries
|
|
@cindex number of retries
|
|
@item -t @var{number}
|
|
@itemx --tries=@var{number}
|
|
Set number of retries to @var{number}. Specify 0 or @samp{inf} for
|
|
infinite retrying.
|
|
|
|
@item -O @var{file}
|
|
@itemx --output-document=@var{file}
|
|
The documents will not be written to the appropriate files, but all will
|
|
be concatenated together and written to @var{file}. If @var{file}
|
|
already exists, it will be overwritten. If the @var{file} is @samp{-},
|
|
the documents will be written to standard output. Including this option
|
|
automatically sets the number of tries to 1.
|
|
|
|
@cindex no-clobber
|
|
@item -nc
|
|
@itemx --no-clobber
|
|
Do not clobber existing files when saving to directory hierarchy within
|
|
recursive retrieval of several files. This option is @emph{extremely}
|
|
useful when you wish to continue where you left off with retrieval of
|
|
many files. If the files have the @samp{.html} or (yuck) @samp{.htm}
|
|
suffix, they will be loaded from the local disk, and parsed as if they
|
|
have been retrieved from the Web.
|
|
|
|
@cindex continue retrieval
|
|
@item -c
|
|
@itemx --continue
|
|
Continue getting an existing file. This is useful when you want to
|
|
finish up the download started by another program, or a previous
|
|
instance of Wget. Thus you can write:
|
|
|
|
@example
|
|
wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
|
|
@end example
|
|
|
|
If there is a file name @file{ls-lR.Z} in the current directory, Wget
|
|
will assume that it is the first portion of the remote file, and will
|
|
require the server to continue the retrieval from an offset equal to the
|
|
length of the local file.
|
|
|
|
Note that you need not specify this option if all you want is Wget to
|
|
continue retrieving where it left off when the connection is lost---Wget
|
|
does this by default. You need this option only when you want to
|
|
continue retrieval of a file already halfway retrieved, saved by another
|
|
@sc{ftp} client, or left by Wget being killed.
|
|
|
|
Without @samp{-c}, the previous example would just begin to download the
|
|
remote file to @file{ls-lR.Z.1}. The @samp{-c} option is also
|
|
applicable for @sc{http} servers that support the @code{Range} header.
|
|
|
|
@cindex dot style
|
|
@cindex retrieval tracing style
|
|
@item --dot-style=@var{style}
|
|
Set the retrieval style to @var{style}. Wget traces the retrieval of
|
|
each document by printing dots on the screen, each dot representing a
|
|
fixed amount of retrieved data. Any number of dots may be separated in
|
|
a @dfn{cluster}, to make counting easier. This option allows you to
|
|
choose one of the pre-defined styles, determining the number of bytes
|
|
represented by a dot, the number of dots in a cluster, and the number of
|
|
dots on the line.
|
|
|
|
With the @code{default} style each dot represents 1K, there are ten dots
|
|
in a cluster and 50 dots in a line. The @code{binary} style has a more
|
|
``computer''-like orientation---8K dots, 16-dots clusters and 48 dots
|
|
per line (which makes for 384K lines). The @code{mega} style is
|
|
suitable for downloading very large files---each dot represents 64K
|
|
retrieved, there are eight dots in a cluster, and 48 dots on each line
|
|
(so each line contains 3M). The @code{micro} style is exactly the
|
|
reverse; it is suitable for downloading small files, with 128-byte dots,
|
|
8 dots per cluster, and 48 dots (6K) per line.
|
|
|
|
@item -N
|
|
@itemx --timestamping
|
|
Turn on time-stamping. @xref{Time-Stamping} for details.
|
|
|
|
@cindex server response, print
|
|
@item -S
|
|
@itemx --server-response
|
|
Print the headers sent by @sc{http} servers and responses sent by
|
|
@sc{ftp} servers.
|
|
|
|
@cindex Wget as spider
|
|
@cindex spider
|
|
@item --spider
|
|
When invoked with this option, Wget will behave as a Web @dfn{spider},
|
|
which means that it will not download the pages, just check that they
|
|
are there. You can use it to check your bookmarks, e.g. with:
|
|
|
|
@example
|
|
wget --spider --force-html -i bookmarks.html
|
|
@end example
|
|
|
|
This feature needs much more work for Wget to get close to the
|
|
functionality of real @sc{www} spiders.
|
|
|
|
@cindex timeout
|
|
@item -T seconds
|
|
@itemx --timeout=@var{seconds}
|
|
Set the read timeout to @var{seconds} seconds. Whenever a network read
|
|
is issued, the file descriptor is checked for a timeout, which could
|
|
otherwise leave a pending connection (uninterrupted read). The default
|
|
timeout is 900 seconds (fifteen minutes). Setting timeout to 0 will
|
|
disable checking for timeouts.
|
|
|
|
Please do not lower the default timeout value with this option unless
|
|
you know what you are doing.
|
|
|
|
@cindex pause
|
|
@cindex wait
|
|
@item -w @var{seconds}
|
|
@itemx --wait=@var{seconds}
|
|
Wait the specified number of seconds between the retrievals. Use of
|
|
this option is recommended, as it lightens the server load by making the
|
|
requests less frequent. Instead of in seconds, the time can be
|
|
specified in minutes using the @code{m} suffix, in hours using @code{h}
|
|
suffix, or in days using @code{d} suffix.
|
|
|
|
Specifying a large value for this option is useful if the network or the
|
|
destination host is down, so that Wget can wait long enough to
|
|
reasonably expect the network error to be fixed before the retry.
|
|
|
|
@cindex retries, waiting between
|
|
@cindex waiting between retries
|
|
@item --waitretry=@var{seconds}
|
|
If you don't want Wget to wait between @emph{every} retrieval, but only
|
|
between retries of failed downloads, you can use this option. Wget will
|
|
use "linear backoff", waiting 1 second after the first failure on a
|
|
given file, then waiting 2 seconds after the second failure on that
|
|
file, up to the maximum number of @var{seconds} you specify. Therefore,
|
|
a value of 10 will actually make Wget wait up to (1 + 2 + ... + 10) = 55
|
|
seconds per file.
|
|
|
|
Note that this option is turned on by default in the global
|
|
@file{wgetrc} file.
|
|
|
|
@cindex proxy
|
|
@item -Y on/off
|
|
@itemx --proxy=on/off
|
|
Turn proxy support on or off. The proxy is on by default if the
|
|
appropriate environmental variable is defined.
|
|
|
|
@cindex quota
|
|
@item -Q @var{quota}
|
|
@itemx --quota=@var{quota}
|
|
Specify download quota for automatic retrievals. The value can be
|
|
specified in bytes (default), kilobytes (with @samp{k} suffix), or
|
|
megabytes (with @samp{m} suffix).
|
|
|
|
Note that quota will never affect downloading a single file. So if you
|
|
specify @samp{wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz}, all of the
|
|
@file{ls-lR.gz} will be downloaded. The same goes even when several
|
|
@sc{url}s are specified on the command-line. However, quota is
|
|
respected when retrieving either recursively, or from an input file.
|
|
Thus you may safely type @samp{wget -Q2m -i sites}---download will be
|
|
aborted when the quota is exceeded.
|
|
|
|
Setting quota to 0 or to @samp{inf} unlimits the download quota.
|
|
@end table
|
|
|
|
@node Directory Options, HTTP Options, Download Options, Invoking
|
|
@section Directory Options
|
|
|
|
@table @samp
|
|
@item -nd
|
|
@itemx --no-directories
|
|
Do not create a hierarchy of directories when retrieving
|
|
recursively. With this option turned on, all files will get saved to the
|
|
current directory, without clobbering (if a name shows up more than
|
|
once, the filenames will get extensions @samp{.n}).
|
|
|
|
@item -x
|
|
@itemx --force-directories
|
|
The opposite of @samp{-nd}---create a hierarchy of directories, even if
|
|
one would not have been created otherwise. E.g. @samp{wget -x
|
|
http://fly.cc.fer.hr/robots.txt} will save the downloaded file to
|
|
@file{fly.cc.fer.hr/robots.txt}.
|
|
|
|
@item -nH
|
|
@itemx --no-host-directories
|
|
Disable generation of host-prefixed directories. By default, invoking
|
|
Wget with @samp{-r http://fly.cc.fer.hr/} will create a structure of
|
|
directories beginning with @file{fly.cc.fer.hr/}. This option disables
|
|
such behavior.
|
|
|
|
@cindex cut directories
|
|
@item --cut-dirs=@var{number}
|
|
Ignore @var{number} directory components. This is useful for getting a
|
|
fine-grained control over the directory where recursive retrieval will
|
|
be saved.
|
|
|
|
Take, for example, the directory at
|
|
@samp{ftp://ftp.xemacs.org/pub/xemacs/}. If you retrieve it with
|
|
@samp{-r}, it will be saved locally under
|
|
@file{ftp.xemacs.org/pub/xemacs/}. While the @samp{-nH} option can
|
|
remove the @file{ftp.xemacs.org/} part, you are still stuck with
|
|
@file{pub/xemacs}. This is where @samp{--cut-dirs} comes in handy; it
|
|
makes Wget not ``see'' @var{number} remote directory components. Here
|
|
are several examples of how @samp{--cut-dirs} option works.
|
|
|
|
@example
|
|
@group
|
|
No options -> ftp.xemacs.org/pub/xemacs/
|
|
-nH -> pub/xemacs/
|
|
-nH --cut-dirs=1 -> xemacs/
|
|
-nH --cut-dirs=2 -> .
|
|
|
|
--cut-dirs=1 -> ftp.xemacs.org/xemacs/
|
|
...
|
|
@end group
|
|
@end example
|
|
|
|
If you just want to get rid of the directory structure, this option is
|
|
similar to a combination of @samp{-nd} and @samp{-P}. However, unlike
|
|
@samp{-nd}, @samp{--cut-dirs} does not lose with subdirectories---for
|
|
instance, with @samp{-nH --cut-dirs=1}, a @file{beta/} subdirectory will
|
|
be placed to @file{xemacs/beta}, as one would expect.
|
|
|
|
@cindex directory prefix
|
|
@item -P @var{prefix}
|
|
@itemx --directory-prefix=@var{prefix}
|
|
Set directory prefix to @var{prefix}. The @dfn{directory prefix} is the
|
|
directory where all other files and subdirectories will be saved to,
|
|
i.e. the top of the retrieval tree. The default is @samp{.} (the
|
|
current directory).
|
|
@end table
|
|
|
|
@node HTTP Options, FTP Options, Directory Options, Invoking
|
|
@section HTTP Options
|
|
|
|
@table @samp
|
|
@cindex http user
|
|
@cindex http password
|
|
@cindex authentication
|
|
@item --http-user=@var{user}
|
|
@itemx --http-passwd=@var{password}
|
|
Specify the username @var{user} and password @var{password} on an
|
|
@sc{http} server. According to the type of the challenge, Wget will
|
|
encode them using either the @code{basic} (insecure) or the
|
|
@code{digest} authentication scheme.
|
|
|
|
Another way to specify username and password is in the @sc{url} itself
|
|
(@xref{URL Format}). For more information about security issues with
|
|
Wget, @xref{Security Considerations}.
|
|
|
|
@cindex proxy
|
|
@cindex cache
|
|
@item -C on/off
|
|
@itemx --cache=on/off
|
|
When set to off, disable server-side cache. In this case, Wget will
|
|
send the remote server an appropriate directive (@samp{Pragma:
|
|
no-cache}) to get the file from the remote service, rather than
|
|
returning the cached version. This is especially useful for retrieving
|
|
and flushing out-of-date documents on proxy servers.
|
|
|
|
Caching is allowed by default.
|
|
|
|
@cindex Content-Length, ignore
|
|
@cindex ignore length
|
|
@item --ignore-length
|
|
Unfortunately, some @sc{http} servers (@sc{cgi} programs, to be more
|
|
precise) send out bogus @code{Content-Length} headers, which makes Wget
|
|
go wild, as it thinks not all the document was retrieved. You can spot
|
|
this syndrome if Wget retries getting the same document again and again,
|
|
each time claiming that the (otherwise normal) connection has closed on
|
|
the very same byte.
|
|
|
|
With this option, Wget will ignore the @code{Content-Length} header---as
|
|
if it never existed.
|
|
|
|
@cindex header, add
|
|
@item --header=@var{additional-header}
|
|
Define an @var{additional-header} to be passed to the @sc{http} servers.
|
|
Headers must contain a @samp{:} preceded by one or more non-blank
|
|
characters, and must not contain newlines.
|
|
|
|
You may define more than one additional header by specifying
|
|
@samp{--header} more than once.
|
|
|
|
@example
|
|
@group
|
|
wget --header='Accept-Charset: iso-8859-2' \
|
|
--header='Accept-Language: hr' \
|
|
http://fly.cc.fer.hr/
|
|
@end group
|
|
@end example
|
|
|
|
Specification of an empty string as the header value will clear all
|
|
previous user-defined headers.
|
|
|
|
@cindex proxy user
|
|
@cindex proxy password
|
|
@cindex proxy authentication
|
|
@item --proxy-user=@var{user}
|
|
@itemx --proxy-passwd=@var{password}
|
|
Specify the username @var{user} and password @var{password} for
|
|
authentication on a proxy server. Wget will encode them using the
|
|
@code{basic} authentication scheme.
|
|
|
|
@cindex server response, save
|
|
@item -s
|
|
@itemx --save-headers
|
|
Save the headers sent by the @sc{http} server to the file, preceding the
|
|
actual contents, with an empty line as the separator.
|
|
|
|
@cindex user-agent
|
|
@item -U @var{agent-string}
|
|
@itemx --user-agent=@var{agent-string}
|
|
Identify as @var{agent-string} to the @sc{http} server.
|
|
|
|
The @sc{http} protocol allows the clients to identify themselves using a
|
|
@code{User-Agent} header field. This enables distinguishing the
|
|
@sc{www} software, usually for statistical purposes or for tracing of
|
|
protocol violations. Wget normally identifies as
|
|
@samp{Wget/@var{version}}, @var{version} being the current version
|
|
number of Wget.
|
|
|
|
However, some sites have been known to impose the policy of tailoring
|
|
the output according to the @code{User-Agent}-supplied information.
|
|
While conceptually this is not such a bad idea, it has been abused by
|
|
servers denying information to clients other than @code{Mozilla} or
|
|
Microsoft @code{Internet Explorer}. This option allows you to change
|
|
the @code{User-Agent} line issued by Wget. Use of this option is
|
|
discouraged, unless you really know what you are doing.
|
|
|
|
@strong{NOTE} that Netscape Communications Corp. has claimed that false
|
|
transmissions of @samp{Mozilla} as the @code{User-Agent} are a copyright
|
|
infringement, which will be prosecuted. @strong{DO NOT} misrepresent
|
|
Wget as Mozilla.
|
|
@end table
|
|
|
|
@node FTP Options, Recursive Retrieval Options, HTTP Options, Invoking
|
|
@section FTP Options
|
|
|
|
@table @samp
|
|
@cindex retrieve symbolic links
|
|
@item --retr-symlinks
|
|
Retrieve symbolic links on @sc{ftp} sites as if they were plain files,
|
|
i.e. don't just create links locally.
|
|
|
|
@cindex globbing, toggle
|
|
@item -g on/off
|
|
@itemx --glob=on/off
|
|
Turn @sc{ftp} globbing on or off. Globbing means you may use the
|
|
shell-like special characters (@dfn{wildcards}), like @samp{*},
|
|
@samp{?}, @samp{[} and @samp{]} to retrieve more than one file from the
|
|
same directory at once, like:
|
|
|
|
@example
|
|
wget ftp://gnjilux.cc.fer.hr/*.msg
|
|
@end example
|
|
|
|
By default, globbing will be turned on if the @sc{url} contains a
|
|
globbing character. This option may be used to turn globbing on or off
|
|
permanently.
|
|
|
|
You may have to quote the @sc{url} to protect it from being expanded by
|
|
your shell. Globbing makes Wget look for a directory listing, which is
|
|
system-specific. This is why it currently works only with Unix @sc{ftp}
|
|
servers (and the ones emulating Unix @code{ls} output).
|
|
|
|
@cindex passive ftp
|
|
@item --passive-ftp
|
|
Use the @dfn{passive} @sc{ftp} retrieval scheme, in which the client
|
|
initiates the data connection. This is sometimes required for @sc{ftp}
|
|
to work behind firewalls.
|
|
@end table
|
|
|
|
@node Recursive Retrieval Options, Recursive Accept/Reject Options, FTP Options, Invoking
|
|
@section Recursive Retrieval Options
|
|
|
|
@table @samp
|
|
@item -r
|
|
@itemx --recursive
|
|
Turn on recursive retrieving. @xref{Recursive Retrieval} for more
|
|
details.
|
|
|
|
@item -l @var{depth}
|
|
@itemx --level=@var{depth}
|
|
Specify recursion maximum depth level @var{depth} (@xref{Recursive
|
|
Retrieval}). The default maximum depth is 5.
|
|
|
|
@cindex proxy filling
|
|
@cindex delete after retrieval
|
|
@cindex filling proxy cache
|
|
@item --delete-after
|
|
This option tells Wget to delete every single file it downloads,
|
|
@emph{after} having done so. It is useful for pre-fetching popular
|
|
pages through proxy, e.g.:
|
|
|
|
@example
|
|
wget -r -nd --delete-after http://whatever.com/~popular/page/
|
|
@end example
|
|
|
|
The @samp{-r} option is to retrieve recursively, and @samp{-nd} not to
|
|
create directories.
|
|
|
|
@cindex conversion of links
|
|
@cindex link conversion
|
|
@item -k
|
|
@itemx --convert-links
|
|
Convert the non-relative links to relative ones locally. Only the
|
|
references to the documents actually downloaded will be converted; the
|
|
rest will be left unchanged.
|
|
|
|
Note that only at the end of the download can Wget know which links have
|
|
been downloaded. Because of that, much of the work done by @samp{-k}
|
|
will be performed at the end of the downloads.
|
|
|
|
@cindex backing up converted files
|
|
@item -K
|
|
@itemx --backup-converted
|
|
When converting a file, back up the original version with a @samp{.orig}
|
|
suffix. Affects the behavior of @samp{-N} (@xref{HTTP Time-Stamping
|
|
Internals}).
|
|
|
|
@item -m
|
|
@itemx --mirror
|
|
Turn on options suitable for mirroring. This option turns on recursion
|
|
and time-stamping, sets infinite recursion depth and keeps @sc{ftp}
|
|
directory listings. It is currently equivalent to
|
|
@samp{-r -N -l inf -nr}.
|
|
|
|
@item -nr
|
|
@itemx --dont-remove-listing
|
|
Don't remove the temporary @file{.listing} files generated by @sc{ftp}
|
|
retrievals. Normally, these files contain the raw directory listings
|
|
received from @sc{ftp} servers. Not removing them can be useful to
|
|
access the full remote file list when running a mirror, or for debugging
|
|
purposes.
|
|
@end table
|
|
|
|
@node Recursive Accept/Reject Options, , Recursive Retrieval Options, Invoking
|
|
@section Recursive Accept/Reject Options
|
|
|
|
@table @samp
|
|
@item -A @var{acclist} --accept @var{acclist}
|
|
@itemx -R @var{rejlist} --reject @var{rejlist}
|
|
Specify comma-separated lists of file name suffixes or patterns to
|
|
accept or reject (@xref{Types of Files} for more details).
|
|
|
|
@item -D @var{domain-list}
|
|
@itemx --domains=@var{domain-list}
|
|
Set domains to be accepted and @sc{dns} looked-up, where
|
|
@var{domain-list} is a comma-separated list. Note that it does
|
|
@emph{not} turn on @samp{-H}. This option speeds things up, even if
|
|
only one host is spanned (@xref{Domain Acceptance}).
|
|
|
|
@item --exclude-domains @var{domain-list}
|
|
Exclude the domains given in a comma-separated @var{domain-list} from
|
|
@sc{dns}-lookup (@xref{Domain Acceptance}).
|
|
|
|
@cindex follow FTP links
|
|
@item --follow-ftp
|
|
Follow @sc{ftp} links from @sc{html} documents. Without this option,
|
|
Wget will ignore all the @sc{ftp} links.
|
|
|
|
@cindex tag-based recursive pruning
|
|
@item --follow-tags=@var{list}
|
|
Wget has an internal table of HTML tag / attribute pairs that it
|
|
considers when looking for linked documents during a recursive
|
|
retrieval. If a user wants only a subset of those tags to be
|
|
considered, however, he or she should be specify such tags in a
|
|
comma-separated @var{list} with this option.
|
|
|
|
@item -G @var{list}
|
|
@itemx --ignore-tags=@var{list}
|
|
This is the opposite of the @samp{--follow-tags} option. To skip
|
|
certain HTML tags when recursively looking for documents to download,
|
|
specify them in a comma-separated @var{list}. The author of this option
|
|
likes to use the following command to download a single HTML page and
|
|
all documents necessary to display it properly:
|
|
|
|
@example
|
|
wget -Ga,area -H -k -K -nh -r http://@var{site}/@var{document}
|
|
@end example
|
|
|
|
@item -H
|
|
@itemx --span-hosts
|
|
Enable spanning across hosts when doing recursive retrieving (@xref{All
|
|
Hosts}).
|
|
|
|
@item -L
|
|
@itemx --relative
|
|
Follow relative links only. Useful for retrieving a specific home page
|
|
without any distractions, not even those from the same hosts
|
|
(@xref{Relative Links}).
|
|
|
|
@item -I @var{list}
|
|
@itemx --include-directories=@var{list}
|
|
Specify a comma-separated list of directories you wish to follow when
|
|
downloading (@xref{Directory-Based Limits} for more details.) Elements
|
|
of @var{list} may contain wildcards.
|
|
|
|
@item -X @var{list}
|
|
@itemx --exclude-directories=@var{list}
|
|
Specify a comma-separated list of directories you wish to exclude from
|
|
download (@xref{Directory-Based Limits} for more details.) Elements of
|
|
@var{list} may contain wildcards.
|
|
|
|
@item -nh
|
|
@itemx --no-host-lookup
|
|
Disable the time-consuming @sc{dns} lookup of almost all hosts
|
|
(@xref{Host Checking}).
|
|
|
|
@item -np
|
|
@item --no-parent
|
|
Do not ever ascend to the parent directory when retrieving recursively.
|
|
This is a useful option, since it guarantees that only the files
|
|
@emph{below} a certain hierarchy will be downloaded.
|
|
@xref{Directory-Based Limits} for more details.
|
|
@end table
|
|
|
|
@node Recursive Retrieval, Following Links, Invoking, Top
|
|
@chapter Recursive Retrieval
|
|
@cindex recursion
|
|
@cindex retrieving
|
|
@cindex recursive retrieval
|
|
|
|
GNU Wget is capable of traversing parts of the Web (or a single
|
|
@sc{http} or @sc{ftp} server), depth-first following links and directory
|
|
structure. This is called @dfn{recursive} retrieving, or
|
|
@dfn{recursion}.
|
|
|
|
With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} from
|
|
the given @sc{url}, documents, retrieving the files the @sc{html}
|
|
document was referring to, through markups like @code{href}, or
|
|
@code{src}. If the freshly downloaded file is also of type
|
|
@code{text/html}, it will be parsed and followed further.
|
|
|
|
The maximum @dfn{depth} to which the retrieval may descend is specified
|
|
with the @samp{-l} option (the default maximum depth is five layers).
|
|
@xref{Recursive Retrieval}.
|
|
|
|
When retrieving an @sc{ftp} @sc{url} recursively, Wget will retrieve all
|
|
the data from the given directory tree (including the subdirectories up
|
|
to the specified depth) on the remote server, creating its mirror image
|
|
locally. @sc{ftp} retrieval is also limited by the @code{depth}
|
|
parameter.
|
|
|
|
By default, Wget will create a local directory tree, corresponding to
|
|
the one found on the remote server.
|
|
|
|
Recursive retrieving can find a number of applications, the most
|
|
important of which is mirroring. It is also useful for @sc{www}
|
|
presentations, and any other opportunities where slow network
|
|
connections should be bypassed by storing the files locally.
|
|
|
|
You should be warned that invoking recursion may cause grave overloading
|
|
on your system, because of the fast exchange of data through the
|
|
network; all of this may hamper other users' work. The same stands for
|
|
the foreign server you are mirroring---the more requests it gets in a
|
|
rows, the greater is its load.
|
|
|
|
Careless retrieving can also fill your file system uncontrollably, which
|
|
can grind the machine to a halt.
|
|
|
|
The load can be minimized by lowering the maximum recursion level
|
|
(@samp{-l}) and/or by lowering the number of retries (@samp{-t}). You
|
|
may also consider using the @samp{-w} option to slow down your requests
|
|
to the remote servers, as well as the numerous options to narrow the
|
|
number of followed links (@xref{Following Links}).
|
|
|
|
Recursive retrieval is a good thing when used properly. Please take all
|
|
precautions not to wreak havoc through carelessness.
|
|
|
|
@node Following Links, Time-Stamping, Recursive Retrieval, Top
|
|
@chapter Following Links
|
|
@cindex links
|
|
@cindex following links
|
|
|
|
When retrieving recursively, one does not wish to retrieve loads of
|
|
unnecessary data. Most of the time the users bear in mind exactly what
|
|
they want to download, and want Wget to follow only specific links.
|
|
|
|
For example, if you wish to download the music archive from
|
|
@samp{fly.cc.fer.hr}, you will not want to download all the home pages
|
|
that happen to be referenced by an obscure part of the archive.
|
|
|
|
Wget possesses several mechanisms that allows you to fine-tune which
|
|
links it will follow.
|
|
|
|
@menu
|
|
* Relative Links:: Follow relative links only.
|
|
* Host Checking:: Follow links on the same host.
|
|
* Domain Acceptance:: Check on a list of domains.
|
|
* All Hosts:: No host restrictions.
|
|
* Types of Files:: Getting only certain files.
|
|
* Directory-Based Limits:: Getting only certain directories.
|
|
* FTP Links:: Following FTP links.
|
|
@end menu
|
|
|
|
@node Relative Links, Host Checking, Following Links, Following Links
|
|
@section Relative Links
|
|
@cindex relative links
|
|
|
|
When only relative links are followed (option @samp{-L}), recursive
|
|
retrieving will never span hosts. No time-expensive @sc{dns}-lookups
|
|
will be performed, and the process will be very fast, with the minimum
|
|
strain of the network. This will suit your needs often, especially when
|
|
mirroring the output of various @code{x2html} converters, since they
|
|
generally output relative links.
|
|
|
|
@node Host Checking, Domain Acceptance, Relative Links, Following Links
|
|
@section Host Checking
|
|
@cindex DNS lookup
|
|
@cindex host lookup
|
|
@cindex host checking
|
|
|
|
The drawback of following the relative links solely is that humans often
|
|
tend to mix them with absolute links to the very same host, and the very
|
|
same page. In this mode (which is the default mode for following links)
|
|
all @sc{url}s that refer to the same host will be retrieved.
|
|
|
|
The problem with this option are the aliases of the hosts and domains.
|
|
Thus there is no way for Wget to know that @samp{regoc.srce.hr} and
|
|
@samp{www.srce.hr} are the same host, or that @samp{fly.cc.fer.hr} is
|
|
the same as @samp{fly.cc.etf.hr}. Whenever an absolute link is
|
|
encountered, the host is @sc{dns}-looked-up with @code{gethostbyname} to
|
|
check whether we are maybe dealing with the same hosts. Although the
|
|
results of @code{gethostbyname} are cached, it is still a great
|
|
slowdown, e.g. when dealing with large indices of home pages on different
|
|
hosts (because each of the hosts must be @sc{dns}-resolved to see
|
|
whether it just @emph{might} be an alias of the starting host).
|
|
|
|
To avoid the overhead you may use @samp{-nh}, which will turn off
|
|
@sc{dns}-resolving and make Wget compare hosts literally. This will
|
|
make things run much faster, but also much less reliable
|
|
(e.g. @samp{www.srce.hr} and @samp{regoc.srce.hr} will be flagged as
|
|
different hosts).
|
|
|
|
Note that modern @sc{http} servers allow one IP address to host several
|
|
@dfn{virtual servers}, each having its own directory hierarchy. Such
|
|
``servers'' are distinguished by their hostnames (all of which point to
|
|
the same IP address); for this to work, a client must send a @code{Host}
|
|
header, which is what Wget does. However, in that case Wget @emph{must
|
|
not} try to divine a host's ``real'' address, nor try to use the same
|
|
hostname for each access, i.e. @samp{-nh} must be turned on.
|
|
|
|
In other words, the @samp{-nh} option must be used to enable the
|
|
retrieval from virtual servers distinguished by their hostnames. As the
|
|
number of such server setups grow, the behavior of @samp{-nh} may become
|
|
the default in the future.
|
|
|
|
@node Domain Acceptance, All Hosts, Host Checking, Following Links
|
|
@section Domain Acceptance
|
|
|
|
With the @samp{-D} option you may specify the domains that will be
|
|
followed. The hosts the domain of which is not in this list will not be
|
|
@sc{dns}-resolved. Thus you can specify @samp{-Dmit.edu} just to make
|
|
sure that @strong{nothing outside of @sc{mit} gets looked up}. This is
|
|
very important and useful. It also means that @samp{-D} does @emph{not}
|
|
imply @samp{-H} (span all hosts), which must be specified explicitly.
|
|
Feel free to use this options since it will speed things up, with almost
|
|
all the reliability of checking for all hosts. Thus you could invoke
|
|
|
|
@example
|
|
wget -r -D.hr http://fly.cc.fer.hr/
|
|
@end example
|
|
|
|
to make sure that only the hosts in @samp{.hr} domain get
|
|
@sc{dns}-looked-up for being equal to @samp{fly.cc.fer.hr}. So
|
|
@samp{fly.cc.etf.hr} will be checked (only once!) and found equal, but
|
|
@samp{www.gnu.ai.mit.edu} will not even be checked.
|
|
|
|
Of course, domain acceptance can be used to limit the retrieval to
|
|
particular domains with spanning of hosts in them, but then you must
|
|
specify @samp{-H} explicitly. E.g.:
|
|
|
|
@example
|
|
wget -r -H -Dmit.edu,stanford.edu http://www.mit.edu/
|
|
@end example
|
|
|
|
will start with @samp{http://www.mit.edu/}, following links across
|
|
@sc{mit} and Stanford.
|
|
|
|
If there are domains you want to exclude specifically, you can do it
|
|
with @samp{--exclude-domains}, which accepts the same type of arguments
|
|
of @samp{-D}, but will @emph{exclude} all the listed domains. For
|
|
example, if you want to download all the hosts from @samp{foo.edu}
|
|
domain, with the exception of @samp{sunsite.foo.edu}, you can do it like
|
|
this:
|
|
|
|
@example
|
|
wget -rH -Dfoo.edu --exclude-domains sunsite.foo.edu http://www.foo.edu/
|
|
@end example
|
|
|
|
@node All Hosts, Types of Files, Domain Acceptance, Following Links
|
|
@section All Hosts
|
|
@cindex all hosts
|
|
@cindex span hosts
|
|
|
|
When @samp{-H} is specified without @samp{-D}, all hosts are freely
|
|
spanned. There are no restrictions whatsoever as to what part of the
|
|
net Wget will go to fetch documents, other than maximum retrieval depth.
|
|
If a page references @samp{www.yahoo.com}, so be it. Such an option is
|
|
rarely useful for itself.
|
|
|
|
@node Types of Files, Directory-Based Limits, All Hosts, Following Links
|
|
@section Types of Files
|
|
@cindex types of files
|
|
|
|
When downloading material from the web, you will often want to restrict
|
|
the retrieval to only certain file types. For example, if you are
|
|
interested in downloading @sc{gif}s, you will not be overjoyed to get
|
|
loads of PostScript documents, and vice versa.
|
|
|
|
Wget offers two options to deal with this problem. Each option
|
|
description lists a short name, a long name, and the equivalent command
|
|
in @file{.wgetrc}.
|
|
|
|
@cindex accept wildcards
|
|
@cindex accept suffixes
|
|
@cindex wildcards, accept
|
|
@cindex suffixes, accept
|
|
@table @samp
|
|
@item -A @var{acclist}
|
|
@itemx --accept @var{acclist}
|
|
@itemx accept = @var{acclist}
|
|
The argument to @samp{--accept} option is a list of file suffixes or
|
|
patterns that Wget will download during recursive retrieval. A suffix
|
|
is the ending part of a file, and consists of ``normal'' letters,
|
|
e.g. @samp{gif} or @samp{.jpg}. A matching pattern contains shell-like
|
|
wildcards, e.g. @samp{books*} or @samp{zelazny*196[0-9]*}.
|
|
|
|
So, specifying @samp{wget -A gif,jpg} will make Wget download only the
|
|
files ending with @samp{gif} or @samp{jpg}, i.e. @sc{gif}s and
|
|
@sc{jpeg}s. On the other hand, @samp{wget -A "zelazny*196[0-9]*"} will
|
|
download only files beginning with @samp{zelazny} and containing numbers
|
|
from 1960 to 1969 anywhere within. Look up the manual of your shell for
|
|
a description of how pattern matching works.
|
|
|
|
Of course, any number of suffixes and patterns can be combined into a
|
|
comma-separated list, and given as an argument to @samp{-A}.
|
|
|
|
@cindex reject wildcards
|
|
@cindex reject suffixes
|
|
@cindex wildcards, reject
|
|
@cindex suffixes, reject
|
|
@item -R @var{rejlist}
|
|
@itemx --reject @var{rejlist}
|
|
@itemx reject = @var{rejlist}
|
|
The @samp{--reject} option works the same way as @samp{--accept}, only
|
|
its logic is the reverse; Wget will download all files @emph{except} the
|
|
ones matching the suffixes (or patterns) in the list.
|
|
|
|
So, if you want to download a whole page except for the cumbersome
|
|
@sc{mpeg}s and @sc{.au} files, you can use @samp{wget -R mpg,mpeg,au}.
|
|
Analogously, to download all files except the ones beginning with
|
|
@samp{bjork}, use @samp{wget -R "bjork*"}. The quotes are to prevent
|
|
expansion by the shell.
|
|
@end table
|
|
|
|
The @samp{-A} and @samp{-R} options may be combined to achieve even
|
|
better fine-tuning of which files to retrieve. E.g. @samp{wget -A
|
|
"*zelazny*" -R .ps} will download all the files having @samp{zelazny} as
|
|
a part of their name, but @emph{not} the PostScript files.
|
|
|
|
Note that these two options do not affect the downloading of @sc{html}
|
|
files; Wget must load all the @sc{html}s to know where to go at
|
|
all---recursive retrieval would make no sense otherwise.
|
|
|
|
@node Directory-Based Limits, FTP Links, Types of Files, Following Links
|
|
@section Directory-Based Limits
|
|
@cindex directories
|
|
@cindex directory limits
|
|
|
|
Regardless of other link-following facilities, it is often useful to
|
|
place the restriction of what files to retrieve based on the directories
|
|
those files are placed in. There can be many reasons for this---the
|
|
home pages may be organized in a reasonable directory structure; or some
|
|
directories may contain useless information, e.g. @file{/cgi-bin} or
|
|
@file{/dev} directories.
|
|
|
|
Wget offers three different options to deal with this requirement. Each
|
|
option description lists a short name, a long name, and the equivalent
|
|
command in @file{.wgetrc}.
|
|
|
|
@cindex directories, include
|
|
@cindex include directories
|
|
@cindex accept directories
|
|
@table @samp
|
|
@item -I @var{list}
|
|
@itemx --include @var{list}
|
|
@itemx include_directories = @var{list}
|
|
@samp{-I} option accepts a comma-separated list of directories included
|
|
in the retrieval. Any other directories will simply be ignored. The
|
|
directories are absolute paths.
|
|
|
|
So, if you wish to download from @samp{http://host/people/bozo/}
|
|
following only links to bozo's colleagues in the @file{/people}
|
|
directory and the bogus scripts in @file{/cgi-bin}, you can specify:
|
|
|
|
@example
|
|
wget -I /people,/cgi-bin http://host/people/bozo/
|
|
@end example
|
|
|
|
@cindex directories, exclude
|
|
@cindex exclude directories
|
|
@cindex reject directories
|
|
@item -X @var{list}
|
|
@itemx --exclude @var{list}
|
|
@itemx exclude_directories = @var{list}
|
|
@samp{-X} option is exactly the reverse of @samp{-I}---this is a list of
|
|
directories @emph{excluded} from the download. E.g. if you do not want
|
|
Wget to download things from @file{/cgi-bin} directory, specify @samp{-X
|
|
/cgi-bin} on the command line.
|
|
|
|
The same as with @samp{-A}/@samp{-R}, these two options can be combined
|
|
to get a better fine-tuning of downloading subdirectories. E.g. if you
|
|
want to load all the files from @file{/pub} hierarchy except for
|
|
@file{/pub/worthless}, specify @samp{-I/pub -X/pub/worthless}.
|
|
|
|
@cindex no parent
|
|
@item -np
|
|
@itemx --no-parent
|
|
@itemx no_parent = on
|
|
The simplest, and often very useful way of limiting directories is
|
|
disallowing retrieval of the links that refer to the hierarchy
|
|
@dfn{above} than the beginning directory, i.e. disallowing ascent to the
|
|
parent directory/directories.
|
|
|
|
The @samp{--no-parent} option (short @samp{-np}) is useful in this case.
|
|
Using it guarantees that you will never leave the existing hierarchy.
|
|
Supposing you issue Wget with:
|
|
|
|
@example
|
|
wget -r --no-parent http://somehost/~luzer/my-archive/
|
|
@end example
|
|
|
|
You may rest assured that none of the references to
|
|
@file{/~his-girls-homepage/} or @file{/~luzer/all-my-mpegs/} will be
|
|
followed. Only the archive you are interested in will be downloaded.
|
|
Essentially, @samp{--no-parent} is similar to
|
|
@samp{-I/~luzer/my-archive}, only it handles redirections in a more
|
|
intelligent fashion.
|
|
@end table
|
|
|
|
@node FTP Links, , Directory-Based Limits, Following Links
|
|
@section Following FTP Links
|
|
@cindex following ftp links
|
|
|
|
The rules for @sc{ftp} are somewhat specific, as it is necessary for
|
|
them to be. @sc{ftp} links in @sc{html} documents are often included
|
|
for purposes of reference, and it is often inconvenient to download them
|
|
by default.
|
|
|
|
To have @sc{ftp} links followed from @sc{html} documents, you need to
|
|
specify the @samp{--follow-ftp} option. Having done that, @sc{ftp}
|
|
links will span hosts regardless of @samp{-H} setting. This is logical,
|
|
as @sc{ftp} links rarely point to the same host where the @sc{http}
|
|
server resides. For similar reasons, the @samp{-L} options has no
|
|
effect on such downloads. On the other hand, domain acceptance
|
|
(@samp{-D}) and suffix rules (@samp{-A} and @samp{-R}) apply normally.
|
|
|
|
Also note that followed links to @sc{ftp} directories will not be
|
|
retrieved recursively further.
|
|
|
|
@node Time-Stamping, Startup File, Following Links, Top
|
|
@chapter Time-Stamping
|
|
@cindex time-stamping
|
|
@cindex timestamping
|
|
@cindex updating the archives
|
|
@cindex incremental updating
|
|
|
|
One of the most important aspects of mirroring information from the
|
|
Internet is updating your archives.
|
|
|
|
Downloading the whole archive again and again, just to replace a few
|
|
changed files is expensive, both in terms of wasted bandwidth and money,
|
|
and the time to do the update. This is why all the mirroring tools
|
|
offer the option of incremental updating.
|
|
|
|
Such an updating mechanism means that the remote server is scanned in
|
|
search of @dfn{new} files. Only those new files will be downloaded in
|
|
the place of the old ones.
|
|
|
|
A file is considered new if one of these two conditions are met:
|
|
|
|
@enumerate
|
|
@item
|
|
A file of that name does not already exist locally.
|
|
|
|
@item
|
|
A file of that name does exist, but the remote file was modified more
|
|
recently than the local file.
|
|
@end enumerate
|
|
|
|
To implement this, the program needs to be aware of the time of last
|
|
modification of both remote and local files. Such information are
|
|
called the @dfn{time-stamps}.
|
|
|
|
The time-stamping in GNU Wget is turned on using @samp{--timestamping}
|
|
(@samp{-N}) option, or through @code{timestamping = on} directive in
|
|
@file{.wgetrc}. With this option, for each file it intends to download,
|
|
Wget will check whether a local file of the same name exists. If it
|
|
does, and the remote file is older, Wget will not download it.
|
|
|
|
If the local file does not exist, or the sizes of the files do not
|
|
match, Wget will download the remote file no matter what the time-stamps
|
|
say.
|
|
|
|
@menu
|
|
* Time-Stamping Usage::
|
|
* HTTP Time-Stamping Internals::
|
|
* FTP Time-Stamping Internals::
|
|
@end menu
|
|
|
|
@node Time-Stamping Usage, HTTP Time-Stamping Internals, Time-Stamping, Time-Stamping
|
|
@section Time-Stamping Usage
|
|
@cindex time-stamping usage
|
|
@cindex usage, time-stamping
|
|
|
|
The usage of time-stamping is simple. Say you would like to download a
|
|
file so that it keeps its date of modification.
|
|
|
|
@example
|
|
wget -S http://www.gnu.ai.mit.edu/
|
|
@end example
|
|
|
|
A simple @code{ls -l} shows that the time stamp on the local file equals
|
|
the state of the @code{Last-Modified} header, as returned by the server.
|
|
As you can see, the time-stamping info is preserved locally, even
|
|
without @samp{-N}.
|
|
|
|
Several days later, you would like Wget to check if the remote file has
|
|
changed, and download it if it has.
|
|
|
|
@example
|
|
wget -N http://www.gnu.ai.mit.edu/
|
|
@end example
|
|
|
|
Wget will ask the server for the last-modified date. If the local file
|
|
is newer, the remote file will not be re-fetched. However, if the remote
|
|
file is more recent, Wget will proceed fetching it normally.
|
|
|
|
The same goes for @sc{ftp}. For example:
|
|
|
|
@example
|
|
wget ftp://ftp.ifi.uio.no/pub/emacs/gnus/*
|
|
@end example
|
|
|
|
@code{ls} will show that the timestamps are set according to the state
|
|
on the remote server. Reissuing the command with @samp{-N} will make
|
|
Wget re-fetch @emph{only} the files that have been modified.
|
|
|
|
In both @sc{http} and @sc{ftp} retrieval Wget will time-stamp the local
|
|
file correctly (with or without @samp{-N}) if it gets the stamps,
|
|
i.e. gets the directory listing for @sc{ftp} or the @code{Last-Modified}
|
|
header for @sc{http}.
|
|
|
|
If you wished to mirror the GNU archive every week, you would use the
|
|
following command every week:
|
|
|
|
@example
|
|
wget --timestamping -r ftp://prep.ai.mit.edu/pub/gnu/
|
|
@end example
|
|
|
|
@node HTTP Time-Stamping Internals, FTP Time-Stamping Internals, Time-Stamping Usage, Time-Stamping
|
|
@section HTTP Time-Stamping Internals
|
|
@cindex http time-stamping
|
|
|
|
Time-stamping in @sc{http} is implemented by checking of the
|
|
@code{Last-Modified} header. If you wish to retrieve the file
|
|
@file{foo.html} through @sc{http}, Wget will check whether
|
|
@file{foo.html} exists locally. If it doesn't, @file{foo.html} will be
|
|
retrieved unconditionally.
|
|
|
|
If the file does exist locally, Wget will first check its local
|
|
time-stamp (similar to the way @code{ls -l} checks it), and then send a
|
|
@code{HEAD} request to the remote server, demanding the information on
|
|
the remote file.
|
|
|
|
The @code{Last-Modified} header is examined to find which file was
|
|
modified more recently (which makes it ``newer''). If the remote file
|
|
is newer, it will be downloaded; if it is older, Wget will give
|
|
up.@footnote{As an additional check, Wget will look at the
|
|
@code{Content-Length} header, and compare the sizes; if they are not the
|
|
same, the remote file will be downloaded no matter what the time-stamp
|
|
says.}
|
|
|
|
When @samp{--backup-converted} (@samp{-K}) is specified in conjunction
|
|
with @samp{-N}, server file @samp{@var{X}} is compared to local file
|
|
@samp{@var{X}.orig}, if extant, rather than being compared to local file
|
|
@samp{@var{X}}, which will always differ if it's been converted by
|
|
@samp{--convert-links} (@samp{-k}).
|
|
|
|
Arguably, @sc{http} time-stamping should be implemented using the
|
|
@code{If-Modified-Since} request.
|
|
|
|
@node FTP Time-Stamping Internals, , HTTP Time-Stamping Internals, Time-Stamping
|
|
@section FTP Time-Stamping Internals
|
|
@cindex ftp time-stamping
|
|
|
|
In theory, @sc{ftp} time-stamping works much the same as @sc{http}, only
|
|
@sc{ftp} has no headers---time-stamps must be received from the
|
|
directory listings.
|
|
|
|
For each directory files must be retrieved from, Wget will use the
|
|
@code{LIST} command to get the listing. It will try to analyze the
|
|
listing, assuming that it is a Unix @code{ls -l} listing, and extract
|
|
the time-stamps. The rest is exactly the same as for @sc{http}.
|
|
|
|
Assumption that every directory listing is a Unix-style listing may
|
|
sound extremely constraining, but in practice it is not, as many
|
|
non-Unix @sc{ftp} servers use the Unixoid listing format because most
|
|
(all?) of the clients understand it. Bear in mind that @sc{rfc959}
|
|
defines no standard way to get a file list, let alone the time-stamps.
|
|
We can only hope that a future standard will define this.
|
|
|
|
Another non-standard solution includes the use of @code{MDTM} command
|
|
that is supported by some @sc{ftp} servers (including the popular
|
|
@code{wu-ftpd}), which returns the exact time of the specified file.
|
|
Wget may support this command in the future.
|
|
|
|
@node Startup File, Examples, Time-Stamping, Top
|
|
@chapter Startup File
|
|
@cindex startup file
|
|
@cindex wgetrc
|
|
@cindex .wgetrc
|
|
@cindex startup
|
|
@cindex .netrc
|
|
|
|
Once you know how to change default settings of Wget through command
|
|
line arguments, you may wish to make some of those settings permanent.
|
|
You can do that in a convenient way by creating the Wget startup
|
|
file---@file{.wgetrc}.
|
|
|
|
Besides @file{.wgetrc} is the ``main'' initialization file, it is
|
|
convenient to have a special facility for storing passwords. Thus Wget
|
|
reads and interprets the contents of @file{$HOME/.netrc}, if it finds
|
|
it. You can find @file{.netrc} format in your system manuals.
|
|
|
|
Wget reads @file{.wgetrc} upon startup, recognizing a limited set of
|
|
commands.
|
|
|
|
@menu
|
|
* Wgetrc Location:: Location of various wgetrc files.
|
|
* Wgetrc Syntax:: Syntax of wgetrc.
|
|
* Wgetrc Commands:: List of available commands.
|
|
* Sample Wgetrc:: A wgetrc example.
|
|
@end menu
|
|
|
|
@node Wgetrc Location, Wgetrc Syntax, Startup File, Startup File
|
|
@section Wgetrc Location
|
|
@cindex wgetrc location
|
|
@cindex location of wgetrc
|
|
|
|
When initializing, Wget will look for a @dfn{global} startup file,
|
|
@file{/usr/local/etc/wgetrc} by default (or some prefix other than
|
|
@file{/usr/local}, if Wget was not installed there) and read commands
|
|
from there, if it exists.
|
|
|
|
Then it will look for the user's file. If the environmental variable
|
|
@code{WGETRC} is set, Wget will try to load that file. Failing that, no
|
|
further attempts will be made.
|
|
|
|
If @code{WGETRC} is not set, Wget will try to load @file{$HOME/.wgetrc}.
|
|
|
|
The fact that user's settings are loaded after the system-wide ones
|
|
means that in case of collision user's wgetrc @emph{overrides} the
|
|
system-wide wgetrc (in @file{/usr/local/etc/wgetrc} by default).
|
|
Fascist admins, away!
|
|
|
|
@node Wgetrc Syntax, Wgetrc Commands, Wgetrc Location, Startup File
|
|
@section Wgetrc Syntax
|
|
@cindex wgetrc syntax
|
|
@cindex syntax of wgetrc
|
|
|
|
The syntax of a wgetrc command is simple:
|
|
|
|
@example
|
|
variable = value
|
|
@end example
|
|
|
|
The @dfn{variable} will also be called @dfn{command}. Valid
|
|
@dfn{values} are different for different commands.
|
|
|
|
The commands are case-insensitive and underscore-insensitive. Thus
|
|
@samp{DIr__PrefiX} is the same as @samp{dirprefix}. Empty lines, lines
|
|
beginning with @samp{#} and lines containing white-space only are
|
|
discarded.
|
|
|
|
Commands that expect a comma-separated list will clear the list on an
|
|
empty command. So, if you wish to reset the rejection list specified in
|
|
global @file{wgetrc}, you can do it with:
|
|
|
|
@example
|
|
reject =
|
|
@end example
|
|
|
|
@node Wgetrc Commands, Sample Wgetrc, Wgetrc Syntax, Startup File
|
|
@section Wgetrc Commands
|
|
@cindex wgetrc commands
|
|
|
|
The complete set of commands is listed below, the letter after @samp{=}
|
|
denoting the value the command takes. It is @samp{on/off} for @samp{on}
|
|
or @samp{off} (which can also be @samp{1} or @samp{0}), @var{string} for
|
|
any non-empty string or @var{n} for a positive integer. For example,
|
|
you may specify @samp{use_proxy = off} to disable use of proxy servers
|
|
by default. You may use @samp{inf} for infinite values, where
|
|
appropriate.
|
|
|
|
Most of the commands have their equivalent command-line option
|
|
(@xref{Invoking}), except some more obscure or rarely used ones.
|
|
|
|
@table @asis
|
|
@item accept/reject = @var{string}
|
|
Same as @samp{-A}/@samp{-R} (@xref{Types of Files}).
|
|
|
|
@item add_hostdir = on/off
|
|
Enable/disable host-prefixed file names. @samp{-nH} disables it.
|
|
|
|
@item continue = on/off
|
|
Enable/disable continuation of the retrieval, the same as @samp{-c}
|
|
(which enables it).
|
|
|
|
@item background = on/off
|
|
Enable/disable going to background, the same as @samp{-b} (which enables
|
|
it).
|
|
|
|
@item backup_converted = on/off
|
|
Enable/disable saving pre-converted files with the suffix @samp{.orig}
|
|
-- the same as @samp{-K} (which enables it).
|
|
|
|
@c @item backups = @var{number}
|
|
@c #### Document me!
|
|
@item base = @var{string}
|
|
Set base for relative @sc{url}s, the same as @samp{-B}.
|
|
|
|
@item cache = on/off
|
|
When set to off, disallow server-caching. See the @samp{-C} option.
|
|
|
|
@item convert links = on/off
|
|
Convert non-relative links locally. The same as @samp{-k}.
|
|
|
|
@item cut_dirs = @var{n}
|
|
Ignore @var{n} remote directory components.
|
|
|
|
@item debug = on/off
|
|
Debug mode, same as @samp{-d}.
|
|
|
|
@item delete_after = on/off
|
|
Delete after download, the same as @samp{--delete-after}.
|
|
|
|
@item dir_prefix = @var{string}
|
|
Top of directory tree, the same as @samp{-P}.
|
|
|
|
@item dirstruct = on/off
|
|
Turning dirstruct on or off, the same as @samp{-x} or @samp{-nd},
|
|
respectively.
|
|
|
|
@item domains = @var{string}
|
|
Same as @samp{-D} (@xref{Domain Acceptance}).
|
|
|
|
@item dot_bytes = @var{n}
|
|
Specify the number of bytes ``contained'' in a dot, as seen throughout
|
|
the retrieval (1024 by default). You can postfix the value with
|
|
@samp{k} or @samp{m}, representing kilobytes and megabytes,
|
|
respectively. With dot settings you can tailor the dot retrieval to
|
|
suit your needs, or you can use the predefined @dfn{styles}
|
|
(@xref{Download Options}).
|
|
|
|
@item dots_in_line = @var{n}
|
|
Specify the number of dots that will be printed in each line throughout
|
|
the retrieval (50 by default).
|
|
|
|
@item dot_spacing = @var{n}
|
|
Specify the number of dots in a single cluster (10 by default).
|
|
|
|
@item dot_style = @var{string}
|
|
Specify the dot retrieval @dfn{style}, as with @samp{--dot-style}.
|
|
|
|
@item exclude_directories = @var{string}
|
|
Specify a comma-separated list of directories you wish to exclude from
|
|
download, the same as @samp{-X} (@xref{Directory-Based Limits}).
|
|
|
|
@item exclude_domains = @var{string}
|
|
Same as @samp{--exclude-domains} (@xref{Domain Acceptance}).
|
|
|
|
@item follow_ftp = on/off
|
|
Follow @sc{ftp} links from @sc{html} documents, the same as @samp{-f}.
|
|
|
|
@item follow_tags = @var{string}
|
|
Only follow certain HTML tags when doing a recursive retrieval, just like
|
|
@samp{--follow-tags}.
|
|
|
|
@item force_html = on/off
|
|
If set to on, force the input filename to be regarded as an @sc{html}
|
|
document, the same as @samp{-F}.
|
|
|
|
@item ftp_proxy = @var{string}
|
|
Use @var{string} as @sc{ftp} proxy, instead of the one specified in
|
|
environment.
|
|
|
|
@item glob = on/off
|
|
Turn globbing on/off, the same as @samp{-g}.
|
|
|
|
@item header = @var{string}
|
|
Define an additional header, like @samp{--header}.
|
|
|
|
@item http_passwd = @var{string}
|
|
Set @sc{http} password.
|
|
|
|
@item http_proxy = @var{string}
|
|
Use @var{string} as @sc{http} proxy, instead of the one specified in
|
|
environment.
|
|
|
|
@item http_user = @var{string}
|
|
Set @sc{http} user to @var{string}.
|
|
|
|
@item ignore_length = on/off
|
|
When set to on, ignore @code{Content-Length} header; the same as
|
|
@samp{--ignore-length}.
|
|
|
|
@item ignore_tags = @var{string}
|
|
Ignore certain HTML tags when doing a recursive retrieval, just like
|
|
@samp{-G} / @samp{--ignore-tags}.
|
|
|
|
@item include_directories = @var{string}
|
|
Specify a comma-separated list of directories you wish to follow when
|
|
downloading, the same as @samp{-I}.
|
|
|
|
@item input = @var{string}
|
|
Read the @sc{url}s from @var{string}, like @samp{-i}.
|
|
|
|
@item kill_longer = on/off
|
|
Consider data longer than specified in content-length header
|
|
as invalid (and retry getting it). The default behaviour is to save
|
|
as much data as there is, provided there is more than or equal
|
|
to the value in @code{Content-Length}.
|
|
|
|
@item logfile = @var{string}
|
|
Set logfile, the same as @samp{-o}.
|
|
|
|
@item login = @var{string}
|
|
Your user name on the remote machine, for @sc{ftp}. Defaults to
|
|
@samp{anonymous}.
|
|
|
|
@item mirror = on/off
|
|
Turn mirroring on/off. The same as @samp{-m}.
|
|
|
|
@item netrc = on/off
|
|
Turn reading netrc on or off.
|
|
|
|
@item noclobber = on/off
|
|
Same as @samp{-nc}.
|
|
|
|
@item no_parent = on/off
|
|
Disallow retrieving outside the directory hierarchy, like
|
|
@samp{--no-parent} (@xref{Directory-Based Limits}).
|
|
|
|
@item no_proxy = @var{string}
|
|
Use @var{string} as the comma-separated list of domains to avoid in
|
|
proxy loading, instead of the one specified in environment.
|
|
|
|
@item output_document = @var{string}
|
|
Set the output filename, the same as @samp{-O}.
|
|
|
|
@item passive_ftp = on/off
|
|
Set passive @sc{ftp}, the same as @samp{--passive-ftp}.
|
|
|
|
@item passwd = @var{string}
|
|
Set your @sc{ftp} password to @var{password}. Without this setting, the
|
|
password defaults to @samp{username@@hostname.domainname}.
|
|
|
|
@item proxy_user = @var{string}
|
|
Set proxy authentication user name to @var{string}, like
|
|
@samp{--proxy-user}.
|
|
|
|
@item proxy_passwd = @var{string}
|
|
Set proxy authentication password to @var{string}, like
|
|
@samp{--proxy-passwd}.
|
|
|
|
@item quiet = on/off
|
|
Quiet mode, the same as @samp{-q}.
|
|
|
|
@item quota = @var{quota}
|
|
Specify the download quota, which is useful to put in the global
|
|
@file{wgetrc}. When download quota is specified, Wget will stop retrieving
|
|
after the download sum has become greater than quota. The quota can be
|
|
specified in bytes (default), kbytes @samp{k} appended) or mbytes
|
|
(@samp{m} appended). Thus @samp{quota = 5m} will set the quota to 5
|
|
mbytes. Note that the user's startup file overrides system settings.
|
|
|
|
@item reclevel = @var{n}
|
|
Recursion level, the same as @samp{-l}.
|
|
|
|
@item recursive = on/off
|
|
Recursive on/off, the same as @samp{-r}.
|
|
|
|
@item relative_only = on/off
|
|
Follow only relative links, the same as @samp{-L} (@xref{Relative
|
|
Links}).
|
|
|
|
@item remove_listing = on/off
|
|
If set to on, remove @sc{ftp} listings downloaded by Wget. Setting it
|
|
to off is the same as @samp{-nr}.
|
|
|
|
@item retr_symlinks = on/off
|
|
When set to on, retrieve symbolic links as if they were plain files; the
|
|
same as @samp{--retr-symlinks}.
|
|
|
|
@item robots = on/off
|
|
Use (or not) @file{/robots.txt} file (@xref{Robots}). Be sure to know
|
|
what you are doing before changing the default (which is @samp{on}).
|
|
|
|
@item server_response = on/off
|
|
Choose whether or not to print the @sc{http} and @sc{ftp} server
|
|
responses, the same as @samp{-S}.
|
|
|
|
@item simple_host_check = on/off
|
|
Same as @samp{-nh} (@xref{Host Checking}).
|
|
|
|
@item span_hosts = on/off
|
|
Same as @samp{-H}.
|
|
|
|
@item timeout = @var{n}
|
|
Set timeout value, the same as @samp{-T}.
|
|
|
|
@item timestamping = on/off
|
|
Turn timestamping on/off. The same as @samp{-N} (@xref{Time-Stamping}).
|
|
|
|
@item tries = @var{n}
|
|
Set number of retries per @sc{url}, the same as @samp{-t}.
|
|
|
|
@item use_proxy = on/off
|
|
Turn proxy support on/off. The same as @samp{-Y}.
|
|
|
|
@item verbose = on/off
|
|
Turn verbose on/off, the same as @samp{-v}/@samp{-nv}.
|
|
|
|
@item wait = @var{n}
|
|
Wait @var{n} seconds between retrievals, the same as @samp{-w}.
|
|
|
|
@item waitretry = @var{n}
|
|
Wait up to @var{n} seconds between retries of failed retrievals only --
|
|
the same as @samp{--waitretry}. Note that this is turned on by default
|
|
in the global @file{wgetrc}.
|
|
@end table
|
|
|
|
@node Sample Wgetrc, , Wgetrc Commands, Startup File
|
|
@section Sample Wgetrc
|
|
@cindex sample wgetrc
|
|
|
|
This is the sample initialization file, as given in the distribution.
|
|
It is divided in two section---one for global usage (suitable for global
|
|
startup file), and one for local usage (suitable for
|
|
@file{$HOME/.wgetrc}). Be careful about the things you change.
|
|
|
|
Note that almost all the lines are commented out. For a command to have
|
|
any effect, you must remove the @samp{#} character at the beginning of
|
|
its line.
|
|
|
|
@example
|
|
@include sample.wgetrc.munged_for_texi_inclusion
|
|
@end example
|
|
|
|
@node Examples, Various, Startup File, Top
|
|
@chapter Examples
|
|
@cindex examples
|
|
|
|
The examples are classified into three sections, because of clarity.
|
|
The first section is a tutorial for beginners. The second section
|
|
explains some of the more complex program features. The third section
|
|
contains advice for mirror administrators, as well as even more complex
|
|
features (that some would call perverted).
|
|
|
|
@menu
|
|
* Simple Usage:: Simple, basic usage of the program.
|
|
* Advanced Usage:: Advanced techniques of usage.
|
|
* Guru Usage:: Mirroring and the hairy stuff.
|
|
@end menu
|
|
|
|
@node Simple Usage, Advanced Usage, Examples, Examples
|
|
@section Simple Usage
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Say you want to download a @sc{url}. Just type:
|
|
|
|
@example
|
|
wget http://fly.cc.fer.hr/
|
|
@end example
|
|
|
|
The response will be something like:
|
|
|
|
@example
|
|
@group
|
|
--13:30:45-- http://fly.cc.fer.hr:80/en/
|
|
=> `index.html'
|
|
Connecting to fly.cc.fer.hr:80... connected!
|
|
HTTP request sent, awaiting response... 200 OK
|
|
Length: 4,694 [text/html]
|
|
|
|
0K -> .... [100%]
|
|
|
|
13:30:46 (23.75 KB/s) - `index.html' saved [4694/4694]
|
|
@end group
|
|
@end example
|
|
|
|
@item
|
|
But what will happen if the connection is slow, and the file is lengthy?
|
|
The connection will probably fail before the whole file is retrieved,
|
|
more than once. In this case, Wget will try getting the file until it
|
|
either gets the whole of it, or exceeds the default number of retries
|
|
(this being 20). It is easy to change the number of tries to 45, to
|
|
insure that the whole file will arrive safely:
|
|
|
|
@example
|
|
wget --tries=45 http://fly.cc.fer.hr/jpg/flyweb.jpg
|
|
@end example
|
|
|
|
@item
|
|
Now let's leave Wget to work in the background, and write its progress
|
|
to log file @file{log}. It is tiring to type @samp{--tries}, so we
|
|
shall use @samp{-t}.
|
|
|
|
@example
|
|
wget -t 45 -o log http://fly.cc.fer.hr/jpg/flyweb.jpg &
|
|
@end example
|
|
|
|
The ampersand at the end of the line makes sure that Wget works in the
|
|
background. To unlimit the number of retries, use @samp{-t inf}.
|
|
|
|
@item
|
|
The usage of @sc{ftp} is as simple. Wget will take care of login and
|
|
password.
|
|
|
|
@example
|
|
@group
|
|
$ wget ftp://gnjilux.cc.fer.hr/welcome.msg
|
|
--10:08:47-- ftp://gnjilux.cc.fer.hr:21/welcome.msg
|
|
=> `welcome.msg'
|
|
Connecting to gnjilux.cc.fer.hr:21... connected!
|
|
Logging in as anonymous ... Logged in!
|
|
==> TYPE I ... done. ==> CWD not needed.
|
|
==> PORT ... done. ==> RETR welcome.msg ... done.
|
|
Length: 1,340 (unauthoritative)
|
|
|
|
0K -> . [100%]
|
|
|
|
10:08:48 (1.28 MB/s) - `welcome.msg' saved [1340]
|
|
@end group
|
|
@end example
|
|
|
|
@item
|
|
If you specify a directory, Wget will retrieve the directory listing,
|
|
parse it and convert it to @sc{html}. Try:
|
|
|
|
@example
|
|
wget ftp://prep.ai.mit.edu/pub/gnu/
|
|
lynx index.html
|
|
@end example
|
|
@end itemize
|
|
|
|
@node Advanced Usage, Guru Usage, Simple Usage, Examples
|
|
@section Advanced Usage
|
|
|
|
@itemize @bullet
|
|
@item
|
|
You would like to read the list of @sc{url}s from a file? Not a problem
|
|
with that:
|
|
|
|
@example
|
|
wget -i file
|
|
@end example
|
|
|
|
If you specify @samp{-} as file name, the @sc{url}s will be read from
|
|
standard input.
|
|
|
|
@item
|
|
Create a mirror image of GNU @sc{www} site (with the same directory structure
|
|
the original has) with only one try per document, saving the log of the
|
|
activities to @file{gnulog}:
|
|
|
|
@example
|
|
wget -r -t1 http://www.gnu.ai.mit.edu/ -o gnulog
|
|
@end example
|
|
|
|
@item
|
|
Retrieve the first layer of yahoo links:
|
|
|
|
@example
|
|
wget -r -l1 http://www.yahoo.com/
|
|
@end example
|
|
|
|
@item
|
|
Retrieve the index.html of @samp{www.lycos.com}, showing the original
|
|
server headers:
|
|
|
|
@example
|
|
wget -S http://www.lycos.com/
|
|
@end example
|
|
|
|
@item
|
|
Save the server headers with the file:
|
|
@example
|
|
wget -s http://www.lycos.com/
|
|
more index.html
|
|
@end example
|
|
|
|
@item
|
|
Retrieve the first two levels of @samp{wuarchive.wustl.edu}, saving them
|
|
to /tmp.
|
|
|
|
@example
|
|
wget -P/tmp -l2 ftp://wuarchive.wustl.edu/
|
|
@end example
|
|
|
|
@item
|
|
You want to download all the @sc{gif}s from an @sc{http} directory.
|
|
@samp{wget http://host/dir/*.gif} doesn't work, since @sc{http}
|
|
retrieval does not support globbing. In that case, use:
|
|
|
|
@example
|
|
wget -r -l1 --no-parent -A.gif http://host/dir/
|
|
@end example
|
|
|
|
It is a bit of a kludge, but it works. @samp{-r -l1} means to retrieve
|
|
recursively (@xref{Recursive Retrieval}), with maximum depth of 1.
|
|
@samp{--no-parent} means that references to the parent directory are
|
|
ignored (@xref{Directory-Based Limits}), and @samp{-A.gif} means to
|
|
download only the @sc{gif} files. @samp{-A "*.gif"} would have worked
|
|
too.
|
|
|
|
@item
|
|
Suppose you were in the middle of downloading, when Wget was
|
|
interrupted. Now you do not want to clobber the files already present.
|
|
It would be:
|
|
|
|
@example
|
|
wget -nc -r http://www.gnu.ai.mit.edu/
|
|
@end example
|
|
|
|
@item
|
|
If you want to encode your own username and password to @sc{http} or
|
|
@sc{ftp}, use the appropriate @sc{url} syntax (@xref{URL Format}).
|
|
|
|
@example
|
|
wget ftp://hniksic:mypassword@@jagor.srce.hr/.emacs
|
|
@end example
|
|
|
|
@item
|
|
If you do not like the default retrieval visualization (1K dots with 10
|
|
dots per cluster and 50 dots per line), you can customize it through dot
|
|
settings (@xref{Wgetrc Commands}). For example, many people like the
|
|
``binary'' style of retrieval, with 8K dots and 512K lines:
|
|
|
|
@example
|
|
wget --dot-style=binary ftp://prep.ai.mit.edu/pub/gnu/README
|
|
@end example
|
|
|
|
You can experiment with other styles, like:
|
|
|
|
@example
|
|
wget --dot-style=mega ftp://ftp.xemacs.org/pub/xemacs/xemacs-20.4/xemacs-20.4.tar.gz
|
|
wget --dot-style=micro http://fly.cc.fer.hr/
|
|
@end example
|
|
|
|
To make these settings permanent, put them in your @file{.wgetrc}, as
|
|
described before (@xref{Sample Wgetrc}).
|
|
@end itemize
|
|
|
|
@node Guru Usage, , Advanced Usage, Examples
|
|
@section Guru Usage
|
|
|
|
@cindex mirroring
|
|
@itemize @bullet
|
|
@item
|
|
If you wish Wget to keep a mirror of a page (or @sc{ftp}
|
|
subdirectories), use @samp{--mirror} (@samp{-m}), which is the shorthand
|
|
for @samp{-r -N}. You can put Wget in the crontab file asking it to
|
|
recheck a site each Sunday:
|
|
|
|
@example
|
|
crontab
|
|
0 0 * * 0 wget --mirror ftp://ftp.xemacs.org/pub/xemacs/ -o /home/me/weeklog
|
|
@end example
|
|
|
|
@item
|
|
You may wish to do the same with someone's home page. But you do not
|
|
want to download all those images---you're only interested in @sc{html}.
|
|
|
|
@example
|
|
wget --mirror -A.html http://www.w3.org/
|
|
@end example
|
|
|
|
@item
|
|
But what about mirroring the hosts networkologically close to you? It
|
|
seems so awfully slow because of all that @sc{dns} resolving. Just use
|
|
@samp{-D} (@xref{Domain Acceptance}).
|
|
|
|
@example
|
|
wget -rN -Dsrce.hr http://www.srce.hr/
|
|
@end example
|
|
|
|
Now Wget will correctly find out that @samp{regoc.srce.hr} is the same
|
|
as @samp{www.srce.hr}, but will not even take into consideration the
|
|
link to @samp{www.mit.edu}.
|
|
|
|
@item
|
|
You have a presentation and would like the dumb absolute links to be
|
|
converted to relative? Use @samp{-k}:
|
|
|
|
@example
|
|
wget -k -r @var{URL}
|
|
@end example
|
|
|
|
@cindex redirecting output
|
|
@item
|
|
You would like the output documents to go to standard output instead of
|
|
to files? OK, but Wget will automatically shut up (turn on
|
|
@samp{--quiet}) to prevent mixing of Wget output and the retrieved
|
|
documents.
|
|
|
|
@example
|
|
wget -O - http://jagor.srce.hr/ http://www.srce.hr/
|
|
@end example
|
|
|
|
You can also combine the two options and make weird pipelines to
|
|
retrieve the documents from remote hotlists:
|
|
|
|
@example
|
|
wget -O - http://cool.list.com/ | wget --force-html -i -
|
|
@end example
|
|
@end itemize
|
|
|
|
@node Various, Appendices, Examples, Top
|
|
@chapter Various
|
|
@cindex various
|
|
|
|
This chapter contains all the stuff that could not fit anywhere else.
|
|
|
|
@menu
|
|
* Proxies:: Support for proxy servers
|
|
* Distribution:: Getting the latest version.
|
|
* Mailing List:: Wget mailing list for announcements and discussion.
|
|
* Reporting Bugs:: How and where to report bugs.
|
|
* Portability:: The systems Wget works on.
|
|
* Signals:: Signal-handling performed by Wget.
|
|
@end menu
|
|
|
|
@node Proxies, Distribution, Various, Various
|
|
@section Proxies
|
|
@cindex proxies
|
|
|
|
@dfn{Proxies} are special-purpose @sc{http} servers designed to transfer
|
|
data from remote servers to local clients. One typical use of proxies
|
|
is lightening network load for users behind a slow connection. This is
|
|
achieved by channeling all @sc{http} and @sc{ftp} requests through the
|
|
proxy which caches the transferred data. When a cached resource is
|
|
requested again, proxy will return the data from cache. Another use for
|
|
proxies is for companies that separate (for security reasons) their
|
|
internal networks from the rest of Internet. In order to obtain
|
|
information from the Web, their users connect and retrieve remote data
|
|
using an authorized proxy.
|
|
|
|
Wget supports proxies for both @sc{http} and @sc{ftp} retrievals. The
|
|
standard way to specify proxy location, which Wget recognizes, is using
|
|
the following environment variables:
|
|
|
|
@table @code
|
|
@item http_proxy
|
|
This variable should contain the @sc{url} of the proxy for @sc{http}
|
|
connections.
|
|
|
|
@item ftp_proxy
|
|
This variable should contain the @sc{url} of the proxy for @sc{http}
|
|
connections. It is quite common that @sc{http_proxy} and @sc{ftp_proxy}
|
|
are set to the same @sc{url}.
|
|
|
|
@item no_proxy
|
|
This variable should contain a comma-separated list of domain extensions
|
|
proxy should @emph{not} be used for. For instance, if the value of
|
|
@code{no_proxy} is @samp{.mit.edu}, proxy will not be used to retrieve
|
|
documents from MIT.
|
|
@end table
|
|
|
|
In addition to the environment variables, proxy location and settings
|
|
may be specified from within Wget itself.
|
|
|
|
@table @samp
|
|
@item -Y on/off
|
|
@itemx --proxy=on/off
|
|
@itemx proxy = on/off
|
|
This option may be used to turn the proxy support on or off. Proxy
|
|
support is on by default, provided that the appropriate environment
|
|
variables are set.
|
|
|
|
@item http_proxy = @var{URL}
|
|
@itemx ftp_proxy = @var{URL}
|
|
@itemx no_proxy = @var{string}
|
|
These startup file variables allow you to override the proxy settings
|
|
specified by the environment.
|
|
@end table
|
|
|
|
Some proxy servers require authorization to enable you to use them. The
|
|
authorization consists of @dfn{username} and @dfn{password}, which must
|
|
be sent by Wget. As with @sc{http} authorization, several
|
|
authentication schemes exist. For proxy authorization only the
|
|
@code{Basic} authentication scheme is currently implemented.
|
|
|
|
You may specify your username and password either through the proxy
|
|
@sc{url} or through the command-line options. Assuming that the
|
|
company's proxy is located at @samp{proxy.srce.hr} at port 8001, a proxy
|
|
@sc{url} location containing authorization data might look like this:
|
|
|
|
@example
|
|
http://hniksic:mypassword@@proxy.company.com:8001/
|
|
@end example
|
|
|
|
Alternatively, you may use the @samp{proxy-user} and
|
|
@samp{proxy-password} options, and the equivalent @file{.wgetrc}
|
|
settings @code{proxy_user} and @code{proxy_passwd} to set the proxy
|
|
username and password.
|
|
|
|
@node Distribution, Mailing List, Proxies, Various
|
|
@section Distribution
|
|
@cindex latest version
|
|
|
|
Like all GNU utilities, the latest version of Wget can be found at the
|
|
master GNU archive site prep.ai.mit.edu, and its mirrors. For example,
|
|
Wget @value{VERSION} can be found at
|
|
@url{ftp://prep.ai.mit.edu/gnu/wget/wget-@value{VERSION}.tar.gz}
|
|
|
|
@node Mailing List, Reporting Bugs, Distribution, Various
|
|
@section Mailing List
|
|
@cindex mailing list
|
|
@cindex list
|
|
|
|
Wget has its own mailing list at @email{wget@@sunsite.auc.dk}, thanks
|
|
to Karsten Thygesen. The mailing list is for discussion of Wget
|
|
features and web, reporting Wget bugs (those that you think may be of
|
|
interest to the public) and mailing announcements. You are welcome to
|
|
subscribe. The more people on the list, the better!
|
|
|
|
To subscribe, send mail to @email{wget-subscribe@@sunsite.auc.dk}.
|
|
the magic word @samp{subscribe} in the subject line. Unsubscribe by
|
|
mailing to @email{wget-unsubscribe@@sunsite.auc.dk}.
|
|
|
|
The mailing list is archived at @url{http://fly.cc.fer.hr/archive/wget}.
|
|
|
|
@node Reporting Bugs, Portability, Mailing List, Various
|
|
@section Reporting Bugs
|
|
@cindex bugs
|
|
@cindex reporting bugs
|
|
@cindex bug reports
|
|
|
|
You are welcome to send bug reports about GNU Wget to
|
|
@email{bug-wget@@gnu.org}. The bugs that you think are of the
|
|
interest to the public (i.e. more people should be informed about them)
|
|
can be Cc-ed to the mailing list at @email{wget@@sunsite.auc.dk}.
|
|
|
|
Before actually submitting a bug report, please try to follow a few
|
|
simple guidelines.
|
|
|
|
@enumerate
|
|
@item
|
|
Please try to ascertain that the behaviour you see really is a bug. If
|
|
Wget crashes, it's a bug. If Wget does not behave as documented,
|
|
it's a bug. If things work strange, but you are not sure about the way
|
|
they are supposed to work, it might well be a bug.
|
|
|
|
@item
|
|
Try to repeat the bug in as simple circumstances as possible. E.g. if
|
|
Wget crashes on @samp{wget -rLl0 -t5 -Y0 http://yoyodyne.com -o
|
|
/tmp/log}, you should try to see if it will crash with a simpler set of
|
|
options.
|
|
|
|
Also, while I will probably be interested to know the contents of your
|
|
@file{.wgetrc} file, just dumping it into the debug message is probably
|
|
a bad idea. Instead, you should first try to see if the bug repeats
|
|
with @file{.wgetrc} moved out of the way. Only if it turns out that
|
|
@file{.wgetrc} settings affect the bug, should you mail me the relevant
|
|
parts of the file.
|
|
|
|
@item
|
|
Please start Wget with @samp{-d} option and send the log (or the
|
|
relevant parts of it). If Wget was compiled without debug support,
|
|
recompile it. It is @emph{much} easier to trace bugs with debug support
|
|
on.
|
|
|
|
@item
|
|
If Wget has crashed, try to run it in a debugger, e.g. @code{gdb `which
|
|
wget` core} and type @code{where} to get the backtrace.
|
|
|
|
@item
|
|
Find where the bug is, fix it and send me the patches. :-)
|
|
@end enumerate
|
|
|
|
@node Portability, Signals, Reporting Bugs, Various
|
|
@section Portability
|
|
@cindex portability
|
|
@cindex operating systems
|
|
|
|
Since Wget uses GNU Autoconf for building and configuring, and avoids
|
|
using ``special'' ultra--mega--cool features of any particular Unix, it
|
|
should compile (and work) on all common Unix flavors.
|
|
|
|
Various Wget versions have been compiled and tested under many kinds of
|
|
Unix systems, including Solaris, Linux, SunOS, OSF (aka Digital Unix),
|
|
Ultrix, *BSD, IRIX, and others; refer to the file @file{MACHINES} in the
|
|
distribution directory for a comprehensive list. If you compile it on
|
|
an architecture not listed there, please let me know so I can update it.
|
|
|
|
Wget should also compile on the other Unix systems, not listed in
|
|
@file{MACHINES}. If it doesn't, please let me know.
|
|
|
|
Thanks to kind contributors, this version of Wget compiles and works on
|
|
Microsoft Windows 95 and Windows NT platforms. It has been compiled
|
|
successfully using MS Visual C++ 4.0, Watcom, and Borland C compilers,
|
|
with Winsock as networking software. Naturally, it is crippled of some
|
|
features available on Unix, but it should work as a substitute for
|
|
people stuck with Windows. Note that the Windows port is
|
|
@strong{neither tested nor maintained} by me---all questions and
|
|
problems should be reported to Wget mailing list at
|
|
@email{wget@@sunsite.auc.dk} where the maintainers will look at them.
|
|
|
|
@node Signals, , Portability, Various
|
|
@section Signals
|
|
@cindex signal handling
|
|
@cindex hangup
|
|
|
|
Since the purpose of Wget is background work, it catches the hangup
|
|
signal (@code{SIGHUP}) and ignores it. If the output was on standard
|
|
output, it will be redirected to a file named @file{wget-log}.
|
|
Otherwise, @code{SIGHUP} is ignored. This is convenient when you wish
|
|
to redirect the output of Wget after having started it.
|
|
|
|
@example
|
|
$ wget http://www.ifi.uio.no/~larsi/gnus.tar.gz &
|
|
$ kill -HUP %% # Redirect the output to wget-log
|
|
@end example
|
|
|
|
Other than that, Wget will not try to interfere with signals in any
|
|
way. @kbd{C-c}, @code{kill -TERM} and @code{kill -KILL} should kill it
|
|
alike.
|
|
|
|
@node Appendices, Copying, Various, Top
|
|
@chapter Appendices
|
|
|
|
This chapter contains some references I consider useful, like the Robots
|
|
Exclusion Standard specification, as well as a list of contributors to
|
|
GNU Wget.
|
|
|
|
@menu
|
|
* Robots:: Wget as a WWW robot.
|
|
* Security Considerations:: Security with Wget.
|
|
* Contributors:: People who helped.
|
|
@end menu
|
|
|
|
@node Robots, Security Considerations, Appendices, Appendices
|
|
@section Robots
|
|
@cindex robots
|
|
@cindex robots.txt
|
|
@cindex server maintenance
|
|
|
|
Since Wget is able to traverse the web, it counts as one of the Web
|
|
@dfn{robots}. Thus Wget understands @dfn{Robots Exclusion Standard}
|
|
(@sc{res})---contents of @file{/robots.txt}, used by server
|
|
administrators to shield parts of their systems from wanderings of Wget.
|
|
|
|
Norobots support is turned on only when retrieving recursively, and
|
|
@emph{never} for the first page. Thus, you may issue:
|
|
|
|
@example
|
|
wget -r http://fly.cc.fer.hr/
|
|
@end example
|
|
|
|
First the index of fly.cc.fer.hr will be downloaded. If Wget finds
|
|
anything worth downloading on the same host, only @emph{then} will it
|
|
load the robots, and decide whether or not to load the links after all.
|
|
@file{/robots.txt} is loaded only once per host. Wget does not support
|
|
the robots @code{META} tag.
|
|
|
|
The description of the norobots standard was written, and is maintained
|
|
by Martijn Koster @email{m.koster@@webcrawler.com}. With his
|
|
permission, I contribute a (slightly modified) TeXified version of the
|
|
@sc{res}.
|
|
|
|
@menu
|
|
* Introduction to RES::
|
|
* RES Format::
|
|
* User-Agent Field::
|
|
* Disallow Field::
|
|
* Norobots Examples::
|
|
@end menu
|
|
|
|
@node Introduction to RES, RES Format, Robots, Robots
|
|
@subsection Introduction to RES
|
|
@cindex norobots introduction
|
|
|
|
@dfn{WWW Robots} (also called @dfn{wanderers} or @dfn{spiders}) are
|
|
programs that traverse many pages in the World Wide Web by recursively
|
|
retrieving linked pages. For more information see the robots page.
|
|
|
|
In 1993 and 1994 there have been occasions where robots have visited
|
|
@sc{www} servers where they weren't welcome for various
|
|
reasons. Sometimes these reasons were robot specific, e.g. certain
|
|
robots swamped servers with rapid-fire requests, or retrieved the same
|
|
files repeatedly. In other situations robots traversed parts of @sc{www}
|
|
servers that weren't suitable, e.g. very deep virtual trees, duplicated
|
|
information, temporary information, or cgi-scripts with side-effects
|
|
(such as voting).
|
|
|
|
These incidents indicated the need for established mechanisms for
|
|
@sc{www} servers to indicate to robots which parts of their server
|
|
should not be accessed. This standard addresses this need with an
|
|
operational solution.
|
|
|
|
This document represents a consensus on 30 June 1994 on the robots
|
|
mailing list (@code{robots@@webcrawler.com}), between the majority of
|
|
robot authors and other people with an interest in robots. It has also
|
|
been open for discussion on the Technical World Wide Web mailing list
|
|
(@code{www-talk@@info.cern.ch}). This document is based on a previous
|
|
working draft under the same title.
|
|
|
|
It is not an official standard backed by a standards body, or owned by
|
|
any commercial organization. It is not enforced by anybody, and there
|
|
no guarantee that all current and future robots will use it. Consider
|
|
it a common facility the majority of robot authors offer the @sc{www}
|
|
community to protect @sc{www} server against unwanted accesses by their
|
|
robots.
|
|
|
|
The latest version of this document can be found at
|
|
@url{http://info.webcrawler.com/mak/projects/robots/norobots.html}.
|
|
|
|
@node RES Format, User-Agent Field, Introduction to RES, Robots
|
|
@subsection RES Format
|
|
@cindex norobots format
|
|
|
|
The format and semantics of the @file{/robots.txt} file are as follows:
|
|
|
|
The file consists of one or more records separated by one or more blank
|
|
lines (terminated by @code{CR}, @code{CR/NL}, or @code{NL}). Each
|
|
record contains lines of the form:
|
|
|
|
@example
|
|
<field>:<optionalspace><value><optionalspace>
|
|
@end example
|
|
|
|
The field name is case insensitive.
|
|
|
|
Comments can be included in file using UNIX Bourne shell conventions:
|
|
the @samp{#} character is used to indicate that preceding space (if any)
|
|
and the remainder of the line up to the line termination is discarded.
|
|
Lines containing only a comment are discarded completely, and therefore
|
|
do not indicate a record boundary.
|
|
|
|
The record starts with one or more User-agent lines, followed by one or
|
|
more Disallow lines, as detailed below. Unrecognized headers are
|
|
ignored.
|
|
|
|
The presence of an empty @file{/robots.txt} file has no explicit
|
|
associated semantics, it will be treated as if it was not present,
|
|
i.e. all robots will consider themselves welcome.
|
|
|
|
@node User-Agent Field, Disallow Field, RES Format, Robots
|
|
@subsection User-Agent Field
|
|
@cindex norobots user-agent
|
|
|
|
The value of this field is the name of the robot the record is
|
|
describing access policy for.
|
|
|
|
If more than one User-agent field is present the record describes an
|
|
identical access policy for more than one robot. At least one field
|
|
needs to be present per record.
|
|
|
|
The robot should be liberal in interpreting this field. A case
|
|
insensitive substring match of the name without version information is
|
|
recommended.
|
|
|
|
If the value is @samp{*}, the record describes the default access policy
|
|
for any robot that has not matched any of the other records. It is not
|
|
allowed to have multiple such records in the @file{/robots.txt} file.
|
|
|
|
@node Disallow Field, Norobots Examples, User-Agent Field, Robots
|
|
@subsection Disallow Field
|
|
@cindex norobots disallow
|
|
|
|
The value of this field specifies a partial @sc{url} that is not to be
|
|
visited. This can be a full path, or a partial path; any @sc{url} that
|
|
starts with this value will not be retrieved. For example,
|
|
@w{@samp{Disallow: /help}} disallows both @samp{/help.html} and
|
|
@samp{/help/index.html}, whereas @w{@samp{Disallow: /help/}} would
|
|
disallow @samp{/help/index.html} but allow @samp{/help.html}.
|
|
|
|
Any empty value, indicates that all @sc{url}s can be retrieved. At least
|
|
one Disallow field needs to be present in a record.
|
|
|
|
@node Norobots Examples, , Disallow Field, Robots
|
|
@subsection Norobots Examples
|
|
@cindex norobots examples
|
|
|
|
The following example @samp{/robots.txt} file specifies that no robots
|
|
should visit any @sc{url} starting with @samp{/cyberworld/map/} or
|
|
@samp{/tmp/}:
|
|
|
|
@example
|
|
# robots.txt for http://www.site.com/
|
|
|
|
User-agent: *
|
|
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
|
|
Disallow: /tmp/ # these will soon disappear
|
|
@end example
|
|
|
|
This example @samp{/robots.txt} file specifies that no robots should
|
|
visit any @sc{url} starting with @samp{/cyberworld/map/}, except the
|
|
robot called @samp{cybermapper}:
|
|
|
|
@example
|
|
# robots.txt for http://www.site.com/
|
|
|
|
User-agent: *
|
|
Disallow: /cyberworld/map/ # This is an infinite virtual URL space
|
|
|
|
# Cybermapper knows where to go.
|
|
User-agent: cybermapper
|
|
Disallow:
|
|
@end example
|
|
|
|
This example indicates that no robots should visit this site further:
|
|
|
|
@example
|
|
# go away
|
|
User-agent: *
|
|
Disallow: /
|
|
@end example
|
|
|
|
@node Security Considerations, Contributors, Robots, Appendices
|
|
@section Security Considerations
|
|
@cindex security
|
|
|
|
When using Wget, you must be aware that it sends unencrypted passwords
|
|
through the network, which may present a security problem. Here are the
|
|
main issues, and some solutions.
|
|
|
|
@enumerate
|
|
@item
|
|
The passwords on the command line are visible using @code{ps}. If this
|
|
is a problem, avoid putting passwords from the command line---e.g. you
|
|
can use @file{.netrc} for this.
|
|
|
|
@item
|
|
Using the insecure @dfn{basic} authentication scheme, unencrypted
|
|
passwords are transmitted through the network routers and gateways.
|
|
|
|
@item
|
|
The @sc{ftp} passwords are also in no way encrypted. There is no good
|
|
solution for this at the moment.
|
|
|
|
@item
|
|
Although the ``normal'' output of Wget tries to hide the passwords,
|
|
debugging logs show them, in all forms. This problem is avoided by
|
|
being careful when you send debug logs (yes, even when you send them to
|
|
me).
|
|
@end enumerate
|
|
|
|
@node Contributors, , Security Considerations, Appendices
|
|
@section Contributors
|
|
@cindex contributors
|
|
|
|
@iftex
|
|
GNU Wget was written by Hrvoje Nik@v{s}i@'{c} @email{hniksic@@iskon.hr}.
|
|
@end iftex
|
|
@ifinfo
|
|
GNU Wget was written by Hrvoje Niksic @email{hniksic@@iskon.hr}.
|
|
@end ifinfo
|
|
However, its development could never have gone as far as it has, were it
|
|
not for the help of many people, either with bug reports, feature
|
|
proposals, patches, or letters saying ``Thanks!''.
|
|
|
|
Special thanks goes to the following people (no particular order):
|
|
|
|
@itemize @bullet
|
|
@item
|
|
Karsten Thygesen---donated the mailing list and the initial @sc{ftp}
|
|
space.
|
|
|
|
@item
|
|
Shawn McHorse---bug reports and patches.
|
|
|
|
@item
|
|
Kaveh R. Ghazi---on-the-fly @code{ansi2knr}-ization.
|
|
|
|
@item
|
|
Gordon Matzigkeit---@file{.netrc} support.
|
|
|
|
@item
|
|
@iftex
|
|
Zlatko @v{C}alu@v{s}i@'{c}, Tomislav Vujec and Dra@v{z}en
|
|
Ka@v{c}ar---feature suggestions and ``philosophical'' discussions.
|
|
@end iftex
|
|
@ifinfo
|
|
Zlatko Calusic, Tomislav Vujec and Drazen Kacar---feature suggestions
|
|
and ``philosophical'' discussions.
|
|
@end ifinfo
|
|
|
|
@item
|
|
Darko Budor---initial port to Windows.
|
|
|
|
@item
|
|
Antonio Rosella---help and suggestions, plus the Italian translation.
|
|
|
|
@item
|
|
@iftex
|
|
Tomislav Petrovi@'{c}, Mario Miko@v{c}evi@'{c}---many bug reports and
|
|
suggestions.
|
|
@end iftex
|
|
@ifinfo
|
|
Tomislav Petrovic, Mario Mikocevic---many bug reports and suggestions.
|
|
@end ifinfo
|
|
|
|
@item
|
|
@iftex
|
|
Fran@,{c}ois Pinard---many thorough bug reports and discussions.
|
|
@end iftex
|
|
@ifinfo
|
|
Francois Pinard---many thorough bug reports and discussions.
|
|
@end ifinfo
|
|
|
|
@item
|
|
Karl Eichwalder---lots of help with internationalization and other
|
|
things.
|
|
|
|
@item
|
|
Junio Hamano---donated support for Opie and @sc{http} @code{Digest}
|
|
authentication.
|
|
|
|
@item
|
|
Brian Gough---a generous donation.
|
|
@end itemize
|
|
|
|
The following people have provided patches, bug/build reports, useful
|
|
suggestions, beta testing services, fan mail and all the other things
|
|
that make maintenance so much fun:
|
|
|
|
Tim Adam,
|
|
Martin Baehr,
|
|
Dieter Baron,
|
|
Roger Beeman and the Gurus at Cisco,
|
|
Dan Berger,
|
|
Mark Boyns,
|
|
John Burden,
|
|
Wanderlei Cavassin,
|
|
Gilles Cedoc,
|
|
Tim Charron,
|
|
Noel Cragg,
|
|
@iftex
|
|
Kristijan @v{C}onka@v{s},
|
|
@end iftex
|
|
@ifinfo
|
|
Kristijan Conkas,
|
|
@end ifinfo
|
|
Andrew Deryabin,
|
|
@iftex
|
|
Damir D@v{z}eko,
|
|
@end iftex
|
|
@ifinfo
|
|
Damir Dzeko,
|
|
@end ifinfo
|
|
Andrew Davison,
|
|
Ulrich Drepper,
|
|
Marc Duponcheel,
|
|
@iftex
|
|
Aleksandar Erkalovi@'{c},
|
|
@end iftex
|
|
@ifinfo
|
|
Aleksandar Erkalovic,
|
|
@end ifinfo
|
|
Andy Eskilsson,
|
|
Masashi Fujita,
|
|
Howard Gayle,
|
|
Marcel Gerrits,
|
|
Hans Grobler,
|
|
Mathieu Guillaume,
|
|
Dan Harkless,
|
|
Heiko Herold,
|
|
Karl Heuer,
|
|
HIROSE Masaaki,
|
|
Gregor Hoffleit,
|
|
Erik Magnus Hulthen,
|
|
Richard Huveneers,
|
|
Simon Josefsson,
|
|
@iftex
|
|
Mario Juri@'{c},
|
|
@end iftex
|
|
@ifinfo
|
|
Mario Juric,
|
|
@end ifinfo
|
|
@iftex
|
|
Goran Kezunovi@'{c},
|
|
@end iftex
|
|
@ifinfo
|
|
Goran Kezunovic,
|
|
@end ifinfo
|
|
Robert Kleine,
|
|
Fila Kolodny,
|
|
Alexander Kourakos,
|
|
Martin Kraemer,
|
|
@tex
|
|
$\Sigma\acute{\iota}\mu o\varsigma\;
|
|
\Xi\varepsilon\nu\iota\tau\acute{\epsilon}\lambda\lambda\eta\varsigma$
|
|
(Simos KSenitellis),
|
|
@end tex
|
|
@ifinfo
|
|
Simos KSenitellis,
|
|
@end ifinfo
|
|
Hrvoje Lacko,
|
|
Daniel S. Lewart,
|
|
Dave Love,
|
|
Jordan Mendelson,
|
|
Lin Zhe Min,
|
|
Charlie Negyesi,
|
|
Andrew Pollock,
|
|
Steve Pothier,
|
|
Jan Prikryl,
|
|
Marin Purgar,
|
|
Keith Refson,
|
|
Tobias Ringstrom,
|
|
@c Texinfo doesn't grok @'{@i}, so we have to use TeX itself.
|
|
@tex
|
|
Juan Jos\'{e} Rodr\'{\i}gues,
|
|
@end tex
|
|
@ifinfo
|
|
Juan Jose Rodrigues,
|
|
@end ifinfo
|
|
Edward J. Sabol,
|
|
Heinz Salzmann,
|
|
Robert Schmidt,
|
|
Toomas Soome,
|
|
Tage Stabell-Kulo,
|
|
Sven Sternberger,
|
|
Markus Strasser,
|
|
Szakacsits Szabolcs,
|
|
Mike Thomas,
|
|
Russell Vincent,
|
|
Charles G Waldman,
|
|
Douglas E. Wegscheid,
|
|
Jasmin Zainul,
|
|
@iftex
|
|
Bojan @v{Z}drnja,
|
|
@end iftex
|
|
@ifinfo
|
|
Bojan Zdrnja,
|
|
@end ifinfo
|
|
Kristijan Zimmer.
|
|
|
|
Apologies to all who I accidentally left out, and many thanks to all the
|
|
subscribers of the Wget mailing list.
|
|
|
|
@node Copying, Concept Index, Appendices, Top
|
|
@unnumbered GNU GENERAL PUBLIC LICENSE
|
|
@cindex copying
|
|
@cindex GPL
|
|
@center Version 2, June 1991
|
|
|
|
@display
|
|
Copyright @copyright{} 1989, 1991 Free Software Foundation, Inc.
|
|
675 Mass Ave, Cambridge, MA 02139, USA
|
|
|
|
Everyone is permitted to copy and distribute verbatim copies
|
|
of this license document, but changing it is not allowed.
|
|
@end display
|
|
|
|
@unnumberedsec Preamble
|
|
|
|
The licenses for most software are designed to take away your
|
|
freedom to share and change it. By contrast, the GNU General Public
|
|
License is intended to guarantee your freedom to share and change free
|
|
software---to make sure the software is free for all its users. This
|
|
General Public License applies to most of the Free Software
|
|
Foundation's software and to any other program whose authors commit to
|
|
using it. (Some other Free Software Foundation software is covered by
|
|
the GNU Library General Public License instead.) You can apply it to
|
|
your programs, too.
|
|
|
|
When we speak of free software, we are referring to freedom, not
|
|
price. Our General Public Licenses are designed to make sure that you
|
|
have the freedom to distribute copies of free software (and charge for
|
|
this service if you wish), that you receive source code or can get it
|
|
if you want it, that you can change the software or use pieces of it
|
|
in new free programs; and that you know you can do these things.
|
|
|
|
To protect your rights, we need to make restrictions that forbid
|
|
anyone to deny you these rights or to ask you to surrender the rights.
|
|
These restrictions translate to certain responsibilities for you if you
|
|
distribute copies of the software, or if you modify it.
|
|
|
|
For example, if you distribute copies of such a program, whether
|
|
gratis or for a fee, you must give the recipients all the rights that
|
|
you have. You must make sure that they, too, receive or can get the
|
|
source code. And you must show them these terms so they know their
|
|
rights.
|
|
|
|
We protect your rights with two steps: (1) copyright the software, and
|
|
(2) offer you this license which gives you legal permission to copy,
|
|
distribute and/or modify the software.
|
|
|
|
Also, for each author's protection and ours, we want to make certain
|
|
that everyone understands that there is no warranty for this free
|
|
software. If the software is modified by someone else and passed on, we
|
|
want its recipients to know that what they have is not the original, so
|
|
that any problems introduced by others will not reflect on the original
|
|
authors' reputations.
|
|
|
|
Finally, any free program is threatened constantly by software
|
|
patents. We wish to avoid the danger that redistributors of a free
|
|
program will individually obtain patent licenses, in effect making the
|
|
program proprietary. To prevent this, we have made it clear that any
|
|
patent must be licensed for everyone's free use or not licensed at all.
|
|
|
|
The precise terms and conditions for copying, distribution and
|
|
modification follow.
|
|
|
|
@iftex
|
|
@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
|
@end iftex
|
|
@ifinfo
|
|
@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
|
@end ifinfo
|
|
|
|
@enumerate
|
|
@item
|
|
This License applies to any program or other work which contains
|
|
a notice placed by the copyright holder saying it may be distributed
|
|
under the terms of this General Public License. The ``Program'', below,
|
|
refers to any such program or work, and a ``work based on the Program''
|
|
means either the Program or any derivative work under copyright law:
|
|
that is to say, a work containing the Program or a portion of it,
|
|
either verbatim or with modifications and/or translated into another
|
|
language. (Hereinafter, translation is included without limitation in
|
|
the term ``modification''.) Each licensee is addressed as ``you''.
|
|
|
|
Activities other than copying, distribution and modification are not
|
|
covered by this License; they are outside its scope. The act of
|
|
running the Program is not restricted, and the output from the Program
|
|
is covered only if its contents constitute a work based on the
|
|
Program (independent of having been made by running the Program).
|
|
Whether that is true depends on what the Program does.
|
|
|
|
@item
|
|
You may copy and distribute verbatim copies of the Program's
|
|
source code as you receive it, in any medium, provided that you
|
|
conspicuously and appropriately publish on each copy an appropriate
|
|
copyright notice and disclaimer of warranty; keep intact all the
|
|
notices that refer to this License and to the absence of any warranty;
|
|
and give any other recipients of the Program a copy of this License
|
|
along with the Program.
|
|
|
|
You may charge a fee for the physical act of transferring a copy, and
|
|
you may at your option offer warranty protection in exchange for a fee.
|
|
|
|
@item
|
|
You may modify your copy or copies of the Program or any portion
|
|
of it, thus forming a work based on the Program, and copy and
|
|
distribute such modifications or work under the terms of Section 1
|
|
above, provided that you also meet all of these conditions:
|
|
|
|
@enumerate a
|
|
@item
|
|
You must cause the modified files to carry prominent notices
|
|
stating that you changed the files and the date of any change.
|
|
|
|
@item
|
|
You must cause any work that you distribute or publish, that in
|
|
whole or in part contains or is derived from the Program or any
|
|
part thereof, to be licensed as a whole at no charge to all third
|
|
parties under the terms of this License.
|
|
|
|
@item
|
|
If the modified program normally reads commands interactively
|
|
when run, you must cause it, when started running for such
|
|
interactive use in the most ordinary way, to print or display an
|
|
announcement including an appropriate copyright notice and a
|
|
notice that there is no warranty (or else, saying that you provide
|
|
a warranty) and that users may redistribute the program under
|
|
these conditions, and telling the user how to view a copy of this
|
|
License. (Exception: if the Program itself is interactive but
|
|
does not normally print such an announcement, your work based on
|
|
the Program is not required to print an announcement.)
|
|
@end enumerate
|
|
|
|
These requirements apply to the modified work as a whole. If
|
|
identifiable sections of that work are not derived from the Program,
|
|
and can be reasonably considered independent and separate works in
|
|
themselves, then this License, and its terms, do not apply to those
|
|
sections when you distribute them as separate works. But when you
|
|
distribute the same sections as part of a whole which is a work based
|
|
on the Program, the distribution of the whole must be on the terms of
|
|
this License, whose permissions for other licensees extend to the
|
|
entire whole, and thus to each and every part regardless of who wrote it.
|
|
|
|
Thus, it is not the intent of this section to claim rights or contest
|
|
your rights to work written entirely by you; rather, the intent is to
|
|
exercise the right to control the distribution of derivative or
|
|
collective works based on the Program.
|
|
|
|
In addition, mere aggregation of another work not based on the Program
|
|
with the Program (or with a work based on the Program) on a volume of
|
|
a storage or distribution medium does not bring the other work under
|
|
the scope of this License.
|
|
|
|
@item
|
|
You may copy and distribute the Program (or a work based on it,
|
|
under Section 2) in object code or executable form under the terms of
|
|
Sections 1 and 2 above provided that you also do one of the following:
|
|
|
|
@enumerate a
|
|
@item
|
|
Accompany it with the complete corresponding machine-readable
|
|
source code, which must be distributed under the terms of Sections
|
|
1 and 2 above on a medium customarily used for software interchange; or,
|
|
|
|
@item
|
|
Accompany it with a written offer, valid for at least three
|
|
years, to give any third party, for a charge no more than your
|
|
cost of physically performing source distribution, a complete
|
|
machine-readable copy of the corresponding source code, to be
|
|
distributed under the terms of Sections 1 and 2 above on a medium
|
|
customarily used for software interchange; or,
|
|
|
|
@item
|
|
Accompany it with the information you received as to the offer
|
|
to distribute corresponding source code. (This alternative is
|
|
allowed only for noncommercial distribution and only if you
|
|
received the program in object code or executable form with such
|
|
an offer, in accord with Subsection b above.)
|
|
@end enumerate
|
|
|
|
The source code for a work means the preferred form of the work for
|
|
making modifications to it. For an executable work, complete source
|
|
code means all the source code for all modules it contains, plus any
|
|
associated interface definition files, plus the scripts used to
|
|
control compilation and installation of the executable. However, as a
|
|
special exception, the source code distributed need not include
|
|
anything that is normally distributed (in either source or binary
|
|
form) with the major components (compiler, kernel, and so on) of the
|
|
operating system on which the executable runs, unless that component
|
|
itself accompanies the executable.
|
|
|
|
If distribution of executable or object code is made by offering
|
|
access to copy from a designated place, then offering equivalent
|
|
access to copy the source code from the same place counts as
|
|
distribution of the source code, even though third parties are not
|
|
compelled to copy the source along with the object code.
|
|
|
|
@item
|
|
You may not copy, modify, sublicense, or distribute the Program
|
|
except as expressly provided under this License. Any attempt
|
|
otherwise to copy, modify, sublicense or distribute the Program is
|
|
void, and will automatically terminate your rights under this License.
|
|
However, parties who have received copies, or rights, from you under
|
|
this License will not have their licenses terminated so long as such
|
|
parties remain in full compliance.
|
|
|
|
@item
|
|
You are not required to accept this License, since you have not
|
|
signed it. However, nothing else grants you permission to modify or
|
|
distribute the Program or its derivative works. These actions are
|
|
prohibited by law if you do not accept this License. Therefore, by
|
|
modifying or distributing the Program (or any work based on the
|
|
Program), you indicate your acceptance of this License to do so, and
|
|
all its terms and conditions for copying, distributing or modifying
|
|
the Program or works based on it.
|
|
|
|
@item
|
|
Each time you redistribute the Program (or any work based on the
|
|
Program), the recipient automatically receives a license from the
|
|
original licensor to copy, distribute or modify the Program subject to
|
|
these terms and conditions. You may not impose any further
|
|
restrictions on the recipients' exercise of the rights granted herein.
|
|
You are not responsible for enforcing compliance by third parties to
|
|
this License.
|
|
|
|
@item
|
|
If, as a consequence of a court judgment or allegation of patent
|
|
infringement or for any other reason (not limited to patent issues),
|
|
conditions are imposed on you (whether by court order, agreement or
|
|
otherwise) that contradict the conditions of this License, they do not
|
|
excuse you from the conditions of this License. If you cannot
|
|
distribute so as to satisfy simultaneously your obligations under this
|
|
License and any other pertinent obligations, then as a consequence you
|
|
may not distribute the Program at all. For example, if a patent
|
|
license would not permit royalty-free redistribution of the Program by
|
|
all those who receive copies directly or indirectly through you, then
|
|
the only way you could satisfy both it and this License would be to
|
|
refrain entirely from distribution of the Program.
|
|
|
|
If any portion of this section is held invalid or unenforceable under
|
|
any particular circumstance, the balance of the section is intended to
|
|
apply and the section as a whole is intended to apply in other
|
|
circumstances.
|
|
|
|
It is not the purpose of this section to induce you to infringe any
|
|
patents or other property right claims or to contest validity of any
|
|
such claims; this section has the sole purpose of protecting the
|
|
integrity of the free software distribution system, which is
|
|
implemented by public license practices. Many people have made
|
|
generous contributions to the wide range of software distributed
|
|
through that system in reliance on consistent application of that
|
|
system; it is up to the author/donor to decide if he or she is willing
|
|
to distribute software through any other system and a licensee cannot
|
|
impose that choice.
|
|
|
|
This section is intended to make thoroughly clear what is believed to
|
|
be a consequence of the rest of this License.
|
|
|
|
@item
|
|
If the distribution and/or use of the Program is restricted in
|
|
certain countries either by patents or by copyrighted interfaces, the
|
|
original copyright holder who places the Program under this License
|
|
may add an explicit geographical distribution limitation excluding
|
|
those countries, so that distribution is permitted only in or among
|
|
countries not thus excluded. In such case, this License incorporates
|
|
the limitation as if written in the body of this License.
|
|
|
|
@item
|
|
The Free Software Foundation may publish revised and/or new versions
|
|
of the General Public License from time to time. Such new versions will
|
|
be similar in spirit to the present version, but may differ in detail to
|
|
address new problems or concerns.
|
|
|
|
Each version is given a distinguishing version number. If the Program
|
|
specifies a version number of this License which applies to it and ``any
|
|
later version'', you have the option of following the terms and conditions
|
|
either of that version or of any later version published by the Free
|
|
Software Foundation. If the Program does not specify a version number of
|
|
this License, you may choose any version ever published by the Free Software
|
|
Foundation.
|
|
|
|
@item
|
|
If you wish to incorporate parts of the Program into other free
|
|
programs whose distribution conditions are different, write to the author
|
|
to ask for permission. For software which is copyrighted by the Free
|
|
Software Foundation, write to the Free Software Foundation; we sometimes
|
|
make exceptions for this. Our decision will be guided by the two goals
|
|
of preserving the free status of all derivatives of our free software and
|
|
of promoting the sharing and reuse of software generally.
|
|
|
|
@iftex
|
|
@heading NO WARRANTY
|
|
@end iftex
|
|
@ifinfo
|
|
@center NO WARRANTY
|
|
@end ifinfo
|
|
@cindex no warranty
|
|
|
|
@item
|
|
BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
|
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
|
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
|
PROVIDE THE PROGRAM ``AS IS'' WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
|
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
|
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
|
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
|
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
|
REPAIR OR CORRECTION.
|
|
|
|
@item
|
|
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
|
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
|
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
|
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
|
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
|
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
|
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
|
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
|
POSSIBILITY OF SUCH DAMAGES.
|
|
@end enumerate
|
|
|
|
@iftex
|
|
@heading END OF TERMS AND CONDITIONS
|
|
@end iftex
|
|
@ifinfo
|
|
@center END OF TERMS AND CONDITIONS
|
|
@end ifinfo
|
|
|
|
@page
|
|
@unnumberedsec How to Apply These Terms to Your New Programs
|
|
|
|
If you develop a new program, and you want it to be of the greatest
|
|
possible use to the public, the best way to achieve this is to make it
|
|
free software which everyone can redistribute and change under these terms.
|
|
|
|
To do so, attach the following notices to the program. It is safest
|
|
to attach them to the start of each source file to most effectively
|
|
convey the exclusion of warranty; and each file should have at least
|
|
the ``copyright'' line and a pointer to where the full notice is found.
|
|
|
|
@smallexample
|
|
@var{one line to give the program's name and an idea of what it does.}
|
|
Copyright (C) 19@var{yy} @var{name of author}
|
|
|
|
This program is free software; you can redistribute it and/or
|
|
modify it under the terms of the GNU General Public License
|
|
as published by the Free Software Foundation; either version 2
|
|
of the License, or (at your option) any later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program; if not, write to the Free Software
|
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
|
@end smallexample
|
|
|
|
Also add information on how to contact you by electronic and paper mail.
|
|
|
|
If the program is interactive, make it output a short notice like this
|
|
when it starts in an interactive mode:
|
|
|
|
@smallexample
|
|
Gnomovision version 69, Copyright (C) 19@var{yy} @var{name of author}
|
|
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details
|
|
type `show w'. This is free software, and you are welcome
|
|
to redistribute it under certain conditions; type `show c'
|
|
for details.
|
|
@end smallexample
|
|
|
|
The hypothetical commands @samp{show w} and @samp{show c} should show
|
|
the appropriate parts of the General Public License. Of course, the
|
|
commands you use may be called something other than @samp{show w} and
|
|
@samp{show c}; they could even be mouse-clicks or menu items---whatever
|
|
suits your program.
|
|
|
|
You should also get your employer (if you work as a programmer) or your
|
|
school, if any, to sign a ``copyright disclaimer'' for the program, if
|
|
necessary. Here is a sample; alter the names:
|
|
|
|
@smallexample
|
|
@group
|
|
Yoyodyne, Inc., hereby disclaims all copyright
|
|
interest in the program `Gnomovision'
|
|
(which makes passes at compilers) written
|
|
by James Hacker.
|
|
|
|
@var{signature of Ty Coon}, 1 April 1989
|
|
Ty Coon, President of Vice
|
|
@end group
|
|
@end smallexample
|
|
|
|
This General Public License does not permit incorporating your program into
|
|
proprietary programs. If your program is a subroutine library, you may
|
|
consider it more useful to permit linking proprietary applications with the
|
|
library. If this is what you want to do, use the GNU Library General
|
|
Public License instead of this License.
|
|
|
|
@node Concept Index, , Copying, Top
|
|
@unnumbered Concept Index
|
|
@printindex cp
|
|
|
|
@contents
|
|
|
|
@bye
|