1
0
mirror of https://github.com/moparisthebest/wget synced 2024-07-03 16:38:41 -04:00

[svn] Tweaks and tag use improvements.

By Aaron S. Hawley.
This commit is contained in:
hniksic 2003-09-30 14:09:06 -07:00
parent c1f92cae25
commit 1c01316428
3 changed files with 120 additions and 116 deletions

View File

@ -1,3 +1,8 @@
2003-09-21 Aaron S. Hawley <Aaron.Hawley@uvm.edu>
* wget.texi: Split version to version.texi. Tweak documentation's
phrasing and markup.
2003-09-21 Hrvoje Niksic <hniksic@xemacs.org> 2003-09-21 Hrvoje Niksic <hniksic@xemacs.org>
* wget.texi: Documented the new timeout options. * wget.texi: Documented the new timeout options.

1
doc/version.texi Normal file
View File

@ -0,0 +1 @@
@set VERSION 1.9-cvs

View File

@ -2,7 +2,9 @@
@c %**start of header @c %**start of header
@setfilename wget.info @setfilename wget.info
@settitle GNU Wget Manual @include version.texi
@set UPDATED May 2003
@settitle GNU Wget @value{VERSION} Manual
@c Disable the monstrous rectangles beside overfull hbox-es. @c Disable the monstrous rectangles beside overfull hbox-es.
@finalout @finalout
@c Use `odd' to print double-sided. @c Use `odd' to print double-sided.
@ -19,18 +21,12 @@
@set Wget Wget @set Wget Wget
@c man title Wget The non-interactive network downloader. @c man title Wget The non-interactive network downloader.
@c This should really be generated automatically, possibly by including @dircategory Network Applications
@c an auto-generated file.
@set VERSION 1.9-cvs
@set UPDATED September 2003
@dircategory Net Utilities
@dircategory World Wide Web
@direntry @direntry
* Wget: (wget). The non-interactive network downloader. * Wget: (wget). The non-interactive network downloader.
@end direntry @end direntry
@ifinfo @ifnottex
This file documents the the GNU Wget utility for downloading network This file documents the the GNU Wget utility for downloading network
data. data.
@ -56,11 +52,11 @@ Documentation License'', with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the section Back-Cover Texts. A copy of the license is included in the section
entitled ``GNU Free Documentation License''. entitled ``GNU Free Documentation License''.
@c man end @c man end
@end ifinfo @end ifnottex
@titlepage @titlepage
@title GNU Wget @title GNU Wget @value{VERSION}
@subtitle The noninteractive downloading utility @subtitle The non-interactive download utility
@subtitle Updated for Wget @value{VERSION}, @value{UPDATED} @subtitle Updated for Wget @value{VERSION}, @value{UPDATED}
@author by Hrvoje Nik@v{s}i@'{c} and the developers @author by Hrvoje Nik@v{s}i@'{c} and the developers
@ -75,7 +71,7 @@ GNU Info entry for @file{wget}.
@page @page
@vskip 0pt plus 1filll @vskip 0pt plus 1filll
Copyright @copyright{} 1996, 1997, 1998, 2000, 2001 Free Software Copyright @copyright{} 1996, 1997, 1998, 2000, 2001, 2003 Free Software
Foundation, Inc. Foundation, Inc.
Permission is granted to copy, distribute and/or modify this document Permission is granted to copy, distribute and/or modify this document
@ -87,14 +83,14 @@ Back-Cover Texts. A copy of the license is included in the section
entitled ``GNU Free Documentation License''. entitled ``GNU Free Documentation License''.
@end titlepage @end titlepage
@ifinfo @ifnottex
@node Top, Overview, (dir), (dir) @node Top, Overview, (dir), (dir)
@top Wget @value{VERSION} @top Wget @value{VERSION}
This manual documents version @value{VERSION} of GNU Wget, the freely This manual documents version @value{VERSION} of GNU Wget, the freely
available utility for network download. available utility for network downloads.
Copyright @copyright{} 1996, 1997, 1998, 2000, 2001 Free Software Copyright @copyright{} 1996, 1997, 1998, 2000, 2001, 2003 Free Software
Foundation, Inc. Foundation, Inc.
@menu @menu
@ -110,7 +106,7 @@ Foundation, Inc.
* Copying:: You may give out copies of Wget and of this manual. * Copying:: You may give out copies of Wget and of this manual.
* Concept Index:: Topics covered by this manual. * Concept Index:: Topics covered by this manual.
@end menu @end menu
@end ifinfo @end ifnottex
@node Overview, Invoking, Top, Top @node Overview, Invoking, Top, Top
@chapter Overview @chapter Overview
@ -187,7 +183,7 @@ also supports the passive @sc{ftp} downloading as an option.
@sp 1 @sp 1
@item @item
Builtin features offer mechanisms to tune which links you wish to follow Built-in features offer mechanisms to tune which links you wish to follow
(@pxref{Following Links}). (@pxref{Following Links}).
@sp 1 @sp 1
@ -632,7 +628,7 @@ servers that support the @code{Range} header.
Select the type of the progress indicator you wish to use. Legal Select the type of the progress indicator you wish to use. Legal
indicators are ``dot'' and ``bar''. indicators are ``dot'' and ``bar''.
The ``bar'' indicator is used by default. It draws an ASCII progress The ``bar'' indicator is used by default. It draws an @sc{ascii} progress
bar graphics (a.k.a ``thermometer'' display) indicating the status of bar graphics (a.k.a ``thermometer'' display) indicating the status of
retrieval. If the output is not a TTY, the ``dot'' bar will be used by retrieval. If the output is not a TTY, the ``dot'' bar will be used by
default. default.
@ -672,19 +668,19 @@ Print the headers sent by @sc{http} servers and responses sent by
@item --spider @item --spider
When invoked with this option, Wget will behave as a Web @dfn{spider}, When invoked with this option, Wget will behave as a Web @dfn{spider},
which means that it will not download the pages, just check that they which means that it will not download the pages, just check that they
are there. You can use it to check your bookmarks, e.g. with: are there. For example, you can use Wget to check your bookmarks:
@example @example
wget --spider --force-html -i bookmarks.html wget --spider --force-html -i bookmarks.html
@end example @end example
This feature needs much more work for Wget to get close to the This feature needs much more work for Wget to get close to the
functionality of real @sc{www} spiders. functionality of real web spiders.
@cindex timeout @cindex timeout
@item -T seconds @item -T seconds
@itemx --timeout=@var{seconds} @itemx --timeout=@var{seconds}
Set the network timeouts to @var{seconds} seconds. This is equivalent Set the network timeout to @var{seconds} seconds. This is equivalent
to specifying @samp{--dns-timeout}, @samp{--connect-timeout}, and to specifying @samp{--dns-timeout}, @samp{--connect-timeout}, and
@samp{--read-timeout}, all at the same time. @samp{--read-timeout}, all at the same time.
@ -950,7 +946,7 @@ downloaded and the URL does not end with the regexp
to be appended to the local filename. This is useful, for instance, when to be appended to the local filename. This is useful, for instance, when
you're mirroring a remote site that uses @samp{.asp} pages, but you want you're mirroring a remote site that uses @samp{.asp} pages, but you want
the mirrored pages to be viewable on your stock Apache server. Another the mirrored pages to be viewable on your stock Apache server. Another
good use for this is when you're downloading the output of CGIs. A URL good use for this is when you're downloading CGI-generated materials. A URL
like @samp{http://site.com/article.cgi?25} will be saved as like @samp{http://site.com/article.cgi?25} will be saved as
@file{article.cgi?25.html}. @file{article.cgi?25.html}.
@ -1217,7 +1213,7 @@ recurse through them, but in the future it should be enhanced to do
this. this.
Note that when retrieving a file (not a directory) because it was Note that when retrieving a file (not a directory) because it was
specified on the commandline, rather than because it was recursed to, specified on the command-line, rather than because it was recursed to,
this option has no effect. Symbolic links are always traversed in this this option has no effect. Symbolic links are always traversed in this
case. case.
@end table @end table
@ -1264,7 +1260,7 @@ created in the first place.
After the download is complete, convert the links in the document to After the download is complete, convert the links in the document to
make them suitable for local viewing. This affects not only the visible make them suitable for local viewing. This affects not only the visible
hyperlinks, but any part of the document that links to external content, hyperlinks, but any part of the document that links to external content,
such as embedded images, links to style sheets, hyperlinks to non-HTML such as embedded images, links to style sheets, hyperlinks to non-@sc{html}
content, etc. content, etc.
Each link will be changed in one of the two ways: Each link will be changed in one of the two ways:
@ -1319,10 +1315,10 @@ directory listings. It is currently equivalent to
@item -p @item -p
@itemx --page-requisites @itemx --page-requisites
This option causes Wget to download all the files that are necessary to This option causes Wget to download all the files that are necessary to
properly display a given HTML page. This includes such things as properly display a given @sc{html} page. This includes such things as
inlined images, sounds, and referenced stylesheets. inlined images, sounds, and referenced stylesheets.
Ordinarily, when downloading a single HTML page, any requisite documents Ordinarily, when downloading a single @sc{html} page, any requisite documents
that may be needed to display it properly are not downloaded. Using that may be needed to display it properly are not downloaded. Using
@samp{-r} together with @samp{-l} can help, but since Wget does not @samp{-r} together with @samp{-l} can help, but since Wget does not
ordinarily distinguish between external and inlined documents, one is ordinarily distinguish between external and inlined documents, one is
@ -1367,8 +1363,8 @@ wget -r -l 0 -p http://@var{site}/1.html
would download just @file{1.html} and @file{1.gif}, but unfortunately would download just @file{1.html} and @file{1.gif}, but unfortunately
this is not the case, because @samp{-l 0} is equivalent to this is not the case, because @samp{-l 0} is equivalent to
@samp{-l inf}---that is, infinite recursion. To download a single HTML @samp{-l inf}---that is, infinite recursion. To download a single @sc{html}
page (or a handful of them, all specified on the commandline or in a page (or a handful of them, all specified on the command-line or in a
@samp{-i} @sc{url} input file) and its (or their) requisites, simply leave off @samp{-i} @sc{url} input file) and its (or their) requisites, simply leave off
@samp{-r} and @samp{-l}: @samp{-r} and @samp{-l}:
@ -1392,21 +1388,21 @@ external document link is any URL specified in an @code{<A>} tag, an
@code{<AREA>} tag, or a @code{<LINK>} tag other than @code{<LINK @code{<AREA>} tag, or a @code{<LINK>} tag other than @code{<LINK
REL="stylesheet">}. REL="stylesheet">}.
@cindex HTML comments @cindex @sc{html} comments
@cindex comments, HTML @cindex comments, @sc{html}
@item --strict-comments @item --strict-comments
Turn on strict parsing of HTML comments. The default is to terminate Turn on strict parsing of @sc{html} comments. The default is to terminate
comments at the first occurrence of @samp{-->}. comments at the first occurrence of @samp{-->}.
According to specifications, HTML comments are expressed as SGML According to specifications, @sc{html} comments are expressed as @sc{sgml}
@dfn{declarations}. Declaration is special markup that begins with @dfn{declarations}. Declaration is special markup that begins with
@samp{<!} and ends with @samp{>}, such as @samp{<!DOCTYPE ...>}, that @samp{<!} and ends with @samp{>}, such as @samp{<!DOCTYPE ...>}, that
may contain comments between a pair of @samp{--} delimiters. HTML may contain comments between a pair of @samp{--} delimiters. @sc{html}
comments are ``empty declarations'', SGML declarations without any comments are ``empty declarations'', @sc{sgml} declarations without any
non-comment text. Therefore, @samp{<!--foo-->} is a valid comment, and non-comment text. Therefore, @samp{<!--foo-->} is a valid comment, and
so is @samp{<!--one-- --two-->}, but @samp{<!--1--2-->} is not. so is @samp{<!--one-- --two-->}, but @samp{<!--1--2-->} is not.
On the other hand, most HTML writers don't perceive comments as anything On the other hand, most @sc{html} writers don't perceive comments as anything
other than text delimited with @samp{<!--} and @samp{-->}, which is not other than text delimited with @samp{<!--} and @samp{-->}, which is not
quite the same. For example, something like @samp{<!------------>} quite the same. For example, something like @samp{<!------------>}
works as a valid comment as long as the number of dashes is a multiple works as a valid comment as long as the number of dashes is a multiple
@ -1452,7 +1448,7 @@ Wget will ignore all the @sc{ftp} links.
@cindex tag-based recursive pruning @cindex tag-based recursive pruning
@item --follow-tags=@var{list} @item --follow-tags=@var{list}
Wget has an internal table of HTML tag / attribute pairs that it Wget has an internal table of @sc{html} tag / attribute pairs that it
considers when looking for linked documents during a recursive considers when looking for linked documents during a recursive
retrieval. If a user wants only a subset of those tags to be retrieval. If a user wants only a subset of those tags to be
considered, however, he or she should be specify such tags in a considered, however, he or she should be specify such tags in a
@ -1461,11 +1457,11 @@ comma-separated @var{list} with this option.
@item -G @var{list} @item -G @var{list}
@itemx --ignore-tags=@var{list} @itemx --ignore-tags=@var{list}
This is the opposite of the @samp{--follow-tags} option. To skip This is the opposite of the @samp{--follow-tags} option. To skip
certain HTML tags when recursively looking for documents to download, certain @sc{html} tags when recursively looking for documents to download,
specify them in a comma-separated @var{list}. specify them in a comma-separated @var{list}.
In the past, the @samp{-G} option was the best bet for downloading a In the past, the @samp{-G} option was the best bet for downloading a
single page and its requisites, using a commandline like: single page and its requisites, using a command-line like:
@example @example
wget -Ga,area -H -k -K -r http://@var{site}/@var{document} wget -Ga,area -H -k -K -r http://@var{site}/@var{document}
@ -1519,18 +1515,18 @@ This is a useful option, since it guarantees that only the files
GNU Wget is capable of traversing parts of the Web (or a single GNU Wget is capable of traversing parts of the Web (or a single
@sc{http} or @sc{ftp} server), following links and directory structure. @sc{http} or @sc{ftp} server), following links and directory structure.
We refer to this as to @dfn{recursive retrieving}, or @dfn{recursion}. We refer to this as to @dfn{recursive retrieval}, or @dfn{recursion}.
With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} from With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} from
the given @sc{url}, documents, retrieving the files the @sc{html} the given @sc{url}, documents, retrieving the files the @sc{html}
document was referring to, through markups like @code{href}, or document was referring to, through markup like @code{href}, or
@code{src}. If the freshly downloaded file is also of type @code{src}. If the freshly downloaded file is also of type
@code{text/html} or @code{application/xhtml+xml}, it will be parsed and @code{text/html} or @code{application/xhtml+xml}, it will be parsed and
followed further. followed further.
Recursive retrieval of @sc{http} and @sc{html} content is Recursive retrieval of @sc{http} and @sc{html} content is
@dfn{breadth-first}. This means that Wget first downloads the requested @dfn{breadth-first}. This means that Wget first downloads the requested
HTML document, then the documents linked from that document, then the @sc{html} document, then the documents linked from that document, then the
documents linked by them, and so on. In other words, Wget first documents linked by them, and so on. In other words, Wget first
downloads the documents at depth 1, then those at depth 2, and so on downloads the documents at depth 1, then those at depth 2, and so on
until the specified maximum depth. until the specified maximum depth.
@ -1615,7 +1611,7 @@ your Wget into a small version of google.
However, visiting different hosts, or @dfn{host spanning,} is sometimes However, visiting different hosts, or @dfn{host spanning,} is sometimes
a useful option. Maybe the images are served from a different server. a useful option. Maybe the images are served from a different server.
Maybe you're mirroring a site that consists of pages interlinked between Maybe you're mirroring a site that consists of pages interlinked between
three servers. Maybe the server has two equivalent names, and the HTML three servers. Maybe the server has two equivalent names, and the @sc{html}
pages refer to both interchangeably. pages refer to both interchangeably.
@table @asis @table @asis
@ -2101,7 +2097,7 @@ after the @samp{=}. Simple Boolean values can be set or unset using
Boolean allowed in some cases is the @dfn{lockable Boolean}, which may Boolean allowed in some cases is the @dfn{lockable Boolean}, which may
be set to @samp{on}, @samp{off}, @samp{always}, or @samp{never}. If an be set to @samp{on}, @samp{off}, @samp{always}, or @samp{never}. If an
option is set to @samp{always} or @samp{never}, that value will be option is set to @samp{always} or @samp{never}, that value will be
locked in for the duration of the Wget invocation---commandline options locked in for the duration of the Wget invocation---command-line options
will not override. will not override.
Some commands take pseudo-arbitrary values. @var{address} values can be Some commands take pseudo-arbitrary values. @var{address} values can be
@ -2109,7 +2105,7 @@ hostnames or dotted-quad IP addresses. @var{n} can be any positive
integer, or @samp{inf} for infinity, where appropriate. @var{string} integer, or @samp{inf} for infinity, where appropriate. @var{string}
values can be any non-empty string. values can be any non-empty string.
Most of these commands have commandline equivalents (@pxref{Invoking}), Most of these commands have command-line equivalents (@pxref{Invoking}),
though some of the more obscure or rarely used ones do not. though some of the more obscure or rarely used ones do not.
@table @asis @table @asis
@ -2213,7 +2209,7 @@ Follow @sc{ftp} links from @sc{html} documents---the same as
@samp{--follow-ftp}. @samp{--follow-ftp}.
@item follow_tags = @var{string} @item follow_tags = @var{string}
Only follow certain HTML tags when doing a recursive retrieval, just like Only follow certain @sc{html} tags when doing a recursive retrieval, just like
@samp{--follow-tags}. @samp{--follow-tags}.
@item force_html = on/off @item force_html = on/off
@ -2250,7 +2246,7 @@ When set to on, ignore @code{Content-Length} header; the same as
@samp{--ignore-length}. @samp{--ignore-length}.
@item ignore_tags = @var{string} @item ignore_tags = @var{string}
Ignore certain HTML tags when doing a recursive retrieval, just like Ignore certain @sc{html} tags when doing a recursive retrieval, just like
@samp{-G} / @samp{--ignore-tags}. @samp{-G} / @samp{--ignore-tags}.
@item include_directories = @var{string} @item include_directories = @var{string}
@ -2262,7 +2258,7 @@ Read the @sc{url}s from @var{string}, like @samp{-i}.
@item kill_longer = on/off @item kill_longer = on/off
Consider data longer than specified in content-length header as invalid Consider data longer than specified in content-length header as invalid
(and retry getting it). The default behaviour is to save as much data (and retry getting it). The default behavior is to save as much data
as there is, provided there is more than or equal to the value in as there is, provided there is more than or equal to the value in
@code{Content-Length}. @code{Content-Length}.
@ -2298,14 +2294,14 @@ proxy loading, instead of the one specified in environment.
Set the output filename---the same as @samp{-O}. Set the output filename---the same as @samp{-O}.
@item page_requisites = on/off @item page_requisites = on/off
Download all ancillary documents necessary for a single HTML page to Download all ancillary documents necessary for a single @sc{html} page to
display properly---the same as @samp{-p}. display properly---the same as @samp{-p}.
@item passive_ftp = on/off/always/never @item passive_ftp = on/off/always/never
Set passive @sc{ftp}---the same as @samp{--passive-ftp}. Some scripts Set passive @sc{ftp}---the same as @samp{--passive-ftp}. Some scripts
and @samp{.pm} (Perl module) files download files using @samp{wget and @samp{.pm} (Perl module) files download files using @samp{wget
--passive-ftp}. If your firewall does not allow this, you can set --passive-ftp}. If your firewall does not allow this, you can set
@samp{passive_ftp = never} to override the commandline. @samp{passive_ftp = never} to override the command-line.
@item passwd = @var{string} @item passwd = @var{string}
Set your @sc{ftp} password to @var{password}. Without this setting, the Set your @sc{ftp} password to @var{password}. Without this setting, the
@ -2525,7 +2521,7 @@ wget --convert-links -r http://www.gnu.org/ -o gnulog
@end example @end example
@item @item
Retrieve only one HTML page, but make sure that all the elements needed Retrieve only one @sc{html} page, but make sure that all the elements needed
for the page to be displayed, such as inline images and external style for the page to be displayed, such as inline images and external style
sheets, are also downloaded. Also make sure the downloaded page sheets, are also downloaded. Also make sure the downloaded page
references the downloaded links. references the downloaded links.
@ -2534,7 +2530,7 @@ references the downloaded links.
wget -p --convert-links http://www.server.com/dir/page.html wget -p --convert-links http://www.server.com/dir/page.html
@end example @end example
The HTML page will be saved to @file{www.server.com/dir/page.html}, and The @sc{html} page will be saved to @file{www.server.com/dir/page.html}, and
the images, stylesheets, etc., somewhere under @file{www.server.com/}, the images, stylesheets, etc., somewhere under @file{www.server.com/},
depending on where they were on the remote server. depending on where they were on the remote server.
@ -2648,7 +2644,7 @@ crontab
In addition to the above, you want the links to be converted for local In addition to the above, you want the links to be converted for local
viewing. But, after having read this manual, you know that link viewing. But, after having read this manual, you know that link
conversion doesn't play well with timestamping, so you also want Wget to conversion doesn't play well with timestamping, so you also want Wget to
back up the original HTML files before the conversion. Wget invocation back up the original @sc{html} files before the conversion. Wget invocation
would look like this: would look like this:
@example @example
@ -2658,7 +2654,7 @@ wget --mirror --convert-links --backup-converted \
@item @item
But you've also noticed that local viewing doesn't work all that well But you've also noticed that local viewing doesn't work all that well
when HTML files are saved under extensions other than @samp{.html}, when @sc{html} files are saved under extensions other than @samp{.html},
perhaps because they were served as @file{index.cgi}. So you'd like perhaps because they were served as @file{index.cgi}. So you'd like
Wget to rename all the files served with content-type @samp{text/html} Wget to rename all the files served with content-type @samp{text/html}
or @samp{application/xhtml+xml} to @file{@var{name}.html}. or @samp{application/xhtml+xml} to @file{@var{name}.html}.
@ -2787,9 +2783,8 @@ features and web, reporting Wget bugs (those that you think may be of
interest to the public) and mailing announcements. You are welcome to interest to the public) and mailing announcements. You are welcome to
subscribe. The more people on the list, the better! subscribe. The more people on the list, the better!
To subscribe, send mail to @email{wget-subscribe@@sunsite.dk}. To subscribe, simply send mail to @email{wget-subscribe@@sunsite.dk}.
the magic word @samp{subscribe} in the subject line. Unsubscribe by Unsubscribe by mailing to @email{wget-unsubscribe@@sunsite.dk}.
mailing to @email{wget-unsubscribe@@sunsite.dk}.
The mailing list is archived at @url{http://fly.srk.fer.hr/archive/wget}. The mailing list is archived at @url{http://fly.srk.fer.hr/archive/wget}.
Alternative archive is available at Alternative archive is available at
@ -2810,7 +2805,7 @@ simple guidelines.
@enumerate @enumerate
@item @item
Please try to ascertain that the behaviour you see really is a bug. If Please try to ascertain that the behavior you see really is a bug. If
Wget crashes, it's a bug. If Wget does not behave as documented, Wget crashes, it's a bug. If Wget does not behave as documented,
it's a bug. If things work strange, but you are not sure about the way it's a bug. If things work strange, but you are not sure about the way
they are supposed to work, it might well be a bug. they are supposed to work, it might well be a bug.
@ -2914,25 +2909,28 @@ As long as Wget is only retrieving static pages, and doing it at a
reasonable rate (see the @samp{--wait} option), there's not much of a reasonable rate (see the @samp{--wait} option), there's not much of a
problem. The trouble is that Wget can't tell the difference between the problem. The trouble is that Wget can't tell the difference between the
smallest static page and the most demanding CGI. A site I know has a smallest static page and the most demanding CGI. A site I know has a
section handled by an, uh, @dfn{bitchin'} CGI Perl script that converts section handled by a CGI Perl script that converts Info files to @sc{html} on
Info files to HTML on the fly. The script is slow, but works well the fly. The script is slow, but works well enough for human users
enough for human users viewing an occasional Info file. However, when viewing an occasional Info file. However, when someone's recursive Wget
someone's recursive Wget download stumbles upon the index page that download stumbles upon the index page that links to all the Info files
links to all the Info files through the script, the system is brought to through the script, the system is brought to its knees without providing
its knees without providing anything useful to the downloader. anything useful to the user (This task of converting Info files could be
done locally and access to Info documentation for all installed GNU
software on a system is available from the @code{info} command).
To avoid this kind of accident, as well as to preserve privacy for To avoid this kind of accident, as well as to preserve privacy for
documents that need to be protected from well-behaved robots, the documents that need to be protected from well-behaved robots, the
concept of @dfn{robot exclusion} has been invented. The idea is that concept of @dfn{robot exclusion} was invented. The idea is that
the server administrators and document authors can specify which the server administrators and document authors can specify which
portions of the site they wish to protect from the robots. portions of the site they wish to protect from robots and those
they will permit access.
The most popular mechanism, and the de facto standard supported by all The most popular mechanism, and the @i{de facto} standard supported by
the major robots, is the ``Robots Exclusion Standard'' (RES) written by all the major robots, is the ``Robots Exclusion Standard'' (RES) written
Martijn Koster et al. in 1994. It specifies the format of a text file by Martijn Koster et al. in 1994. It specifies the format of a text
containing directives that instruct the robots which URL paths to avoid. file containing directives that instruct the robots which URL paths to
To be found by the robots, the specifications must be placed in avoid. To be found by the robots, the specifications must be placed in
@file{/robots.txt} in the server root, which the robots are supposed to @file{/robots.txt} in the server root, which the robots are expected to
download and parse. download and parse.
Although Wget is not a web robot in the strictest sense of the word, it Although Wget is not a web robot in the strictest sense of the word, it
@ -3018,9 +3016,9 @@ me).
@iftex @iftex
GNU Wget was written by Hrvoje Nik@v{s}i@'{c} @email{hniksic@@arsdigita.com}. GNU Wget was written by Hrvoje Nik@v{s}i@'{c} @email{hniksic@@arsdigita.com}.
@end iftex @end iftex
@ifinfo @ifnottex
GNU Wget was written by Hrvoje Niksic @email{hniksic@@arsdigita.com}. GNU Wget was written by Hrvoje Niksic @email{hniksic@@arsdigita.com}.
@end ifinfo @end ifnottex
However, its development could never have gone as far as it has, were it However, its development could never have gone as far as it has, were it
not for the help of many people, either with bug reports, feature not for the help of many people, either with bug reports, feature
proposals, patches, or letters saying ``Thanks!''. proposals, patches, or letters saying ``Thanks!''.
@ -3048,10 +3046,10 @@ Gordon Matzigkeit---@file{.netrc} support.
Zlatko @v{C}alu@v{s}i@'{c}, Tomislav Vujec and Dra@v{z}en Zlatko @v{C}alu@v{s}i@'{c}, Tomislav Vujec and Dra@v{z}en
Ka@v{c}ar---feature suggestions and ``philosophical'' discussions. Ka@v{c}ar---feature suggestions and ``philosophical'' discussions.
@end iftex @end iftex
@ifinfo @ifnottex
Zlatko Calusic, Tomislav Vujec and Drazen Kacar---feature suggestions Zlatko Calusic, Tomislav Vujec and Drazen Kacar---feature suggestions
and ``philosophical'' discussions. and ``philosophical'' discussions.
@end ifinfo @end ifnottex
@item @item
Darko Budor---initial port to Windows. Darko Budor---initial port to Windows.
@ -3064,17 +3062,17 @@ Antonio Rosella---help and suggestions, plus the Italian translation.
Tomislav Petrovi@'{c}, Mario Miko@v{c}evi@'{c}---many bug reports and Tomislav Petrovi@'{c}, Mario Miko@v{c}evi@'{c}---many bug reports and
suggestions. suggestions.
@end iftex @end iftex
@ifinfo @ifnottex
Tomislav Petrovic, Mario Mikocevic---many bug reports and suggestions. Tomislav Petrovic, Mario Mikocevic---many bug reports and suggestions.
@end ifinfo @end ifnottex
@item @item
@iftex @iftex
Fran@,{c}ois Pinard---many thorough bug reports and discussions. Fran@,{c}ois Pinard---many thorough bug reports and discussions.
@end iftex @end iftex
@ifinfo @ifnottex
Francois Pinard---many thorough bug reports and discussions. Francois Pinard---many thorough bug reports and discussions.
@end ifinfo @end ifnottex
@item @item
Karl Eichwalder---lots of help with internationalization and other Karl Eichwalder---lots of help with internationalization and other
@ -3112,9 +3110,9 @@ Noel Cragg,
@iftex @iftex
Kristijan @v{C}onka@v{s}, Kristijan @v{C}onka@v{s},
@end iftex @end iftex
@ifinfo @ifnottex
Kristijan Conkas, Kristijan Conkas,
@end ifinfo @end ifnottex
John Daily, John Daily,
Andrew Davison, Andrew Davison,
Andrew Deryabin, Andrew Deryabin,
@ -3123,16 +3121,16 @@ Marc Duponcheel,
@iftex @iftex
Damir D@v{z}eko, Damir D@v{z}eko,
@end iftex @end iftex
@ifinfo @ifnottex
Damir Dzeko, Damir Dzeko,
@end ifinfo @end ifnottex
Alan Eldridge, Alan Eldridge,
@iftex @iftex
Aleksandar Erkalovi@'{c}, Aleksandar Erkalovi@'{c},
@end iftex @end iftex
@ifinfo @ifnottex
Aleksandar Erkalovic, Aleksandar Erkalovic,
@end ifinfo @end ifnottex
Andy Eskilsson, Andy Eskilsson,
Christian Fraenkel, Christian Fraenkel,
Masashi Fujita, Masashi Fujita,
@ -3154,22 +3152,22 @@ Simon Josefsson,
@iftex @iftex
Mario Juri@'{c}, Mario Juri@'{c},
@end iftex @end iftex
@ifinfo @ifnottex
Mario Juric, Mario Juric,
@end ifinfo @end ifnottex
@iftex @iftex
Hack Kampbj@o rn, Hack Kampbj@o rn,
@end iftex @end iftex
@ifinfo @ifnottex
Hack Kampbjorn, Hack Kampbjorn,
@end ifinfo @end ifnottex
Const Kaplinsky, Const Kaplinsky,
@iftex @iftex
Goran Kezunovi@'{c}, Goran Kezunovi@'{c},
@end iftex @end iftex
@ifinfo @ifnottex
Goran Kezunovic, Goran Kezunovic,
@end ifinfo @end ifnottex
Robert Kleine, Robert Kleine,
KOJIMA Haime, KOJIMA Haime,
Fila Kolodny, Fila Kolodny,
@ -3180,17 +3178,17 @@ $\Sigma\acute{\iota}\mu o\varsigma\;
\Xi\varepsilon\nu\iota\tau\acute{\epsilon}\lambda\lambda\eta\varsigma$ \Xi\varepsilon\nu\iota\tau\acute{\epsilon}\lambda\lambda\eta\varsigma$
(Simos KSenitellis), (Simos KSenitellis),
@end tex @end tex
@ifinfo @ifnottex
Simos KSenitellis, Simos KSenitellis,
@end ifinfo @end ifnottex
Hrvoje Lacko, Hrvoje Lacko,
Daniel S. Lewart, Daniel S. Lewart,
@iftex @iftex
Nicol@'{a}s Lichtmeier, Nicol@'{a}s Lichtmeier,
@end iftex @end iftex
@ifinfo @ifnottex
Nicolas Lichtmeier, Nicolas Lichtmeier,
@end ifinfo @end ifnottex
Dave Love, Dave Love,
Alexander V. Lukyanov, Alexander V. Lukyanov,
Jordan Mendelson, Jordan Mendelson,
@ -3204,16 +3202,16 @@ Steve Pothier,
@iftex @iftex
Jan P@v{r}ikryl, Jan P@v{r}ikryl,
@end iftex @end iftex
@ifinfo @ifnottex
Jan Prikryl, Jan Prikryl,
@end ifinfo @end ifnottex
Marin Purgar, Marin Purgar,
@iftex @iftex
Csaba R@'{a}duly, Csaba R@'{a}duly,
@end iftex @end iftex
@ifinfo @ifnottex
Csaba Raduly, Csaba Raduly,
@end ifinfo @end ifnottex
Keith Refson, Keith Refson,
Tyler Riddle, Tyler Riddle,
Tobias Ringstrom, Tobias Ringstrom,
@ -3221,9 +3219,9 @@ Tobias Ringstrom,
@tex @tex
Juan Jos\'{e} Rodr\'{\i}gues, Juan Jos\'{e} Rodr\'{\i}gues,
@end tex @end tex
@ifinfo @ifnottex
Juan Jose Rodrigues, Juan Jose Rodrigues,
@end ifinfo @end ifnottex
Edward J. Sabol, Edward J. Sabol,
Heinz Salzmann, Heinz Salzmann,
Robert Schmidt, Robert Schmidt,
@ -3245,9 +3243,9 @@ Jasmin Zainul,
@iftex @iftex
Bojan @v{Z}drnja, Bojan @v{Z}drnja,
@end iftex @end iftex
@ifinfo @ifnottex
Bojan Zdrnja, Bojan Zdrnja,
@end ifinfo @end ifnottex
Kristijan Zimmer. Kristijan Zimmer.
Apologies to all who I accidentally left out, and many thanks to all the Apologies to all who I accidentally left out, and many thanks to all the
@ -3388,9 +3386,9 @@ modification follow.
@iftex @iftex
@unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION @unnumberedsec TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
@end iftex @end iftex
@ifinfo @ifnottex
@center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION @center TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
@end ifinfo @end ifnottex
@enumerate @enumerate
@item @item
@ -3613,9 +3611,9 @@ of promoting the sharing and reuse of software generally.
@iftex @iftex
@heading NO WARRANTY @heading NO WARRANTY
@end iftex @end iftex
@ifinfo @ifnottex
@center NO WARRANTY @center NO WARRANTY
@end ifinfo @end ifnottex
@cindex no warranty @cindex no warranty
@item @item
@ -3644,9 +3642,9 @@ POSSIBILITY OF SUCH DAMAGES.
@iftex @iftex
@heading END OF TERMS AND CONDITIONS @heading END OF TERMS AND CONDITIONS
@end iftex @end iftex
@ifinfo @ifnottex
@center END OF TERMS AND CONDITIONS @center END OF TERMS AND CONDITIONS
@end ifinfo @end ifnottex
@page @page
@unnumberedsec How to Apply These Terms to Your New Programs @unnumberedsec How to Apply These Terms to Your New Programs
@ -3803,13 +3801,13 @@ subsequent modification by readers is not Transparent. A copy that is
not ``Transparent'' is called ``Opaque''. not ``Transparent'' is called ``Opaque''.
Examples of suitable formats for Transparent copies include plain Examples of suitable formats for Transparent copies include plain
ASCII without markup, Texinfo input format, LaTeX input format, SGML @sc{ascii} without markup, Texinfo input format, LaTeX input format, @sc{sgml}
or XML using a publicly available DTD, and standard-conforming simple or @sc{xml} using a publicly available @sc{dtd}, and standard-conforming simple
HTML designed for human modification. Opaque formats include @sc{html} designed for human modification. Opaque formats include
PostScript, PDF, proprietary formats that can be read and edited only PostScript, @sc{pdf}, proprietary formats that can be read and edited only
by proprietary word processors, SGML or XML for which the DTD and/or by proprietary word processors, @sc{sgml} or @sc{xml} for which the @sc{dtd} and/or
processing tools are not generally available, and the processing tools are not generally available, and the
machine-generated HTML produced by some word processors for output machine-generated @sc{html} produced by some word processors for output
purposes only. purposes only.
The ``Title Page'' means, for a printed book, the title page itself, The ``Title Page'' means, for a printed book, the title page itself,