mirror of
https://github.com/moparisthebest/wget
synced 2024-07-03 16:38:41 -04:00
[svn] Examples section of the documentation revamped.
Include EXAMPLES in the man page.
This commit is contained in:
parent
171feaa3f2
commit
5379abeee0
@ -1,3 +1,11 @@
|
|||||||
|
2001-12-08 Hrvoje Niksic <hniksic@arsdigita.com>
|
||||||
|
|
||||||
|
* texi2pod.pl: Include the EXAMPLES section.
|
||||||
|
|
||||||
|
* wget.texi (Overview): Shorten the man page DESCRIPTION.
|
||||||
|
(Examples): Redo the Examples chapter. Include it in the man
|
||||||
|
page.
|
||||||
|
|
||||||
2001-12-01 Hrvoje Niksic <hniksic@arsdigita.com>
|
2001-12-01 Hrvoje Niksic <hniksic@arsdigita.com>
|
||||||
|
|
||||||
* wget.texi: Update the manual with the new recursive retrieval
|
* wget.texi: Update the manual with the new recursive retrieval
|
||||||
|
284
doc/wget.texi
284
doc/wget.texi
@ -112,14 +112,16 @@ Foundation, Inc.
|
|||||||
@cindex features
|
@cindex features
|
||||||
|
|
||||||
@c man begin DESCRIPTION
|
@c man begin DESCRIPTION
|
||||||
GNU Wget is a freely available network utility to retrieve files from
|
GNU Wget is a free utility for non-interactive download of files from
|
||||||
the World Wide Web, using @sc{http} (Hyper Text Transfer Protocol) and
|
the Web. It supports @sc{http}, @sc{https}, and @sc{ftp} protocols, as
|
||||||
@sc{ftp} (File Transfer Protocol), the two most widely used Internet
|
well as retrieval through @sc{http} proxies.
|
||||||
protocols. It has many useful features to make downloading easier, some
|
|
||||||
of them being:
|
@c man end
|
||||||
|
This chapter is a partial overview of Wget's features.
|
||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item
|
@item
|
||||||
|
@c man begin DESCRIPTION
|
||||||
Wget is non-interactive, meaning that it can work in the background,
|
Wget is non-interactive, meaning that it can work in the background,
|
||||||
while the user is not logged on. This allows you to start a retrieval
|
while the user is not logged on. This allows you to start a retrieval
|
||||||
and disconnect from the system, letting Wget finish the work. By
|
and disconnect from the system, letting Wget finish the work. By
|
||||||
@ -128,18 +130,23 @@ which can be a great hindrance when transferring a lot of data.
|
|||||||
@c man end
|
@c man end
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@c man begin DESCRIPTION
|
|
||||||
@item
|
@item
|
||||||
Wget is capable of descending recursively through the structure of
|
@ignore
|
||||||
@sc{html} documents and @sc{ftp} directory trees, making a local copy of
|
@c man begin DESCRIPTION
|
||||||
the directory hierarchy similar to the one on the remote server. This
|
|
||||||
feature can be used to mirror archives and home pages, or traverse the
|
@c man end
|
||||||
web in search of data, like a @sc{www} robot (@pxref{Robots}). In that
|
@end ignore
|
||||||
spirit, Wget understands the @code{norobots} convention.
|
@c man begin DESCRIPTION
|
||||||
|
Wget can follow links in @sc{html} pages and create local versions of
|
||||||
|
remote web sites, fully recreating the directory structure of the
|
||||||
|
original site. This is sometimes referred to as ``recursive
|
||||||
|
downloading.'' While doing that, Wget respects the Robot Exclusion
|
||||||
|
Standard (@file{/robots.txt}). Wget can be instructed to convert the
|
||||||
|
links in downloaded @sc{html} files to the local files for offline
|
||||||
|
viewing.
|
||||||
@c man end
|
@c man end
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@c man begin DESCRIPTION
|
|
||||||
@item
|
@item
|
||||||
File name wildcard matching and recursive mirroring of directories are
|
File name wildcard matching and recursive mirroring of directories are
|
||||||
available when retrieving via @sc{ftp}. Wget can read the time-stamp
|
available when retrieving via @sc{ftp}. Wget can read the time-stamp
|
||||||
@ -148,52 +155,47 @@ locally. Thus Wget can see if the remote file has changed since last
|
|||||||
retrieval, and automatically retrieve the new version if it has. This
|
retrieval, and automatically retrieve the new version if it has. This
|
||||||
makes Wget suitable for mirroring of @sc{ftp} sites, as well as home
|
makes Wget suitable for mirroring of @sc{ftp} sites, as well as home
|
||||||
pages.
|
pages.
|
||||||
@c man end
|
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@c man begin DESCRIPTION
|
|
||||||
@item
|
@item
|
||||||
Wget works exceedingly well on slow or unstable connections,
|
@ignore
|
||||||
retrying the document until it is fully retrieved, or until a
|
@c man begin DESCRIPTION
|
||||||
user-specified retry count is surpassed. It will try to resume the
|
|
||||||
download from the point of interruption, using @code{REST} with @sc{ftp}
|
@c man end
|
||||||
and @code{Range} with @sc{http} servers that support them.
|
@end ignore
|
||||||
|
@c man begin DESCRIPTION
|
||||||
|
Wget has been designed for robustness over slow or unstable network
|
||||||
|
connections; if a download fails due to a network problem, it will
|
||||||
|
keep retrying until the whole file has been retrieved. If the server
|
||||||
|
supports regetting, it will instruct the server to continue the
|
||||||
|
download from where it left off.
|
||||||
@c man end
|
@c man end
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@c man begin DESCRIPTION
|
|
||||||
@item
|
@item
|
||||||
By default, Wget supports proxy servers, which can lighten the network
|
Wget supports proxy servers, which can lighten the network load, speed
|
||||||
load, speed up retrieval and provide access behind firewalls. However,
|
up retrieval and provide access behind firewalls. However, if you are
|
||||||
if you are behind a firewall that requires that you use a socks style
|
behind a firewall that requires that you use a socks style gateway, you
|
||||||
gateway, you can get the socks library and build Wget with support for
|
can get the socks library and build Wget with support for socks. Wget
|
||||||
socks. Wget also supports the passive @sc{ftp} downloading as an
|
also supports the passive @sc{ftp} downloading as an option.
|
||||||
option.
|
|
||||||
@c man end
|
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@c man begin DESCRIPTION
|
|
||||||
@item
|
@item
|
||||||
Builtin features offer mechanisms to tune which links you wish to follow
|
Builtin features offer mechanisms to tune which links you wish to follow
|
||||||
(@pxref{Following Links}).
|
(@pxref{Following Links}).
|
||||||
@c man end
|
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@c man begin DESCRIPTION
|
|
||||||
@item
|
@item
|
||||||
The retrieval is conveniently traced with printing dots, each dot
|
The retrieval is conveniently traced with printing dots, each dot
|
||||||
representing a fixed amount of data received (1KB by default). These
|
representing a fixed amount of data received (1KB by default). These
|
||||||
representations can be customized to your preferences.
|
representations can be customized to your preferences.
|
||||||
@c man end
|
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@c man begin DESCRIPTION
|
|
||||||
@item
|
@item
|
||||||
Most of the features are fully configurable, either through command line
|
Most of the features are fully configurable, either through command line
|
||||||
options, or via the initialization file @file{.wgetrc} (@pxref{Startup
|
options, or via the initialization file @file{.wgetrc} (@pxref{Startup
|
||||||
File}). Wget allows you to define @dfn{global} startup files
|
File}). Wget allows you to define @dfn{global} startup files
|
||||||
(@file{/usr/local/etc/wgetrc} by default) for site settings.
|
(@file{/usr/local/etc/wgetrc} by default) for site settings.
|
||||||
@c man end
|
|
||||||
|
|
||||||
@ignore
|
@ignore
|
||||||
@c man begin FILES
|
@c man begin FILES
|
||||||
@ -208,14 +210,12 @@ User startup file.
|
|||||||
@end ignore
|
@end ignore
|
||||||
|
|
||||||
@sp 1
|
@sp 1
|
||||||
@c man begin DESCRIPTION
|
|
||||||
@item
|
@item
|
||||||
Finally, GNU Wget is free software. This means that everyone may use
|
Finally, GNU Wget is free software. This means that everyone may use
|
||||||
it, redistribute it and/or modify it under the terms of the GNU General
|
it, redistribute it and/or modify it under the terms of the GNU General
|
||||||
Public License, as published by the Free Software Foundation
|
Public License, as published by the Free Software Foundation
|
||||||
(@pxref{Copying}).
|
(@pxref{Copying}).
|
||||||
@end itemize
|
@end itemize
|
||||||
@c man end
|
|
||||||
|
|
||||||
@node Invoking, Recursive Retrieval, Overview, Top
|
@node Invoking, Recursive Retrieval, Overview, Top
|
||||||
@chapter Invoking
|
@chapter Invoking
|
||||||
@ -1206,17 +1206,6 @@ likes to use a few options in addition to @samp{-p}:
|
|||||||
wget -E -H -k -K -p http://@var{site}/@var{document}
|
wget -E -H -k -K -p http://@var{site}/@var{document}
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
In one case you'll need to add a couple more options. If @var{document}
|
|
||||||
is a @code{<FRAMESET>} page, the "one more hop" that @samp{-p} gives you
|
|
||||||
won't be enough---you'll get the @code{<FRAME>} pages that are
|
|
||||||
referenced, but you won't get @emph{their} requisites. Therefore, in
|
|
||||||
this case you'll need to add @samp{-r -l1} to the commandline. The
|
|
||||||
@samp{-r -l1} will recurse from the @code{<FRAMESET>} page to to the
|
|
||||||
@code{<FRAME>} pages, and the @samp{-p} will get their requisites. If
|
|
||||||
you're already using a recursion level of 1 or more, you'll need to up
|
|
||||||
it by one. In the future, @samp{-p} may be made smarter so that it'll
|
|
||||||
do "two more hops" in the case of a @code{<FRAMESET>} page.
|
|
||||||
|
|
||||||
To finish off this topic, it's worth knowing that Wget's idea of an
|
To finish off this topic, it's worth knowing that Wget's idea of an
|
||||||
external document link is any URL specified in an @code{<A>} tag, an
|
external document link is any URL specified in an @code{<A>} tag, an
|
||||||
@code{<AREA>} tag, or a @code{<LINK>} tag other than @code{<LINK
|
@code{<AREA>} tag, or a @code{<LINK>} tag other than @code{<LINK
|
||||||
@ -2199,16 +2188,14 @@ its line.
|
|||||||
@chapter Examples
|
@chapter Examples
|
||||||
@cindex examples
|
@cindex examples
|
||||||
|
|
||||||
The examples are classified into three sections, because of clarity.
|
@c man begin EXAMPLES
|
||||||
The first section is a tutorial for beginners. The second section
|
The examples are divided into three sections loosely based on their
|
||||||
explains some of the more complex program features. The third section
|
complexity.
|
||||||
contains advice for mirror administrators, as well as even more complex
|
|
||||||
features (that some would call perverted).
|
|
||||||
|
|
||||||
@menu
|
@menu
|
||||||
* Simple Usage:: Simple, basic usage of the program.
|
* Simple Usage:: Simple, basic usage of the program.
|
||||||
* Advanced Usage:: Advanced techniques of usage.
|
* Advanced Usage:: Advanced tips.
|
||||||
* Guru Usage:: Mirroring and the hairy stuff.
|
* Very Advanced Usage:: The hairy stuff.
|
||||||
@end menu
|
@end menu
|
||||||
|
|
||||||
@node Simple Usage, Advanced Usage, Examples, Examples
|
@node Simple Usage, Advanced Usage, Examples, Examples
|
||||||
@ -2222,22 +2209,6 @@ Say you want to download a @sc{url}. Just type:
|
|||||||
wget http://fly.srk.fer.hr/
|
wget http://fly.srk.fer.hr/
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
The response will be something like:
|
|
||||||
|
|
||||||
@example
|
|
||||||
@group
|
|
||||||
--13:30:45-- http://fly.srk.fer.hr:80/en/
|
|
||||||
=> `index.html'
|
|
||||||
Connecting to fly.srk.fer.hr:80... connected!
|
|
||||||
HTTP request sent, awaiting response... 200 OK
|
|
||||||
Length: 4,694 [text/html]
|
|
||||||
|
|
||||||
0K -> .... [100%]
|
|
||||||
|
|
||||||
13:30:46 (23.75 KB/s) - `index.html' saved [4694/4694]
|
|
||||||
@end group
|
|
||||||
@end example
|
|
||||||
|
|
||||||
@item
|
@item
|
||||||
But what will happen if the connection is slow, and the file is lengthy?
|
But what will happen if the connection is slow, and the file is lengthy?
|
||||||
The connection will probably fail before the whole file is retrieved,
|
The connection will probably fail before the whole file is retrieved,
|
||||||
@ -2267,20 +2238,7 @@ The usage of @sc{ftp} is as simple. Wget will take care of login and
|
|||||||
password.
|
password.
|
||||||
|
|
||||||
@example
|
@example
|
||||||
@group
|
wget ftp://gnjilux.srk.fer.hr/welcome.msg
|
||||||
$ wget ftp://gnjilux.srk.fer.hr/welcome.msg
|
|
||||||
--10:08:47-- ftp://gnjilux.srk.fer.hr:21/welcome.msg
|
|
||||||
=> `welcome.msg'
|
|
||||||
Connecting to gnjilux.srk.fer.hr:21... connected!
|
|
||||||
Logging in as anonymous ... Logged in!
|
|
||||||
==> TYPE I ... done. ==> CWD not needed.
|
|
||||||
==> PORT ... done. ==> RETR welcome.msg ... done.
|
|
||||||
Length: 1,340 (unauthoritative)
|
|
||||||
|
|
||||||
0K -> . [100%]
|
|
||||||
|
|
||||||
10:08:48 (1.28 MB/s) - `welcome.msg' saved [1340]
|
|
||||||
@end group
|
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
@item
|
@item
|
||||||
@ -2289,39 +2247,65 @@ parse it and convert it to @sc{html}. Try:
|
|||||||
|
|
||||||
@example
|
@example
|
||||||
wget ftp://prep.ai.mit.edu/pub/gnu/
|
wget ftp://prep.ai.mit.edu/pub/gnu/
|
||||||
lynx index.html
|
links index.html
|
||||||
@end example
|
@end example
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
@node Advanced Usage, Guru Usage, Simple Usage, Examples
|
@node Advanced Usage, Very Advanced Usage, Simple Usage, Examples
|
||||||
@section Advanced Usage
|
@section Advanced Usage
|
||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item
|
@item
|
||||||
You would like to read the list of @sc{url}s from a file? Not a problem
|
You have a file that contains the URLs you want to download? Use the
|
||||||
with that:
|
@samp{-i} switch:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget -i file
|
wget -i @var{file}
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
If you specify @samp{-} as file name, the @sc{url}s will be read from
|
If you specify @samp{-} as file name, the @sc{url}s will be read from
|
||||||
standard input.
|
standard input.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Create a mirror image of GNU @sc{www} site (with the same directory structure
|
Create a five levels deep mirror image of the GNU web site, with the
|
||||||
the original has) with only one try per document, saving the log of the
|
same directory structure the original has, with only one try per
|
||||||
activities to @file{gnulog}:
|
document, saving the log of the activities to @file{gnulog}:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget -r -t1 http://www.gnu.ai.mit.edu/ -o gnulog
|
wget -r http://www.gnu.org/ -o gnulog
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Retrieve the first layer of yahoo links:
|
The same as the above, but convert the links in the @sc{html} files to
|
||||||
|
point to local files, so you can view the documents off-line:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget -r -l1 http://www.yahoo.com/
|
wget --convert-links -r http://www.gnu.org/ -o gnulog
|
||||||
|
@end example
|
||||||
|
|
||||||
|
@item
|
||||||
|
Retrieve only one HTML page, but make sure that all the elements needed
|
||||||
|
for the page to be displayed, such as inline images and external style
|
||||||
|
sheets, are also downloaded. Also make sure the downloaded page
|
||||||
|
references the downloaded links.
|
||||||
|
|
||||||
|
@example
|
||||||
|
wget -p --convert-links http://www.server.com/dir/page.html
|
||||||
|
@end example
|
||||||
|
|
||||||
|
The HTML page will be saved to @file{www.server.com/dir/page.html}, and
|
||||||
|
the images, stylesheets, etc., somewhere under @file{www.server.com/},
|
||||||
|
depending on where they were on the remote server.
|
||||||
|
|
||||||
|
@item
|
||||||
|
The same as the above, but without the @file{www.server.com/} directory.
|
||||||
|
In fact, I don't want to have all those random server directories
|
||||||
|
anyway---just save @emph{all} those files under a @file{download/}
|
||||||
|
subdirectory of the current directory.
|
||||||
|
|
||||||
|
@example
|
||||||
|
wget -p --convert-links -nH -nd -Pdownload \
|
||||||
|
http://www.server.com/dir/page.html
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
@item
|
@item
|
||||||
@ -2333,7 +2317,8 @@ wget -S http://www.lycos.com/
|
|||||||
@end example
|
@end example
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Save the server headers with the file:
|
Save the server headers with the file, perhaps for post-processing.
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget -s http://www.lycos.com/
|
wget -s http://www.lycos.com/
|
||||||
more index.html
|
more index.html
|
||||||
@ -2341,25 +2326,26 @@ more index.html
|
|||||||
|
|
||||||
@item
|
@item
|
||||||
Retrieve the first two levels of @samp{wuarchive.wustl.edu}, saving them
|
Retrieve the first two levels of @samp{wuarchive.wustl.edu}, saving them
|
||||||
to /tmp.
|
to @file{/tmp}.
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget -P/tmp -l2 ftp://wuarchive.wustl.edu/
|
wget -r -l2 -P/tmp ftp://wuarchive.wustl.edu/
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
@item
|
@item
|
||||||
You want to download all the @sc{gif}s from an @sc{http} directory.
|
You want to download all the @sc{gif}s from a directory on an @sc{http}
|
||||||
@samp{wget http://host/dir/*.gif} doesn't work, since @sc{http}
|
server. @samp{wget http://www.server.com/dir/*.gif} doesn't work
|
||||||
retrieval does not support globbing. In that case, use:
|
because @sc{http} retrieval does not support globbing. In that case,
|
||||||
|
use:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget -r -l1 --no-parent -A.gif http://host/dir/
|
wget -r -l1 --no-parent -A.gif http://www.server.com/dir/
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
It is a bit of a kludge, but it works. @samp{-r -l1} means to retrieve
|
More verbose, but the effect is the same. @samp{-r -l1} means to
|
||||||
recursively (@pxref{Recursive Retrieval}), with maximum depth of 1.
|
retrieve recursively (@pxref{Recursive Retrieval}), with maximum depth
|
||||||
@samp{--no-parent} means that references to the parent directory are
|
of 1. @samp{--no-parent} means that references to the parent directory
|
||||||
ignored (@pxref{Directory-Based Limits}), and @samp{-A.gif} means to
|
are ignored (@pxref{Directory-Based Limits}), and @samp{-A.gif} means to
|
||||||
download only the @sc{gif} files. @samp{-A "*.gif"} would have worked
|
download only the @sc{gif} files. @samp{-A "*.gif"} would have worked
|
||||||
too.
|
too.
|
||||||
|
|
||||||
@ -2369,7 +2355,7 @@ interrupted. Now you do not want to clobber the files already present.
|
|||||||
It would be:
|
It would be:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget -nc -r http://www.gnu.ai.mit.edu/
|
wget -nc -r http://www.gnu.org/
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
@item
|
@item
|
||||||
@ -2377,81 +2363,76 @@ If you want to encode your own username and password to @sc{http} or
|
|||||||
@sc{ftp}, use the appropriate @sc{url} syntax (@pxref{URL Format}).
|
@sc{ftp}, use the appropriate @sc{url} syntax (@pxref{URL Format}).
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget ftp://hniksic:mypassword@@jagor.srce.hr/.emacs
|
wget ftp://hniksic:mypassword@@unix.server.com/.emacs
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
|
@cindex redirecting output
|
||||||
@item
|
@item
|
||||||
If you do not like the default retrieval visualization (1K dots with 10
|
You would like the output documents to go to standard output instead of
|
||||||
dots per cluster and 50 dots per line), you can customize it through dot
|
to files?
|
||||||
settings (@pxref{Wgetrc Commands}). For example, many people like the
|
|
||||||
``binary'' style of retrieval, with 8K dots and 512K lines:
|
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget --dot-style=binary ftp://prep.ai.mit.edu/pub/gnu/README
|
wget -O - http://jagor.srce.hr/ http://www.srce.hr/
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
You can experiment with other styles, like:
|
You can also combine the two options and make pipelines to retrieve the
|
||||||
|
documents from remote hotlists:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget --dot-style=mega ftp://ftp.xemacs.org/pub/xemacs/xemacs-20.4/xemacs-20.4.tar.gz
|
wget -O - http://cool.list.com/ | wget --force-html -i -
|
||||||
wget --dot-style=micro http://fly.srk.fer.hr/
|
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
To make these settings permanent, put them in your @file{.wgetrc}, as
|
|
||||||
described before (@pxref{Sample Wgetrc}).
|
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
@node Guru Usage, , Advanced Usage, Examples
|
@node Very Advanced Usage, , Advanced Usage, Examples
|
||||||
@section Guru Usage
|
@section Very Advanced Usage
|
||||||
|
|
||||||
@cindex mirroring
|
@cindex mirroring
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item
|
@item
|
||||||
If you wish Wget to keep a mirror of a page (or @sc{ftp}
|
If you wish Wget to keep a mirror of a page (or @sc{ftp}
|
||||||
subdirectories), use @samp{--mirror} (@samp{-m}), which is the shorthand
|
subdirectories), use @samp{--mirror} (@samp{-m}), which is the shorthand
|
||||||
for @samp{-r -N}. You can put Wget in the crontab file asking it to
|
for @samp{-r -l inf -N}. You can put Wget in the crontab file asking it
|
||||||
recheck a site each Sunday:
|
to recheck a site each Sunday:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
crontab
|
crontab
|
||||||
0 0 * * 0 wget --mirror ftp://ftp.xemacs.org/pub/xemacs/ -o /home/me/weeklog
|
0 0 * * 0 wget --mirror http://www.gnu.org/ -o /home/me/weeklog
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
@item
|
@item
|
||||||
You may wish to do the same with someone's home page. But you do not
|
In addition to the above, you want the links to be converted for local
|
||||||
want to download all those images---you're only interested in @sc{html}.
|
viewing. But, after having read this manual, you know that link
|
||||||
|
conversion doesn't play well with timestamping, so you also want Wget to
|
||||||
|
back up the original HTML files before the conversion. Wget invocation
|
||||||
|
would look like this:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget --mirror -A.html http://www.w3.org/
|
wget --mirror --convert-links --backup-converted \
|
||||||
|
http://www.gnu.org/ -o /home/me/weeklog
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
@item
|
@item
|
||||||
You have a presentation and would like the dumb absolute links to be
|
But you've also noticed that local viewing doesn't work all that well
|
||||||
converted to relative? Use @samp{-k}:
|
when HTML files are saved under extensions other than @samp{.html},
|
||||||
|
perhaps because they were served as @file{index.cgi}. So you'd like
|
||||||
|
Wget to rename all the files served with content-type @samp{text/html}
|
||||||
|
to @file{@var{name}.html}.
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget -k -r @var{URL}
|
wget --mirror --convert-links --backup-converted \
|
||||||
|
--html-extension -o /home/me/weeklog \
|
||||||
|
http://www.gnu.org/
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
@cindex redirecting output
|
Or, with less typing:
|
||||||
@item
|
|
||||||
You would like the output documents to go to standard output instead of
|
|
||||||
to files? OK, but Wget will automatically shut up (turn on
|
|
||||||
@samp{--quiet}) to prevent mixing of Wget output and the retrieved
|
|
||||||
documents.
|
|
||||||
|
|
||||||
@example
|
@example
|
||||||
wget -O - http://jagor.srce.hr/ http://www.srce.hr/
|
wget -m -k -K -E http://www.gnu.org/ -o /home/me/weeklog
|
||||||
@end example
|
|
||||||
|
|
||||||
You can also combine the two options and make weird pipelines to
|
|
||||||
retrieve the documents from remote hotlists:
|
|
||||||
|
|
||||||
@example
|
|
||||||
wget -O - http://cool.list.com/ | wget --force-html -i -
|
|
||||||
@end example
|
@end example
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
|
@c man end
|
||||||
|
|
||||||
@node Various, Appendices, Examples, Top
|
@node Various, Appendices, Examples, Top
|
||||||
@chapter Various
|
@chapter Various
|
||||||
@cindex various
|
@cindex various
|
||||||
@ -2592,16 +2573,18 @@ they are supposed to work, it might well be a bug.
|
|||||||
|
|
||||||
@item
|
@item
|
||||||
Try to repeat the bug in as simple circumstances as possible. E.g. if
|
Try to repeat the bug in as simple circumstances as possible. E.g. if
|
||||||
Wget crashes on @samp{wget -rLl0 -t5 -Y0 http://yoyodyne.com -o
|
Wget crashes while downloading @samp{wget -rl0 -kKE -t5 -Y0
|
||||||
/tmp/log}, you should try to see if it will crash with a simpler set of
|
http://yoyodyne.com -o /tmp/log}, you should try to see if the crash is
|
||||||
options.
|
repeatable, and if will occur with a simpler set of options. You might
|
||||||
|
even try to start the download at the page where the crash occurred to
|
||||||
|
see if that page somehow triggered the crash.
|
||||||
|
|
||||||
Also, while I will probably be interested to know the contents of your
|
Also, while I will probably be interested to know the contents of your
|
||||||
@file{.wgetrc} file, just dumping it into the debug message is probably
|
@file{.wgetrc} file, just dumping it into the debug message is probably
|
||||||
a bad idea. Instead, you should first try to see if the bug repeats
|
a bad idea. Instead, you should first try to see if the bug repeats
|
||||||
with @file{.wgetrc} moved out of the way. Only if it turns out that
|
with @file{.wgetrc} moved out of the way. Only if it turns out that
|
||||||
@file{.wgetrc} settings affect the bug, should you mail me the relevant
|
@file{.wgetrc} settings affect the bug, mail me the relevant parts of
|
||||||
parts of the file.
|
the file.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Please start Wget with @samp{-d} option and send the log (or the
|
Please start Wget with @samp{-d} option and send the log (or the
|
||||||
@ -2612,9 +2595,6 @@ on.
|
|||||||
@item
|
@item
|
||||||
If Wget has crashed, try to run it in a debugger, e.g. @code{gdb `which
|
If Wget has crashed, try to run it in a debugger, e.g. @code{gdb `which
|
||||||
wget` core} and type @code{where} to get the backtrace.
|
wget` core} and type @code{where} to get the backtrace.
|
||||||
|
|
||||||
@item
|
|
||||||
Find where the bug is, fix it and send me the patches. :-)
|
|
||||||
@end enumerate
|
@end enumerate
|
||||||
@c man end
|
@c man end
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user