From 28668d2875beccfbdac60344f7f380e354511aa5 Mon Sep 17 00:00:00 2001 From: dan Date: Tue, 22 Aug 2000 20:04:20 -0700 Subject: [PATCH] [svn] * wget.texi (Download Options): --no-clobber's documentation was severely lacking -- ameliorated the situation. Some of the previously-undocumented stuff (like the multiple-file-version numeric-suffixing) that's now mentioned for the first (and only) time in the -nc documentation should probably be mentioned elsewhere, but due to the way that wget.texi's hierarchy is laid out, I had a hard time finding anywhere else appropriate. --- doc/ChangeLog | 10 ++++++ doc/wget.info | 94 ++++++++++++++++++++++++------------------------- doc/wget.info-1 | 71 ++++++++++++++++--------------------- doc/wget.info-2 | 34 ++++++++++++++++++ doc/wget.info-3 | 2 ++ doc/wget.texi | 39 ++++++++++++++++---- 6 files changed, 157 insertions(+), 93 deletions(-) diff --git a/doc/ChangeLog b/doc/ChangeLog index 64efe28f..773411c3 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,13 @@ +2000-08-22 Dan Harkless + + * wget.texi (Download Options): --no-clobber's documentation was + severely lacking -- ameliorated the situation. Some of the + previously-undocumented stuff (like the multiple-file-version + numeric-suffixing) that's now mentioned for the first (and only) + time in the -nc documentation should probably be mentioned + elsewhere, but due to the way that wget.texi's hierarchy is laid + out, I had a hard time finding anywhere else appropriate. + 2000-07-17 Dan Harkless * wget.texi (HTTP Options): Minor clarification in "download a diff --git a/doc/wget.info b/doc/wget.info index 250d9bcc..e20be821 100644 --- a/doc/wget.info +++ b/doc/wget.info @@ -26,8 +26,8 @@ notice identical to this one.  Indirect: wget.info-1: 961 -wget.info-2: 50079 -wget.info-3: 92081 +wget.info-2: 49932 +wget.info-3: 93404  Tag Table: (Indirect) @@ -39,50 +39,50 @@ Node: Option Syntax8163 Node: Basic Startup Options9587 Node: Logging and Input File Options10287 Node: Download Options12681 -Node: Directory Options19043 -Node: HTTP Options21521 -Node: FTP Options25426 -Node: Recursive Retrieval Options26619 -Node: Recursive Accept/Reject Options28583 -Node: Recursive Retrieval31481 -Node: Following Links33779 -Node: Relative Links34807 -Node: Host Checking35321 -Node: Domain Acceptance37346 -Node: All Hosts39016 -Node: Types of Files39443 -Node: Directory-Based Limits41893 -Node: FTP Links44533 -Node: Time-Stamping45403 -Node: Time-Stamping Usage47040 -Node: HTTP Time-Stamping Internals48609 -Node: FTP Time-Stamping Internals50079 -Node: Startup File51287 -Node: Wgetrc Location52160 -Node: Wgetrc Syntax52975 -Node: Wgetrc Commands53690 -Node: Sample Wgetrc60972 -Node: Examples65991 -Node: Simple Usage66598 -Node: Advanced Usage68992 -Node: Guru Usage71743 -Node: Various73405 -Node: Proxies73929 -Node: Distribution76694 -Node: Mailing List77045 -Node: Reporting Bugs77744 -Node: Portability79529 -Node: Signals80904 -Node: Appendices81558 -Node: Robots81973 -Node: Introduction to RES83120 -Node: RES Format85013 -Node: User-Agent Field86117 -Node: Disallow Field86881 -Node: Norobots Examples87492 -Node: Security Considerations88446 -Node: Contributors89442 -Node: Copying92081 -Node: Concept Index111244 +Node: Directory Options20366 +Node: HTTP Options22844 +Node: FTP Options26749 +Node: Recursive Retrieval Options27942 +Node: Recursive Accept/Reject Options29906 +Node: Recursive Retrieval32804 +Node: Following Links35102 +Node: Relative Links36130 +Node: Host Checking36644 +Node: Domain Acceptance38669 +Node: All Hosts40339 +Node: Types of Files40766 +Node: Directory-Based Limits43216 +Node: FTP Links45856 +Node: Time-Stamping46726 +Node: Time-Stamping Usage48363 +Node: HTTP Time-Stamping Internals49932 +Node: FTP Time-Stamping Internals51402 +Node: Startup File52610 +Node: Wgetrc Location53483 +Node: Wgetrc Syntax54298 +Node: Wgetrc Commands55013 +Node: Sample Wgetrc62295 +Node: Examples67314 +Node: Simple Usage67921 +Node: Advanced Usage70315 +Node: Guru Usage73066 +Node: Various74728 +Node: Proxies75252 +Node: Distribution78017 +Node: Mailing List78368 +Node: Reporting Bugs79067 +Node: Portability80852 +Node: Signals82227 +Node: Appendices82881 +Node: Robots83296 +Node: Introduction to RES84443 +Node: RES Format86336 +Node: User-Agent Field87440 +Node: Disallow Field88204 +Node: Norobots Examples88815 +Node: Security Considerations89769 +Node: Contributors90765 +Node: Copying93404 +Node: Concept Index112567  End Tag Table diff --git a/doc/wget.info-1 b/doc/wget.info-1 index c967c794..73c8f084 100644 --- a/doc/wget.info-1 +++ b/doc/wget.info-1 @@ -357,12 +357,37 @@ Download Options `-nc' `--no-clobber' - Do not clobber existing files when saving to directory hierarchy - within recursive retrieval of several files. This option is - *extremely* useful when you wish to continue where you left off - with retrieval of many files. If the files have the `.html' or - (yuck) `.htm' suffix, they will be loaded from the local disk, and - parsed as if they have been retrieved from the Web. + If a file is downloaded more than once in the same directory, + wget's behavior depends on a few options, including `-nc'. In + certain cases, the local file will be "clobbered", or overwritten, + upon repeated download. In other cases it will be preserved. + + When running wget without `-N', `-nc', or `-r', downloading the + same file in the same directory will result in the original copy + of `FILE' being preserved and the second copy being named + `FILE.1'. If that file is downloaded yet again, the third copy + will be named `FILE.2', and so on. When `-nc' is specified, this + behavior is suppressed, and wget will refuse to download newer + copies of `FILE'. Therefore, "no-clobber" is actually a misnomer + in this mode - it's not clobbering that's prevented (as the + numeric suffixes were already preventing clobbering), but rather + the multiple version saving that's prevented. + + When running wget with `-r', but without `-N' or `-nc', + re-downloading a file will result in the new copy simply + overwriting the old. Adding `-nc' will prevent this behavior, + instead causing the original version to be preserved and any newer + copies on the server to be ignored. + + When running wget with `-N', with or without `-r', the decision as + to whether or not to download a newer copy of a file depends on + the local and remote timestamp and size of the file (*Note + Time-Stamping::). `-nc' may not be specified at the same time as + `-N'. + + Note that when `-nc' is specified, files with the suffixes `.html' + or (yuck) `.htm' will be loaded from the local disk and parsed as + if they had been retrieved from the Web. `-c' `--continue' @@ -1220,37 +1245,3 @@ following command every week: wget --timestamping -r ftp://prep.ai.mit.edu/pub/gnu/ - -File: wget.info, Node: HTTP Time-Stamping Internals, Next: FTP Time-Stamping Internals, Prev: Time-Stamping Usage, Up: Time-Stamping - -HTTP Time-Stamping Internals -============================ - - Time-stamping in HTTP is implemented by checking of the -`Last-Modified' header. If you wish to retrieve the file `foo.html' -through HTTP, Wget will check whether `foo.html' exists locally. If it -doesn't, `foo.html' will be retrieved unconditionally. - - If the file does exist locally, Wget will first check its local -time-stamp (similar to the way `ls -l' checks it), and then send a -`HEAD' request to the remote server, demanding the information on the -remote file. - - The `Last-Modified' header is examined to find which file was -modified more recently (which makes it "newer"). If the remote file is -newer, it will be downloaded; if it is older, Wget will give up.(1) - - When `--backup-converted' (`-K') is specified in conjunction with -`-N', server file `X' is compared to local file `X.orig', if extant, -rather than being compared to local file `X', which will always differ -if it's been converted by `--convert-links' (`-k'). - - Arguably, HTTP time-stamping should be implemented using the -`If-Modified-Since' request. - - ---------- Footnotes ---------- - - (1) As an additional check, Wget will look at the `Content-Length' -header, and compare the sizes; if they are not the same, the remote -file will be downloaded no matter what the time-stamp says. - diff --git a/doc/wget.info-2 b/doc/wget.info-2 index 6adae5ab..19f7d344 100644 --- a/doc/wget.info-2 +++ b/doc/wget.info-2 @@ -23,6 +23,40 @@ are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. + +File: wget.info, Node: HTTP Time-Stamping Internals, Next: FTP Time-Stamping Internals, Prev: Time-Stamping Usage, Up: Time-Stamping + +HTTP Time-Stamping Internals +============================ + + Time-stamping in HTTP is implemented by checking of the +`Last-Modified' header. If you wish to retrieve the file `foo.html' +through HTTP, Wget will check whether `foo.html' exists locally. If it +doesn't, `foo.html' will be retrieved unconditionally. + + If the file does exist locally, Wget will first check its local +time-stamp (similar to the way `ls -l' checks it), and then send a +`HEAD' request to the remote server, demanding the information on the +remote file. + + The `Last-Modified' header is examined to find which file was +modified more recently (which makes it "newer"). If the remote file is +newer, it will be downloaded; if it is older, Wget will give up.(1) + + When `--backup-converted' (`-K') is specified in conjunction with +`-N', server file `X' is compared to local file `X.orig', if extant, +rather than being compared to local file `X', which will always differ +if it's been converted by `--convert-links' (`-k'). + + Arguably, HTTP time-stamping should be implemented using the +`If-Modified-Since' request. + + ---------- Footnotes ---------- + + (1) As an additional check, Wget will look at the `Content-Length' +header, and compare the sizes; if they are not the same, the remote +file will be downloaded no matter what the time-stamp says. +  File: wget.info, Node: FTP Time-Stamping Internals, Prev: HTTP Time-Stamping Internals, Up: Time-Stamping diff --git a/doc/wget.info-3 b/doc/wget.info-3 index cb93e3ca..83609260 100644 --- a/doc/wget.info-3 +++ b/doc/wget.info-3 @@ -408,6 +408,7 @@ Concept Index * bug reports: Reporting Bugs. * bugs: Reporting Bugs. * cache: HTTP Options. +* clobbering, file: Download Options. * command line: Invoking. * Content-Length, ignore: HTTP Options. * continue retrieval: Download Options. @@ -424,6 +425,7 @@ Concept Index * directory prefix: Directory Options. * DNS lookup: Host Checking. * dot style: Download Options. +* downloading multiple times: Download Options. * examples: Examples. * exclude directories: Directory-Based Limits. * execute wgetrc command: Basic Startup Options. diff --git a/doc/wget.texi b/doc/wget.texi index 6cff3b77..d96a02f4 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -453,15 +453,42 @@ already exists, it will be overwritten. If the @var{file} is @samp{-}, the documents will be written to standard output. Including this option automatically sets the number of tries to 1. +@cindex clobbering, file +@cindex downloading multiple times @cindex no-clobber @item -nc @itemx --no-clobber -Do not clobber existing files when saving to directory hierarchy within -recursive retrieval of several files. This option is @emph{extremely} -useful when you wish to continue where you left off with retrieval of -many files. If the files have the @samp{.html} or (yuck) @samp{.htm} -suffix, they will be loaded from the local disk, and parsed as if they -have been retrieved from the Web. +If a file is downloaded more than once in the same directory, wget's +behavior depends on a few options, including @samp{-nc}. In certain +cases, the local file will be "clobbered", or overwritten, upon repeated +download. In other cases it will be preserved. + +When running wget without @samp{-N}, @samp{-nc}, or @samp{-r}, +downloading the same file in the same directory will result in the +original copy of @samp{@var{file}} being preserved and the second copy +being named @samp{@var{file}.1}. If that file is downloaded yet again, +the third copy will be named @samp{@var{file}.2}, and so on. When +@samp{-nc} is specified, this behavior is suppressed, and wget will +refuse to download newer copies of @samp{@var{file}}. Therefore, +"no-clobber" is actually a misnomer in this mode -- it's not clobbering +that's prevented (as the numeric suffixes were already preventing +clobbering), but rather the multiple version saving that's prevented. + +When running wget with @samp{-r}, but without @samp{-N} or @samp{-nc}, +re-downloading a file will result in the new copy simply overwriting the +old. Adding @samp{-nc} will prevent this behavior, instead causing the +original version to be preserved and any newer copies on the server to +be ignored. + +When running wget with @samp{-N}, with or without @samp{-r}, the +decision as to whether or not to download a newer copy of a file depends +on the local and remote timestamp and size of the file +(@xref{Time-Stamping}). @samp{-nc} may not be specified at the same +time as @samp{-N}. + +Note that when @samp{-nc} is specified, files with the suffixes +@samp{.html} or (yuck) @samp{.htm} will be loaded from the local disk +and parsed as if they had been retrieved from the Web. @cindex continue retrieval @item -c