1
0
mirror of https://github.com/moparisthebest/wget synced 2024-07-03 16:38:41 -04:00

[svn] * wget.texi (Download Options): --no-clobber's documentation was

severely lacking -- ameliorated the situation.  Some of the
previously-undocumented stuff (like the multiple-file-version numeric-suffixing)
that's now mentioned for the first (and only) time in the -nc documentation
should probably be mentioned elsewhere, but due to the way that wget.texi's
hierarchy is laid out, I had a hard time finding anywhere else appropriate.
This commit is contained in:
dan 2000-08-22 20:04:20 -07:00
parent 51642074f4
commit 28668d2875
6 changed files with 157 additions and 93 deletions

View File

@ -1,3 +1,13 @@
2000-08-22 Dan Harkless <dan-wget@dilvish.speed.net>
* wget.texi (Download Options): --no-clobber's documentation was
severely lacking -- ameliorated the situation. Some of the
previously-undocumented stuff (like the multiple-file-version
numeric-suffixing) that's now mentioned for the first (and only)
time in the -nc documentation should probably be mentioned
elsewhere, but due to the way that wget.texi's hierarchy is laid
out, I had a hard time finding anywhere else appropriate.
2000-07-17 Dan Harkless <dan-wget@dilvish.speed.net> 2000-07-17 Dan Harkless <dan-wget@dilvish.speed.net>
* wget.texi (HTTP Options): Minor clarification in "download a * wget.texi (HTTP Options): Minor clarification in "download a

View File

@ -26,8 +26,8 @@ notice identical to this one.
 
Indirect: Indirect:
wget.info-1: 961 wget.info-1: 961
wget.info-2: 50079 wget.info-2: 49932
wget.info-3: 92081 wget.info-3: 93404
 
Tag Table: Tag Table:
(Indirect) (Indirect)
@ -39,50 +39,50 @@ Node: Option Syntax8163
Node: Basic Startup Options9587 Node: Basic Startup Options9587
Node: Logging and Input File Options10287 Node: Logging and Input File Options10287
Node: Download Options12681 Node: Download Options12681
Node: Directory Options19043 Node: Directory Options20366
Node: HTTP Options21521 Node: HTTP Options22844
Node: FTP Options25426 Node: FTP Options26749
Node: Recursive Retrieval Options26619 Node: Recursive Retrieval Options27942
Node: Recursive Accept/Reject Options28583 Node: Recursive Accept/Reject Options29906
Node: Recursive Retrieval31481 Node: Recursive Retrieval32804
Node: Following Links33779 Node: Following Links35102
Node: Relative Links34807 Node: Relative Links36130
Node: Host Checking35321 Node: Host Checking36644
Node: Domain Acceptance37346 Node: Domain Acceptance38669
Node: All Hosts39016 Node: All Hosts40339
Node: Types of Files39443 Node: Types of Files40766
Node: Directory-Based Limits41893 Node: Directory-Based Limits43216
Node: FTP Links44533 Node: FTP Links45856
Node: Time-Stamping45403 Node: Time-Stamping46726
Node: Time-Stamping Usage47040 Node: Time-Stamping Usage48363
Node: HTTP Time-Stamping Internals48609 Node: HTTP Time-Stamping Internals49932
Node: FTP Time-Stamping Internals50079 Node: FTP Time-Stamping Internals51402
Node: Startup File51287 Node: Startup File52610
Node: Wgetrc Location52160 Node: Wgetrc Location53483
Node: Wgetrc Syntax52975 Node: Wgetrc Syntax54298
Node: Wgetrc Commands53690 Node: Wgetrc Commands55013
Node: Sample Wgetrc60972 Node: Sample Wgetrc62295
Node: Examples65991 Node: Examples67314
Node: Simple Usage66598 Node: Simple Usage67921
Node: Advanced Usage68992 Node: Advanced Usage70315
Node: Guru Usage71743 Node: Guru Usage73066
Node: Various73405 Node: Various74728
Node: Proxies73929 Node: Proxies75252
Node: Distribution76694 Node: Distribution78017
Node: Mailing List77045 Node: Mailing List78368
Node: Reporting Bugs77744 Node: Reporting Bugs79067
Node: Portability79529 Node: Portability80852
Node: Signals80904 Node: Signals82227
Node: Appendices81558 Node: Appendices82881
Node: Robots81973 Node: Robots83296
Node: Introduction to RES83120 Node: Introduction to RES84443
Node: RES Format85013 Node: RES Format86336
Node: User-Agent Field86117 Node: User-Agent Field87440
Node: Disallow Field86881 Node: Disallow Field88204
Node: Norobots Examples87492 Node: Norobots Examples88815
Node: Security Considerations88446 Node: Security Considerations89769
Node: Contributors89442 Node: Contributors90765
Node: Copying92081 Node: Copying93404
Node: Concept Index111244 Node: Concept Index112567
 
End Tag Table End Tag Table

View File

@ -357,12 +357,37 @@ Download Options
`-nc' `-nc'
`--no-clobber' `--no-clobber'
Do not clobber existing files when saving to directory hierarchy If a file is downloaded more than once in the same directory,
within recursive retrieval of several files. This option is wget's behavior depends on a few options, including `-nc'. In
*extremely* useful when you wish to continue where you left off certain cases, the local file will be "clobbered", or overwritten,
with retrieval of many files. If the files have the `.html' or upon repeated download. In other cases it will be preserved.
(yuck) `.htm' suffix, they will be loaded from the local disk, and
parsed as if they have been retrieved from the Web. When running wget without `-N', `-nc', or `-r', downloading the
same file in the same directory will result in the original copy
of `FILE' being preserved and the second copy being named
`FILE.1'. If that file is downloaded yet again, the third copy
will be named `FILE.2', and so on. When `-nc' is specified, this
behavior is suppressed, and wget will refuse to download newer
copies of `FILE'. Therefore, "no-clobber" is actually a misnomer
in this mode - it's not clobbering that's prevented (as the
numeric suffixes were already preventing clobbering), but rather
the multiple version saving that's prevented.
When running wget with `-r', but without `-N' or `-nc',
re-downloading a file will result in the new copy simply
overwriting the old. Adding `-nc' will prevent this behavior,
instead causing the original version to be preserved and any newer
copies on the server to be ignored.
When running wget with `-N', with or without `-r', the decision as
to whether or not to download a newer copy of a file depends on
the local and remote timestamp and size of the file (*Note
Time-Stamping::). `-nc' may not be specified at the same time as
`-N'.
Note that when `-nc' is specified, files with the suffixes `.html'
or (yuck) `.htm' will be loaded from the local disk and parsed as
if they had been retrieved from the Web.
`-c' `-c'
`--continue' `--continue'
@ -1220,37 +1245,3 @@ following command every week:
wget --timestamping -r ftp://prep.ai.mit.edu/pub/gnu/ wget --timestamping -r ftp://prep.ai.mit.edu/pub/gnu/

File: wget.info, Node: HTTP Time-Stamping Internals, Next: FTP Time-Stamping Internals, Prev: Time-Stamping Usage, Up: Time-Stamping
HTTP Time-Stamping Internals
============================
Time-stamping in HTTP is implemented by checking of the
`Last-Modified' header. If you wish to retrieve the file `foo.html'
through HTTP, Wget will check whether `foo.html' exists locally. If it
doesn't, `foo.html' will be retrieved unconditionally.
If the file does exist locally, Wget will first check its local
time-stamp (similar to the way `ls -l' checks it), and then send a
`HEAD' request to the remote server, demanding the information on the
remote file.
The `Last-Modified' header is examined to find which file was
modified more recently (which makes it "newer"). If the remote file is
newer, it will be downloaded; if it is older, Wget will give up.(1)
When `--backup-converted' (`-K') is specified in conjunction with
`-N', server file `X' is compared to local file `X.orig', if extant,
rather than being compared to local file `X', which will always differ
if it's been converted by `--convert-links' (`-k').
Arguably, HTTP time-stamping should be implemented using the
`If-Modified-Since' request.
---------- Footnotes ----------
(1) As an additional check, Wget will look at the `Content-Length'
header, and compare the sizes; if they are not the same, the remote
file will be downloaded no matter what the time-stamp says.

View File

@ -23,6 +23,40 @@ are included exactly as in the original, and provided that the entire
resulting derived work is distributed under the terms of a permission resulting derived work is distributed under the terms of a permission
notice identical to this one. notice identical to this one.

File: wget.info, Node: HTTP Time-Stamping Internals, Next: FTP Time-Stamping Internals, Prev: Time-Stamping Usage, Up: Time-Stamping
HTTP Time-Stamping Internals
============================
Time-stamping in HTTP is implemented by checking of the
`Last-Modified' header. If you wish to retrieve the file `foo.html'
through HTTP, Wget will check whether `foo.html' exists locally. If it
doesn't, `foo.html' will be retrieved unconditionally.
If the file does exist locally, Wget will first check its local
time-stamp (similar to the way `ls -l' checks it), and then send a
`HEAD' request to the remote server, demanding the information on the
remote file.
The `Last-Modified' header is examined to find which file was
modified more recently (which makes it "newer"). If the remote file is
newer, it will be downloaded; if it is older, Wget will give up.(1)
When `--backup-converted' (`-K') is specified in conjunction with
`-N', server file `X' is compared to local file `X.orig', if extant,
rather than being compared to local file `X', which will always differ
if it's been converted by `--convert-links' (`-k').
Arguably, HTTP time-stamping should be implemented using the
`If-Modified-Since' request.
---------- Footnotes ----------
(1) As an additional check, Wget will look at the `Content-Length'
header, and compare the sizes; if they are not the same, the remote
file will be downloaded no matter what the time-stamp says.
 
File: wget.info, Node: FTP Time-Stamping Internals, Prev: HTTP Time-Stamping Internals, Up: Time-Stamping File: wget.info, Node: FTP Time-Stamping Internals, Prev: HTTP Time-Stamping Internals, Up: Time-Stamping

View File

@ -408,6 +408,7 @@ Concept Index
* bug reports: Reporting Bugs. * bug reports: Reporting Bugs.
* bugs: Reporting Bugs. * bugs: Reporting Bugs.
* cache: HTTP Options. * cache: HTTP Options.
* clobbering, file: Download Options.
* command line: Invoking. * command line: Invoking.
* Content-Length, ignore: HTTP Options. * Content-Length, ignore: HTTP Options.
* continue retrieval: Download Options. * continue retrieval: Download Options.
@ -424,6 +425,7 @@ Concept Index
* directory prefix: Directory Options. * directory prefix: Directory Options.
* DNS lookup: Host Checking. * DNS lookup: Host Checking.
* dot style: Download Options. * dot style: Download Options.
* downloading multiple times: Download Options.
* examples: Examples. * examples: Examples.
* exclude directories: Directory-Based Limits. * exclude directories: Directory-Based Limits.
* execute wgetrc command: Basic Startup Options. * execute wgetrc command: Basic Startup Options.

View File

@ -453,15 +453,42 @@ already exists, it will be overwritten. If the @var{file} is @samp{-},
the documents will be written to standard output. Including this option the documents will be written to standard output. Including this option
automatically sets the number of tries to 1. automatically sets the number of tries to 1.
@cindex clobbering, file
@cindex downloading multiple times
@cindex no-clobber @cindex no-clobber
@item -nc @item -nc
@itemx --no-clobber @itemx --no-clobber
Do not clobber existing files when saving to directory hierarchy within If a file is downloaded more than once in the same directory, wget's
recursive retrieval of several files. This option is @emph{extremely} behavior depends on a few options, including @samp{-nc}. In certain
useful when you wish to continue where you left off with retrieval of cases, the local file will be "clobbered", or overwritten, upon repeated
many files. If the files have the @samp{.html} or (yuck) @samp{.htm} download. In other cases it will be preserved.
suffix, they will be loaded from the local disk, and parsed as if they
have been retrieved from the Web. When running wget without @samp{-N}, @samp{-nc}, or @samp{-r},
downloading the same file in the same directory will result in the
original copy of @samp{@var{file}} being preserved and the second copy
being named @samp{@var{file}.1}. If that file is downloaded yet again,
the third copy will be named @samp{@var{file}.2}, and so on. When
@samp{-nc} is specified, this behavior is suppressed, and wget will
refuse to download newer copies of @samp{@var{file}}. Therefore,
"no-clobber" is actually a misnomer in this mode -- it's not clobbering
that's prevented (as the numeric suffixes were already preventing
clobbering), but rather the multiple version saving that's prevented.
When running wget with @samp{-r}, but without @samp{-N} or @samp{-nc},
re-downloading a file will result in the new copy simply overwriting the
old. Adding @samp{-nc} will prevent this behavior, instead causing the
original version to be preserved and any newer copies on the server to
be ignored.
When running wget with @samp{-N}, with or without @samp{-r}, the
decision as to whether or not to download a newer copy of a file depends
on the local and remote timestamp and size of the file
(@xref{Time-Stamping}). @samp{-nc} may not be specified at the same
time as @samp{-N}.
Note that when @samp{-nc} is specified, files with the suffixes
@samp{.html} or (yuck) @samp{.htm} will be loaded from the local disk
and parsed as if they had been retrieved from the Web.
@cindex continue retrieval @cindex continue retrieval
@item -c @item -c