wget.texi (Recursive Retrieval Options): Explained that you need
to use -r -l1 -p to get the two levels of requisites for a
<FRAMESET> page. Also made a few other wording improvements.
me to believe it wasn't was exposing a different bug -- URLs specified on the
commandline as opposed to being recursed to don't always get re-converted at the
end of the Wget run.
get from a file over HTTP (FTP only supports ranges ending at the end of the
file, though forcibly disconnecting from the server at the desired endpoint
might be workable).
removed. Hopefully all the failures I was seeing were due to the fact that it
wasn't documented that non-globbing, non-recursive FTP downloads need -N to get
the remote timestamp to be preserved.
* Makefile.in (install): Do install.man if we have pod2man.
* Makefile.in: Make wget man page and install it if we have pod2man. Added some
missing '$(srcdir)/'s. Added missing dependencies on install targets
(allowing you to just do `make install' rather than forcing you to do `make &&
make install'). Also, Makefile rules should always use output file parameters
if available rather than redirecting stdout with '>', or you falsely satisfy
dependencies if the tool you're running is missing or fails -- fixed call of
texi2pod.pl that did this wrong.
* texi2pod.pl: Removed from CVS. Now automatically generated.
* texi2pod.pl.in: This new file is processed into texi2pod.pl, getting the
appropriate path to the Perl 5+ executable on this system and becoming
executable (CVS files, by contrast, don't arrive executable).
number of bytes at the end of a file before resuming download. Apparently, some
stupid proxies insert a "transfer interrupted" string we need to get rid of.
you in a directory other than "/"? I don't see a src/ChangeLog entry for
it. In any case, my testing shows that it's fixed in 1.7-dev, but TODO and
a comment in src/ftp.c were not changed to reflect this.
- use mmap() to read whole files in core instead of allocating memory
and read'ing it.
- use a new, more general, HTML parser (html-parse.c) and interface to
it from Wget (html-url.c).
- respect <meta name=robots content=nofollow> (easy with the new HTML
parser).
- use hash tables instead of linked lists in places where the lists
were used to facilitate mappings.
- rewrite the code in host.c to be more readable and faster (hash
tables instead of home-grown lists.)
- make convert_links properly convert partial URLs to complete ones
for those URLs that have *not* been downloaded.
- use HTTP persistent connections where available. very
simple-minded, caches the last connection to the server.
Published in <sxshf533d5r.fsf@florida.arsdigita.de>.
* TODO: We need to check the HTTP spec w.r.t. simplification of absolute URLs.
* MAILING-LIST: I didn't realize <wget@sunsite.auc.dk> allowed posting by
non-subscribers. <bug-wget@gnu.org> soon to be an alias for it.
* NEWS: Always forget to update this file when making user-vis. changes.
However, Brian McMahon <bm@iucr.org> wants the old incorrect behavior to still
be available as an option, as he depends on it to allow mirrors of his site to
send CGI queries to his original site, but still get graphics off of the mirror
site. Perhaps this would be better dealt with by adding an option to tell -k
not to convert certain URLs patterns?
besides Hrvoje, and added the following three items I've been meaning to get to:
* Make -K compare X.orig to X and move the former on top of the latter if
they're the same, rather than leaving identical .orig files laying around.
* Add an option to save all text/html files with a .html extension so that when
grabbing the output of a dynamically-generated remote page, you'll end up with
a filename that will cause _your_ webserver to realize the saved static HTML
file isn't text/plain.
* Allow mirroring of FTP URLs where logging in puts you somewhere else besides
'/'.
download a single HTML document and all its constituents.
* po/*.{gmo,po,pot}: Regenerated after adding new options.
* po/hr.po: Hrvoje forgot '\n's on his translations of my altered messages,
causing msgfmt to balk and `make install' to fail.
* wget.texi (Recursive Retrieval Options): In -K description, added a link to
the discussion of interaction with -N.
(Recursive Accept/Reject Options): Did some alphabetizing and added descriptions
of new --follow-tags and -G / --ignore-tags options.
(Following Links): Changed "the loads of" to "loads of".
(Wgetrc Commands): Added descriptions of new follow_tags and ignore_tags
commands.
* html.c (idmatch): Implemented checking of my new --follow-tags and
--ignore-tags options.
* init.c (commands): Added comment reminding people adding new entries doing
allocation to add corresponding freeing in cleanup().
(commands): Added new followtags and ignoretags commands.
(cleanup): Free storage for new followtags and ignoretags.
* main.c: Use of "comma-separated list" was random -- normalized it. Did some
alphabetization. Added comments pointing out "Options without arguments" and
"Options accepting an argument" sections of long_options[]. Added new options
--follow-tags and -G / --ignore-tags. Added comment that Damir's --referer is
currently undocumented. Added comment that Heiko's --waitretry is partially
undocumented (mentioned in --help but not in wget.texi). Moved improperly
sorted 24, 129, and 'G' cases.
* options.h (struct options): Added new fields follow_tags and ignore_tags.
* wget.h: Added "#define EQ 0" so we can say "strcmp(a, b) == EQ".