Published in <sxsy9slhu7g.fsf@florida.arsdigita.de>.
* http.c (gethttp): Return RETRUNNEEDED when the retrieval is
unneeded because the file is already there and fully downloaded,
and -c is specified.
(http_loop): Handle RETRUNNEEDED.
* wget.h (uerr_t): New value RETRUNNEEDED.
* http.c (http_loop): Set no_truncate for files that both exist
and are non-empty.
(gethttp): Consider the download finished when restval >= contlen,
not only when restval==contlen.
(gethttp): Handle redirection before giving up due to -c.
(gethttp): Clarify error message which explains that -c will not
truncate the file.
(gethttp): When returning CONTNOTSUPPORTED, don't forget to free
the stuff that needs freeing and release the socket.
* main.c (print_help): Wget booleans accept "off", not "no".
Published in <sxsvgo8nmub.fsf@florida.arsdigita.de>.
* retr.c (retrieve_url): Call uri_merge, not url_concat.
* html-url.c (collect_tags_mapper): Call uri_merge, not
url_concat.
* url.c (mkstruct): Use encode_string instead of xstrdup followed
by URL_CLEANSE.
(path_simplify_with_kludge): Deleted.
(contains_unsafe): Deleted.
(construct): Renamed to uri_merge_1.
(url_concat): Renamed to uri_merge.
* url.c (str_url): Use encode_string instead of the unnecessary
CLEANDUP.
(encode_string_maybe): New function, returns input string if no
encoding is needed.
(encode_string): Call encode_string_maybe to do the dirty work,
xstrdup if no work needed.
* wget.h (XDIGIT_TO_xchar): Define here.
* url.c (decode_string): Use new name.
(encode_string): Ditto.
* http.c (XDIGIT_TO_xchar): Rename HEXD2asc to XDIGIT_TO_xchar.
(dump_hash): Use new name.
* wget.h: Rename ASC2HEXD and HEXD2ASC to XCHAR_TO_XDIGIT and
XDIGIT_TO_XCHAR respectively.
* hash.c (hash_table_map): Allow deletion and change of the
element processed by MAPFUN.
(string_hash): Use the function from glib.
* hash.c (hash_table_remove): Rewrite to actually clear deleted
entries instead of just marking them as deleted.
Published in <sxsu23tvdur.fsf@florida.arsdigita.de>.
Published in <sxsitkc709p.fsf@florida.arsdigita.de>.
* url.c (parseurl): Don't strip trailing slash when u->dir is "/"
because that strips the *leading* slash, thus forcing relative
FTP retrieval.
* ftp.c (getftp): Convert initial FTP directory from VMS to UNIX
notation for VMS servers.
(ftp_retrieve_dirs): Do not prepend '/' to f->name when
odir is an empty string.
configuration section" of top Makefile and analogous spot in others.
* po/Makefile.in.in: Previous addition of top_builddir to
po/Makefile.in was bogus -- it's generated from po/Makefile.in.in.
appropriate -I, -L, and -R/-rpath flags in environment variables,
manually. Automated everything, including bundling libtool so we can
successfully link with the OpenSSL shared libraries on just about any
platform.
* wget.texi: Moved -nr from "Recursive Retrieval Options" to "FTP Options" and
gave it a @cindex entry. Alphabetized FTP options by long option name.
* main.c (print_help): -nr belongs in "FTP options" section of --help output,
not "Recursive retrieval" section. Alphabetized FTP options by long option
name.
you in a directory other than "/"? I don't see a src/ChangeLog entry for
it. In any case, my testing shows that it's fixed in 1.7-dev, but TODO and
a comment in src/ftp.c were not changed to reflect this.
looking at the dates would make you think that things went into
1.6 that actually just went into the 1.7-dev branch. Added "[Not
in 1.6 branch.]" where appropriate to clarify.
* ftp.c, http.c: Applied Hack Kampbj?rn <hack@hackdata.com>'s
patch to deal with h_errno not being defined in netdb.h under Cygwin.
2000-12-30 Dan Harkless <wget@harkless.org>
* version.c: Released Wget version 1.6. Note that on this branch we
never actually had the version set to 1.6, but we still need the
ChangeLog comment for posterity.
- use mmap() to read whole files in core instead of allocating memory
and read'ing it.
- use a new, more general, HTML parser (html-parse.c) and interface to
it from Wget (html-url.c).
- respect <meta name=robots content=nofollow> (easy with the new HTML
parser).
- use hash tables instead of linked lists in places where the lists
were used to facilitate mappings.
- rewrite the code in host.c to be more readable and faster (hash
tables instead of home-grown lists.)
- make convert_links properly convert partial URLs to complete ones
for those URLs that have *not* been downloaded.
- use HTTP persistent connections where available. very
simple-minded, caches the last connection to the server.
Published in <sxshf533d5r.fsf@florida.arsdigita.de>.
files specified on the commandline. Made --convert-links be ignored when
--delete-after is specified. Added note about this fact to --delete-after docs
and made general improvements to them, including the clarification that
--delete-after only deletes local files.
I renamed to "lockable_boolean") in the .wgetrc (currently just passive_ftp).
Wrote documentation for his changes and added the missing "referer" to the
.wgetrc section (making mention of the issue of "referrer" being the correct
spelling).
* ftp.c (ftp_retrieve_list): Use new INFINITE_RECURSION #define.
* html.c: htmlfindurl() now takes final `dash_p_leaf_HTML' parameter.
Wrapped some > 80-column lines. When -p is specified and we're at a
leaf node, do not traverse <A>, <AREA>, or <LINK> tags other than
<LINK REL="stylesheet">.
* html.h (htmlfindurl): Now takes final `dash_p_leaf_HTML' parameter.
* init.c: Added new -p / --page-requisites / page_requisites option.
* main.c (print_help): Clarified that -l inf and -l 0 both allow
infinite recursion. Changed the unhelpful --mirrior description
to simply give the options it's equivalent to. Added new -p option.
(main): Added some comments; handle new -p / --page-requisites.
* options.h (struct options): Added new page_requisites field.
* recur.c: Changed "URL-s" to "URLs" and "HTML-s" to "HTMLs".
Calculate and pass down new `dash_p_leaf_HTML' parameter to
get_urls_html(). Use new INFINITE_RECURSION #define.
* retr.c: Changed "URL-s" to "URLs". get_urls_html() now takes
final `dash_p_leaf_HTML' parameter.
* url.c: get_urls_html() and htmlfindurl() now take final
`dash_p_leaf_HTML' parameter.
* url.h (get_urls_html): Now takes final `dash_p_leaf_HTML' parameter.
* wget.h: Added some comments and new INFINITE_RECURSION #define.
* wget.texi (Recursive Retrieval Options): Documented new -p option.
go through without doing an update first, and I forgot to make the change the
second time. Just changed an erroneous main.c (main) to main.c (print_help).
said that 0 seconds are waited after the first retry, which I believe is
incorrect and does not match what's written elsewhere (e.g. wget.texi). Changed
to 1.
>= width of type" warning on 32-bit architectures. Got rid of it by tricking
the compiler w/ a variable.
* url.c (UNSAFE_CHAR): The macro didn't include all the illegal characters per
RFC1738, namely everything above '~'. It also generated a warning on OSes
where char =~ unsigned char. Fixed.
download a single HTML document and all its constituents.
* po/*.{gmo,po,pot}: Regenerated after adding new options.
* po/hr.po: Hrvoje forgot '\n's on his translations of my altered messages,
causing msgfmt to balk and `make install' to fail.
* wget.texi (Recursive Retrieval Options): In -K description, added a link to
the discussion of interaction with -N.
(Recursive Accept/Reject Options): Did some alphabetizing and added descriptions
of new --follow-tags and -G / --ignore-tags options.
(Following Links): Changed "the loads of" to "loads of".
(Wgetrc Commands): Added descriptions of new follow_tags and ignore_tags
commands.
* html.c (idmatch): Implemented checking of my new --follow-tags and
--ignore-tags options.
* init.c (commands): Added comment reminding people adding new entries doing
allocation to add corresponding freeing in cleanup().
(commands): Added new followtags and ignoretags commands.
(cleanup): Free storage for new followtags and ignoretags.
* main.c: Use of "comma-separated list" was random -- normalized it. Did some
alphabetization. Added comments pointing out "Options without arguments" and
"Options accepting an argument" sections of long_options[]. Added new options
--follow-tags and -G / --ignore-tags. Added comment that Damir's --referer is
currently undocumented. Added comment that Heiko's --waitretry is partially
undocumented (mentioned in --help but not in wget.texi). Moved improperly
sorted 24, 129, and 'G' cases.
* options.h (struct options): Added new fields follow_tags and ignore_tags.
* wget.h: Added "#define EQ 0" so we can say "strcmp(a, b) == EQ".
coded for (downloading StarOffice from Sun's website). He says he doesn't use
wget any more, so he won't be writing a patch that allows downloading that
without breaking anything (such a patch would apparently involve stopping
certain characters in the URL from being escaped).
URLs, gen_page.cgi?page1 and get_page.cgi?page2, they'll both be saved as
get_page.cgi and the second will overwrite the first. Also, parameters to
implicit CGIs, like "http://www.host.com/db/?2000-03-02" cause the URLs to be
printed with trailing garbage characters, and could seg fault. I'm not sure
what Dan had in mind with this patch (no explanatory comments), but I'm removing
it for now. If he can rewrite it so it doesn't break stuff, okay.