1
0
mirror of https://github.com/moparisthebest/wget synced 2024-07-03 16:38:41 -04:00
Commit Graph

218 Commits

Author SHA1 Message Date
hniksic
b27144fcce [svn] My patch "persistent connection tweaks".
Published in <sxshf531qhj.fsf@florida.arsdigita.de>.

(Applied with the addition of correct calculation for the
length of the request.)
2000-11-19 15:42:13 -08:00
hniksic
b0b1c815c1 [svn] A bunch of new features:
- use mmap() to read whole files in core instead of allocating memory
  and read'ing it.

- use a new, more general, HTML parser (html-parse.c) and interface to
  it from Wget (html-url.c).

- respect <meta name=robots content=nofollow> (easy with the new HTML
  parser).

- use hash tables instead of linked lists in places where the lists
  were used to facilitate mappings.

- rewrite the code in host.c to be more readable and faster (hash
  tables instead of home-grown lists.)

- make convert_links properly convert partial URLs to complete ones
  for those URLs that have *not* been downloaded.

- use HTTP persistent connections where available.  very
  simple-minded, caches the last connection to the server.

Published in <sxshf533d5r.fsf@florida.arsdigita.de>.
2000-11-19 12:50:10 -08:00
hniksic
ccf31643ab [svn] vsnprintf() fixup. 2000-11-16 08:37:49 -08:00
hniksic
cc3b6eb3e4 [svn] Do the _XOPEN_SOURCE/_SVID_SOURCE things only on Linux. 2000-11-15 10:10:01 -08:00
hniksic
6a70f04a5c [svn] Don't clutter the host list with duplicate hosts.
Published in <sxsitpt56eh.fsf@florida.arsdigita.de>.
2000-11-12 16:46:13 -08:00
hniksic
e1f1c1ff40 [svn] Better version of read_whole_line().
Published in <sxsr94jd7z4.fsf@florida.arsdigita.de>.
2000-11-10 10:01:35 -08:00
hniksic
e18ca280fb [svn] Fix off-by-one error in comind().
Published in <sxsvgtvdcki.fsf@florida.arsdigita.de>.
2000-11-10 08:20:55 -08:00
hniksic
f306ae9626 [svn] Changed last_slash[-1] to *(last_slash - 1). 2000-11-08 07:51:28 -08:00
hniksic
b72b6cf387 [svn] Correctly handle URLs where / does not follow the host name.
Published in <sxsn1fag6zu.fsf@florida.arsdigita.de>.
2000-11-08 01:15:40 -08:00
hniksic
34ea31bb01 [svn] Sort commands[]. 2000-11-07 03:43:36 -08:00
hniksic
0e2b74ce3b [svn] Commit "minor fixes". 2000-11-06 13:24:57 -08:00
hniksic
366ad1d6d9 [svn] Rewrote the logging code.
Published at <sxs1ywrf300.fsf@florida.arsdigita.de>.
2000-11-04 20:38:31 -08:00
hniksic
c2c821b3c9 [svn] snprintf.c addition. 2000-11-04 14:49:46 -08:00
hniksic
6e23da9254 [svn] Hide password from URL when non-verbose, too. 2000-11-01 17:41:20 -08:00
hniksic
6eb0870af0 [svn] Contributed fix. 2000-11-01 17:02:56 -08:00
hniksic
986c445029 [svn] Fixed minor memory leaks. 2000-11-01 16:18:27 -08:00
hniksic
b3758323ed [svn] Applied contributed fix. 2000-11-01 15:57:19 -08:00
hniksic
b7a8c6d3f5 [svn] Gracefully handle opt.downloaded overflowing.
Published in <sxsd7gfnv17.fsf@florida.arsdigita.de>.
2000-11-01 15:17:31 -08:00
hniksic
29cdc8da20 [svn] Updated long_to_string(); enhanced opt.downloaded to use
64-bit types where available.
Published in <sxswvenqsmn.fsf@florida.arsdigita.de> and
<sxssnpbqshp.fsf@florida.arsdigita.de>.
2000-11-01 13:51:25 -08:00
hniksic
b9eeb0c54c [svn] Fix "optimization" of query-strings in URLs.
Published in <sxs3dhbwnmw.fsf@florida.arsdigita.de>.
2000-11-01 10:31:53 -08:00
hniksic
6d13e17142 [svn] Detect redirection cycles.
Published in <sxsd7ggtjac.fsf@florida.arsdigita.de>.
2000-10-31 20:21:50 -08:00
hniksic
515d82fb95 [svn] Committed my patch from <sxsy9z4xz5m.fsf@florida.arsdigita.de>
(recognize HTML entities.)
2000-10-31 17:25:12 -08:00
hniksic
846b045a69 [svn] Applied my patch from <sxs3dhczfv5.fsf@florida.arsdigita.de>. 2000-10-31 16:38:57 -08:00
hniksic
f6715dd08d [svn] Committed my patch from <sxs7l6ozghz.fsf@florida.arsdigita.de>. 2000-10-31 16:26:33 -08:00
hniksic
0dd418242a [svn] Committed my patches from <sxsbsw16sbu.fsf@florida.arsdigita.de>
and <sxsvgu824xk.fsf@florida.arsdigita.de>.
2000-10-31 11:25:32 -08:00
hniksic
b095202cad [svn] Applied Adrian Aichner's patch from
<20001029223711.28688.qmail@web10601.mail.yahoo.com>.
2000-10-30 13:07:04 -08:00
dan
24c465b5ad [svn] retr.c (retrieve_url): Manually applied T. Bharath
<TBharath@responsenetworks.com>'s patch to get wget to grok illegal relative URL
redirects.  Reformatted and re-commented it.
2000-10-27 20:18:20 -07:00
dan
1396b30055 [svn] Manually applied Rob Mayoff <mayoff@dqd.com>'s patch (vs. 1.5.3, not 1.5.3+dev)
to add --bind-address, making many necessary alphabetization, coding style,
comment, documentation, and naming fixes and additions.
2000-10-23 23:19:17 -07:00
dan
2fbb4936a0 [svn] main.c (print_help): Clarified that --delete-after deletes local files. 2000-10-23 20:52:34 -07:00
dan
f4673bcdaf [svn] --delete-after wasn't implemented for files retrieved by FTP or corresponding to
files specified on the commandline.  Made --convert-links be ignored when
--delete-after is specified.  Added note about this fact to --delete-after docs
and made general improvements to them, including the clarification that
--delete-after only deletes local files.
2000-10-23 20:43:47 -07:00
dan
8a9be7627d [svn] ftp.c (getftp): Applied Piotr Sulecki <Piotr.Sulecki@ios.krakow.pl>'s
patch to work around FTP servers that incorrectly respond to the
          	"REST" command with the remaining size rather than the total
          	file size.
2000-10-20 00:28:57 -07:00
dan
8cf52e0dd3 [svn] Applied John Daily <jdaily@cyberdude.com>'s patch for his "quad" commands (which
I renamed to "lockable_boolean") in the .wgetrc (currently just passive_ftp).
Wrote documentation for his changes and added the missing "referer" to the
.wgetrc section (making mention of the issue of "referrer" being the correct
spelling).
2000-10-19 23:59:30 -07:00
dan
b3e2c0ff97 [svn] Implemented and documented new -E / --html-extension / html_extension option. 2000-10-19 22:55:46 -07:00
dan
cbf018d0c0 [svn] --retr-symlinks was not previously documented properly. Based on my newfound
understanding of what its limitations are, added a TODO item.  Also made a minor
tweak in html.c to silence a warning.
2000-10-09 15:43:11 -07:00
dan
7931200609 [svn] * *.{gmo,po,pot}: Regenerated after modifying wget --help output.
* ftp.c (ftp_retrieve_list): Use new INFINITE_RECURSION #define.

* html.c: htmlfindurl() now takes final `dash_p_leaf_HTML' parameter.
Wrapped some > 80-column lines.  When -p is specified and we're at a
leaf node, do not traverse <A>, <AREA>, or <LINK> tags other than
<LINK REL="stylesheet">.

* html.h (htmlfindurl): Now takes final `dash_p_leaf_HTML' parameter.

* init.c: Added new -p / --page-requisites / page_requisites option.

* main.c (print_help): Clarified that -l inf and -l 0 both allow
infinite recursion.  Changed the unhelpful --mirrior description
to simply give the options it's equivalent to.  Added new -p option.
(main): Added some comments; handle new -p / --page-requisites.

* options.h (struct options): Added new page_requisites field.

* recur.c: Changed "URL-s" to "URLs" and "HTML-s" to "HTMLs".
Calculate and pass down new `dash_p_leaf_HTML' parameter to
get_urls_html().  Use new INFINITE_RECURSION #define.

* retr.c: Changed "URL-s" to "URLs".  get_urls_html() now takes
final `dash_p_leaf_HTML' parameter.

* url.c: get_urls_html() and htmlfindurl() now take final
`dash_p_leaf_HTML' parameter.

* url.h (get_urls_html): Now takes final `dash_p_leaf_HTML' parameter.

* wget.h: Added some comments and new INFINITE_RECURSION #define.

* wget.texi (Recursive Retrieval Options): Documented new -p option.
2000-08-30 04:26:21 -07:00
dan
001392bf2b [svn] * main.c (print_help): -B / --base was not mentioned. 2000-08-23 15:40:20 -07:00
dan
1f0acebeb0 [svn] * main.c (print_help): Modified -nc description to mention that it also prevents
the creation of multiple versions of the same file with ".<number>" suffixes.
2000-08-22 20:11:55 -07:00
hniksic
7794db052c [svn] Committed Jan Prikryl's patch from
<20000709171425.A16267@launzatte.cg.tuwien.ac.at>.
2000-07-14 07:15:23 -07:00
dan
ae77e4f08e [svn] Oops. Meant to check this change in with my last one, but the commit wouldn't
go through without doing an update first, and I forgot to make the change the
second time.  Just changed an erroneous main.c (main) to main.c (print_help).
2000-06-09 14:40:26 -07:00
dan
eea2d24220 [svn] Heiko's --help output for --waitretry was over 80 columns. Shortened. It also
said that 0 seconds are waited after the first retry, which I believe is
incorrect and does not match what's written elsewhere (e.g. wget.texi).  Changed
to 1.
2000-06-09 13:59:56 -07:00
hniksic
1765080b2e [svn] Comment fix. 2000-06-09 01:03:19 -07:00
hniksic
2e806fb2f3 [svn] Don't try to chmod() symlinks. 2000-06-01 04:20:05 -07:00
hniksic
0eec6b9f30 [svn] Committed my patch <dpem6hln1k.fsf@mraz.iskon.hr>. 2000-06-01 03:47:03 -07:00
dan
b05feb3ae2 [svn] Damir Dzeko <ddzeko@zesoi.fer.hr> did not document his new --referer option.
Did so (--help output and wget.texi).  Also tweaked --help output for --execute.
2000-05-22 19:29:38 -07:00
hniksic
ee6065f581 [svn] Committed my patch from <dpd7mj3sap.fsf@mraz.iskon.hr>. 2000-05-19 00:37:22 -07:00
hniksic
094481c386 [svn] Committed host.c patch from <dpk8i3za97.fsf_-_@mraz.iskon.hr>. 2000-04-14 02:31:21 -07:00
hniksic
6b4a85888e [svn] Commit several fixes. 2000-04-12 06:23:35 -07:00
dan
1ecfed1e10 [svn] * host.c (store_hostaddress): R. K. Owen's patch introduces a "left shift count
>= width of type" warning on 32-bit architectures.  Got rid of it by tricking
  the compiler w/ a variable.

* url.c (UNSAFE_CHAR): The macro didn't include all the illegal characters per
  RFC1738, namely everything above '~'.  It also generated a warning on OSes
  where char =~ unsigned char.  Fixed.
2000-04-04 20:08:10 -07:00
hniksic
bc7060a81d [svn] More old fixes. 2000-03-31 06:14:58 -08:00
hniksic
aeabb42714 [svn] Commit another old fix. 2000-03-31 06:08:57 -08:00
hniksic
858adf01cd [svn] Fix store_hostaddress() on big-endian 64-bit machines. 2000-03-31 06:07:07 -08:00
hniksic
0d42b49e30 [svn] Commit really old change. 2000-03-31 06:04:54 -08:00
hniksic
6b0aaebf33 [svn] Committed patch from <dp8zyzz1se.fsf@mraz.iskon.hr>. 2000-03-31 05:51:53 -08:00
hniksic
c71f174ed6 [svn] Changes from <9t9pusol5a1.fsf@mraz.iskon.hr>. 2000-03-21 07:47:45 -08:00
dan
4454f6ce0a [svn] * TODO: Removed done item: we now have an option (-G) that makes it easy to
download a single HTML document and all its constituents.

* po/*.{gmo,po,pot}: Regenerated after adding new options.

* po/hr.po: Hrvoje forgot '\n's on his translations of my altered messages,
causing msgfmt to balk and `make install' to fail.


* wget.texi (Recursive Retrieval Options): In -K description, added a link to
the discussion of interaction with -N.
(Recursive Accept/Reject Options): Did some alphabetizing and added descriptions
of new --follow-tags and -G / --ignore-tags options.
(Following Links): Changed "the loads of" to "loads of".
(Wgetrc Commands): Added descriptions of new follow_tags and ignore_tags
commands.


* html.c (idmatch): Implemented checking of my new --follow-tags and
--ignore-tags options.

* init.c (commands): Added comment reminding people adding new entries doing
allocation to add corresponding freeing in cleanup().
(commands): Added new followtags and ignoretags commands.
(cleanup): Free storage for new followtags and ignoretags.

* main.c: Use of "comma-separated list" was random -- normalized it.  Did some
alphabetization.  Added comments pointing out "Options without arguments" and
"Options accepting an argument" sections of long_options[].  Added new options
--follow-tags and -G / --ignore-tags.  Added comment that Damir's --referer is
currently undocumented.  Added comment that Heiko's --waitretry is partially
undocumented (mentioned in --help but not in wget.texi).  Moved improperly
sorted 24, 129, and 'G' cases.

* options.h (struct options): Added new fields follow_tags and ignore_tags.

* wget.h: Added "#define EQ 0" so we can say "strcmp(a, b) == EQ".
2000-03-10 22:48:06 -08:00
dan
66ced51104 [svn] Dan Berger responded to my email. Added his explanation of what his patch was
coded for (downloading StarOffice from Sun's website).  He says he doesn't use
wget any more, so he won't be writing a patch that allows downloading that
without breaking anything (such a patch would apparently involve stopping
certain characters in the URL from being escaped).
2000-03-02 15:49:37 -08:00
dan
3a8c75cac4 [svn] Dan Berger's query string patch is totally bogus. If you have two different
URLs, gen_page.cgi?page1 and get_page.cgi?page2, they'll both be saved as
get_page.cgi and the second will overwrite the first.  Also, parameters to
implicit CGIs, like "http://www.host.com/db/?2000-03-02" cause the URLs to be
printed with trailing garbage characters, and could seg fault.  I'm not sure
what Dan had in mind with this patch (no explanatory comments), but I'm removing
it for now.  If he can rewrite it so it doesn't break stuff, okay.
2000-03-02 14:48:07 -08:00
dan
d2e1d7fe9d [svn] Hrvoje didn't regenerate the .info files after changing wget.texi.
Got rid of newly-introduced nested-if warnings in ftp.c and http.c.  Fixed
apparently completely untested code in main.c that was trying to provide --wait
/ --waitretry backwards compatibility, but had multiple fundamental bugs.
2000-03-02 13:17:47 -08:00
hniksic
5d8cfbd904 [svn] Applied contributed patches (see ChangeLog for details.) 2000-03-02 06:45:37 -08:00
hniksic
1dc66a6cf6 [svn] *** empty log message *** 2000-03-02 06:23:22 -08:00
hniksic
33dbab7add [svn] Added contributed patch. 2000-03-02 06:18:53 -08:00
hniksic
2b2fd2924a [svn] Added user-contributed patches. 2000-03-02 06:16:12 -08:00
hniksic
cf35e2009d [svn] Applied contributed patch. 2000-03-02 05:34:05 -08:00
hniksic
f4f8e83327 [svn] Applied Edward Sabol's patch. 2000-03-02 05:28:31 -08:00
dan
4331c39c9a [svn] Implemented the item I formerly had in the TODO: When -K and -N are used
together, we compare local file X.orig (if extant) against server file X.
Previously -k and -N were worthless in combination because the local converted
files always differed from the server versions.
2000-03-01 22:33:48 -08:00
dan
e0a58713f7 [svn] Upped version number from 1.5.3. to 1.5.3+dev. Because the development source
is available via anonymous CVS and desirable features are being added, it's
quite possible for end-users to be getting their hands on development versions.
They may report bugs, so if we don't change the version number, we'll have to
continually followup the statement "I'm using version 1.5.3" with the question
"The FTP archive or the CVS source?"  Better to just make this development
version have a unique number.  Once we're ready to actually release the next
version, we can up the version from 1.5.3+dev to 1.5.4, or 1.6, or whatever it
turns out to be (depending on how much development gets done).

Also made minor updates (dates, email addresses) to wget.texi.
2000-02-29 16:50:52 -08:00
dan
e5408e7db8 [svn] Implemented new -K / --backup-converted / backup_converted = on option. 2000-02-29 16:17:23 -08:00
kwget
31d6616c48 [svn] Initial revision 1999-12-01 23:42:23 -08:00