Commit Graph

119 Commits

Author SHA1 Message Date
hniksic 46617228fa [svn] URL-decode user and password in URL.
Published in <sxsita52hg6.fsf@florida.arsdigita.de>.
2002-01-14 05:26:16 -08:00
hniksic 524a1f54dc [svn] Handle links to relative "net locations," e.g. <a href="//www.server.com/">. 2002-01-13 17:56:40 -08:00
hniksic a3500d32d7 [svn] Move path_simplify to url.c. 2001-12-14 07:46:00 -08:00
hniksic b9f370004d [svn] Cosmetic changes to get_urls_html. 2001-12-12 11:06:10 -08:00
hniksic 943f657aa7 [svn] Rename long_to_string to number_to_string, and make it return a useful
value.
2001-12-09 18:29:12 -08:00
hniksic dd84231c6a [svn] Minor fixes prompted by `lint'.
Published in <sxsadwt2nkg.fsf@florida.arsdigita.de>.
2001-12-08 17:24:41 -08:00
hniksic 0620ada923 [svn] Fix OpenSSL PRNG seeding.
Published in <sxs7ks1noc4.fsf@florida.arsdigita.de>.
2001-12-05 17:13:31 -08:00
hniksic 0fdc1bd8c0 [svn] Fix downloading of duplicate URLs.
Published in <sxsvgfmu2bj.fsf@florida.arsdigita.de>.
2001-12-04 13:03:35 -08:00
hniksic e986f7dad3 [svn] Quote '?' as '%3F' in local files when `--html-extension' is turned on.
Published in <sxszo4ztiwr.fsf@florida.arsdigita.de>.
2001-12-04 01:49:37 -08:00
hniksic 8b2a216c77 [svn] Make --base -i work.
Published in <sxsoflisqcf.fsf@florida.arsdigita.de>.
2001-12-01 11:17:19 -08:00
hniksic 569fd61c95 [svn] Use the full path when building the authorization line.
Published in <sxsitbqu9iw.fsf@florida.arsdigita.de>.
2001-12-01 09:39:07 -08:00
hniksic f4d019a423 [svn] Correctly convert links in <meta http-equiv=Refresh content="...">.
Published in <sxsadx3wp49.fsf@florida.arsdigita.de>.
2001-11-30 20:18:51 -08:00
hniksic cca7541b10 [svn] Don't translate %d-%d. 2001-11-27 04:58:09 -08:00
hniksic df05e7ff10 [svn] Handle <base href=...> when converting links.
Published in <sxsadxaae3t.fsf@florida.arsdigita.de>.
2001-11-25 10:40:55 -08:00
hniksic 2e6e3f21f8 [svn] Attempt to quote '?' as "%3F" when linking to local files.
Given up on the attempt, as it breaks local browsing.
2001-11-25 09:44:28 -08:00
hniksic 222e9465b7 [svn] Implemented breadth-first retrieval.
Published in <sxsherjczw2.fsf@florida.arsdigita.de>.
2001-11-24 19:10:34 -08:00
hniksic d5be8ecca4 [svn] Rewrite parsing and handling of URLs.
Published in <sxs4rnnlklo.fsf@florida.arsdigita.de>.
2001-11-21 16:24:28 -08:00
hniksic a24b3d50f0 [svn] Don't use the now-obsolete TYPE variable.
Published in <sxswv0ledyx.fsf@florida.arsdigita.de>.
2001-11-20 08:03:41 -08:00
hniksic 94c5b23136 [svn] Handle shorthands in proxy URLs.
Published in <sxs6686py1q.fsf@florida.arsdigita.de>.
2001-11-19 08:15:42 -08:00
hniksic e8e8797873 [svn] Rewrite shorthand URLs in a step separate from parsing.
Published in <sxspu6f7ecz.fsf@florida.arsdigita.de>.
2001-11-18 17:14:14 -08:00
hniksic f178e6c613 [svn] Clean up handling of schemes.
Published in <sxswv0n7h7s.fsf@florida.arsdigita.de>.
2001-11-18 16:12:05 -08:00
hniksic 303f406997 [svn] Don't list all the "known" (but unsupported) protocols. Instead, just
skip the characters until the first ':'.
Published in <sxsitc8a848.fsf@florida.arsdigita.de>.
2001-11-17 22:49:09 -08:00
hniksic 0c42479322 [svn] Applied Edward Sabol's patch from
<200106131813.f5DIDss1294858@alderaan.gsfc.nasa.gov>.
It fixes a memory leak in url_equal, and comments it out,
as it's unused.
2001-11-16 08:49:19 -08:00
hniksic e1f4cff68c [svn] Make sure that slashes don't sneak in as part of file name via
query string.
Published in <sxsu21eb3te.fsf@florida.arsdigita.de>.
2001-06-18 02:08:04 -07:00
hniksic 0b056d1720 [svn] Update copyright notices. 2001-05-27 12:35:15 -07:00
hniksic ae621c6770 [svn] Treat empty proxy environment vars as unset.
Published in <sxssniwq8d6.fsf@florida.arsdigita.de>.
2001-04-26 03:11:49 -07:00
hniksic d80f6cbe8c [svn] Reimplemented UNSAFE_CHAR and RESERVED_CHAR.
Fixed snprintf.c to avoid ISDIGIT.
2001-04-24 17:20:30 -07:00
hniksic ac7c8c1390 [svn] Improve performance of grow_hash_table.
Published in <sxs66g8nd4c.fsf@florida.arsdigita.de>.
2001-04-14 00:41:29 -07:00
hniksic 61bb00adc0 [svn] Various url.c-related changes.
Published in <sxsvgo8nmub.fsf@florida.arsdigita.de>.

* retr.c (retrieve_url): Call uri_merge, not url_concat.
* html-url.c (collect_tags_mapper): Call uri_merge, not
url_concat.
* url.c (mkstruct): Use encode_string instead of xstrdup followed
by URL_CLEANSE.
(path_simplify_with_kludge): Deleted.
(contains_unsafe): Deleted.
(construct): Renamed to uri_merge_1.
(url_concat): Renamed to uri_merge.
* url.c (str_url): Use encode_string instead of the unnecessary
CLEANDUP.
(encode_string_maybe): New function, returns input string if no
encoding is needed.
(encode_string): Call encode_string_maybe to do the dirty work,
xstrdup if no work needed.
* wget.h (XDIGIT_TO_xchar): Define here.
* url.c (decode_string): Use new name.
(encode_string): Ditto.
* http.c (XDIGIT_TO_xchar): Rename HEXD2asc to XDIGIT_TO_xchar.
(dump_hash): Use new name.
* wget.h: Rename ASC2HEXD and HEXD2ASC to XCHAR_TO_XDIGIT and
XDIGIT_TO_XCHAR respectively.
2001-04-13 21:11:35 -07:00
hniksic 8a0e9e765e [svn] Minor -Wall-induced fixes. Also, skip_url is removed.
Published in <sxs8zl5v5cw.fsf@florida.arsdigita.de>.
2001-04-12 20:39:23 -07:00
hniksic 963863113f [svn] Fix retrieval of directories when initial CWD is not `/'.
Published in <sxsitkc709p.fsf@florida.arsdigita.de>.

* url.c (parseurl): Don't strip trailing slash when u->dir is "/"
because that strips the *leading* slash, thus forcing relative
FTP retrieval.
* ftp.c (getftp): Convert initial FTP directory from VMS to UNIX
notation for VMS servers.
(ftp_retrieve_dirs): Do not prepend '/' to f->name when
odir is an empty string.
2001-04-10 17:24:59 -07:00
hniksic c51015565a [svn] parse_uname() Would run past the end of the string if the
username was present, but the URL did not contain a slash, e.g.
http://foo:bar@myhost.
Reported by Christian Fraenkel.
2001-04-04 07:00:34 -07:00
hniksic 1a6058b1ec [svn] Applied Philipp Thomas's safe-ctype patch. Published in
<20010330025159.U21662@jeffreys.suse.de>.
2001-03-30 14:36:59 -08:00
janp 5014d32c3a [svn] Skip `:port' in the host header if it is the DEFAULT_HTTPS_PORT when
using SSL. Patch submitted by Hack Kampbjorn <hack@hackdata.com>.
2001-03-08 15:11:03 -08:00
hniksic 54811e2832 [svn] Applied Jan's patch to allow non-quoted @ character in
passwords.  Published in <20010106173455.A9455@erwin.telekabel.at>.
2001-02-10 16:28:22 -08:00
hniksic b370dd1914 [svn] Applied Hack Kampbjorn's patch to print FTP type in debug output.
Published in <3A7D94B5.D9B932FB@hackdata.com>.
2001-02-10 16:06:59 -08:00
dan fa636eb71d [svn] url.c (str_url): Clarified this function's comment header after Hrvoje answered
my question on the list as to when hide != 1.  Also Hrvoje pointed out I need to
use xstrdup() on the string literal.
2001-01-10 22:16:46 -08:00
dan 48cf02169d [svn] Just clarified a comment in the fix I just committed. 2001-01-09 20:32:29 -08:00
dan 1993e140f2 [svn] url.c (str_url): Henrik van Ginhoven pointed out on the list that we shouldn't
give away the number of characters in the password by replacing each character
with a 'x'.  Use "<password>" instead.
2001-01-09 20:30:43 -08:00
dan a77dc45c4d [svn] Hrvoje's response to my "wondering" comment in write_backup_file() read
extremely strangely without adding tags to show who was saying what.  Also, one
of his phrases was very misleading.
2001-01-09 18:10:16 -08:00
hniksic 35325bd092 [svn] Include fragment identifiers in converted URLs. Published in
<sxs8zorl90l.fsf@florida.arsdigita.de>.
2001-01-04 05:53:53 -08:00
hniksic 5099ec0306 [svn] Apply lint-expired fixes from <sxsn1du7ufa.fsf@florida.arsdigita.de>. 2000-12-17 10:52:52 -08:00
hniksic 7828e81c79 [svn] Committed C. Frankel's SSL patch. 2000-12-05 15:09:41 -08:00
hniksic 7b5ad90acf [svn] Commit my url.c fix (space as unsafe character) and Jan's
winnt directory listing parsing.
2000-12-05 14:29:47 -08:00
hniksic 1cddc05edb [svn] Committed memory debugging stuff.
Published in <sxs1yw34pt4.fsf@florida.arsdigita.de>.
2000-11-22 14:15:45 -08:00
hniksic 2ffb47eabf [svn] Committed <sxsbsv854j9.fsf@florida.arsdigita.de>. 2000-11-22 08:58:28 -08:00
hniksic 6e598c81e3 [svn] Committed a bunch of different tweaks of mine.
Published in <sxsr9463wrx.fsf@florida.arsdigita.de>.
2000-11-20 18:06:36 -08:00
hniksic b0b1c815c1 [svn] A bunch of new features:
- use mmap() to read whole files in core instead of allocating memory
  and read'ing it.

- use a new, more general, HTML parser (html-parse.c) and interface to
  it from Wget (html-url.c).

- respect <meta name=robots content=nofollow> (easy with the new HTML
  parser).

- use hash tables instead of linked lists in places where the lists
  were used to facilitate mappings.

- rewrite the code in host.c to be more readable and faster (hash
  tables instead of home-grown lists.)

- make convert_links properly convert partial URLs to complete ones
  for those URLs that have *not* been downloaded.

- use HTTP persistent connections where available.  very
  simple-minded, caches the last connection to the server.

Published in <sxshf533d5r.fsf@florida.arsdigita.de>.
2000-11-19 12:50:10 -08:00
hniksic f306ae9626 [svn] Changed last_slash[-1] to *(last_slash - 1). 2000-11-08 07:51:28 -08:00
hniksic b72b6cf387 [svn] Correctly handle URLs where / does not follow the host name.
Published in <sxsn1fag6zu.fsf@florida.arsdigita.de>.
2000-11-08 01:15:40 -08:00
hniksic 0e2b74ce3b [svn] Commit "minor fixes". 2000-11-06 13:24:57 -08:00
hniksic 366ad1d6d9 [svn] Rewrote the logging code.
Published at <sxs1ywrf300.fsf@florida.arsdigita.de>.
2000-11-04 20:38:31 -08:00
hniksic eef4a668b7 [svn] Update copyright blurbs with the year 2000. 2000-11-01 17:50:03 -08:00
hniksic b3758323ed [svn] Applied contributed fix. 2000-11-01 15:57:19 -08:00
hniksic b9eeb0c54c [svn] Fix "optimization" of query-strings in URLs.
Published in <sxs3dhbwnmw.fsf@florida.arsdigita.de>.
2000-11-01 10:31:53 -08:00
hniksic 515d82fb95 [svn] Committed my patch from <sxsy9z4xz5m.fsf@florida.arsdigita.de>
(recognize HTML entities.)
2000-10-31 17:25:12 -08:00
hniksic f6715dd08d [svn] Committed my patch from <sxs7l6ozghz.fsf@florida.arsdigita.de>. 2000-10-31 16:26:33 -08:00
hniksic 0dd418242a [svn] Committed my patches from <sxsbsw16sbu.fsf@florida.arsdigita.de>
and <sxsvgu824xk.fsf@florida.arsdigita.de>.
2000-10-31 11:25:32 -08:00
dan b3e2c0ff97 [svn] Implemented and documented new -E / --html-extension / html_extension option. 2000-10-19 22:55:46 -07:00
dan 7931200609 [svn] * *.{gmo,po,pot}: Regenerated after modifying wget --help output.
* ftp.c (ftp_retrieve_list): Use new INFINITE_RECURSION #define.

* html.c: htmlfindurl() now takes final `dash_p_leaf_HTML' parameter.
Wrapped some > 80-column lines.  When -p is specified and we're at a
leaf node, do not traverse <A>, <AREA>, or <LINK> tags other than
<LINK REL="stylesheet">.

* html.h (htmlfindurl): Now takes final `dash_p_leaf_HTML' parameter.

* init.c: Added new -p / --page-requisites / page_requisites option.

* main.c (print_help): Clarified that -l inf and -l 0 both allow
infinite recursion.  Changed the unhelpful --mirrior description
to simply give the options it's equivalent to.  Added new -p option.
(main): Added some comments; handle new -p / --page-requisites.

* options.h (struct options): Added new page_requisites field.

* recur.c: Changed "URL-s" to "URLs" and "HTML-s" to "HTMLs".
Calculate and pass down new `dash_p_leaf_HTML' parameter to
get_urls_html().  Use new INFINITE_RECURSION #define.

* retr.c: Changed "URL-s" to "URLs".  get_urls_html() now takes
final `dash_p_leaf_HTML' parameter.

* url.c: get_urls_html() and htmlfindurl() now take final
`dash_p_leaf_HTML' parameter.

* url.h (get_urls_html): Now takes final `dash_p_leaf_HTML' parameter.

* wget.h: Added some comments and new INFINITE_RECURSION #define.

* wget.texi (Recursive Retrieval Options): Documented new -p option.
2000-08-30 04:26:21 -07:00
hniksic 1765080b2e [svn] Comment fix. 2000-06-09 01:03:19 -07:00
hniksic 0eec6b9f30 [svn] Committed my patch <dpem6hln1k.fsf@mraz.iskon.hr>. 2000-06-01 03:47:03 -07:00
dan 1ecfed1e10 [svn] * host.c (store_hostaddress): R. K. Owen's patch introduces a "left shift count
>= width of type" warning on 32-bit architectures.  Got rid of it by tricking
  the compiler w/ a variable.

* url.c (UNSAFE_CHAR): The macro didn't include all the illegal characters per
  RFC1738, namely everything above '~'.  It also generated a warning on OSes
  where char =~ unsigned char.  Fixed.
2000-04-04 20:08:10 -07:00
hniksic 0d42b49e30 [svn] Commit really old change. 2000-03-31 06:04:54 -08:00
dan 3a8c75cac4 [svn] Dan Berger's query string patch is totally bogus. If you have two different
URLs, gen_page.cgi?page1 and get_page.cgi?page2, they'll both be saved as
get_page.cgi and the second will overwrite the first.  Also, parameters to
implicit CGIs, like "http://www.host.com/db/?2000-03-02" cause the URLs to be
printed with trailing garbage characters, and could seg fault.  I'm not sure
what Dan had in mind with this patch (no explanatory comments), but I'm removing
it for now.  If he can rewrite it so it doesn't break stuff, okay.
2000-03-02 14:48:07 -08:00
hniksic 2b2fd2924a [svn] Added user-contributed patches. 2000-03-02 06:16:12 -08:00
dan 4331c39c9a [svn] Implemented the item I formerly had in the TODO: When -K and -N are used
together, we compare local file X.orig (if extant) against server file X.
Previously -k and -N were worthless in combination because the local converted
files always differed from the server versions.
2000-03-01 22:33:48 -08:00
dan e5408e7db8 [svn] Implemented new -K / --backup-converted / backup_converted = on option. 2000-02-29 16:17:23 -08:00
kwget 31d6616c48 [svn] Initial revision 1999-12-01 23:42:23 -08:00