1
0
mirror of https://github.com/moparisthebest/wget synced 2024-07-03 16:38:41 -04:00
Commit Graph

135 Commits

Author SHA1 Message Date
hniksic
0fdc1bd8c0 [svn] Fix downloading of duplicate URLs.
Published in <sxsvgfmu2bj.fsf@florida.arsdigita.de>.
2001-12-04 13:03:35 -08:00
hniksic
7ab7f93f8d [svn] Make -p work with framed pages.
Published in <sxsu1vby71t.fsf@florida.arsdigita.de>.
2001-11-30 19:06:41 -08:00
hniksic
a4db28e20f [svn] Ignore -np when in -p mode.
Published in <sxsg06w2c52.fsf@florida.arsdigita.de>.
2001-11-30 13:17:53 -08:00
hniksic
39482df431 [svn] descend_url_p: When resolving no_parent, compare with the start url,
not the parent url.
Published in <sxspu614ikm.fsf@florida.arsdigita.de>.
2001-11-29 09:04:28 -08:00
hniksic
024cb5ed3a [svn] A lot of host name changes.
Published in <sxs3d32856s.fsf@florida.arsdigita.de>.
2001-11-25 21:36:33 -08:00
hniksic
f6921edc73 [svn] Be careful whether we want to descend into results of redirection.
Published in <sxs7kse8hmq.fsf@florida.arsdigita.de>.
2001-11-25 17:11:48 -08:00
hniksic
3afb9c659a [svn] Recursion and progress bar tweaks.
Published in <sxsd727cvc0.fsf@florida.arsdigita.de>.
2001-11-25 13:03:30 -08:00
hniksic
df05e7ff10 [svn] Handle <base href=...> when converting links.
Published in <sxsadxaae3t.fsf@florida.arsdigita.de>.
2001-11-25 10:40:55 -08:00
hniksic
222e9465b7 [svn] Implemented breadth-first retrieval.
Published in <sxsherjczw2.fsf@florida.arsdigita.de>.
2001-11-24 19:10:34 -08:00
hniksic
1da2947d50 [svn] Fix typo that made us never use robots.txt. 2001-11-23 17:48:28 -08:00
hniksic
d5be8ecca4 [svn] Rewrite parsing and handling of URLs.
Published in <sxs4rnnlklo.fsf@florida.arsdigita.de>.
2001-11-21 16:24:28 -08:00
hniksic
f178e6c613 [svn] Clean up handling of schemes.
Published in <sxswv0n7h7s.fsf@florida.arsdigita.de>.
2001-11-18 16:12:05 -08:00
hniksic
05f90bb302 [svn] Plug in new implementation of RES.
Published in <sxselmwddt0.fsf@florida.arsdigita.de>.
2001-11-17 18:17:30 -08:00
hniksic
2255a89b24 [svn] After canonicalizing the URL, check for its existence among undesirable_urls.
Published in <sxs7kyeohte.fsf@florida.arsdigita.de>.
2001-06-14 14:48:00 -07:00
hniksic
0b056d1720 [svn] Update copyright notices. 2001-05-27 12:35:15 -07:00
hniksic
72eca0976b [svn] Commit several minor changes:
* main.c (print_help): Document `--no-http-keep-alive'.

* utils.c (numdigit): Handle negative numbers *correctly*.

* hash.c (make_nocase_string_hash_table): Use term "nocase" rather
than the confusing "unsigned".

* utils.c (string_set_contains): Renamed from string_set_exists.

* hash.c (hash_table_contains): Renamed from hash_table_exists.

* cookies.c: Move case-insensitive hash tables to hash.c.

Published in <sxsheyq9vvl.fsf@florida.arsdigita.de>.
2001-05-12 13:06:41 -07:00
hniksic
aa888ba8da [svn] Don't clear dl_file_url_map and dl_url_file_map in recursive_retrieve.
Published in <sxsk856le2y.fsf@florida.arsdigita.de> under the subject
"Link conversion fix".
2001-03-31 18:41:26 -08:00
hniksic
8c4cd805e2 [svn] Oops! Fix braino in recur.c -- clear the hash tables only when
they are defined.
2001-03-30 18:21:20 -08:00
hniksic
728584d072 [svn] Record downloaded files and downloaded HTML files in all cases.
Published under the subject "Link conversion fix" in
<sxsn1a2n2zd.fsf@florida.arsdigita.de>.
2001-03-30 18:05:54 -08:00
hniksic
1a6058b1ec [svn] Applied Philipp Thomas's safe-ctype patch. Published in
<20010330025159.U21662@jeffreys.suse.de>.
2001-03-30 14:36:59 -08:00
hniksic
5099ec0306 [svn] Apply lint-expired fixes from <sxsn1du7ufa.fsf@florida.arsdigita.de>. 2000-12-17 10:52:52 -08:00
hniksic
2e8fc46b7b [svn] Include <netdb.h> where h_errno is used. Likewise for <errno.h> and errno.
From <sxsvgsi7wcw.fsf@florida.arsdigita.de>.
2000-12-17 10:12:02 -08:00
hniksic
2ffb47eabf [svn] Committed <sxsbsv854j9.fsf@florida.arsdigita.de>. 2000-11-22 08:58:28 -08:00
hniksic
6e598c81e3 [svn] Committed a bunch of different tweaks of mine.
Published in <sxsr9463wrx.fsf@florida.arsdigita.de>.
2000-11-20 18:06:36 -08:00
hniksic
b27144fcce [svn] My patch "persistent connection tweaks".
Published in <sxshf531qhj.fsf@florida.arsdigita.de>.

(Applied with the addition of correct calculation for the
length of the request.)
2000-11-19 15:42:13 -08:00
hniksic
b0b1c815c1 [svn] A bunch of new features:
- use mmap() to read whole files in core instead of allocating memory
  and read'ing it.

- use a new, more general, HTML parser (html-parse.c) and interface to
  it from Wget (html-url.c).

- respect <meta name=robots content=nofollow> (easy with the new HTML
  parser).

- use hash tables instead of linked lists in places where the lists
  were used to facilitate mappings.

- rewrite the code in host.c to be more readable and faster (hash
  tables instead of home-grown lists.)

- make convert_links properly convert partial URLs to complete ones
  for those URLs that have *not* been downloaded.

- use HTTP persistent connections where available.  very
  simple-minded, caches the last connection to the server.

Published in <sxshf533d5r.fsf@florida.arsdigita.de>.
2000-11-19 12:50:10 -08:00
hniksic
e1f1c1ff40 [svn] Better version of read_whole_line().
Published in <sxsr94jd7z4.fsf@florida.arsdigita.de>.
2000-11-10 10:01:35 -08:00
hniksic
eef4a668b7 [svn] Update copyright blurbs with the year 2000. 2000-11-01 17:50:03 -08:00
hniksic
b7a8c6d3f5 [svn] Gracefully handle opt.downloaded overflowing.
Published in <sxsd7gfnv17.fsf@florida.arsdigita.de>.
2000-11-01 15:17:31 -08:00
dan
f4673bcdaf [svn] --delete-after wasn't implemented for files retrieved by FTP or corresponding to
files specified on the commandline.  Made --convert-links be ignored when
--delete-after is specified.  Added note about this fact to --delete-after docs
and made general improvements to them, including the clarification that
--delete-after only deletes local files.
2000-10-23 20:43:47 -07:00
dan
7931200609 [svn] * *.{gmo,po,pot}: Regenerated after modifying wget --help output.
* ftp.c (ftp_retrieve_list): Use new INFINITE_RECURSION #define.

* html.c: htmlfindurl() now takes final `dash_p_leaf_HTML' parameter.
Wrapped some > 80-column lines.  When -p is specified and we're at a
leaf node, do not traverse <A>, <AREA>, or <LINK> tags other than
<LINK REL="stylesheet">.

* html.h (htmlfindurl): Now takes final `dash_p_leaf_HTML' parameter.

* init.c: Added new -p / --page-requisites / page_requisites option.

* main.c (print_help): Clarified that -l inf and -l 0 both allow
infinite recursion.  Changed the unhelpful --mirrior description
to simply give the options it's equivalent to.  Added new -p option.
(main): Added some comments; handle new -p / --page-requisites.

* options.h (struct options): Added new page_requisites field.

* recur.c: Changed "URL-s" to "URLs" and "HTML-s" to "HTMLs".
Calculate and pass down new `dash_p_leaf_HTML' parameter to
get_urls_html().  Use new INFINITE_RECURSION #define.

* retr.c: Changed "URL-s" to "URLs".  get_urls_html() now takes
final `dash_p_leaf_HTML' parameter.

* url.c: get_urls_html() and htmlfindurl() now take final
`dash_p_leaf_HTML' parameter.

* url.h (get_urls_html): Now takes final `dash_p_leaf_HTML' parameter.

* wget.h: Added some comments and new INFINITE_RECURSION #define.

* wget.texi (Recursive Retrieval Options): Documented new -p option.
2000-08-30 04:26:21 -07:00
hniksic
6b4a85888e [svn] Commit several fixes. 2000-04-12 06:23:35 -07:00
dan
03e5e4fe4d [svn] recur.c (parse_robots): Applied Edward J. Sabol's patch for Guan Yang's reported
problem with "User-agent:<space>*<space>" lines.
2000-03-02 13:28:59 -08:00
hniksic
f4f8e83327 [svn] Applied Edward Sabol's patch. 2000-03-02 05:28:31 -08:00
kwget
31d6616c48 [svn] Initial revision 1999-12-01 23:42:23 -08:00