micah
248cb3e907
[svn] Various small fixes, courtesy of Gisle Vanem <giva@bgnett.no>.
2007-08-27 10:48:16 -07:00
micah
7d2066b221
[svn] Make indentation consistent (all-spaces, no tabs).
2007-08-02 20:38:21 -07:00
micah
c17f57f1fa
[svn] Fix for bug #20296 : User:pass@ given in Referer header.
2007-07-29 18:22:34 -07:00
micah
4d7c5e087b
[svn] Merge of fix for bugs 20341 and 20410.
2007-07-09 22:53:22 -07:00
mtortonesi
763229b67f
[svn] #include'd spider.h to get rid of compiler warnings.
2006-08-28 07:41:40 -07:00
mtortonesi
0dbef4ccb4
[svn] Several fixes for recursive spider mode.
2006-08-24 08:27:57 -07:00
mtortonesi
1c7493b83e
[svn] Added sanity checks for -k, -p, -r and -N when -O is given. Added fixes for 64-bit platforms. Updated copyright and maintainer information.
2006-07-14 06:25:50 -07:00
mtortonesi
01093c0c33
[svn] Fixed recursive spider mode.
2006-05-25 09:11:29 -07:00
mtortonesi
ea4ffded27
[svn] Restricted operational semantics of frontcmp and proclist from generic strings to directory names, and fixed dirname matching algorithm. Renamed above mentioned functions to subdir_p and dir_matches_p respectively. Added testcases for subdir_p and dir_matches_p.
2006-03-15 06:55:29 -08:00
mtortonesi
0bd1751372
[svn] recur.c: changed type of html_allowed member of struct queue_element to bool.
2006-03-14 05:52:17 -08:00
hniksic
097695b723
[svn] New option --ignore-case for case-insensitive matching.
2005-07-06 12:44:00 -07:00
hniksic
db9de5b075
[svn] Update FSF's address and copyright years.
2005-07-01 19:26:52 -07:00
hniksic
2447fb9a9b
[svn] Move extern declarations to .h files.
2005-06-27 11:19:22 -07:00
hniksic
002def87d2
[svn] Rename LARGE_INT to SUM_SIZE_INT, and simplify its handling.
2005-06-25 07:39:51 -07:00
hniksic
74fbb03b10
[svn] Use bool type for boolean variables and values.
2005-06-22 12:38:10 -07:00
hniksic
277e840a0f
[svn] Remove K&R support.
2005-06-19 15:34:58 -07:00
hniksic
b49b6db4f1
[svn] Correct logic of check #6 in download_child_p.
...
By Larry Jones and Hrvoje Niksic.
2005-04-09 15:18:36 -07:00
hniksic
e2e9b753e4
[svn] Retired the `boolean' type. Renamed FREE_MAYBE to xfree_null and moved the
...
definition from wget.h to xmalloc.h. Moved the DEFAULT_LOGFILE
define to log.h. Moved the INFINITE_RECURSION define to recur.h.
2003-11-02 11:56:37 -08:00
hniksic
5f0a2b3f08
[svn] Use new macros xnew, xnew0, xnew_array, and xnew0_array in various places.
2003-10-31 06:55:50 -08:00
hniksic
711bf72609
[svn] Remove VERY_LONG_TYPE; use LARGE_INT instead. Remove special code
...
for handling VERY_LONG_TYPE overflows.
Make opt.quota a LARGE_INT.
2003-10-11 06:57:11 -07:00
hniksic
1b3cdef574
[svn] Don't descend into HTML that was downloaded by following <img src=...>
...
and such.
2003-10-10 07:25:10 -07:00
hniksic
097923f7b1
[svn] Move fnmatch() to cmpt.c and don't use it under GNU libc.
2003-10-07 16:53:31 -07:00
hniksic
95c647eb44
[svn] Split off non-URL related stuff from url.c to convert.c.
2003-09-21 15:47:14 -07:00
hniksic
d7673d398b
[svn] Check whether downloaded_html_set is non-NULL before using it.
...
Posted in <sxsr8hsvnhh.fsf@florida.munich.redhat.com>.
2002-07-24 14:16:30 -07:00
hniksic
b2be7522c7
[svn] Update the license to include the OpenSSL exception.
2002-05-17 19:16:36 -07:00
abbotti
83dc077b17
[svn] (download_child_p): Minor optimization to avoid unnecessary call to
...
schemes_are_similar_p function.
Published in <kvq7eu4okekh2ohb0rdvavt16nbgb02v00@farscape.privy.mev.co.uk>.
2002-05-16 10:38:30 -07:00
abbotti
e863a6323b
[svn] New function schemes_are_similar_p to test enumerated scheme codes for
...
similarity (SCHEME_HTTP and SCHEME_HTTPS are similar). Use it in recur.c
(download_child_p). Fixes a bug that caused -H option to be ignored when
child scheme different to parent scheme.
Published in <agn4eu8apduek7magfu9bfe63gto8i7cdh@farscape.privy.mev.co.uk>.
2002-05-16 10:22:24 -07:00
hniksic
6fe9ec9f16
[svn] Indentation change.
2002-04-20 21:25:07 -07:00
hniksic
bf018d5721
[svn] Revert order of check number 6 in download_child_p for clarity.
2002-04-20 19:15:11 -07:00
hniksic
d4b0486cc4
[svn] Remove needless level of indentation.
2002-04-20 17:54:13 -07:00
hniksic
f8b4b8bd12
[svn] When downloading recursively, don't ignore rejection of HTML
...
documents that are themselves leaves of recursion.
2002-04-15 14:57:10 -07:00
abbotti
cfd7b9a951
[svn] Use new function to test filename for common html suffixes.
...
Submitted by Ian Abbott in <3CB72D29.4898.1F34872@localhost> with minor
changes to formatting and comments.
2002-04-12 11:53:39 -07:00
hniksic
1fa3b90235
[svn] Handle starting URL of recursing download being non-parsable.
...
Published in <sxszo26t33k.fsf@florida.arsdigita.de>.
2002-02-18 22:09:57 -08:00
hniksic
75a080ad0d
[svn] Follow https links from http.
...
Submitted by Christian Lackas in <20020211202444.GA20371@lackas.desy.de>.
2002-02-18 21:23:35 -08:00
hniksic
8db1264218
[svn] Enqueue start_url in the canonical form.
...
Published in <sxsofkvi8zx.fsf@florida.arsdigita.de>.
2001-12-19 06:27:29 -08:00
hniksic
2cf87bea8b
[svn] Fix crash introduced by previous patch.
2001-12-18 14:20:14 -08:00
hniksic
40fd876c57
[svn] Descend into HTML files we've already downloaded.
2001-12-18 14:14:31 -08:00
hniksic
416671063a
[svn] Propagate referrer information from retrieve_tree to retrieve_url.
...
Submitted by Ian Abbott in <3C1F4BFE.17436.D2D7B2@localhost>.
2001-12-18 07:22:03 -08:00
hniksic
f031900662
[svn] Don't abort when one URL references more than one file.
...
Published in <sxs1yhz0w1m.fsf@florida.arsdigita.de>.
2001-12-13 11:18:31 -08:00
hniksic
8a2ab60263
[svn] Fix overzealous URL-removal in register_download.
...
Published in <sxszo4yqq91.fsf@florida.arsdigita.de>.
2001-12-04 19:51:23 -08:00
hniksic
0fdc1bd8c0
[svn] Fix downloading of duplicate URLs.
...
Published in <sxsvgfmu2bj.fsf@florida.arsdigita.de>.
2001-12-04 13:03:35 -08:00
hniksic
7ab7f93f8d
[svn] Make -p work with framed pages.
...
Published in <sxsu1vby71t.fsf@florida.arsdigita.de>.
2001-11-30 19:06:41 -08:00
hniksic
a4db28e20f
[svn] Ignore -np when in -p mode.
...
Published in <sxsg06w2c52.fsf@florida.arsdigita.de>.
2001-11-30 13:17:53 -08:00
hniksic
39482df431
[svn] descend_url_p: When resolving no_parent, compare with the start url,
...
not the parent url.
Published in <sxspu614ikm.fsf@florida.arsdigita.de>.
2001-11-29 09:04:28 -08:00
hniksic
024cb5ed3a
[svn] A lot of host name changes.
...
Published in <sxs3d32856s.fsf@florida.arsdigita.de>.
2001-11-25 21:36:33 -08:00
hniksic
f6921edc73
[svn] Be careful whether we want to descend into results of redirection.
...
Published in <sxs7kse8hmq.fsf@florida.arsdigita.de>.
2001-11-25 17:11:48 -08:00
hniksic
3afb9c659a
[svn] Recursion and progress bar tweaks.
...
Published in <sxsd727cvc0.fsf@florida.arsdigita.de>.
2001-11-25 13:03:30 -08:00
hniksic
df05e7ff10
[svn] Handle <base href=...> when converting links.
...
Published in <sxsadxaae3t.fsf@florida.arsdigita.de>.
2001-11-25 10:40:55 -08:00
hniksic
222e9465b7
[svn] Implemented breadth-first retrieval.
...
Published in <sxsherjczw2.fsf@florida.arsdigita.de>.
2001-11-24 19:10:34 -08:00
hniksic
1da2947d50
[svn] Fix typo that made us never use robots.txt.
2001-11-23 17:48:28 -08:00
hniksic
d5be8ecca4
[svn] Rewrite parsing and handling of URLs.
...
Published in <sxs4rnnlklo.fsf@florida.arsdigita.de>.
2001-11-21 16:24:28 -08:00
hniksic
f178e6c613
[svn] Clean up handling of schemes.
...
Published in <sxswv0n7h7s.fsf@florida.arsdigita.de>.
2001-11-18 16:12:05 -08:00
hniksic
05f90bb302
[svn] Plug in new implementation of RES.
...
Published in <sxselmwddt0.fsf@florida.arsdigita.de>.
2001-11-17 18:17:30 -08:00
hniksic
2255a89b24
[svn] After canonicalizing the URL, check for its existence among undesirable_urls.
...
Published in <sxs7kyeohte.fsf@florida.arsdigita.de>.
2001-06-14 14:48:00 -07:00
hniksic
0b056d1720
[svn] Update copyright notices.
2001-05-27 12:35:15 -07:00
hniksic
72eca0976b
[svn] Commit several minor changes:
...
* main.c (print_help): Document `--no-http-keep-alive'.
* utils.c (numdigit): Handle negative numbers *correctly*.
* hash.c (make_nocase_string_hash_table): Use term "nocase" rather
than the confusing "unsigned".
* utils.c (string_set_contains): Renamed from string_set_exists.
* hash.c (hash_table_contains): Renamed from hash_table_exists.
* cookies.c: Move case-insensitive hash tables to hash.c.
Published in <sxsheyq9vvl.fsf@florida.arsdigita.de>.
2001-05-12 13:06:41 -07:00
hniksic
aa888ba8da
[svn] Don't clear dl_file_url_map and dl_url_file_map in recursive_retrieve.
...
Published in <sxsk856le2y.fsf@florida.arsdigita.de> under the subject
"Link conversion fix".
2001-03-31 18:41:26 -08:00
hniksic
8c4cd805e2
[svn] Oops! Fix braino in recur.c -- clear the hash tables only when
...
they are defined.
2001-03-30 18:21:20 -08:00
hniksic
728584d072
[svn] Record downloaded files and downloaded HTML files in all cases.
...
Published under the subject "Link conversion fix" in
<sxsn1a2n2zd.fsf@florida.arsdigita.de>.
2001-03-30 18:05:54 -08:00
hniksic
1a6058b1ec
[svn] Applied Philipp Thomas's safe-ctype patch. Published in
...
<20010330025159.U21662@jeffreys.suse.de>.
2001-03-30 14:36:59 -08:00
hniksic
5099ec0306
[svn] Apply lint-expired fixes from <sxsn1du7ufa.fsf@florida.arsdigita.de>.
2000-12-17 10:52:52 -08:00
hniksic
2e8fc46b7b
[svn] Include <netdb.h> where h_errno is used. Likewise for <errno.h> and errno.
...
From <sxsvgsi7wcw.fsf@florida.arsdigita.de>.
2000-12-17 10:12:02 -08:00
hniksic
2ffb47eabf
[svn] Committed <sxsbsv854j9.fsf@florida.arsdigita.de>.
2000-11-22 08:58:28 -08:00
hniksic
6e598c81e3
[svn] Committed a bunch of different tweaks of mine.
...
Published in <sxsr9463wrx.fsf@florida.arsdigita.de>.
2000-11-20 18:06:36 -08:00
hniksic
b27144fcce
[svn] My patch "persistent connection tweaks".
...
Published in <sxshf531qhj.fsf@florida.arsdigita.de>.
(Applied with the addition of correct calculation for the
length of the request.)
2000-11-19 15:42:13 -08:00
hniksic
b0b1c815c1
[svn] A bunch of new features:
...
- use mmap() to read whole files in core instead of allocating memory
and read'ing it.
- use a new, more general, HTML parser (html-parse.c) and interface to
it from Wget (html-url.c).
- respect <meta name=robots content=nofollow> (easy with the new HTML
parser).
- use hash tables instead of linked lists in places where the lists
were used to facilitate mappings.
- rewrite the code in host.c to be more readable and faster (hash
tables instead of home-grown lists.)
- make convert_links properly convert partial URLs to complete ones
for those URLs that have *not* been downloaded.
- use HTTP persistent connections where available. very
simple-minded, caches the last connection to the server.
Published in <sxshf533d5r.fsf@florida.arsdigita.de>.
2000-11-19 12:50:10 -08:00
hniksic
e1f1c1ff40
[svn] Better version of read_whole_line().
...
Published in <sxsr94jd7z4.fsf@florida.arsdigita.de>.
2000-11-10 10:01:35 -08:00
hniksic
eef4a668b7
[svn] Update copyright blurbs with the year 2000.
2000-11-01 17:50:03 -08:00
hniksic
b7a8c6d3f5
[svn] Gracefully handle opt.downloaded overflowing.
...
Published in <sxsd7gfnv17.fsf@florida.arsdigita.de>.
2000-11-01 15:17:31 -08:00
dan
f4673bcdaf
[svn] --delete-after wasn't implemented for files retrieved by FTP or corresponding to
...
files specified on the commandline. Made --convert-links be ignored when
--delete-after is specified. Added note about this fact to --delete-after docs
and made general improvements to them, including the clarification that
--delete-after only deletes local files.
2000-10-23 20:43:47 -07:00
dan
7931200609
[svn] * *.{gmo,po,pot}: Regenerated after modifying wget --help output.
...
* ftp.c (ftp_retrieve_list): Use new INFINITE_RECURSION #define.
* html.c: htmlfindurl() now takes final `dash_p_leaf_HTML' parameter.
Wrapped some > 80-column lines. When -p is specified and we're at a
leaf node, do not traverse <A>, <AREA>, or <LINK> tags other than
<LINK REL="stylesheet">.
* html.h (htmlfindurl): Now takes final `dash_p_leaf_HTML' parameter.
* init.c: Added new -p / --page-requisites / page_requisites option.
* main.c (print_help): Clarified that -l inf and -l 0 both allow
infinite recursion. Changed the unhelpful --mirrior description
to simply give the options it's equivalent to. Added new -p option.
(main): Added some comments; handle new -p / --page-requisites.
* options.h (struct options): Added new page_requisites field.
* recur.c: Changed "URL-s" to "URLs" and "HTML-s" to "HTMLs".
Calculate and pass down new `dash_p_leaf_HTML' parameter to
get_urls_html(). Use new INFINITE_RECURSION #define.
* retr.c: Changed "URL-s" to "URLs". get_urls_html() now takes
final `dash_p_leaf_HTML' parameter.
* url.c: get_urls_html() and htmlfindurl() now take final
`dash_p_leaf_HTML' parameter.
* url.h (get_urls_html): Now takes final `dash_p_leaf_HTML' parameter.
* wget.h: Added some comments and new INFINITE_RECURSION #define.
* wget.texi (Recursive Retrieval Options): Documented new -p option.
2000-08-30 04:26:21 -07:00
hniksic
6b4a85888e
[svn] Commit several fixes.
2000-04-12 06:23:35 -07:00
dan
03e5e4fe4d
[svn] recur.c (parse_robots): Applied Edward J. Sabol's patch for Guan Yang's reported
...
problem with "User-agent:<space>*<space>" lines.
2000-03-02 13:28:59 -08:00
hniksic
f4f8e83327
[svn] Applied Edward Sabol's patch.
2000-03-02 05:28:31 -08:00
kwget
31d6616c48
[svn] Initial revision
1999-12-01 23:42:23 -08:00