1
0
mirror of https://github.com/moparisthebest/wget synced 2024-07-03 16:38:41 -04:00
Commit Graph

44 Commits

Author SHA1 Message Date
hniksic
b0b1c815c1 [svn] A bunch of new features:
- use mmap() to read whole files in core instead of allocating memory
  and read'ing it.

- use a new, more general, HTML parser (html-parse.c) and interface to
  it from Wget (html-url.c).

- respect <meta name=robots content=nofollow> (easy with the new HTML
  parser).

- use hash tables instead of linked lists in places where the lists
  were used to facilitate mappings.

- rewrite the code in host.c to be more readable and faster (hash
  tables instead of home-grown lists.)

- make convert_links properly convert partial URLs to complete ones
  for those URLs that have *not* been downloaded.

- use HTTP persistent connections where available.  very
  simple-minded, caches the last connection to the server.

Published in <sxshf533d5r.fsf@florida.arsdigita.de>.
2000-11-19 12:50:10 -08:00
hniksic
69cb9be79c [svn] *** empty log message *** 2000-11-16 08:29:46 -08:00
hniksic
06825343f1 [svn] Doc tweaks.
Published by me in <sxs8zqkayf2.fsf@florida.arsdigita.de>.
2000-11-16 04:35:27 -08:00
hniksic
1a5c5a006a [svn] Robots doc changes.
Published at <sxsn1f1o6s2.fsf@florida.arsdigita.de>.
2000-11-15 02:44:18 -08:00
hniksic
d889ef73f4 [svn] Introduce GFDL; remove warnings.
Published at <sxsaeb2qigu.fsf@florida.arsdigita.de>.
2000-11-14 14:49:07 -08:00
hniksic
cc7b24c138 [svn] cc.fer.hr -> srk.fer.hr 2000-11-10 06:47:30 -08:00
hniksic
4960a155c4 [svn] Minor fixes: use $(srcdir) in sample.wgetrc.munged_for_texi_inclusion;
added names of many contributors.
2000-11-04 23:06:41 -08:00
hniksic
6a5009aee8 [svn] Added more contributors to the contributors section. 2000-11-04 20:56:11 -08:00
dan
e7fb0946fa [svn] Forgot to mention that I documented Rob's new bind_address command in the Wgetrc
Commands section as well.
2000-10-24 15:39:18 -07:00
hniksic
2aa950f52d [svn] *** empty log message *** 2000-10-24 02:14:40 -07:00
dan
1396b30055 [svn] Manually applied Rob Mayoff <mayoff@dqd.com>'s patch (vs. 1.5.3, not 1.5.3+dev)
to add --bind-address, making many necessary alphabetization, coding style,
comment, documentation, and naming fixes and additions.
2000-10-23 23:19:17 -07:00
dan
f4673bcdaf [svn] --delete-after wasn't implemented for files retrieved by FTP or corresponding to
files specified on the commandline.  Made --convert-links be ignored when
--delete-after is specified.  Added note about this fact to --delete-after docs
and made general improvements to them, including the clarification that
--delete-after only deletes local files.
2000-10-23 20:43:47 -07:00
hniksic
778160a155 [svn] hniksic@iskon.hr -> hniksic@arsdigita.com 2000-10-23 08:43:04 -07:00
dan
5781c8b006 [svn] Include -E on my preferred commandline for downloading a single page and
requisites and making sure it displays properly locally.
2000-10-20 16:06:45 -07:00
dan
f1f1c3956b [svn] Hack Kampbjorn noticed that I accidentally repeated a word. 2000-10-20 14:46:41 -07:00
dan
8cf52e0dd3 [svn] Applied John Daily <jdaily@cyberdude.com>'s patch for his "quad" commands (which
I renamed to "lockable_boolean") in the .wgetrc (currently just passive_ftp).
Wrote documentation for his changes and added the missing "referer" to the
.wgetrc section (making mention of the issue of "referrer" being the correct
spelling).
2000-10-19 23:59:30 -07:00
dan
b3e2c0ff97 [svn] Implemented and documented new -E / --html-extension / html_extension option. 2000-10-19 22:55:46 -07:00
dan
cbf018d0c0 [svn] --retr-symlinks was not previously documented properly. Based on my newfound
understanding of what its limitations are, added a TODO item.  Also made a minor
tweak in html.c to silence a warning.
2000-10-09 15:43:11 -07:00
dan
7931200609 [svn] * *.{gmo,po,pot}: Regenerated after modifying wget --help output.
* ftp.c (ftp_retrieve_list): Use new INFINITE_RECURSION #define.

* html.c: htmlfindurl() now takes final `dash_p_leaf_HTML' parameter.
Wrapped some > 80-column lines.  When -p is specified and we're at a
leaf node, do not traverse <A>, <AREA>, or <LINK> tags other than
<LINK REL="stylesheet">.

* html.h (htmlfindurl): Now takes final `dash_p_leaf_HTML' parameter.

* init.c: Added new -p / --page-requisites / page_requisites option.

* main.c (print_help): Clarified that -l inf and -l 0 both allow
infinite recursion.  Changed the unhelpful --mirrior description
to simply give the options it's equivalent to.  Added new -p option.
(main): Added some comments; handle new -p / --page-requisites.

* options.h (struct options): Added new page_requisites field.

* recur.c: Changed "URL-s" to "URLs" and "HTML-s" to "HTMLs".
Calculate and pass down new `dash_p_leaf_HTML' parameter to
get_urls_html().  Use new INFINITE_RECURSION #define.

* retr.c: Changed "URL-s" to "URLs".  get_urls_html() now takes
final `dash_p_leaf_HTML' parameter.

* url.c: get_urls_html() and htmlfindurl() now take final
`dash_p_leaf_HTML' parameter.

* url.h (get_urls_html): Now takes final `dash_p_leaf_HTML' parameter.

* wget.h: Added some comments and new INFINITE_RECURSION #define.

* wget.texi (Recursive Retrieval Options): Documented new -p option.
2000-08-30 04:26:21 -07:00
dan
f4fcbd194b [svn] * wget.texi (Logging and Input File Options): -B / --base was not documented as
a separate item, and the .wgetrc version was misleading.

* wget.texi (Wgetrc Commands): Changed all instances of ", the same as" to the
  more grammatical " -- the same as".
2000-08-23 15:41:21 -07:00
dan
f21839b197 [svn] * wget.texi (Download Options): Using -c on a file that's already fully
downloaded results in an unchanged file and no second ".1" copy.
2000-08-23 14:36:31 -07:00
dan
28668d2875 [svn] * wget.texi (Download Options): --no-clobber's documentation was
severely lacking -- ameliorated the situation.  Some of the
previously-undocumented stuff (like the multiple-file-version numeric-suffixing)
that's now mentioned for the first (and only) time in the -nc documentation
should probably be mentioned elsewhere, but due to the way that wget.texi's
hierarchy is laid out, I had a hard time finding anywhere else appropriate.
2000-08-22 20:04:20 -07:00
dan
cf1e1c68de [svn] wget.texi (HTTP Options): Minor clarification in "download a single HTML page
and all files necessary to display it" example.
2000-07-17 17:19:47 -07:00
dan
b05feb3ae2 [svn] Damir Dzeko <ddzeko@zesoi.fer.hr> did not document his new --referer option.
Did so (--help output and wget.texi).  Also tweaked --help output for --execute.
2000-05-22 19:29:38 -07:00
dan
13fdc3ae17 [svn] Regenerated after change to waitretry entry in sample.wgetrc. 2000-04-20 15:08:11 -07:00
dan
ed80cb342b [svn] Really using "stepwise refinement" on this file, aren't I? Realized during the
usual shower meditation session this morning that I hadn't changed the text 'The
"wait" command above' when I moved waitretry up so "wait" no longer _is_ above.
Fixed to say "below" and got a little more wordy on the "linear backoff".
2000-04-20 15:06:43 -07:00
dan
7ba90c0cbb [svn] Oops. Forgot to regenerate and commit wget.info-2 after changing sample.wgetrc. 2000-04-19 13:08:58 -07:00
dan
b27657dd51 [svn] Realized this morning in the shower that I put the (uncommented) waitretry
setting in the local section, which is prefaced by a comment saying that stuff
in there shouldn't be set in the global file.  Moved the setting to the global
section.
2000-04-18 19:09:28 -07:00
dan
0a8054755c [svn] * Makefile.in (sample.wgetrc.munged_for_texi_inclusion): Added build,
dependencies, and distclean cleanup of this new file.

* sample.wgetrc: Uncommented waitretry and set it to 10, clarified some wording,
  and re-wrapped some text to 71 columns due to @sample indentation in
  wget.texi.

* wget.texi: Herold further expounded on the behavior of waitretry -- reworded
  docs again.  Changed note saying _all_ lines in sample.wgetrc are commented
  out.  Don't have an entire hand- cut-and-pasted copy of sample.wgetrc in this
  file -- use @include.
2000-04-13 12:37:52 -07:00
dan
14e5a6d11e [svn] Try more aggressively to prevent line-wrapping (e.g. on an 80 column display) on
that "wgetrc exists" message.
2000-04-12 21:55:35 -07:00
dan
367f3b15d7 [svn] Makefile (install.wgetrc): I completely missed the message that the new wgetrc
wasn't being installed the first couple of times I ran `make install' after
changing sample.wgetrc.  Added blank lines around the message and a "<Hit RETURN
to acknowledge>", and reworded the message to be a bit more clear.
2000-04-12 21:37:51 -07:00
dan
6098c2ae46 [svn] Oops. I intentionally did my "cvs diff" before regenerating the .info* files
to make it easy to send a patch to the list, and thus those files weren't noted
as having been changed and I forgot to regenerate and commit them.
2000-04-12 20:41:58 -07:00
dan
b89e043ade [svn] D'oh! Forgot to change the month to 04. 2000-04-12 18:44:43 -07:00
dan
63fecba717 [svn] * sample.wgetrc: Added entries for backup_converted and waitretry.
* wget.texi (waitretry): Herold Heiko <Heiko.Herold@previnet.it>'s
new option was undocumented until now.  Reworded the suggested documentation he
sent to the list.
2000-04-12 18:42:34 -07:00
dan
4454f6ce0a [svn] * TODO: Removed done item: we now have an option (-G) that makes it easy to
download a single HTML document and all its constituents.

* po/*.{gmo,po,pot}: Regenerated after adding new options.

* po/hr.po: Hrvoje forgot '\n's on his translations of my altered messages,
causing msgfmt to balk and `make install' to fail.


* wget.texi (Recursive Retrieval Options): In -K description, added a link to
the discussion of interaction with -N.
(Recursive Accept/Reject Options): Did some alphabetizing and added descriptions
of new --follow-tags and -G / --ignore-tags options.
(Following Links): Changed "the loads of" to "loads of".
(Wgetrc Commands): Added descriptions of new follow_tags and ignore_tags
commands.


* html.c (idmatch): Implemented checking of my new --follow-tags and
--ignore-tags options.

* init.c (commands): Added comment reminding people adding new entries doing
allocation to add corresponding freeing in cleanup().
(commands): Added new followtags and ignoretags commands.
(cleanup): Free storage for new followtags and ignoretags.

* main.c: Use of "comma-separated list" was random -- normalized it.  Did some
alphabetization.  Added comments pointing out "Options without arguments" and
"Options accepting an argument" sections of long_options[].  Added new options
--follow-tags and -G / --ignore-tags.  Added comment that Damir's --referer is
currently undocumented.  Added comment that Heiko's --waitretry is partially
undocumented (mentioned in --help but not in wget.texi).  Moved improperly
sorted 24, 129, and 'G' cases.

* options.h (struct options): Added new fields follow_tags and ignore_tags.

* wget.h: Added "#define EQ 0" so we can say "strcmp(a, b) == EQ".
2000-03-10 22:48:06 -08:00
dan
d2e1d7fe9d [svn] Hrvoje didn't regenerate the .info files after changing wget.texi.
Got rid of newly-introduced nested-if warnings in ftp.c and http.c.  Fixed
apparently completely untested code in main.c that was trying to provide --wait
/ --waitretry backwards compatibility, but had multiple fundamental bugs.
2000-03-02 13:17:47 -08:00
hniksic
a04bf0f734 [svn] *** empty log message *** 2000-03-02 06:56:48 -08:00
hniksic
18959ba4ab [svn] Spelling fixes. 2000-03-02 05:44:56 -08:00
hniksic
a8622f4462 [svn] Update. 2000-03-02 05:36:47 -08:00
dan
fce9edf954 [svn] Added a note about my newly-implemented interaction between -K and -N. 2000-03-01 23:06:10 -08:00
dan
c33c857eb2 [svn] Upped version number from 1.5.3. to 1.5.3+dev. Because the development source
is available via anonymous CVS and desirable features are being added, it's
quite possible for end-users to be getting their hands on development versions.
They may report bugs, so if we don't change the version number, we'll have to
continually followup the statement "I'm using version 1.5.3" with the question
"The FTP archive or the CVS source?"  Better to just make this development
version have a unique number.  Once we're ready to actually release the next
version, we can up the version from 1.5.3+dev to 1.5.4, or 1.6, or whatever it
turns out to be (depending on how much development gets done).

Also made minor updates (dates, email addresses) to wget.texi.
2000-02-29 17:03:39 -08:00
dan
e0a58713f7 [svn] Upped version number from 1.5.3. to 1.5.3+dev. Because the development source
is available via anonymous CVS and desirable features are being added, it's
quite possible for end-users to be getting their hands on development versions.
They may report bugs, so if we don't change the version number, we'll have to
continually followup the statement "I'm using version 1.5.3" with the question
"The FTP archive or the CVS source?"  Better to just make this development
version have a unique number.  Once we're ready to actually release the next
version, we can up the version from 1.5.3+dev to 1.5.4, or 1.6, or whatever it
turns out to be (depending on how much development gets done).

Also made minor updates (dates, email addresses) to wget.texi.
2000-02-29 16:50:52 -08:00
dan
e5408e7db8 [svn] Implemented new -K / --backup-converted / backup_converted = on option. 2000-02-29 16:17:23 -08:00
kwget
31d6616c48 [svn] Initial revision 1999-12-01 23:42:23 -08:00