1
0
mirror of https://github.com/moparisthebest/wget synced 2024-07-03 16:38:41 -04:00
Commit Graph

281 Commits

Author SHA1 Message Date
hniksic
b0b1c815c1 [svn] A bunch of new features:
- use mmap() to read whole files in core instead of allocating memory
  and read'ing it.

- use a new, more general, HTML parser (html-parse.c) and interface to
  it from Wget (html-url.c).

- respect <meta name=robots content=nofollow> (easy with the new HTML
  parser).

- use hash tables instead of linked lists in places where the lists
  were used to facilitate mappings.

- rewrite the code in host.c to be more readable and faster (hash
  tables instead of home-grown lists.)

- make convert_links properly convert partial URLs to complete ones
  for those URLs that have *not* been downloaded.

- use HTTP persistent connections where available.  very
  simple-minded, caches the last connection to the server.

Published in <sxshf533d5r.fsf@florida.arsdigita.de>.
2000-11-19 12:50:10 -08:00
hniksic
69cb9be79c [svn] *** empty log message *** 2000-11-16 08:29:46 -08:00
hniksic
06825343f1 [svn] Doc tweaks.
Published by me in <sxs8zqkayf2.fsf@florida.arsdigita.de>.
2000-11-16 04:35:27 -08:00
hniksic
1a5c5a006a [svn] Robots doc changes.
Published at <sxsn1f1o6s2.fsf@florida.arsdigita.de>.
2000-11-15 02:44:18 -08:00
hniksic
d889ef73f4 [svn] Introduce GFDL; remove warnings.
Published at <sxsaeb2qigu.fsf@florida.arsdigita.de>.
2000-11-14 14:49:07 -08:00
hniksic
cc7b24c138 [svn] cc.fer.hr -> srk.fer.hr 2000-11-10 06:47:30 -08:00
hniksic
6a5009aee8 [svn] Added more contributors to the contributors section. 2000-11-04 20:56:11 -08:00
dan
1396b30055 [svn] Manually applied Rob Mayoff <mayoff@dqd.com>'s patch (vs. 1.5.3, not 1.5.3+dev)
to add --bind-address, making many necessary alphabetization, coding style,
comment, documentation, and naming fixes and additions.
2000-10-23 23:19:17 -07:00
dan
f4673bcdaf [svn] --delete-after wasn't implemented for files retrieved by FTP or corresponding to
files specified on the commandline.  Made --convert-links be ignored when
--delete-after is specified.  Added note about this fact to --delete-after docs
and made general improvements to them, including the clarification that
--delete-after only deletes local files.
2000-10-23 20:43:47 -07:00
hniksic
778160a155 [svn] hniksic@iskon.hr -> hniksic@arsdigita.com 2000-10-23 08:43:04 -07:00
dan
5781c8b006 [svn] Include -E on my preferred commandline for downloading a single page and
requisites and making sure it displays properly locally.
2000-10-20 16:06:45 -07:00
dan
f1f1c3956b [svn] Hack Kampbjorn noticed that I accidentally repeated a word. 2000-10-20 14:46:41 -07:00
dan
8cf52e0dd3 [svn] Applied John Daily <jdaily@cyberdude.com>'s patch for his "quad" commands (which
I renamed to "lockable_boolean") in the .wgetrc (currently just passive_ftp).
Wrote documentation for his changes and added the missing "referer" to the
.wgetrc section (making mention of the issue of "referrer" being the correct
spelling).
2000-10-19 23:59:30 -07:00
dan
b3e2c0ff97 [svn] Implemented and documented new -E / --html-extension / html_extension option. 2000-10-19 22:55:46 -07:00
dan
cbf018d0c0 [svn] --retr-symlinks was not previously documented properly. Based on my newfound
understanding of what its limitations are, added a TODO item.  Also made a minor
tweak in html.c to silence a warning.
2000-10-09 15:43:11 -07:00
dan
7931200609 [svn] * *.{gmo,po,pot}: Regenerated after modifying wget --help output.
* ftp.c (ftp_retrieve_list): Use new INFINITE_RECURSION #define.

* html.c: htmlfindurl() now takes final `dash_p_leaf_HTML' parameter.
Wrapped some > 80-column lines.  When -p is specified and we're at a
leaf node, do not traverse <A>, <AREA>, or <LINK> tags other than
<LINK REL="stylesheet">.

* html.h (htmlfindurl): Now takes final `dash_p_leaf_HTML' parameter.

* init.c: Added new -p / --page-requisites / page_requisites option.

* main.c (print_help): Clarified that -l inf and -l 0 both allow
infinite recursion.  Changed the unhelpful --mirrior description
to simply give the options it's equivalent to.  Added new -p option.
(main): Added some comments; handle new -p / --page-requisites.

* options.h (struct options): Added new page_requisites field.

* recur.c: Changed "URL-s" to "URLs" and "HTML-s" to "HTMLs".
Calculate and pass down new `dash_p_leaf_HTML' parameter to
get_urls_html().  Use new INFINITE_RECURSION #define.

* retr.c: Changed "URL-s" to "URLs".  get_urls_html() now takes
final `dash_p_leaf_HTML' parameter.

* url.c: get_urls_html() and htmlfindurl() now take final
`dash_p_leaf_HTML' parameter.

* url.h (get_urls_html): Now takes final `dash_p_leaf_HTML' parameter.

* wget.h: Added some comments and new INFINITE_RECURSION #define.

* wget.texi (Recursive Retrieval Options): Documented new -p option.
2000-08-30 04:26:21 -07:00
dan
f4fcbd194b [svn] * wget.texi (Logging and Input File Options): -B / --base was not documented as
a separate item, and the .wgetrc version was misleading.

* wget.texi (Wgetrc Commands): Changed all instances of ", the same as" to the
  more grammatical " -- the same as".
2000-08-23 15:41:21 -07:00
dan
f21839b197 [svn] * wget.texi (Download Options): Using -c on a file that's already fully
downloaded results in an unchanged file and no second ".1" copy.
2000-08-23 14:36:31 -07:00
dan
28668d2875 [svn] * wget.texi (Download Options): --no-clobber's documentation was
severely lacking -- ameliorated the situation.  Some of the
previously-undocumented stuff (like the multiple-file-version numeric-suffixing)
that's now mentioned for the first (and only) time in the -nc documentation
should probably be mentioned elsewhere, but due to the way that wget.texi's
hierarchy is laid out, I had a hard time finding anywhere else appropriate.
2000-08-22 20:04:20 -07:00
dan
cf1e1c68de [svn] wget.texi (HTTP Options): Minor clarification in "download a single HTML page
and all files necessary to display it" example.
2000-07-17 17:19:47 -07:00
dan
b05feb3ae2 [svn] Damir Dzeko <ddzeko@zesoi.fer.hr> did not document his new --referer option.
Did so (--help output and wget.texi).  Also tweaked --help output for --execute.
2000-05-22 19:29:38 -07:00
dan
0a8054755c [svn] * Makefile.in (sample.wgetrc.munged_for_texi_inclusion): Added build,
dependencies, and distclean cleanup of this new file.

* sample.wgetrc: Uncommented waitretry and set it to 10, clarified some wording,
  and re-wrapped some text to 71 columns due to @sample indentation in
  wget.texi.

* wget.texi: Herold further expounded on the behavior of waitretry -- reworded
  docs again.  Changed note saying _all_ lines in sample.wgetrc are commented
  out.  Don't have an entire hand- cut-and-pasted copy of sample.wgetrc in this
  file -- use @include.
2000-04-13 12:37:52 -07:00
dan
63fecba717 [svn] * sample.wgetrc: Added entries for backup_converted and waitretry.
* wget.texi (waitretry): Herold Heiko <Heiko.Herold@previnet.it>'s
new option was undocumented until now.  Reworded the suggested documentation he
sent to the list.
2000-04-12 18:42:34 -07:00
dan
4454f6ce0a [svn] * TODO: Removed done item: we now have an option (-G) that makes it easy to
download a single HTML document and all its constituents.

* po/*.{gmo,po,pot}: Regenerated after adding new options.

* po/hr.po: Hrvoje forgot '\n's on his translations of my altered messages,
causing msgfmt to balk and `make install' to fail.


* wget.texi (Recursive Retrieval Options): In -K description, added a link to
the discussion of interaction with -N.
(Recursive Accept/Reject Options): Did some alphabetizing and added descriptions
of new --follow-tags and -G / --ignore-tags options.
(Following Links): Changed "the loads of" to "loads of".
(Wgetrc Commands): Added descriptions of new follow_tags and ignore_tags
commands.


* html.c (idmatch): Implemented checking of my new --follow-tags and
--ignore-tags options.

* init.c (commands): Added comment reminding people adding new entries doing
allocation to add corresponding freeing in cleanup().
(commands): Added new followtags and ignoretags commands.
(cleanup): Free storage for new followtags and ignoretags.

* main.c: Use of "comma-separated list" was random -- normalized it.  Did some
alphabetization.  Added comments pointing out "Options without arguments" and
"Options accepting an argument" sections of long_options[].  Added new options
--follow-tags and -G / --ignore-tags.  Added comment that Damir's --referer is
currently undocumented.  Added comment that Heiko's --waitretry is partially
undocumented (mentioned in --help but not in wget.texi).  Moved improperly
sorted 24, 129, and 'G' cases.

* options.h (struct options): Added new fields follow_tags and ignore_tags.

* wget.h: Added "#define EQ 0" so we can say "strcmp(a, b) == EQ".
2000-03-10 22:48:06 -08:00
hniksic
a04bf0f734 [svn] *** empty log message *** 2000-03-02 06:56:48 -08:00
hniksic
18959ba4ab [svn] Spelling fixes. 2000-03-02 05:44:56 -08:00
hniksic
a8622f4462 [svn] Update. 2000-03-02 05:36:47 -08:00
dan
fce9edf954 [svn] Added a note about my newly-implemented interaction between -K and -N. 2000-03-01 23:06:10 -08:00
dan
e0a58713f7 [svn] Upped version number from 1.5.3. to 1.5.3+dev. Because the development source
is available via anonymous CVS and desirable features are being added, it's
quite possible for end-users to be getting their hands on development versions.
They may report bugs, so if we don't change the version number, we'll have to
continually followup the statement "I'm using version 1.5.3" with the question
"The FTP archive or the CVS source?"  Better to just make this development
version have a unique number.  Once we're ready to actually release the next
version, we can up the version from 1.5.3+dev to 1.5.4, or 1.6, or whatever it
turns out to be (depending on how much development gets done).

Also made minor updates (dates, email addresses) to wget.texi.
2000-02-29 16:50:52 -08:00
dan
e5408e7db8 [svn] Implemented new -K / --backup-converted / backup_converted = on option. 2000-02-29 16:17:23 -08:00
kwget
31d6616c48 [svn] Initial revision 1999-12-01 23:42:23 -08:00