1
0
mirror of https://github.com/moparisthebest/wget synced 2024-07-03 16:38:41 -04:00
Commit Graph

28 Commits

Author SHA1 Message Date
hniksic
6663d70a0a [svn] New TODO item. 2000-11-21 06:58:46 -08:00
hniksic
b0b1c815c1 [svn] A bunch of new features:
- use mmap() to read whole files in core instead of allocating memory
  and read'ing it.

- use a new, more general, HTML parser (html-parse.c) and interface to
  it from Wget (html-url.c).

- respect <meta name=robots content=nofollow> (easy with the new HTML
  parser).

- use hash tables instead of linked lists in places where the lists
  were used to facilitate mappings.

- rewrite the code in host.c to be more readable and faster (hash
  tables instead of home-grown lists.)

- make convert_links properly convert partial URLs to complete ones
  for those URLs that have *not* been downloaded.

- use HTTP persistent connections where available.  very
  simple-minded, caches the last connection to the server.

Published in <sxshf533d5r.fsf@florida.arsdigita.de>.
2000-11-19 12:50:10 -08:00
hniksic
fa90bec240 [svn] One more todo item. 2000-11-14 14:45:43 -08:00
hniksic
bfca6f02b4 [svn] Spelling fixlet. 2000-11-06 02:16:37 -08:00
hniksic
ac96041552 [svn] As of recently, path simplification does stop at '?'. 2000-11-06 02:03:57 -08:00
dan
24c465b5ad [svn] retr.c (retrieve_url): Manually applied T. Bharath
<TBharath@responsenetworks.com>'s patch to get wget to grok illegal relative URL
redirects.  Reformatted and re-commented it.
2000-10-27 20:18:20 -07:00
dan
e863bff640 [svn] --mime-extensions would be more appropriate than --mime-extension.
^
2000-10-24 15:40:22 -07:00
dan
71994021f3 [svn] TODO: Generalize --html-extension to something like --mime-extension. 2000-10-20 16:20:24 -07:00
dan
d9dd14a995 [svn] * AUTHORS: Added -E to thst list of my stuff.
* TODO: We need to check the HTTP spec w.r.t. simplification of absolute URLs.

* MAILING-LIST: I didn't realize <wget@sunsite.auc.dk> allowed posting by
  non-subscribers.  <bug-wget@gnu.org> soon to be an alias for it.

* NEWS: Always forget to update this file when making user-vis. changes.
2000-10-20 15:29:42 -07:00
dan
6dd2357558 [svn] TODO: -k needs to convert '?' to "%3F" in links to saved files containing the
'?' character (e.g. CGI output).
2000-10-20 14:44:26 -07:00
dan
da17e06a1e [svn] TODO: Make -I and -X allow an optional hostname before the directory name?
When simplifying paths, wget needs to stop at any '?' character.
2000-10-19 23:06:03 -07:00
dan
b3e2c0ff97 [svn] Implemented and documented new -E / --html-extension / html_extension option. 2000-10-19 22:55:46 -07:00
dan
de7c00c095 [svn] TODO: Add option to save local filenames without extra %-encoding. 2000-10-18 23:29:20 -07:00
dan
cbf018d0c0 [svn] --retr-symlinks was not previously documented properly. Based on my newfound
understanding of what its limitations are, added a TODO item.  Also made a minor
tweak in html.c to silence a warning.
2000-10-09 15:43:11 -07:00
dan
2358c437c5 [svn] TODO: Make wget follow (illegal) relative URL HTTP redirects. 2000-09-25 17:42:50 -07:00
dan
737daec8e6 [svn] TODO: Make wget return nonzero in situations like bad HTTP auth. 2000-09-25 15:09:25 -07:00
dan
51642074f4 [svn] Just fixed a typo. 2000-07-21 18:36:44 -07:00
dan
88c07d546e [svn] TODO: -k should convert "hostless absolute" URLs, like <A HREF="/index.html">.
However, Brian McMahon <bm@iucr.org> wants the old incorrect behavior to still
be available as an option, as he depends on it to allow mirrors of his site to
send CGI queries to his original site, but still get graphics off of the mirror
site.  Perhaps this would be better dealt with by adding an option to tell -k
not to convert certain URLs patterns?
2000-07-21 16:16:10 -07:00
dan
e1d4d0995f [svn] -k should convert "hostless absolute" URLs, like <A HREF="/index.html">. 2000-07-19 18:19:58 -07:00
dan
fe387ce432 [svn] TODO: Timestamps are sometimes not copied over on files retrieved by FTP. 2000-05-24 13:29:18 -07:00
dan
6d218bc4ab [svn] TODO: Wget does not currently handle "fragment identifiers" (the part of a URL
starting with the '#' character) properly.
2000-05-22 19:40:09 -07:00
dan
f7c83b6ee3 [svn] TODO: Make `-k' check for files that were downloaded in the past and convert
links to them in newly-downloaded documents.
2000-05-17 19:19:59 -07:00
dan
532c3fb65a [svn] Reworded the opening paragraph to reflect that there are now more developers
besides Hrvoje, and added the following three items I've been meaning to get to:

* Make -K compare X.orig to X and move the former on top of the latter if
  they're the same, rather than leaving identical .orig files laying around.

* Add an option to save all text/html files with a .html extension so that when
  grabbing the output of a dynamically-generated remote page, you'll end up with
  a filename that will cause _your_ webserver to realize the saved static HTML
  file isn't text/plain.

* Allow mirroring of FTP URLs where logging in puts you somewhere else besides
  '/'.
2000-04-05 20:36:28 -07:00
dan
4454f6ce0a [svn] * TODO: Removed done item: we now have an option (-G) that makes it easy to
download a single HTML document and all its constituents.

* po/*.{gmo,po,pot}: Regenerated after adding new options.

* po/hr.po: Hrvoje forgot '\n's on his translations of my altered messages,
causing msgfmt to balk and `make install' to fail.


* wget.texi (Recursive Retrieval Options): In -K description, added a link to
the discussion of interaction with -N.
(Recursive Accept/Reject Options): Did some alphabetizing and added descriptions
of new --follow-tags and -G / --ignore-tags options.
(Following Links): Changed "the loads of" to "loads of".
(Wgetrc Commands): Added descriptions of new follow_tags and ignore_tags
commands.


* html.c (idmatch): Implemented checking of my new --follow-tags and
--ignore-tags options.

* init.c (commands): Added comment reminding people adding new entries doing
allocation to add corresponding freeing in cleanup().
(commands): Added new followtags and ignoretags commands.
(cleanup): Free storage for new followtags and ignoretags.

* main.c: Use of "comma-separated list" was random -- normalized it.  Did some
alphabetization.  Added comments pointing out "Options without arguments" and
"Options accepting an argument" sections of long_options[].  Added new options
--follow-tags and -G / --ignore-tags.  Added comment that Damir's --referer is
currently undocumented.  Added comment that Heiko's --waitretry is partially
undocumented (mentioned in --help but not in wget.texi).  Moved improperly
sorted 24, 129, and 'G' cases.

* options.h (struct options): Added new fields follow_tags and ignore_tags.

* wget.h: Added "#define EQ 0" so we can say "strcmp(a, b) == EQ".
2000-03-10 22:48:06 -08:00
dan
4331c39c9a [svn] Implemented the item I formerly had in the TODO: When -K and -N are used
together, we compare local file X.orig (if extant) against server file X.
Previously -k and -N were worthless in combination because the local converted
files always differed from the server versions.
2000-03-01 22:33:48 -08:00
dan
e5408e7db8 [svn] Implemented new -K / --backup-converted / backup_converted = on option. 2000-02-29 16:17:23 -08:00
dan
4c00181dd5 [svn] Really just a test to see if my write access works. Changed "through SSLeay" to
"through SSLeay or OpenSSL" (I believe someone's actually already gotten the
latter working, and hopefully they'll delete this item when they commit their
changes).
2000-02-29 11:24:17 -08:00
kwget
31d6616c48 [svn] Initial revision 1999-12-01 23:42:23 -08:00