[svn] Implemented breadth-first retrieval.

Published in <sxsherjczw2.fsf@florida.arsdigita.de>.
2024-07-03 16:38:41 -04:00 · 2001-11-24 19:10:34 -08:00 · 2001-11-24 19:10:34 -08:00 · 222e9465b7
commit 222e9465b7
parent b88223f99d
23 changed files with 1073 additions and 853 deletions
--- a/6
+++ b/6
@ -1,3 +1,9 @@
+2001-11-25  Hrvoje Niksic  <hniksic@arsdigita.com>
+
+	* TODO: Ditto.
+
+	* NEWS: Updated with the latest stuff.
+
 2001-11-23  Hrvoje Niksic  <hniksic@arsdigita.com>

 	* po/hr.po: A major overhaul.
--- a/10
+++ b/10
@ -7,9 +7,19 @@ Please send GNU Wget bug reports to <bug-wget@gnu.org>.

 * Changes in Wget 1.8.

+** "Recursive retrieval" now uses a breadth-first algorithm.
+Recursive downloads are faster and consume *significantly* less memory
+than before.
+
 ** A new progress indicator is now available.  Try it with
 --progress=bar or using `progress = bar' in `.wgetrc'.

+** Host directories now contain port information if the URL is at a
+non-standard port.
+
+** Wget now supports the robots.txt directives specified in
+<http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>.
+
 ** URL parser has been fixed, especially the infamous overzealous
 quoting bug.  Wget no longer dequotes reserved characters, e.g. `%3F'
 is no longer translated to `?', nor `%2B' to `+'.  Unsafe characters
--- a/36
+++ b/36
@ -20,15 +20,6 @@ changes.
  file, though forcibly disconnecting from the server at the desired endpoint
  might be workable).

-* RFC 1738 says that if logging on to an FTP server puts you in a directory
-  other than '/', the way to specify a file relative to '/' in a URL (let's use
-  "/bin/ls" in this example) is "ftp://host/%2Fbin/ls".  Wget needs to support
-  this (and ideally not consider "ftp://host//bin/ls" to be equivalent, as that
-  would equate to the command "CWD " rather than "CWD /").  To accomodate people
-  used to broken FTP clients like Internet Explorer and Netscape, if
-  "ftp://host/bin/ls" doesn't exist, Wget should try again (perhaps under
-  control of an option), acting as if the user had typed "ftp://host/%2Fbin/ls".
-
 * If multiple FTP URLs are specified that are on the same host, Wget should
  re-use the connection rather than opening a new one for each file.

@ -37,16 +28,9 @@ changes.

 * Limit the number of successive redirection to max. 20 or so.

-* If -c used on a file that's already completely downloaded, don't re-download
-  it (unless normal --timestamping processing would cause you to do so).
-
 * If -c used with -N, check to make sure a file hasn't changed on the server
  before "continuing" to download it (preventing a bogus hybrid file).

-* Take a look at
-  <http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>
-  and support the new directives.
-
 * Generalize --html-extension to something like --mime-extensions and have it
  look at mime.types/mimecap file for preferred extension.  Non-HTML files with
  filenames changed this way would be re-downloaded each time despite -N unless
@ -87,9 +71,6 @@ changes.
  turning it off.  Get rid of `--foo=no' stuff.  Short options would
  be handled as `-x' vs. `-nx'.

-* Implement "thermometer" display (not all that hard; use an
-  alternative show_progress() if the output goes to a terminal.)
-
 * Add option to only list wildcard matches without doing the download.

 * Add case-insensitivity as an option.
@ -102,19 +83,13 @@ changes.

 * Allow time-stamping by arbitrary date.

-* Fix Unix directory parser to allow for spaces in file names.
-
 * Allow size limit to files (perhaps with an option to download oversize files 
  up through the limit or not at all, to get more functionality than [u]limit.

-* Implement breadth-first retrieval.
-
 * Download to .in* when mirroring.

 * Add an option to delete or move no-longer-existent files when mirroring.

-* Implement a switch to avoid downloading multiple files (e.g. x and x.gz).
-
 * Implement uploading (--upload URL?) in FTP and HTTP.

 * Rewrite FTP code to allow for easy addition of new commands.  It
@ -129,13 +104,10 @@ changes.

 * Implement a concept of "packages" a la mirror.

-* Implement correct RFC1808 URL parsing.
-
-* Implement more HTTP/1.1 bells and whistles (ETag, Content-MD5 etc.)
-
-* Add a "rollback" option to have --continue throw away a configurable number of
-  bytes at the end of a file before resuming download.  Apparently, some stupid
-  proxies insert a "transfer interrupted" string we need to get rid of.
+* Add a "rollback" option to have continued retrieval throw away a
+  configurable number of bytes at the end of a file before resuming
+  download.  Apparently, some stupid proxies insert a "transfer
+  interrupted" string we need to get rid of.

 * When using --accept and --reject, you can end up with empty directories.  Have
  Wget any such at the end.
--- a/src/ChangeLog
+++ b/src/ChangeLog
@ -1,3 +1,68 @@
+2001-11-25  Hrvoje Niksic  <hniksic@arsdigita.com>
+
+	* url.c (reencode_string): Use unsigned char, not char --
+	otherwise the hex digits come out wrong for 8-bit chars such as
+	nbsp.
+	(lowercase_str): New function.
+	(url_parse): Canonicalize u->url if needed.
+	(get_urls_file): Parse each URL, and return only the valid ones.
+	(free_urlpos): Call url_free.
+	(mkstruct): Add :port if the port is non-standard.
+	(mkstruct): Append the query string to the file name, if any.
+	(urlpath_length): Use strpbrk_or_eos.
+	(uri_merge_1): Handle the cases where LINK is an empty string,
+	where LINK consists only of query, and where LINK consists only of
+	fragment.
+	(convert_links): Count and report both kinds of conversion.
+	(downloaded_file): Use a hash table, not a list.
+	(downloaded_files_free): Free the hash table.
+
+	* retr.c (retrieve_from_file): Ditto.
+
+	* main.c (main): Call either retrieve_url or retrieve_tree
+	for each URL, not both.
+
+	* retr.c (register_all_redirections): New function.
+	(register_redirections_mapper): Ditto.
+	(retrieve_url): Register the redirections.
+	(retrieve_url): Make the string "Error parsing proxy ..." 
+	translatable.
+
+	* res.c (add_path): Strip leading slash from robots.txt paths so
+	that the path representations are "compatible".
+	(free_specs): Free each individual path, too.
+	(res_cleanup): New function.
+	(cleanup_hash_table_mapper): Ditto.
+
+	* recur.c (url_queue_new): New function.
+	(url_queue_delete): Ditto.
+	(url_enqueue): Ditto.
+	(url_dequeue): Ditto.
+	(retrieve_tree): New function, replacement for recursive_retrieve.
+	(descend_url_p): New function.
+	(register_redirection): New function.
+
+	* progress.c (create_image): Cosmetic changes.
+
+	* init.c (cleanup): Do all those complex cleanups only if
+	DEBUG_MALLOC is defined.
+
+	* main.c: Removed --simple-check and the corresponding
+	simple_host_check in init.c.
+
+	* html-url.c (handle_link): Parse the URL here, and propagate the
+	parsed URL to the caller, who would otherwise have to parse it
+	again.
+
+	* host.c (xstrdup_lower): Moved to utils.c.
+	(realhost): Removed.
+	(same_host): Ditto.
+
+2001-11-24  Hrvoje Niksic  <hniksic@arsdigita.com>
+
+	* utils.c (path_simplify): Preserver the (non-)existence of
+	leading slash.  Return non-zero if changes were made.
+
 2001-11-24  Hrvoje Niksic  <hniksic@arsdigita.com>

 	* progress.c (bar_update): Don't modify bp->total_length if it is
--- a/src/Makefile.in
+++ b/src/Makefile.in
@ -162,8 +162,10 @@ main$o: wget.h utils.h init.h retr.h recur.h host.h cookies.h
 gnu-md5$o: wget.h gnu-md5.h
 mswindows$o: wget.h url.h
 netrc$o: wget.h utils.h netrc.h init.h
+progress$o: wget.h progress.h utils.h retr.h
 rbuf$o: wget.h rbuf.h connect.h
 recur$o: wget.h url.h recur.h utils.h retr.h ftp.h fnmatch.h host.h hash.h
+res$o: wget.h utils.h hash.h url.h retr.h res.h
 retr$o: wget.h utils.h retr.h url.h recur.h ftp.h host.h connect.h hash.h
 snprintf$o:
 safe-ctype$o: safe-ctype.h
--- a/src/host.c
+++ b/src/host.c
@ -60,8 +60,14 @@ extern int errno;
 #endif

 /* Mapping between all known hosts to their addresses (n.n.n.n). */
+
+/* #### We should map to *lists* of IP addresses. */
+
 struct hash_table *host_name_address_map;

+/* The following two tables are obsolete, since we no longer do host
+   canonicalization.  */
+
 /* Mapping between all known addresses (n.n.n.n) to their hosts.  This
   is the inverse of host_name_address_map.  These two tables share
   the strdup'ed strings. */
@ -70,18 +76,6 @@ struct hash_table *host_address_name_map;
 /* Mapping between auxilliary (slave) and master host names. */
 struct hash_table *host_slave_master_map;

-/* Utility function: like xstrdup(), but also lowercases S.  */
-
-static char *
-xstrdup_lower (const char *s)
-{
-  char *copy = xstrdup (s);
-  char *p = copy;
-  for (; *p; p++)
-    *p = TOLOWER (*p);
-  return copy;
-}
-
 /* The same as gethostbyname, but supports internet addresses of the
   form `N.N.N.N'.  On some systems gethostbyname() knows how to do
   this automatically.  */
@ -216,114 +210,6 @@ store_hostaddress (unsigned char *where, const char *hostname)
  return 1;
 }

-/* Determine the "real" name of HOST, as perceived by Wget.  If HOST
-   is referenced by more than one name, "real" name is considered to
-   be the first one encountered in the past.  */
-char *
-realhost (const char *host)
-{
-  struct in_addr in;
-  struct hostent *hptr;
-  char *master_name;
-
-  DEBUGP (("Checking for %s in host_name_address_map.\n", host));
-  if (hash_table_contains (host_name_address_map, host))
-    {
-      DEBUGP (("Found; %s was already used, by that name.\n", host));
-      return xstrdup_lower (host);
-    }
-
-  DEBUGP (("Checking for %s in host_slave_master_map.\n", host));
-  master_name = hash_table_get (host_slave_master_map, host);
-  if (master_name)
-    {
-    has_master:
-      DEBUGP (("Found; %s was already used, by the name %s.\n",
-	       host, master_name));
-      return xstrdup (master_name);
-    }
-
-  DEBUGP (("First time I hear about %s by that name; looking it up.\n",
-	   host));
-  hptr = ngethostbyname (host);
-  if (hptr)
-    {
-      char *inet_s;
-      /* Originally, we copied to in.s_addr, but it appears to be
-	 missing on some systems.  */
-      memcpy (&in, *hptr->h_addr_list, sizeof (in));
-      inet_s = inet_ntoa (in);
-
-      add_host_to_cache (host, inet_s);
-
-      /* add_host_to_cache() can establish a slave-master mapping. */
-      DEBUGP (("Checking again for %s in host_slave_master_map.\n", host));
-      master_name = hash_table_get (host_slave_master_map, host);
-      if (master_name)
-	goto has_master;
-    }
-
-  return xstrdup_lower (host);
-}
-
-/* Compare two hostnames (out of URL-s if the arguments are URL-s),
-   taking care of aliases.  It uses realhost() to determine a unique
-   hostname for each of two hosts.  If simple_check is non-zero, only
-   strcmp() is used for comparison.  */
-int
-same_host (const char *u1, const char *u2)
-{
-  const char *s;
-  char *p1, *p2;
-  char *real1, *real2;
-
-  /* Skip protocol, if present.  */
-  u1 += url_skip_scheme (u1);
-  u2 += url_skip_scheme (u2);
-
-  /* Skip username ans password, if present.  */
-  u1 += url_skip_uname (u1);
-  u2 += url_skip_uname (u2);
-
-  for (s = u1; *u1 && *u1 != '/' && *u1 != ':'; u1++);
-  p1 = strdupdelim (s, u1);
-  for (s = u2; *u2 && *u2 != '/' && *u2 != ':'; u2++);
-  p2 = strdupdelim (s, u2);
-  DEBUGP (("Comparing hosts %s and %s...\n", p1, p2));
-  if (strcasecmp (p1, p2) == 0)
-    {
-      xfree (p1);
-      xfree (p2);
-      DEBUGP (("They are quite alike.\n"));
-      return 1;
-    }
-  else if (opt.simple_check)
-    {
-      xfree (p1);
-      xfree (p2);
-      DEBUGP (("Since checking is simple, I'd say they are not the same.\n"));
-      return 0;
-    }
-  real1 = realhost (p1);
-  real2 = realhost (p2);
-  xfree (p1);
-  xfree (p2);
-  if (strcasecmp (real1, real2) == 0)
-    {
-      DEBUGP (("They are alike, after realhost()->%s.\n", real1));
-      xfree (real1);
-      xfree (real2);
-      return 1;
-    }
-  else
-    {
-      DEBUGP (("They are not the same (%s, %s).\n", real1, real2));
-      xfree (real1);
-      xfree (real2);
-      return 0;
-    }
-}
-
 /* Determine whether a URL is acceptable to be followed, according to
   a list of domains to accept.  */
 int
@ -383,7 +269,7 @@ herrmsg (int error)
 }

 void
-clean_hosts (void)
+host_cleanup (void)
 {
  /* host_name_address_map and host_address_name_map share the
     strings.  Because of that, calling free_keys_and_values once
--- a/src/host.h
+++ b/src/host.h
@ -27,15 +27,11 @@ struct url;
 struct hostent *ngethostbyname PARAMS ((const char *));
 int store_hostaddress PARAMS ((unsigned char *, const char *));

-void clean_hosts PARAMS ((void));
+void host_cleanup PARAMS ((void));

-char *realhost PARAMS ((const char *));
-int same_host PARAMS ((const char *, const char *));
 int accept_domain PARAMS ((struct url *));
 int sufmatch PARAMS ((const char **, const char *));

-char *ftp_getaddress PARAMS ((void));
-
 char *herrmsg PARAMS ((int));

 #endif /* HOST_H */
--- a/src/html-url.c
+++ b/src/html-url.c
@ -284,7 +284,7 @@ struct collect_urls_closure {
  char *text;			/* HTML text. */
  char *base;			/* Base URI of the document, possibly
 				   changed through <base href=...>. */
-  urlpos *head, *tail;		/* List of URLs */
+  struct urlpos *head, *tail;	/* List of URLs */
  const char *parent_base;	/* Base of the current document. */
  const char *document_file;	/* File name of this document. */
  int dash_p_leaf_HTML;		/* Whether -p is specified, and this
@ -301,59 +301,67 @@ static void
 handle_link (struct collect_urls_closure *closure, const char *link_uri,
 	     struct taginfo *tag, int attrid)
 {
-  int no_scheme = !url_has_scheme (link_uri);
-  urlpos *newel;
-
+  int link_has_scheme = url_has_scheme (link_uri);
+  struct urlpos *newel;
  const char *base = closure->base ? closure->base : closure->parent_base;
-  char *complete_uri;
-
-  char *fragment = strrchr (link_uri, '#');
-
-  if (fragment)
-    {
-      /* Nullify the fragment identifier, i.e. everything after the
-         last occurrence of `#', inclusive.  This copying is
-         relatively inefficient, but it doesn't matter because
-         fragment identifiers don't come up all that often.  */
-      int hashlen = fragment - link_uri;
-      char *p = alloca (hashlen + 1);
-      memcpy (p, link_uri, hashlen);
-      p[hashlen] = '\0';
-      link_uri = p;
-    }
+  struct url *url;

  if (!base)
    {
-      if (no_scheme)
+      DEBUGP (("%s: no base, merge will use \"%s\".\n",
+	       closure->document_file, link_uri));
+
+      if (!link_has_scheme)
 	{
 	  /* We have no base, and the link does not have a host
 	     attached to it.  Nothing we can do.  */
 	  /* #### Should we print a warning here?  Wget 1.5.x used to.  */
 	  return;
 	}
-      else
-	complete_uri = xstrdup (link_uri);
+
+      url = url_parse (link_uri, NULL);
+      if (!url)
+	{
+	  DEBUGP (("%s: link \"%s\" doesn't parse.\n",
+		   closure->document_file, link_uri));
+	  return;
+	}
    }
  else
-    complete_uri = uri_merge (base, link_uri);
+    {
+      /* Merge BASE with LINK_URI, but also make sure the result is
+	 canonicalized, i.e. that "../" have been resolved.
+	 (parse_url will do that for us.) */

-  DEBUGP (("%s: merge(\"%s\", \"%s\") -> %s\n",
-	   closure->document_file, base ? base : "(null)",
-	   link_uri, complete_uri));
+      char *complete_uri = uri_merge (base, link_uri);

-  newel = (urlpos *)xmalloc (sizeof (urlpos));
+      DEBUGP (("%s: merge(\"%s\", \"%s\") -> %s\n",
+	       closure->document_file, base, link_uri, complete_uri));
+
+      url = url_parse (complete_uri, NULL);
+      if (!url)
+	{
+	  DEBUGP (("%s: merged link \"%s\" doesn't parse.\n",
+		   closure->document_file, complete_uri));
+	  xfree (complete_uri);
+	  return;
+	}
+      xfree (complete_uri);
+    }
+
+  newel = (struct urlpos *)xmalloc (sizeof (struct urlpos));

  memset (newel, 0, sizeof (*newel));
  newel->next = NULL;
-  newel->url = complete_uri;
+  newel->url = url;
  newel->pos = tag->attrs[attrid].value_raw_beginning - closure->text;
  newel->size = tag->attrs[attrid].value_raw_size;

  /* A URL is relative if the host is not named, and the name does not
     start with `/'.  */
-  if (no_scheme && *link_uri != '/')
+  if (!link_has_scheme && *link_uri != '/')
    newel->link_relative_p = 1;
-  else if (!no_scheme)
+  else if (link_has_scheme)
    newel->link_complete_p = 1;

  if (closure->tail)
@ -542,7 +550,7 @@ collect_tags_mapper (struct taginfo *tag, void *arg)

   If dash_p_leaf_HTML is non-zero, only the elements needed to render
   FILE ("non-external" links) will be returned.  */
-urlpos *
+struct urlpos *
 get_urls_html (const char *file, const char *this_url, int dash_p_leaf_HTML,
 	       int *meta_disallow_follow)
 {
--- a/src/http.c
+++ b/src/http.c
@ -1452,8 +1452,8 @@ File `%s' already there, will not retrieve.\n"), *hstat.local_file);
      if (((suf = suffix (*hstat.local_file)) != NULL)
 	  && (!strcmp (suf, "html") || !strcmp (suf, "htm")))
 	*dt |= TEXTHTML;
-      xfree (suf);

+      FREE_MAYBE (suf);
      FREE_MAYBE (dummy);
      return RETROK;
    }
--- a/src/init.c
+++ b/src/init.c
@ -171,7 +171,6 @@ static struct {
  { "savecookies",	&opt.cookies_output,	cmd_file },
  { "saveheaders",	&opt.save_headers,	cmd_boolean },
  { "serverresponse",	&opt.server_response,	cmd_boolean },
-  { "simplehostcheck",	&opt.simple_check,	cmd_boolean },
  { "spanhosts",	&opt.spanhost,		cmd_boolean },
  { "spider",		&opt.spider,		cmd_boolean },
 #ifdef HAVE_SSL
@ -1009,6 +1008,7 @@ check_user_specified_header (const char *s)
 }

 void cleanup_html_url PARAMS ((void));
+void res_cleanup PARAMS ((void));
 void downloaded_files_free PARAMS ((void));


@ -1016,13 +1016,27 @@ void downloaded_files_free PARAMS ((void));
 void
 cleanup (void)
 {
-  extern acc_t *netrc_list;
+  /* Free external resources, close files, etc. */

-  recursive_cleanup ();
-  clean_hosts ();
-  free_netrc (netrc_list);
  if (opt.dfp)
    fclose (opt.dfp);
+
+  /* We're exiting anyway so there's no real need to call free()
+     hundreds of times.  Skipping the frees will make Wget exit
+     faster.
+
+     However, when detecting leaks, it's crucial to free() everything
+     because then you can find the real leaks, i.e. the allocated
+     memory which grows with the size of the program.  */
+
+#ifdef DEBUG_MALLOC
+  recursive_cleanup ();
+  res_cleanup ();
+  host_cleanup ();
+  {
+    extern acc_t *netrc_list;
+    free_netrc (netrc_list);
+  }
  cleanup_html_url ();
  downloaded_files_free ();
  cookies_cleanup ();
@ -1037,6 +1051,7 @@ cleanup (void)
  free_vec (opt.domains);
  free_vec (opt.follow_tags);
  free_vec (opt.ignore_tags);
+  FREE_MAYBE (opt.progress_type);
  xfree (opt.ftp_acc);
  FREE_MAYBE (opt.ftp_pass);
  FREE_MAYBE (opt.ftp_proxy);
@ -1055,4 +1070,5 @@ cleanup (void)
  FREE_MAYBE (opt.bind_address);
  FREE_MAYBE (opt.cookies_input);
  FREE_MAYBE (opt.cookies_output);
+#endif
 }
--- a/src/main.c
+++ b/src/main.c
@ -402,9 +402,6 @@ hpVqvdkKsxmNWrHSLcFbEY:G:g:T:U:O:l:n:i:o:a:t:D:A:R:P:B:e:Q:X:I:w:C:",
 	case 149:
 	  setval ("removelisting", "off");
 	  break;
-	case 150:
-	  setval ("simplehostcheck", "on");
-	  break;
 	case 155:
 	  setval ("bindaddress", optarg);
 	  break;
@ -604,7 +601,7 @@ GNU General Public License for more details.\n"));
 	  break;
 	case 'n':
 	  {
-	    /* #### The n? options are utter crock!  */
+	    /* #### What we really want here is --no-foo. */
 	    char *p;

 	    for (p = optarg; *p; p++)
@ -613,9 +610,6 @@ GNU General Public License for more details.\n"));
 		case 'v':
 		  setval ("verbose", "off");
 		  break;
-		case 'h':
-		  setval ("simplehostcheck", "on");
-		  break;
 		case 'H':
 		  setval ("addhostdir", "off");
 		  break;
@ -806,17 +800,17 @@ Can't timestamp and not clobber old files at the same time.\n"));
 #endif /* HAVE_SIGNAL */

  status = RETROK;		/* initialize it, just-in-case */
-  recursive_reset ();
+  /*recursive_reset ();*/
  /* Retrieve the URLs from argument list.  */
  for (t = url; *t; t++)
    {
-      char *filename, *redirected_URL;
+      char *filename = NULL, *redirected_URL = NULL;
      int dt;

-      status = retrieve_url (*t, &filename, &redirected_URL, NULL, &dt);
-      if (opt.recursive && status == RETROK && (dt & TEXTHTML))
-	status = recursive_retrieve (filename,
-				     redirected_URL ? redirected_URL : *t);
+      if (opt.recursive && url_scheme (*t) != SCHEME_FTP)
+	status = retrieve_tree (*t);
+      else
+	status = retrieve_url (*t, &filename, &redirected_URL, NULL, &dt);

      if (opt.delete_after && file_exists_p(filename))
 	{
--- a/src/options.h
+++ b/src/options.h
@ -36,9 +36,6 @@ struct options
  int relative_only;		/* Follow only relative links. */
  int no_parent;		/* Restrict access to the parent
 				   directory.  */
-  int simple_check;		/* Should we use simple checking
-				   (strcmp) or do we create a host
-				   hash and call gethostbyname? */
  int reclevel;			/* Maximum level of recursion */
  int dirstruct;		/* Do we build the directory structure
 				  as we go along? */
--- a/src/progress.c
+++ b/src/progress.c
@ -27,6 +27,9 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */
 # include <strings.h>
 #endif /* HAVE_STRING_H */
 #include <assert.h>
+#ifdef HAVE_UNISTD_H
+# include <unistd.h>
+#endif

 #include "wget.h"
 #include "progress.h"
@ -470,14 +473,14 @@ create_image (struct bar_progress *bp, long dltime)
     Calculate its geometry:

     "xxx% "         - percentage                - 5 chars
-     "| ... | "      - progress bar decorations  - 3 chars
+     "| ... |"       - progress bar decorations  - 2 chars
     "1012.56 K/s "  - dl rate                   - 12 chars
     "nnnn "         - downloaded bytes          - 11 chars
     "ETA: xx:xx:xx" - ETA                       - 13 chars

     "=====>..."     - progress bar content      - the rest
  */
-  int progress_len = screen_width - (5 + 3 + 12 + 11 + 13);
+  int progress_len = screen_width - (5 + 2 + 12 + 11 + 13);

  if (progress_len < 7)
    progress_len = 0;
@ -530,7 +533,7 @@ create_image (struct bar_progress *bp, long dltime)
    }
  else
    {
-      strcpy (p, "----.-- K/s ");
+      strcpy (p, "  --.-- K/s ");
      p += 12;
    }

--- a/src/recur.c
+++ b/src/recur.c
@ -1,5 +1,5 @@
 /* Handling of recursive HTTP retrieving.
-   Copyright (C) 1995, 1996, 1997, 2000 Free Software Foundation, Inc.
+   Copyright (C) 1995, 1996, 1997, 2000, 2001 Free Software Foundation, Inc.

 This file is part of GNU Wget.

@ -54,452 +54,480 @@ static struct hash_table *dl_file_url_map;
 static struct hash_table *dl_url_file_map;

 /* List of HTML files downloaded in this Wget run.  Used for link
-   conversion after Wget is done.  */
+   conversion after Wget is done.  This list should only be traversed
+   in order.  If you need to check whether a file has been downloaded,
+   use a hash table, e.g. dl_file_url_map.  */
 static slist *downloaded_html_files;
+
+/* Functions for maintaining the URL queue.  */

-/* List of undesirable-to-load URLs.  */
-static struct hash_table *undesirable_urls;
+struct queue_element {
+  const char *url;
+  const char *referer;
+  int depth;
+  struct queue_element *next;
+};

-/* Current recursion depth.  */
-static int depth;
+struct url_queue {
+  struct queue_element *head;
+  struct queue_element *tail;
+  int count, maxcount;
+};

-/* Base directory we're recursing from (used by no_parent).  */
-static char *base_dir;
+/* Create a URL queue. */

-static int first_time = 1;
-
-
-/* Cleanup the data structures associated with recursive retrieving
-   (the variables above).  */
-void
-recursive_cleanup (void)
+static struct url_queue *
+url_queue_new (void)
 {
-  if (undesirable_urls)
-    {
-      string_set_free (undesirable_urls);
-      undesirable_urls = NULL;
-    }
-  if (dl_file_url_map)
-    {
-      free_keys_and_values (dl_file_url_map);
-      hash_table_destroy (dl_file_url_map);
-      dl_file_url_map = NULL;
-    }
-  if (dl_url_file_map)
-    {
-      free_keys_and_values (dl_url_file_map);
-      hash_table_destroy (dl_url_file_map);
-      dl_url_file_map = NULL;
-    }
-  undesirable_urls = NULL;
-  slist_free (downloaded_html_files);
-  downloaded_html_files = NULL;
-  FREE_MAYBE (base_dir);
-  first_time = 1;
+  struct url_queue *queue = xmalloc (sizeof (*queue));
+  memset (queue, '\0', sizeof (*queue));
+  return queue;
 }

-/* Reset FIRST_TIME to 1, so that some action can be taken in
-   recursive_retrieve().  */
-void
-recursive_reset (void)
+/* Delete a URL queue. */
+
+static void
+url_queue_delete (struct url_queue *queue)
 {
-  first_time = 1;
+  xfree (queue);
 }

-/* The core of recursive retrieving.  Endless recursion is avoided by
-   having all URLs stored to a linked list of URLs, which is checked
-   before loading any URL.  That way no URL can get loaded twice.
+/* Enqueue a URL in the queue.  The queue is FIFO: the items will be
+   retrieved ("dequeued") from the queue in the order they were placed
+   into it.  */
+
+static void
+url_enqueue (struct url_queue *queue,
+	     const char *url, const char *referer, int depth)
+{
+  struct queue_element *qel = xmalloc (sizeof (*qel));
+  qel->url = url;
+  qel->referer = referer;
+  qel->depth = depth;
+  qel->next = NULL;
+
+  ++queue->count;
+  if (queue->count > queue->maxcount)
+    queue->maxcount = queue->count;
+
+  DEBUGP (("Enqueuing %s at depth %d\n", url, depth));
+  DEBUGP (("Queue count %d, maxcount %d.\n", queue->count, queue->maxcount));
+
+  if (queue->tail)
+    queue->tail->next = qel;
+  queue->tail = qel;
+
+  if (!queue->head)
+    queue->head = queue->tail;
+}
+
+/* Take a URL out of the queue.  Return 1 if this operation succeeded,
+   or 0 if the queue is empty.  */
+
+static int
+url_dequeue (struct url_queue *queue,
+	     const char **url, const char **referer, int *depth)
+{
+  struct queue_element *qel = queue->head;
+
+  if (!qel)
+    return 0;
+
+  queue->head = queue->head->next;
+  if (!queue->head)
+    queue->tail = NULL;
+
+  *url = qel->url;
+  *referer = qel->referer;
+  *depth = qel->depth;
+
+  --queue->count;
+
+  DEBUGP (("Dequeuing %s at depth %d\n", qel->url, qel->depth));
+  DEBUGP (("Queue count %d, maxcount %d.\n", queue->count, queue->maxcount));
+
+  xfree (qel);
+  return 1;
+}
+
+static int descend_url_p PARAMS ((const struct urlpos *, struct url *, int,
+				  struct url *, struct hash_table *));
+
+/* Retrieve a part of the web beginning with START_URL.  This used to
+   be called "recursive retrieval", because the old function was
+   recursive and implemented depth-first search.  retrieve_tree on the
+   other hand implements breadth-search traversal of the tree, which
+   results in much nicer ordering of downloads.
+
+   The algorithm this function uses is simple:
+
+   1. put START_URL in the queue.
+   2. while there are URLs in the queue:
+
+     3. get next URL from the queue.
+     4. download it.
+     5. if the URL is HTML and its depth does not exceed maximum depth,
+        get the list of URLs embedded therein.
+     6. for each of those URLs do the following:
+
+       7. if the URL is not one of those downloaded before, and if it
+          satisfies the criteria specified by the various command-line
+	  options, add it to the queue. */

-   The function also supports specification of maximum recursion depth
-   and a number of other goodies.  */
 uerr_t
-recursive_retrieve (const char *file, const char *this_url)
+retrieve_tree (const char *start_url)
 {
-  char *constr, *filename, *newloc;
-  char *canon_this_url = NULL;
-  int dt, inl, dash_p_leaf_HTML = FALSE;
-  int meta_disallow_follow;
-  int this_url_ftp;            /* See below the explanation */
-  urlpos *url_list, *cur_url;
-  struct url *u;
+  uerr_t status = RETROK;

-  assert (this_url != NULL);
-  assert (file != NULL);
-  /* If quota was exceeded earlier, bail out.  */
-  if (downloaded_exceeds_quota ())
-    return QUOTEXC;
-  /* Cache the current URL in the list.  */
-  if (first_time)
+  /* The queue of URLs we need to load. */
+  struct url_queue *queue = url_queue_new ();
+
+  /* The URLs we decided we don't want to load. */
+  struct hash_table *blacklist = make_string_hash_table (0);
+
+  /* We'll need various components of this, so better get it over with
+     now. */
+  struct url *start_url_parsed = url_parse (start_url, NULL);
+
+  url_enqueue (queue, xstrdup (start_url), NULL, 0);
+  string_set_add (blacklist, start_url);
+
+  while (1)
    {
-      /* These three operations need to be done only once per Wget
-         run.  They should probably be at a different location.  */
-      if (!undesirable_urls)
-	undesirable_urls = make_string_hash_table (0);
+      int descend = 0;
+      char *url, *referer, *file = NULL;
+      int depth;
+      boolean dash_p_leaf_HTML = FALSE;

-      hash_table_clear (undesirable_urls);
-      string_set_add (undesirable_urls, this_url);
-      /* Enter this_url to the hash table, in original and "enhanced" form.  */
-      u = url_parse (this_url, NULL);
-      if (u)
-	{
-	  string_set_add (undesirable_urls, u->url);
-	  if (opt.no_parent)
-	    base_dir = xstrdup (u->dir); /* Set the base dir.  */
-	  /* Set the canonical this_url to be sent as referer.  This
-	     problem exists only when running the first time.  */
-	  canon_this_url = xstrdup (u->url);
-	}
-      else
-	{
-	  DEBUGP (("Double yuck!  The *base* URL is broken.\n"));
-	  base_dir = NULL;
-	}
-      url_free (u);
-      depth = 1;
-      first_time = 0;
-    }
-  else
-    ++depth;
-
-  if (opt.reclevel != INFINITE_RECURSION && depth > opt.reclevel)
-    /* We've exceeded the maximum recursion depth specified by the user. */
-    {
-      if (opt.page_requisites && depth <= opt.reclevel + 1)
-	/* When -p is specified, we can do one more partial recursion from the
-	   "leaf nodes" on the HTML document tree.  The recursion is partial in
-	   that we won't traverse any <A> or <AREA> tags, nor any <LINK> tags
-	   except for <LINK REL="stylesheet">. */
-	dash_p_leaf_HTML = TRUE;
-      else
-	/* Either -p wasn't specified or it was and we've already gone the one
-	   extra (pseudo-)level that it affords us, so we need to bail out. */
-	{
-	  DEBUGP (("Recursion depth %d exceeded max. depth %d.\n",
-		   depth, opt.reclevel));
-	  --depth;
-	  return RECLEVELEXC;
-	}
-    }
-
-  /* Determine whether this_url is an FTP URL.  If it is, it means
-     that the retrieval is done through proxy.  In that case, FTP
-     links will be followed by default and recursion will not be
-     turned off when following them.  */
-  this_url_ftp = (url_scheme (this_url) == SCHEME_FTP);
-
-  /* Get the URL-s from an HTML file: */
-  url_list = get_urls_html (file, canon_this_url ? canon_this_url : this_url,
-			    dash_p_leaf_HTML, &meta_disallow_follow);
-
-  if (opt.use_robots && meta_disallow_follow)
-    {
-      /* The META tag says we are not to follow this file.  Respect
-         that.  */
-      free_urlpos (url_list);
-      url_list = NULL;
-    }
-
-  /* Decide what to do with each of the URLs.  A URL will be loaded if
-     it meets several requirements, discussed later.  */
-  for (cur_url = url_list; cur_url; cur_url = cur_url->next)
-    {
-      /* If quota was exceeded earlier, bail out.  */
      if (downloaded_exceeds_quota ())
 	break;
-      /* Parse the URL for convenient use in other functions, as well
-	 as to get the optimized form.  It also checks URL integrity.  */
-      u = url_parse (cur_url->url, NULL);
-      if (!u)
-	{
-	  DEBUGP (("Yuck!  A bad URL.\n"));
-	  continue;
-	}
-      assert (u->url != NULL);
-      constr = xstrdup (u->url);

-      /* Several checkings whether a file is acceptable to load:
-	 1. check if URL is ftp, and we don't load it
-	 2. check for relative links (if relative_only is set)
-	 3. check for domain
-	 4. check for no-parent
-	 5. check for excludes && includes
-	 6. check for suffix
-	 7. check for same host (if spanhost is unset), with possible
-	 gethostbyname baggage
-	 8. check for robots.txt
+      if (status == FWRITEERR)
+	break;

-	 Addendum: If the URL is FTP, and it is to be loaded, only the
-	 domain and suffix settings are "stronger".
+      /* Get the next URL from the queue. */

-	 Note that .html and (yuck) .htm will get loaded regardless of
-	 suffix rules (but that is remedied later with unlink) unless
-	 the depth equals the maximum depth.
+      if (!url_dequeue (queue,
+			(const char **)&url, (const char **)&referer,
+			&depth))
+	break;

-	 More time- and memory- consuming tests should be put later on
-	 the list.  */
+      /* And download it. */

-      /* inl is set if the URL we are working on (constr) is stored in
-	 undesirable_urls.  Using it is crucial to avoid unnecessary
-	 repeated continuous hits to the hash table.  */
-      inl = string_set_contains (undesirable_urls, constr);
+      {
+	int dt = 0;
+	char *redirected = NULL;
+	int oldrec = opt.recursive;

-      /* If it is FTP, and FTP is not followed, chuck it out.  */
-      if (!inl)
-	if (u->scheme == SCHEME_FTP && !opt.follow_ftp && !this_url_ftp)
+	opt.recursive = 0;
+	status = retrieve_url (url, &file, &redirected, NULL, &dt);
+	opt.recursive = oldrec;
+
+	if (redirected)
 	  {
-	    DEBUGP (("Uh, it is FTP but i'm not in the mood to follow FTP.\n"));
-	    string_set_add (undesirable_urls, constr);
-	    inl = 1;
+	    xfree (url);
+	    url = redirected;
 	  }
-      /* If it is absolute link and they are not followed, chuck it
-	 out.  */
-      if (!inl && u->scheme != SCHEME_FTP)
-	if (opt.relative_only && !cur_url->link_relative_p)
-	  {
-	    DEBUGP (("It doesn't really look like a relative link.\n"));
-	    string_set_add (undesirable_urls, constr);
-	    inl = 1;
-	  }
-      /* If its domain is not to be accepted/looked-up, chuck it out.  */
-      if (!inl)
-	if (!accept_domain (u))
-	  {
-	    DEBUGP (("I don't like the smell of that domain.\n"));
-	    string_set_add (undesirable_urls, constr);
-	    inl = 1;
-	  }
-      /* Check for parent directory.  */
-      if (!inl && opt.no_parent
-	  /* If the new URL is FTP and the old was not, ignore
-             opt.no_parent.  */
-	  && !(!this_url_ftp && u->scheme == SCHEME_FTP))
-	{
-	  /* Check for base_dir first.  */
-	  if (!(base_dir && frontcmp (base_dir, u->dir)))
-	    {
-	      /* Failing that, check for parent dir.  */
-	      struct url *ut = url_parse (this_url, NULL);
-	      if (!ut)
-		DEBUGP (("Double yuck!  The *base* URL is broken.\n"));
-	      else if (!frontcmp (ut->dir, u->dir))
-		{
-		  /* Failing that too, kill the URL.  */
-		  DEBUGP (("Trying to escape parental guidance with no_parent on.\n"));
-		  string_set_add (undesirable_urls, constr);
-		  inl = 1;
-		}
-	      url_free (ut);
-	    }
-	}
-      /* If the file does not match the acceptance list, or is on the
-	 rejection list, chuck it out.  The same goes for the
-	 directory exclude- and include- lists.  */
-      if (!inl && (opt.includes || opt.excludes))
-	{
-	  if (!accdir (u->dir, ALLABS))
-	    {
-	      DEBUGP (("%s (%s) is excluded/not-included.\n", constr, u->dir));
-	      string_set_add (undesirable_urls, constr);
-	      inl = 1;
-	    }
-	}
-      if (!inl)
-	{
-	  char *suf = NULL;
-	  /* We check for acceptance/rejection rules only for non-HTML
-	     documents.  Since we don't know whether they really are
-	     HTML, it will be deduced from (an OR-ed list):
+	if (file && status == RETROK
+	    && (dt & RETROKF) && (dt & TEXTHTML))
+	  descend = 1;
+      }

-	     1) u->file is "" (meaning it is a directory)
-	     2) suffix exists, AND:
-	     a) it is "html", OR
-	     b) it is "htm"
-
-	     If the file *is* supposed to be HTML, it will *not* be
-            subject to acc/rej rules, unless a finite maximum depth has
-            been specified and the current depth is the maximum depth. */
-	  if (!
-	      (!*u->file
-	       || (((suf = suffix (constr)) != NULL)
-                  && ((!strcmp (suf, "html") || !strcmp (suf, "htm"))
-                      && ((opt.reclevel != INFINITE_RECURSION) &&
-			  (depth != opt.reclevel))))))
-	    {
-	      if (!acceptable (u->file))
-		{
-		  DEBUGP (("%s (%s) does not match acc/rej rules.\n",
-			  constr, u->file));
-		  string_set_add (undesirable_urls, constr);
-		  inl = 1;
-		}
-	    }
-	  FREE_MAYBE (suf);
-	}
-      /* Optimize the URL (which includes possible DNS lookup) only
-	 after all other possibilities have been exhausted.  */
-      if (!inl)
+      if (descend
+	  && depth >= opt.reclevel && opt.reclevel != INFINITE_RECURSION)
 	{
-	  if (!opt.simple_check)
-	    {
-	      /* Find the "true" host.  */
-	      char *host = realhost (u->host);
-	      xfree (u->host);
-	      u->host = host;
-
-	      /* Refresh the printed representation of the URL.  */
-	      xfree (u->url);
-	      u->url = url_string (u, 0);
-	    }
+	  if (opt.page_requisites && depth == opt.reclevel)
+	    /* When -p is specified, we can do one more partial
+	       recursion from the "leaf nodes" on the HTML document
+	       tree.  The recursion is partial in that we won't
+	       traverse any <A> or <AREA> tags, nor any <LINK> tags
+	       except for <LINK REL="stylesheet">. */
+	    /* #### This would be the place to implement the TODO
+	       entry saying that -p should do two more hops on
+	       framesets.  */
+	    dash_p_leaf_HTML = TRUE;
 	  else
 	    {
-	      char *p;
-	      /* Just lowercase the hostname.  */
-	      for (p = u->host; *p; p++)
-		*p = TOLOWER (*p);
-	      xfree (u->url);
-	      u->url = url_string (u, 0);
+	      /* Either -p wasn't specified or it was and we've
+		 already gone the one extra (pseudo-)level that it
+		 affords us, so we need to bail out. */
+	      DEBUGP (("Not descending further; at depth %d, max. %d.\n",
+		       depth, opt.reclevel));
+	      descend = 0;
 	    }
-	  xfree (constr);
-	  constr = xstrdup (u->url);
-	  /* After we have canonicalized the URL, check if we have it
-	     on the black list. */
-	  if (string_set_contains (undesirable_urls, constr))
-	    inl = 1;
-	  /* This line is bogus. */
-	  /*string_set_add (undesirable_urls, constr);*/
-
-	  if (!inl && !((u->scheme == SCHEME_FTP) && !this_url_ftp))
-	    if (!opt.spanhost && this_url && !same_host (this_url, constr))
-	      {
-		DEBUGP (("This is not the same hostname as the parent's.\n"));
-		string_set_add (undesirable_urls, constr);
-		inl = 1;
-	      }
 	}
-      /* What about robots.txt?  */
-      if (!inl && opt.use_robots && u->scheme == SCHEME_HTTP)
+
+      /* If the downloaded document was HTML, parse it and enqueue the
+	 links it contains. */
+
+      if (descend)
 	{
-	  struct robot_specs *specs = res_get_specs (u->host, u->port);
-	  if (!specs)
+	  int meta_disallow_follow = 0;
+	  struct urlpos *children = get_urls_html (file, url, dash_p_leaf_HTML,
+						   &meta_disallow_follow);
+
+	  if (opt.use_robots && meta_disallow_follow)
 	    {
-	      char *rfile;
-	      if (res_retrieve_file (constr, &rfile))
-		{
-		  specs = res_parse_from_file (rfile);
-		  xfree (rfile);
-		}
-	      else
-		{
-		  /* If we cannot get real specs, at least produce
-		     dummy ones so that we can register them and stop
-		     trying to retrieve them.  */
-		  specs = res_parse ("", 0);
-		}
-	      res_register_specs (u->host, u->port, specs);
+	      free_urlpos (children);
+	      children = NULL;
 	    }

-	  /* Now that we have (or don't have) robots.txt specs, we can
-	     check what they say.  */
-	  if (!res_match_path (specs, u->path))
+	  if (children)
 	    {
-	      DEBUGP (("Not following %s because robots.txt forbids it.\n",
-		       constr));
-	      string_set_add (undesirable_urls, constr);
-	      inl = 1;
+	      struct urlpos *child = children;
+	      struct url *url_parsed = url_parsed = url_parse (url, NULL);
+	      assert (url_parsed != NULL);
+
+	      for (; child; child = child->next)
+		{
+		  if (descend_url_p (child, url_parsed, depth, start_url_parsed,
+				     blacklist))
+		    {
+		      url_enqueue (queue, xstrdup (child->url->url),
+				   xstrdup (url), depth + 1);
+		      /* We blacklist the URL we have enqueued, because we
+			 don't want to enqueue (and hence download) the
+			 same URL twice.  */
+		      string_set_add (blacklist, child->url->url);
+		    }
+		}
+
+	      url_free (url_parsed);
+	      free_urlpos (children);
 	    }
 	}

-      filename = NULL;
-      /* If it wasn't chucked out, do something with it.  */
-      if (!inl)
+      if (opt.delete_after || (file && !acceptable (file)))
 	{
-	  DEBUGP (("I've decided to load it -> "));
-	  /* Add it to the list of already-loaded URL-s.  */
-	  string_set_add (undesirable_urls, constr);
-	  /* Automatically followed FTPs will *not* be downloaded
-	     recursively.  */
-	  if (u->scheme == SCHEME_FTP)
-	    {
-	      /* Don't you adore side-effects?  */
-	      opt.recursive = 0;
-	    }
-	  /* Reset its type.  */
-	  dt = 0;
-	  /* Retrieve it.  */
-	  retrieve_url (constr, &filename, &newloc,
-		       canon_this_url ? canon_this_url : this_url, &dt);
-	  if (u->scheme == SCHEME_FTP)
-	    {
-	      /* Restore...  */
-	      opt.recursive = 1;
-	    }
-	  if (newloc)
-	    {
-	      xfree (constr);
-	      constr = newloc;
-	    }
-	  /* If there was no error, and the type is text/html, parse
-	     it recursively.  */
-	  if (dt & TEXTHTML)
-	    {
-	      if (dt & RETROKF)
-		recursive_retrieve (filename, constr);
-	    }
-	  else
-	    DEBUGP (("%s is not text/html so we don't chase.\n",
-		     filename ? filename: "(null)"));
-
-	  if (opt.delete_after || (filename && !acceptable (filename)))
-	    /* Either --delete-after was specified, or we loaded this otherwise
-	       rejected (e.g. by -R) HTML file just so we could harvest its
-	       hyperlinks -- in either case, delete the local file. */
-	    {
-	      DEBUGP (("Removing file due to %s in recursive_retrieve():\n",
-		       opt.delete_after ? "--delete-after" :
-		       "recursive rejection criteria"));
-	      logprintf (LOG_VERBOSE,
-			 (opt.delete_after ? _("Removing %s.\n")
-			  : _("Removing %s since it should be rejected.\n")),
-			 filename);
-	      if (unlink (filename))
-		logprintf (LOG_NOTQUIET, "unlink: %s\n", strerror (errno));
-	      dt &= ~RETROKF;
-	    }
-
-	  /* If everything was OK, and links are to be converted, let's
-	     store the local filename.  */
-	  if (opt.convert_links && (dt & RETROKF) && (filename != NULL))
-	    {
-	      cur_url->convert = CO_CONVERT_TO_RELATIVE;
-	      cur_url->local_name = xstrdup (filename);
-	    }
+	  /* Either --delete-after was specified, or we loaded this
+	     otherwise rejected (e.g. by -R) HTML file just so we
+	     could harvest its hyperlinks -- in either case, delete
+	     the local file. */
+	  DEBUGP (("Removing file due to %s in recursive_retrieve():\n",
+		   opt.delete_after ? "--delete-after" :
+		   "recursive rejection criteria"));
+	  logprintf (LOG_VERBOSE,
+		     (opt.delete_after ? _("Removing %s.\n")
+		      : _("Removing %s since it should be rejected.\n")),
+		     file);
+	  if (unlink (file))
+	    logprintf (LOG_NOTQUIET, "unlink: %s\n", strerror (errno));
 	}
-      else
-	DEBUGP (("%s already in list, so we don't load.\n", constr));
-      /* Free filename and constr.  */
-      FREE_MAYBE (filename);
-      FREE_MAYBE (constr);
-      url_free (u);
-      /* Increment the pbuf for the appropriate size.  */
+
+      xfree (url);
+      FREE_MAYBE (referer);
+      FREE_MAYBE (file);
    }
-  if (opt.convert_links && !opt.delete_after)
-    /* This is merely the first pass: the links that have been
-       successfully downloaded are converted.  In the second pass,
-       convert_all_links() will also convert those links that have NOT
-       been downloaded to their canonical form.  */
-    convert_links (file, url_list);
-  /* Free the linked list of URL-s.  */
-  free_urlpos (url_list);
-  /* Free the canonical this_url.  */
-  FREE_MAYBE (canon_this_url);
-  /* Decrement the recursion depth.  */
-  --depth;
+
+  /* If anything is left of the queue due to a premature exit, free it
+     now.  */
+  {
+    char *d1, *d2;
+    int d3;
+    while (url_dequeue (queue, (const char **)&d1, (const char **)&d2, &d3))
+      {
+	xfree (d1);
+	FREE_MAYBE (d2);
+      }
+  }
+  url_queue_delete (queue);
+
+  if (start_url_parsed)
+    url_free (start_url_parsed);
+  string_set_free (blacklist);
+
  if (downloaded_exceeds_quota ())
    return QUOTEXC;
+  else if (status == FWRITEERR)
+    return FWRITEERR;
  else
    return RETROK;
 }
+
+/* Based on the context provided by retrieve_tree, decide whether a
+   URL is to be descended to.  This is only ever called from
+   retrieve_tree, but is in a separate function for clarity.  */
+
+static int
+descend_url_p (const struct urlpos *upos, struct url *parent, int depth,
+	       struct url *start_url_parsed, struct hash_table *blacklist)
+{
+  struct url *u = upos->url;
+  const char *url = u->url;
+
+  DEBUGP (("Deciding whether to enqueue \"%s\".\n", url));
+
+  if (string_set_contains (blacklist, url))
+    {
+      DEBUGP (("Already on the black list.\n"));
+      goto out;
+    }
+
+  /* Several things to check for:
+     1. if scheme is not http, and we don't load it
+     2. check for relative links (if relative_only is set)
+     3. check for domain
+     4. check for no-parent
+     5. check for excludes && includes
+     6. check for suffix
+     7. check for same host (if spanhost is unset), with possible
+     gethostbyname baggage
+     8. check for robots.txt
+
+     Addendum: If the URL is FTP, and it is to be loaded, only the
+     domain and suffix settings are "stronger".
+
+     Note that .html files will get loaded regardless of suffix rules
+     (but that is remedied later with unlink) unless the depth equals
+     the maximum depth.
+
+     More time- and memory- consuming tests should be put later on
+     the list.  */
+
+  /* 1. Schemes other than HTTP are normally not recursed into. */
+  if (u->scheme != SCHEME_HTTP
+      && !(u->scheme == SCHEME_FTP && opt.follow_ftp))
+    {
+      DEBUGP (("Not following non-HTTP schemes.\n"));
+      goto blacklist;
+    }
+
+  /* 2. If it is an absolute link and they are not followed, throw it
+     out.  */
+  if (u->scheme == SCHEME_HTTP)
+    if (opt.relative_only && !upos->link_relative_p)
+      {
+	DEBUGP (("It doesn't really look like a relative link.\n"));
+	goto blacklist;
+      }
+
+  /* 3. If its domain is not to be accepted/looked-up, chuck it
+     out.  */
+  if (!accept_domain (u))
+    {
+      DEBUGP (("The domain was not accepted.\n"));
+      goto blacklist;
+    }
+
+  /* 4. Check for parent directory.
+
+     If we descended to a different host or changed the scheme, ignore
+     opt.no_parent.  Also ignore it for -p leaf retrievals.  */
+  if (opt.no_parent
+      && u->scheme == parent->scheme
+      && 0 == strcasecmp (u->host, parent->host)
+      && u->port == parent->port)
+    {
+      if (!frontcmp (parent->dir, u->dir))
+	{
+	  DEBUGP (("Trying to escape the root directory with no_parent in effect.\n"));
+	  goto blacklist;
+	}
+    }
+
+  /* 5. If the file does not match the acceptance list, or is on the
+     rejection list, chuck it out.  The same goes for the directory
+     exclusion and inclusion lists.  */
+  if (opt.includes || opt.excludes)
+    {
+      if (!accdir (u->dir, ALLABS))
+	{
+	  DEBUGP (("%s (%s) is excluded/not-included.\n", url, u->dir));
+	  goto blacklist;
+	}
+    }
+
+  /* 6. */
+  {
+    char *suf = NULL;
+    /* Check for acceptance/rejection rules.  We ignore these rules
+       for HTML documents because they might lead to other files which
+       need to be downloaded.  Of course, we don't know which
+       documents are HTML before downloading them, so we guess.
+
+       A file is subject to acceptance/rejection rules if:
+
+       * u->file is not "" (i.e. it is not a directory)
+       and either:
+         + there is no file suffix,
+	 + or there is a suffix, but is not "html" or "htm",
+	 + both:
+	   - recursion is not infinite,
+	   - and we are at its very end. */
+
+    if (u->file[0] != '\0'
+	&& ((suf = suffix (url)) == NULL
+	    || (0 != strcmp (suf, "html") && 0 != strcmp (suf, "htm"))
+	    || (opt.reclevel == INFINITE_RECURSION && depth >= opt.reclevel)))
+      {
+	if (!acceptable (u->file))
+	  {
+	    DEBUGP (("%s (%s) does not match acc/rej rules.\n",
+		     url, u->file));
+	    FREE_MAYBE (suf);
+	    goto blacklist;
+	  }
+      }
+    FREE_MAYBE (suf);
+  }
+
+  /* 7. */
+  if (u->scheme == parent->scheme)
+    if (!opt.spanhost && 0 != strcasecmp (parent->host, u->host))
+      {
+	DEBUGP (("This is not the same hostname as the parent's (%s and %s).\n",
+		 u->host, parent->host));
+	goto blacklist;
+      }
+
+  /* 8. */
+  if (opt.use_robots && u->scheme == SCHEME_HTTP)
+    {
+      struct robot_specs *specs = res_get_specs (u->host, u->port);
+      if (!specs)
+	{
+	  char *rfile;
+	  if (res_retrieve_file (url, &rfile))
+	    {
+	      specs = res_parse_from_file (rfile);
+	      xfree (rfile);
+	    }
+	  else
+	    {
+	      /* If we cannot get real specs, at least produce
+		 dummy ones so that we can register them and stop
+		 trying to retrieve them.  */
+	      specs = res_parse ("", 0);
+	    }
+	  res_register_specs (u->host, u->port, specs);
+	}
+
+      /* Now that we have (or don't have) robots.txt specs, we can
+	 check what they say.  */
+      if (!res_match_path (specs, u->path))
+	{
+	  DEBUGP (("Not following %s because robots.txt forbids it.\n", url));
+	  goto blacklist;
+	}
+    }
+
+  /* The URL has passed all the tests.  It can be placed in the
+     download queue. */
+  DEBUGP (("Decided to load it.\n"));
+
+  return 1;
+
+ blacklist:
+  string_set_add (blacklist, url);
+
+ out:
+  DEBUGP (("Decided NOT to load it.\n"));
+
+  return 0;
+}

+/* Register that URL has been successfully downloaded to FILE. */
+
 void
 register_download (const char *url, const char *file)
 {
@ -507,12 +535,35 @@ register_download (const char *url, const char *file)
    return;
  if (!dl_file_url_map)
    dl_file_url_map = make_string_hash_table (0);
-  hash_table_put (dl_file_url_map, xstrdup (file), xstrdup (url));
  if (!dl_url_file_map)
    dl_url_file_map = make_string_hash_table (0);
-  hash_table_put (dl_url_file_map, xstrdup (url), xstrdup (file));
+
+  if (!hash_table_contains (dl_file_url_map, file))
+    hash_table_put (dl_file_url_map, xstrdup (file), xstrdup (url));
+  if (!hash_table_contains (dl_url_file_map, url))
+    hash_table_put (dl_url_file_map, xstrdup (url), xstrdup (file));
 }

+/* Register that FROM has been redirected to TO.  This assumes that TO
+   is successfully downloaded and already registered using
+   register_download() above.  */
+
+void
+register_redirection (const char *from, const char *to)
+{
+  char *file;
+
+  if (!opt.convert_links)
+    return;
+
+  file = hash_table_get (dl_url_file_map, to);
+  assert (file != NULL);
+  if (!hash_table_contains (dl_url_file_map, from))
+    hash_table_put (dl_url_file_map, xstrdup (from), xstrdup (file));
+}
+
+/* Register that URL corresponds to the HTML file FILE. */
+
 void
 register_html (const char *url, const char *file)
 {
@ -558,10 +609,11 @@ convert_all_links (void)

  for (html = downloaded_html_files; html; html = html->next)
    {
-      urlpos *urls, *cur_url;
+      struct urlpos *urls, *cur_url;
      char *url;

      DEBUGP (("Rescanning %s\n", html->string));
+
      /* Determine the URL of the HTML file.  get_urls_html will need
 	 it.  */
      url = hash_table_get (dl_file_url_map, html->string);
@ -569,19 +621,19 @@ convert_all_links (void)
 	DEBUGP (("It should correspond to %s.\n", url));
      else
 	DEBUGP (("I cannot find the corresponding URL.\n"));
+
      /* Parse the HTML file...  */
      urls = get_urls_html (html->string, url, FALSE, NULL);
+
      /* We don't respect meta_disallow_follow here because, even if
         the file is not followed, we might still want to convert the
         links that have been followed from other files.  */
+
      for (cur_url = urls; cur_url; cur_url = cur_url->next)
 	{
 	  char *local_name;
+	  struct url *u = cur_url->url;

-	  /* The URL must be in canonical form to be compared.  */
-	  struct url *u = url_parse (cur_url->url, NULL);
-	  if (!u)
-	    continue;
 	  /* We decide the direction of conversion according to whether
 	     a URL was downloaded.  Downloaded URLs will be converted
 	     ABS2REL, whereas non-downloaded will be converted REL2ABS.  */
@ -589,6 +641,7 @@ convert_all_links (void)
 	  if (local_name)
 	    DEBUGP (("%s marked for conversion, local %s\n",
 		     u->url, local_name));
+
 	  /* Decide on the conversion direction.  */
 	  if (local_name)
 	    {
@ -610,7 +663,6 @@ convert_all_links (void)
 		cur_url->convert = CO_CONVERT_TO_COMPLETE;
 	      cur_url->local_name = NULL;
 	    }
-	  url_free (u);
 	}
      /* Convert the links in the file.  */
      convert_links (html->string, urls);
@ -618,3 +670,24 @@ convert_all_links (void)
      free_urlpos (urls);
    }
 }
+
+/* Cleanup the data structures associated with recursive retrieving
+   (the variables above).  */
+void
+recursive_cleanup (void)
+{
+  if (dl_file_url_map)
+    {
+      free_keys_and_values (dl_file_url_map);
+      hash_table_destroy (dl_file_url_map);
+      dl_file_url_map = NULL;
+    }
+  if (dl_url_file_map)
+    {
+      free_keys_and_values (dl_url_file_map);
+      hash_table_destroy (dl_url_file_map);
+      dl_url_file_map = NULL;
+    }
+  slist_free (downloaded_html_files);
+  downloaded_html_files = NULL;
+}
--- a/src/recur.h
+++ b/src/recur.h
@ -21,10 +21,10 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */
 #define RECUR_H

 void recursive_cleanup PARAMS ((void));
-void recursive_reset PARAMS ((void));
-uerr_t recursive_retrieve PARAMS ((const char *, const char *));
+uerr_t retrieve_tree PARAMS ((const char *));

 void register_download PARAMS ((const char *, const char *));
+void register_redirection PARAMS ((const char *, const char *));
 void register_html PARAMS ((const char *, const char *));
 void convert_all_links PARAMS ((void));

--- a/src/res.c
+++ b/src/res.c
@ -125,6 +125,10 @@ add_path (struct robot_specs *specs, const char *path_b, const char *path_e,
 	  int allowedp, int exactp)
 {
  struct path_info pp;
+  if (path_b < path_e && *path_b == '/')
+    /* Our path representation doesn't use a leading slash, so remove
+       one from theirs. */
+    ++path_b;
  pp.path     = strdupdelim (path_b, path_e);
  pp.allowedp = allowedp;
  pp.user_agent_exact_p = exactp;
@ -390,6 +394,9 @@ res_parse_from_file (const char *filename)
 static void
 free_specs (struct robot_specs *specs)
 {
+  int i;
+  for (i = 0; i < specs->count; i++)
+    xfree (specs->paths[i].path);
  FREE_MAYBE (specs->paths);
  xfree (specs);
 }
@ -546,3 +553,22 @@ res_retrieve_file (const char *url, char **file)
    }
  return err == RETROK;
 }
+
+static int
+cleanup_hash_table_mapper (void *key, void *value, void *arg_ignored)
+{
+  xfree (key);
+  free_specs (value);
+  return 0;
+}
+
+void
+res_cleanup (void)
+{
+  if (registered_specs)
+    {
+      hash_table_map (registered_specs, cleanup_hash_table_mapper, NULL);
+      hash_table_destroy (registered_specs);
+      registered_specs = NULL;
+    }
+}
--- a/src/res.h
+++ b/src/res.h
@ -29,3 +29,4 @@ struct robot_specs *res_get_specs PARAMS ((const char *, int));

 int res_retrieve_file PARAMS ((const char *, char **));

+void res_cleanup PARAMS ((void));
--- a/src/retr.c
+++ b/src/retr.c
@ -184,6 +184,26 @@ rate (long bytes, long msecs, int pad)
  return res;
 }

+static int
+register_redirections_mapper (void *key, void *value, void *arg)
+{
+  const char *redirected_from = (const char *)key;
+  const char *redirected_to   = (const char *)arg;
+  if (0 != strcmp (redirected_from, redirected_to))
+    register_redirection (redirected_from, redirected_to);
+  return 0;
+}
+
+/* Register the redirections that lead to the successful download of
+   this URL.  This is necessary so that the link converter can convert
+   redirected URLs to the local file.  */
+
+static void
+register_all_redirections (struct hash_table *redirections, const char *final)
+{
+  hash_table_map (redirections, register_redirections_mapper, (void *)final);
+}
+
 #define USE_PROXY_P(u) (opt.use_proxy && getproxy((u)->scheme)		\
 			&& no_proxy_match((u)->host,			\
 					  (const char **)opt.no_proxy))
@ -254,7 +274,7 @@ retrieve_url (const char *origurl, char **file, char **newloc,
      proxy_url = url_parse (proxy, &up_error_code);
      if (!proxy_url)
 	{
-	  logprintf (LOG_NOTQUIET, "Error parsing proxy URL %s: %s.\n",
+	  logprintf (LOG_NOTQUIET, _("Error parsing proxy URL %s: %s.\n"),
 		     proxy, url_error (up_error_code));
 	  if (redirections)
 	    string_set_free (redirections);
@ -310,7 +330,7 @@ retrieve_url (const char *origurl, char **file, char **newloc,
  if (location_changed)
    {
      char *construced_newloc;
-      struct url *newloc_struct;
+      struct url *newloc_parsed;

      assert (mynewloc != NULL);

@ -326,12 +346,11 @@ retrieve_url (const char *origurl, char **file, char **newloc,
      mynewloc = construced_newloc;

      /* Now, see if this new location makes sense. */
-      newloc_struct = url_parse (mynewloc, &up_error_code);
-      if (!newloc_struct)
+      newloc_parsed = url_parse (mynewloc, &up_error_code);
+      if (!newloc_parsed)
 	{
 	  logprintf (LOG_NOTQUIET, "%s: %s.\n", mynewloc,
 		     url_error (up_error_code));
-	  url_free (newloc_struct);
 	  url_free (u);
 	  if (redirections)
 	    string_set_free (redirections);
@ -340,11 +359,11 @@ retrieve_url (const char *origurl, char **file, char **newloc,
 	  return result;
 	}

-      /* Now mynewloc will become newloc_struct->url, because if the
+      /* Now mynewloc will become newloc_parsed->url, because if the
         Location contained relative paths like .././something, we
         don't want that propagating as url.  */
      xfree (mynewloc);
-      mynewloc = xstrdup (newloc_struct->url);
+      mynewloc = xstrdup (newloc_parsed->url);

      if (!redirections)
 	{
@ -356,11 +375,11 @@ retrieve_url (const char *origurl, char **file, char **newloc,

      /* The new location is OK.  Check for redirection cycle by
         peeking through the history of redirections. */
-      if (string_set_contains (redirections, newloc_struct->url))
+      if (string_set_contains (redirections, newloc_parsed->url))
 	{
 	  logprintf (LOG_NOTQUIET, _("%s: Redirection cycle detected.\n"),
 		     mynewloc);
-	  url_free (newloc_struct);
+	  url_free (newloc_parsed);
 	  url_free (u);
 	  if (redirections)
 	    string_set_free (redirections);
@ -368,12 +387,12 @@ retrieve_url (const char *origurl, char **file, char **newloc,
 	  xfree (mynewloc);
 	  return WRONGCODE;
 	}
-      string_set_add (redirections, newloc_struct->url);
+      string_set_add (redirections, newloc_parsed->url);

      xfree (url);
      url = mynewloc;
      url_free (u);
-      u = newloc_struct;
+      u = newloc_parsed;
      goto redirected;
    }

@ -382,6 +401,8 @@ retrieve_url (const char *origurl, char **file, char **newloc,
      if (*dt & RETROKF)
 	{
 	  register_download (url, local_file);
+	  if (redirections)
+	    register_all_redirections (redirections, url);
 	  if (*dt & TEXTHTML)
 	    register_html (url, local_file);
 	}
@ -415,16 +436,16 @@ uerr_t
 retrieve_from_file (const char *file, int html, int *count)
 {
  uerr_t status;
-  urlpos *url_list, *cur_url;
+  struct urlpos *url_list, *cur_url;

  url_list = (html ? get_urls_html (file, NULL, FALSE, NULL)
 	      : get_urls_file (file));
  status = RETROK;             /* Suppose everything is OK.  */
  *count = 0;                  /* Reset the URL count.  */
-  recursive_reset ();
+
  for (cur_url = url_list; cur_url; cur_url = cur_url->next, ++*count)
    {
-      char *filename, *new_file;
+      char *filename = NULL, *new_file;
      int dt;

      if (downloaded_exceeds_quota ())
@ -432,10 +453,10 @@ retrieve_from_file (const char *file, int html, int *count)
 	  status = QUOTEXC;
 	  break;
 	}
-      status = retrieve_url (cur_url->url, &filename, &new_file, NULL, &dt);
-      if (opt.recursive && status == RETROK && (dt & TEXTHTML))
-	status = recursive_retrieve (filename, new_file ? new_file
-				                        : cur_url->url);
+      if (opt.recursive && cur_url->url->scheme != SCHEME_FTP)
+	status = retrieve_tree (cur_url->url->url);
+      else
+	status = retrieve_url (cur_url->url->url, &filename, &new_file, NULL, &dt);

      if (filename && opt.delete_after && file_exists_p (filename))
 	{
--- a/src/url.c
+++ b/src/url.c
@ -37,6 +37,7 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */
 #include "utils.h"
 #include "url.h"
 #include "host.h"
+#include "hash.h"

 #ifndef errno
 extern int errno;
@ -182,7 +183,7 @@ encode_string_maybe (const char *s)
    {
      if (UNSAFE_CHAR (*p1))
 	{
-	  const unsigned char c = *p1++;
+	  unsigned char c = *p1++;
 	  *p2++ = '%';
 	  *p2++ = XDIGIT_TO_XCHAR (c >> 4);
 	  *p2++ = XDIGIT_TO_XCHAR (c & 0xf);
@ -378,7 +379,7 @@ reencode_string (const char *s)
 	{
 	case CM_ENCODE:
 	  {
-	    char c = *p1++;
+	    unsigned char c = *p1++;
 	    *p2++ = '%';
 	    *p2++ = XDIGIT_TO_XCHAR (c >> 4);
 	    *p2++ = XDIGIT_TO_XCHAR (c & 0xf);
@ -586,6 +587,22 @@ strpbrk_or_eos (const char *s, const char *accept)
  return p;
 }

+/* Turn STR into lowercase; return non-zero if a character was
+   actually changed. */
+
+static int
+lowercase_str (char *str)
+{
+  int change = 0;
+  for (; *str; str++)
+    if (!ISLOWER (*str))
+      {
+	change = 1;
+	*str = TOLOWER (*str);
+      }
+  return change;
+}
+
 static char *parse_errors[] = {
 #define PE_NO_ERROR            0
  "No error",
@ -614,6 +631,7 @@ url_parse (const char *url, int *error)
 {
  struct url *u;
  const char *p;
+  int path_modified, host_modified;

  enum url_scheme scheme;

@ -627,9 +645,7 @@ url_parse (const char *url, int *error)
  int port;
  char *user = NULL, *passwd = NULL;

-  const char *url_orig = url;
-
-  p = url = reencode_string (url);
+  char *url_encoded;

  scheme = url_scheme (url);
  if (scheme == SCHEME_INVALID)
@ -638,6 +654,9 @@ url_parse (const char *url, int *error)
      return NULL;
    }

+  url_encoded = reencode_string (url);
+  p = url_encoded;
+
  p += strlen (supported_schemes[scheme].leading_string);
  uname_b = p;
  p += url_skip_uname (p);
@ -749,11 +768,6 @@ url_parse (const char *url, int *error)
  u = (struct url *)xmalloc (sizeof (struct url));
  memset (u, 0, sizeof (*u));

-  if (url == url_orig)
-    u->url    = xstrdup (url);
-  else
-    u->url    = (char *)url;
-
  u->scheme = scheme;
  u->host   = strdupdelim (host_b, host_e);
  u->port   = port;
@ -761,7 +775,10 @@ url_parse (const char *url, int *error)
  u->passwd = passwd;

  u->path = strdupdelim (path_b, path_e);
-  path_simplify (u->path);
+  path_modified = path_simplify (u->path);
+  parse_path (u->path, &u->dir, &u->file);
+
+  host_modified = lowercase_str (u->host);

  if (params_b)
    u->params = strdupdelim (params_b, params_e);
@ -770,7 +787,26 @@ url_parse (const char *url, int *error)
  if (fragment_b)
    u->fragment = strdupdelim (fragment_b, fragment_e);

-  parse_path (u->path, &u->dir, &u->file);
+
+  if (path_modified || u->fragment || host_modified)
+    {
+      /* If path_simplify modified the path, or if a fragment is
+	 present, or if the original host name had caps in it, make
+	 sure that u->url is equivalent to what would be printed by
+	 url_string.  */
+      u->url = url_string (u, 0);
+
+      if (url_encoded != url)
+	xfree ((char *) url_encoded);
+    }
+  else
+    {
+      if (url_encoded == url)
+	u->url    = xstrdup (url);
+      else
+	u->url    = url_encoded;
+    }
+  url_encoded = NULL;

  return u;
 }
@ -927,17 +963,18 @@ url_free (struct url *url)
  FREE_MAYBE (url->fragment);
  FREE_MAYBE (url->user);
  FREE_MAYBE (url->passwd);
-  FREE_MAYBE (url->dir);
-  FREE_MAYBE (url->file);
+
+  xfree (url->dir);
+  xfree (url->file);

  xfree (url);
 }

-urlpos *
+struct urlpos *
 get_urls_file (const char *file)
 {
  struct file_memory *fm;
-  urlpos *head, *tail;
+  struct urlpos *head, *tail;
  const char *text, *text_end;

  /* Load the file.  */
@ -968,10 +1005,28 @@ get_urls_file (const char *file)
 	--line_end;
      if (line_end > line_beg)
 	{
-	  urlpos *entry = (urlpos *)xmalloc (sizeof (urlpos));
+	  int up_error_code;
+	  char *url_text;
+	  struct urlpos *entry;
+	  struct url *url;
+
+	  /* We must copy the URL to a zero-terminated string.  *sigh*.  */
+	  url_text = strdupdelim (line_beg, line_end);
+	  url = url_parse (url_text, &up_error_code);
+	  if (!url)
+	    {
+	      logprintf (LOG_NOTQUIET, "%s: Invalid URL %s: %s\n",
+			 file, url_text, url_error (up_error_code));
+	      xfree (url_text);
+	      continue;
+	    }
+	  xfree (url_text);
+
+	  entry = (struct urlpos *)xmalloc (sizeof (struct urlpos));
 	  memset (entry, 0, sizeof (*entry));
 	  entry->next = NULL;
-	  entry->url = strdupdelim (line_beg, line_end);
+	  entry->url = url;
+
 	  if (!head)
 	    head = entry;
 	  else
@ -985,12 +1040,13 @@ get_urls_file (const char *file)

 /* Free the linked list of urlpos.  */
 void
-free_urlpos (urlpos *l)
+free_urlpos (struct urlpos *l)
 {
  while (l)
    {
-      urlpos *next = l->next;
-      xfree (l->url);
+      struct urlpos *next = l->next;
+      if (l->url)
+	url_free (l->url);
      FREE_MAYBE (l->local_name);
      xfree (l);
      l = next;
@ -1088,7 +1144,9 @@ count_slashes (const char *s)
 static char *
 mkstruct (const struct url *u)
 {
-  char *host, *dir, *file, *res, *dirpref;
+  char *dir, *dir_preencoding;
+  char *file, *res, *dirpref;
+  char *query = u->query && *u->query ? u->query : NULL;
  int l;

  if (opt.cut_dirs)
@ -1104,36 +1162,35 @@ mkstruct (const struct url *u)
  else
    dir = u->dir + (*u->dir == '/');

-  host = xstrdup (u->host);
  /* Check for the true name (or at least a consistent name for saving
     to directory) of HOST, reusing the hlist if possible.  */
-  if (opt.add_hostdir && !opt.simple_check)
-    {
-      char *nhost = realhost (host);
-      xfree (host);
-      host = nhost;
-    }
-  /* Add dir_prefix and hostname (if required) to the beginning of
-     dir.  */
  if (opt.add_hostdir)
    {
+      /* Add dir_prefix and hostname (if required) to the beginning of
+	 dir.  */
+      dirpref = (char *)alloca (strlen (opt.dir_prefix) + 1
+				+ strlen (u->host)
+				+ 1 + numdigit (u->port)
+				+ 1);
      if (!DOTP (opt.dir_prefix))
-	{
-	  dirpref = (char *)alloca (strlen (opt.dir_prefix) + 1
-				    + strlen (host) + 1);
-	  sprintf (dirpref, "%s/%s", opt.dir_prefix, host);
-	}
+	sprintf (dirpref, "%s/%s", opt.dir_prefix, u->host);
      else
-	STRDUP_ALLOCA (dirpref, host);
+	strcpy (dirpref, u->host);
+
+      if (u->port != scheme_default_port (u->scheme))
+	{
+	  int len = strlen (dirpref);
+	  dirpref[len] = ':';
+	  long_to_string (dirpref + len + 1, u->port);
+	}
    }
-  else                         /* not add_hostdir */
+  else				/* not add_hostdir */
    {
      if (!DOTP (opt.dir_prefix))
 	dirpref = opt.dir_prefix;
      else
 	dirpref = "";
    }
-  xfree (host);

  /* If there is a prefix, prepend it.  */
  if (*dirpref)
@ -1142,7 +1199,10 @@ mkstruct (const struct url *u)
      sprintf (newdir, "%s%s%s", dirpref, *dir == '/' ? "" : "/", dir);
      dir = newdir;
    }
-  dir = encode_string (dir);
+
+  dir_preencoding = dir;
+  dir = reencode_string (dir_preencoding);
+
  l = strlen (dir);
  if (l && dir[l - 1] == '/')
    dir[l - 1] = '\0';
@ -1153,9 +1213,17 @@ mkstruct (const struct url *u)
    file = u->file;

  /* Finally, construct the full name.  */
-  res = (char *)xmalloc (strlen (dir) + 1 + strlen (file) + 1);
+  res = (char *)xmalloc (strlen (dir) + 1 + strlen (file)
+			 + (query ? (1 + strlen (query)) : 0)
+			 + 1);
  sprintf (res, "%s%s%s", dir, *dir ? "/" : "", file);
-  xfree (dir);
+  if (query)
+    {
+      strcat (res, "?");
+      strcat (res, query);
+    }
+  if (dir != dir_preencoding)
+    xfree (dir);
  return res;
 }

@ -1177,7 +1245,7 @@ compose_file_name (char *base, char *query)
    {
      if (UNSAFE_CHAR (*from))
 	{
-	  const unsigned char c = *from++;
+	  unsigned char c = *from++;
 	  *to++ = '%';
 	  *to++ = XDIGIT_TO_XCHAR (c >> 4);
 	  *to++ = XDIGIT_TO_XCHAR (c & 0xf);
@ -1282,10 +1350,8 @@ url_filename (const struct url *u)
 static int
 urlpath_length (const char *url)
 {
-  const char *q = strchr (url, '?');
-  if (q)
-    return q - url;
-  return strlen (url);
+  const char *q = strpbrk_or_eos (url, "?;#");
+  return q - url;
 }

 /* Find the last occurrence of character C in the range [b, e), or
@ -1323,63 +1389,42 @@ uri_merge_1 (const char *base, const char *link, int linklength, int no_scheme)
    {
      const char *end = base + urlpath_length (base);

-      if (*link != '/')
+      if (!*link)
 	{
-	  /* LINK is a relative URL: we need to replace everything
-	     after last slash (possibly empty) with LINK.
-
-	     So, if BASE is "whatever/foo/bar", and LINK is "qux/xyzzy",
-	     our result should be "whatever/foo/qux/xyzzy".  */
-	  int need_explicit_slash = 0;
-	  int span;
-	  const char *start_insert;
-	  const char *last_slash = find_last_char (base, end, '/');
-	  if (!last_slash)
-	    {
-	      /* No slash found at all.  Append LINK to what we have,
-		 but we'll need a slash as a separator.
-
-		 Example: if base == "foo" and link == "qux/xyzzy", then
-		 we cannot just append link to base, because we'd get
-		 "fooqux/xyzzy", whereas what we want is
-		 "foo/qux/xyzzy".
-
-		 To make sure the / gets inserted, we set
-		 need_explicit_slash to 1.  We also set start_insert
-		 to end + 1, so that the length calculations work out
-		 correctly for one more (slash) character.  Accessing
-		 that character is fine, since it will be the
-		 delimiter, '\0' or '?'.  */
-	      /* example: "foo?..." */
-	      /*               ^    ('?' gets changed to '/') */
-	      start_insert = end + 1;
-	      need_explicit_slash = 1;
-	    }
-	  else if (last_slash && last_slash != base && *(last_slash - 1) == '/')
-	    {
-	      /* example: http://host"  */
-	      /*                      ^ */
-	      start_insert = end + 1;
-	      need_explicit_slash = 1;
-	    }
-	  else
-	    {
-	      /* example: "whatever/foo/bar" */
-	      /*                        ^    */
-	      start_insert = last_slash + 1;
-	    }
-
-	  span = start_insert - base;
-	  constr = (char *)xmalloc (span + linklength + 1);
-	  if (span)
-	    memcpy (constr, base, span);
-	  if (need_explicit_slash)
-	    constr[span - 1] = '/';
-	  if (linklength)
-	    memcpy (constr + span, link, linklength);
-	  constr[span + linklength] = '\0';
+	  /* Empty LINK points back to BASE, query string and all. */
+	  constr = xstrdup (base);
 	}
-      else /* *link == `/' */
+      else if (*link == '?')
+	{
+	  /* LINK points to the same location, but changes the query
+	     string.  Examples: */
+	  /* uri_merge("path",         "?new") -> "path?new"     */
+	  /* uri_merge("path?foo",     "?new") -> "path?new"     */
+	  /* uri_merge("path?foo#bar", "?new") -> "path?new"     */
+	  /* uri_merge("path#foo",     "?new") -> "path?new"     */
+	  int baselength = end - base;
+	  constr = xmalloc (baselength + linklength + 1);
+	  memcpy (constr, base, baselength);
+	  memcpy (constr + baselength, link, linklength);
+	  constr[baselength + linklength] = '\0';
+	}
+      else if (*link == '#')
+	{
+	  /* uri_merge("path",         "#new") -> "path#new"     */
+	  /* uri_merge("path#foo",     "#new") -> "path#new"     */
+	  /* uri_merge("path?foo",     "#new") -> "path?foo#new" */
+	  /* uri_merge("path?foo#bar", "#new") -> "path?foo#new" */
+	  int baselength;
+	  const char *end1 = strchr (base, '#');
+	  if (!end1)
+	    end1 = base + strlen (base);
+	  baselength = end1 - base;
+	  constr = xmalloc (baselength + linklength + 1);
+	  memcpy (constr, base, baselength);
+	  memcpy (constr + baselength, link, linklength);
+	  constr[baselength + linklength] = '\0';
+	}
+      else if (*link == '/')
 	{
 	  /* LINK is an absolute path: we need to replace everything
             after (and including) the FIRST slash with LINK.
@ -1435,6 +1480,62 @@ uri_merge_1 (const char *base, const char *link, int linklength, int no_scheme)
 	    memcpy (constr + span, link, linklength);
 	  constr[span + linklength] = '\0';
 	}
+      else
+	{
+	  /* LINK is a relative URL: we need to replace everything
+	     after last slash (possibly empty) with LINK.
+
+	     So, if BASE is "whatever/foo/bar", and LINK is "qux/xyzzy",
+	     our result should be "whatever/foo/qux/xyzzy".  */
+	  int need_explicit_slash = 0;
+	  int span;
+	  const char *start_insert;
+	  const char *last_slash = find_last_char (base, end, '/');
+	  if (!last_slash)
+	    {
+	      /* No slash found at all.  Append LINK to what we have,
+		 but we'll need a slash as a separator.
+
+		 Example: if base == "foo" and link == "qux/xyzzy", then
+		 we cannot just append link to base, because we'd get
+		 "fooqux/xyzzy", whereas what we want is
+		 "foo/qux/xyzzy".
+
+		 To make sure the / gets inserted, we set
+		 need_explicit_slash to 1.  We also set start_insert
+		 to end + 1, so that the length calculations work out
+		 correctly for one more (slash) character.  Accessing
+		 that character is fine, since it will be the
+		 delimiter, '\0' or '?'.  */
+	      /* example: "foo?..." */
+	      /*               ^    ('?' gets changed to '/') */
+	      start_insert = end + 1;
+	      need_explicit_slash = 1;
+	    }
+	  else if (last_slash && last_slash != base && *(last_slash - 1) == '/')
+	    {
+	      /* example: http://host"  */
+	      /*                      ^ */
+	      start_insert = end + 1;
+	      need_explicit_slash = 1;
+	    }
+	  else
+	    {
+	      /* example: "whatever/foo/bar" */
+	      /*                        ^    */
+	      start_insert = last_slash + 1;
+	    }
+
+	  span = start_insert - base;
+	  constr = (char *)xmalloc (span + linklength + 1);
+	  if (span)
+	    memcpy (constr, base, span);
+	  if (need_explicit_slash)
+	    constr[span - 1] = '/';
+	  if (linklength)
+	    memcpy (constr + span, link, linklength);
+	  constr[span + linklength] = '\0';
+	}
    }
  else /* !no_scheme */
    {
@ -1602,12 +1703,13 @@ static void replace_attr PARAMS ((const char **, int, FILE *, const char *));
 /* Change the links in an HTML document.  Accepts a structure that
   defines the positions of all the links.  */
 void
-convert_links (const char *file, urlpos *l)
+convert_links (const char *file, struct urlpos *l)
 {
  struct file_memory *fm;
  FILE               *fp;
  const char         *p;
  downloaded_file_t  downloaded_file_return;
+  int to_url_count = 0, to_file_count = 0;

  logprintf (LOG_VERBOSE, _("Converting %s... "), file);

@ -1615,12 +1717,12 @@ convert_links (const char *file, urlpos *l)
    /* First we do a "dry run": go through the list L and see whether
       any URL needs to be converted in the first place.  If not, just
       leave the file alone.  */
-    int count = 0;
-    urlpos *dry = l;
+    int dry_count = 0;
+    struct urlpos *dry = l;
    for (dry = l; dry; dry = dry->next)
      if (dry->convert != CO_NOCONVERT)
-	++count;
-    if (!count)
+	++dry_count;
+    if (!dry_count)
      {
 	logputs (LOG_VERBOSE, _("nothing to do.\n"));
 	return;
@ -1674,7 +1776,7 @@ convert_links (const char *file, urlpos *l)
      /* If the URL is not to be converted, skip it.  */
      if (l->convert == CO_NOCONVERT)
 	{
-	  DEBUGP (("Skipping %s at position %d.\n", l->url, l->pos));
+	  DEBUGP (("Skipping %s at position %d.\n", l->url->url, l->pos));
 	  continue;
 	}

@ -1689,19 +1791,21 @@ convert_links (const char *file, urlpos *l)
 	  char *quoted_newname = html_quote_string (newname);
 	  replace_attr (&p, l->size, fp, quoted_newname);
 	  DEBUGP (("TO_RELATIVE: %s to %s at position %d in %s.\n",
-		   l->url, newname, l->pos, file));
+		   l->url->url, newname, l->pos, file));
 	  xfree (newname);
 	  xfree (quoted_newname);
+	  ++to_file_count;
 	}
      else if (l->convert == CO_CONVERT_TO_COMPLETE)
 	{
 	  /* Convert the link to absolute URL. */
-	  char *newlink = l->url;
+	  char *newlink = l->url->url;
 	  char *quoted_newlink = html_quote_string (newlink);
 	  replace_attr (&p, l->size, fp, quoted_newlink);
 	  DEBUGP (("TO_COMPLETE: <something> to %s at position %d in %s.\n",
 		   newlink, l->pos, file));
 	  xfree (quoted_newlink);
+	  ++to_url_count;
 	}
    }
  /* Output the rest of the file. */
@ -1709,7 +1813,8 @@ convert_links (const char *file, urlpos *l)
    fwrite (p, 1, fm->length - (p - fm->content), fp);
  fclose (fp);
  read_file_free (fm);
-  logputs (LOG_VERBOSE, _("done.\n"));
+  logprintf (LOG_VERBOSE,
+	     _("%d-%d\n"), to_file_count, to_url_count);
 }

 /* Construct and return a malloced copy of the relative link from two
@ -1766,20 +1871,6 @@ construct_relative (const char *s1, const char *s2)
  return res;
 }

-/* Add URL to the head of the list L.  */
-urlpos *
-add_url (urlpos *l, const char *url, const char *file)
-{
-  urlpos *t;
-
-  t = (urlpos *)xmalloc (sizeof (urlpos));
-  memset (t, 0, sizeof (*t));
-  t->url = xstrdup (url);
-  t->local_name = xstrdup (file);
-  t->next = l;
-  return t;
-}
-
 static void
 write_backup_file (const char *file, downloaded_file_t downloaded_file_return)
 {
@ -1850,15 +1941,9 @@ write_backup_file (const char *file, downloaded_file_t downloaded_file_return)
 	 -- Dan Harkless <wget@harkless.org>

         This [adding a field to the urlpos structure] didn't work
-         because convert_file() is called twice: once after all its
-         sublinks have been retrieved in recursive_retrieve(), and
-         once at the end of the day in convert_all_links().  The
-         original linked list collected in recursive_retrieve() is
-         lost after the first invocation of convert_links(), and
-         convert_all_links() makes a new one (it calls get_urls_html()
-         for each file it covers.)  That's why your first approach didn't
-         work.  The way to make it work is perhaps to make this flag a
-         field in the `urls_html' list.
+         because convert_file() is called from convert_all_links at
+         the end of the retrieval with a freshly built new urlpos
+         list.
 	 -- Hrvoje Niksic <hniksic@arsdigita.com>
      */
      converted_file_ptr = xmalloc(sizeof(*converted_file_ptr));
@ -1941,13 +2026,40 @@ find_fragment (const char *beg, int size, const char **bp, const char **ep)
  return 0;
 }

-typedef struct _downloaded_file_list {
-  char*                          file;
-  downloaded_file_t              download_type;
-  struct _downloaded_file_list*  next;
-} downloaded_file_list;
+/* We're storing "modes" of type downloaded_file_t in the hash table.
+   However, our hash tables only accept pointers for keys and values.
+   So when we need a pointer, we use the address of a
+   downloaded_file_t variable of static storage.  */
+   
+static downloaded_file_t *
+downloaded_mode_to_ptr (downloaded_file_t mode)
+{
+  static downloaded_file_t
+    v1 = FILE_NOT_ALREADY_DOWNLOADED,
+    v2 = FILE_DOWNLOADED_NORMALLY,
+    v3 = FILE_DOWNLOADED_AND_HTML_EXTENSION_ADDED,
+    v4 = CHECK_FOR_FILE;

-static downloaded_file_list *downloaded_files;
+  switch (mode)
+    {
+    case FILE_NOT_ALREADY_DOWNLOADED:
+      return &v1;
+    case FILE_DOWNLOADED_NORMALLY:
+      return &v2;
+    case FILE_DOWNLOADED_AND_HTML_EXTENSION_ADDED:
+      return &v3;
+    case CHECK_FOR_FILE:
+      return &v4;
+    }
+  return NULL;
+}
+
+/* This should really be merged with dl_file_url_map and
+   downloaded_html_files in recur.c.  This was originally a list, but
+   I changed it to a hash table beause it was actually taking a lot of
+   time to find things in it.  */
+
+static struct hash_table *downloaded_files_hash;

 /* Remembers which files have been downloaded.  In the standard case, should be
   called with mode == FILE_DOWNLOADED_NORMALLY for each file we actually
@ -1962,46 +2074,47 @@ static downloaded_file_list *downloaded_files;
   it, call with mode == CHECK_FOR_FILE.  Please be sure to call this function
   with local filenames, not remote URLs. */
 downloaded_file_t
-downloaded_file (downloaded_file_t  mode, const char*  file)
+downloaded_file (downloaded_file_t mode, const char *file)
 {
-  boolean                       found_file = FALSE;
-  downloaded_file_list*         rover = downloaded_files;
+  downloaded_file_t *ptr;

-  while (rover != NULL)
-    if (strcmp(rover->file, file) == 0)
-      {
-	found_file = TRUE;
-	break;
-      }
-    else
-      rover = rover->next;
-
-  if (found_file)
-    return rover->download_type;  /* file had already been downloaded */
-  else
+  if (mode == CHECK_FOR_FILE)
    {
-      if (mode != CHECK_FOR_FILE)
-	{
-	  rover = xmalloc(sizeof(*rover));
-	  rover->file = xstrdup(file); /* use xstrdup() so die on out-of-mem. */
-	  rover->download_type = mode;
-	  rover->next = downloaded_files;
-	  downloaded_files = rover;
-	}
-
-      return FILE_NOT_ALREADY_DOWNLOADED;
+      if (!downloaded_files_hash)
+	return FILE_NOT_ALREADY_DOWNLOADED;
+      ptr = hash_table_get (downloaded_files_hash, file);
+      if (!ptr)
+	return FILE_NOT_ALREADY_DOWNLOADED;
+      return *ptr;
    }
+
+  if (!downloaded_files_hash)
+    downloaded_files_hash = make_string_hash_table (0);
+
+  ptr = hash_table_get (downloaded_files_hash, file);
+  if (ptr)
+    return *ptr;
+
+  ptr = downloaded_mode_to_ptr (mode);
+  hash_table_put (downloaded_files_hash, xstrdup (file), &ptr);
+
+  return FILE_NOT_ALREADY_DOWNLOADED;
+}
+
+static int
+df_free_mapper (void *key, void *value, void *ignored)
+{
+  xfree (key);
+  return 0;
 }

 void
 downloaded_files_free (void)
 {
-  downloaded_file_list*         rover = downloaded_files;
-  while (rover)
+  if (downloaded_files_hash)
    {
-      downloaded_file_list *next = rover->next;
-      xfree (rover->file);
-      xfree (rover);
-      rover = next;
+      hash_table_map (downloaded_files_hash, df_free_mapper, NULL);
+      hash_table_destroy (downloaded_files_hash);
+      downloaded_files_hash = NULL;
    }
 }
--- a/src/url.h
+++ b/src/url.h
@ -72,11 +72,11 @@ enum convert_options {
 /* A structure that defines the whereabouts of a URL, i.e. its
   position in an HTML document, etc.  */

-typedef struct _urlpos
-{
-  char *url;			/* linked URL, after it has been
-				   merged with the base */
-  char *local_name;		/* Local file to which it was saved */
+struct urlpos {
+  struct url *url;		/* the URL of the link, after it has
+				   been merged with the base */
+  char *local_name;		/* local file to which it was saved
+				   (used by convert_links) */

  /* Information about the original link: */
  int link_relative_p;		/* was the link relative? */
@ -89,8 +89,8 @@ typedef struct _urlpos
  /* URL's position in the buffer. */
  int pos, size;

-  struct _urlpos *next;		/* Next struct in list */
-} urlpos;
+  struct urlpos *next;		/* next list element */
+};

 /* downloaded_file() takes a parameter of this type and returns this type. */
 typedef enum
@ -126,9 +126,9 @@ int url_skip_uname PARAMS ((const char *));

 char *url_string PARAMS ((const struct url *, int));

-urlpos *get_urls_file PARAMS ((const char *));
-urlpos *get_urls_html PARAMS ((const char *, const char *, int, int *));
-void free_urlpos PARAMS ((urlpos *));
+struct urlpos *get_urls_file PARAMS ((const char *));
+struct urlpos *get_urls_html PARAMS ((const char *, const char *, int, int *));
+void free_urlpos PARAMS ((struct urlpos *));

 char *uri_merge PARAMS ((const char *, const char *));

@ -136,11 +136,10 @@ void rotate_backups PARAMS ((const char *));
 int mkalldirs PARAMS ((const char *));
 char *url_filename PARAMS ((const struct url *));

-char *getproxy PARAMS ((uerr_t));
+char *getproxy PARAMS ((enum url_scheme));
 int no_proxy_match PARAMS ((const char *, const char **));

-void convert_links PARAMS ((const char *, urlpos *));
-urlpos *add_url PARAMS ((urlpos *, const char *, const char *));
+void convert_links PARAMS ((const char *, struct urlpos *));

 downloaded_file_t downloaded_file PARAMS ((downloaded_file_t, const char *));

--- a/src/utils.c
+++ b/src/utils.c
@ -307,6 +307,18 @@ xstrdup_debug (const char *s, const char *source_file, int source_line)

 #endif /* DEBUG_MALLOC */

+/* Utility function: like xstrdup(), but also lowercases S.  */
+
+char *
+xstrdup_lower (const char *s)
+{
+  char *copy = xstrdup (s);
+  char *p = copy;
+  for (; *p; p++)
+    *p = TOLOWER (*p);
+  return copy;
+}
+
 /* Copy the string formed by two pointers (one on the beginning, other
   on the char after the last char) to a new, malloc-ed location.
   0-terminate it.  */
@ -443,6 +455,8 @@ fork_to_background (void)
 }
 #endif /* not WINDOWS */

+#if 0
+/* debug */
 char *
 ps (char *orig)
 {
@ -450,6 +464,7 @@ ps (char *orig)
  path_simplify (r);
  return r;
 }
+#endif

 /* Canonicalize PATH, and return a new path.  The new path differs from PATH
   in that:
@ -468,45 +483,31 @@ ps (char *orig)
 	Change the original string instead of strdup-ing.
 	React correctly when beginning with `./' and `../'.
 	Don't zip out trailing slashes.  */
-void
+int
 path_simplify (char *path)
 {
-  register int i, start, ddot;
+  register int i, start;
+  int changes = 0;
  char stub_char;

  if (!*path)
-    return;
+    return 0;

-  /*stub_char = (*path == '/') ? '/' : '.';*/
  stub_char = '/';

-  /* Addition: Remove all `./'-s preceding the string.  If `../'-s
-     precede, put `/' in front and remove them too.  */
-  i = 0;
-  ddot = 0;
-  while (1)
-    {
-      if (path[i] == '.' && path[i + 1] == '/')
-	i += 2;
-      else if (path[i] == '.' && path[i + 1] == '.' && path[i + 2] == '/')
-	{
-	  i += 3;
-	  ddot = 1;
-	}
-      else
-	break;
-    }
-  if (i)
-    strcpy (path, path + i - ddot);
+  if (path[0] == '/')
+    /* Preserve initial '/'. */
+    ++path;

-  /* Replace single `.' or `..' with `/'.  */
+  /* Nix out leading `.' or `..' with.  */
  if ((path[0] == '.' && path[1] == '\0')
      || (path[0] == '.' && path[1] == '.' && path[2] == '\0'))
    {
-      path[0] = stub_char;
-      path[1] = '\0';
-      return;
+      path[0] = '\0';
+      changes = 1;
+      return changes;
    }
+
  /* Walk along PATH looking for things to compact.  */
  i = 0;
  while (1)
@ -531,6 +532,7 @@ path_simplify (char *path)
 	{
 	  strcpy (path + start + 1, path + i);
 	  i = start + 1;
+	  changes = 1;
 	}

      /* Check for `../', `./' or trailing `.' by itself.  */
@ -540,6 +542,7 @@ path_simplify (char *path)
 	  if (!path[i + 1])
 	    {
 	      path[--i] = '\0';
+	      changes = 1;
 	      break;
 	    }

@ -548,6 +551,7 @@ path_simplify (char *path)
 	    {
 	      strcpy (path + i, path + i + 1);
 	      i = (start < 0) ? 0 : start;
+	      changes = 1;
 	      continue;
 	    }

@ -556,12 +560,32 @@ path_simplify (char *path)
 	      (path[i + 2] == '/' || !path[i + 2]))
 	    {
 	      while (--start > -1 && path[start] != '/');
-	      strcpy (path + start + 1, path + i + 2);
+	      strcpy (path + start + 1, path + i + 2 + (start == -1 && path[i + 2]));
 	      i = (start < 0) ? 0 : start;
+	      changes = 1;
 	      continue;
 	    }
 	}	/* path == '.' */
    } /* while */
+
+  /* Addition: Remove all `./'-s and `../'-s preceding the string.  */
+  i = 0;
+  while (1)
+    {
+      if (path[i] == '.' && path[i + 1] == '/')
+	i += 2;
+      else if (path[i] == '.' && path[i + 1] == '.' && path[i + 2] == '/')
+	i += 3;
+      else
+	break;
+    }
+  if (i)
+    {
+      strcpy (path, path + i - 0);
+      changes = 1;
+    }
+
+  return changes;
 }

 /* "Touch" FILE, i.e. make its atime and mtime equal to the time
--- a/src/utils.h
+++ b/src/utils.h
@ -48,12 +48,13 @@ char *datetime_str PARAMS ((time_t *));
 void print_malloc_debug_stats ();
 #endif

+char *xstrdup_lower PARAMS ((const char *));
 char *strdupdelim PARAMS ((const char *, const char *));
 char **sepstring PARAMS ((const char *));
 int frontcmp PARAMS ((const char *, const char *));
 char *pwd_cuserid PARAMS ((char *));
 void fork_to_background PARAMS ((void));
-void path_simplify PARAMS ((char *));
+int path_simplify PARAMS ((char *));

 void touch PARAMS ((const char *, time_t));
 int remove_link PARAMS ((const char *));
@ -98,4 +99,6 @@ long wtimer_granularity PARAMS ((void));

 char *html_quote_string PARAMS ((const char *));

+int determine_screen_width PARAMS ((void));
+
 #endif /* UTILS_H */
--- a/src/wget.h
+++ b/src/wget.h
@ -28,6 +28,11 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */
 # define NDEBUG /* To kill off assertions */
 #endif /* not DEBUG */

+/* Define this if you want primitive but extensive malloc debugging.
+   It will make Wget extremely slow, so only do it in development
+   builds.  */
+#undef DEBUG_MALLOC
+
 #ifndef PARAMS
 # if PROTOTYPES
 #  define PARAMS(args) args
@ -60,7 +65,7 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.  */

   3) Finally, the debug messages are meant to be a clue for me to
   debug problems with Wget.  If I get them in a language I don't
-   understand, debugging will become a new challenge of its own!  :-) */
+   understand, debugging will become a new challenge of its own!  */


 /* Include these, so random files need not include them.  */