1
0
mirror of https://github.com/moparisthebest/wget synced 2024-07-03 16:38:41 -04:00

[svn] Fix the broken URLs that pointed to info.webcrawler.com to point to

the new www.robotstxt.org site.
This commit is contained in:
hniksic 2001-12-12 23:29:05 -08:00
parent 3ddcea34a4
commit 3b44ca73ab
5 changed files with 20 additions and 12 deletions

2
NEWS
View File

@ -56,7 +56,7 @@ addresses when accessing the first one fails.
non-standard port. non-standard port.
** Wget now supports the robots.txt directives specified in ** Wget now supports the robots.txt directives specified in
<http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>. <http://www.robotstxt.org/wc/norobots-rfc.txt>.
** URL parser has been fixed, especially the infamous overzealous ** URL parser has been fixed, especially the infamous overzealous
quoting. Wget no longer dequotes reserved characters, e.g. `%3F' is quoting. Wget no longer dequotes reserved characters, e.g. `%3F' is

View File

@ -1,3 +1,8 @@
2001-12-13 Hrvoje Niksic <hniksic@arsdigita.com>
* wget.texi (Robots): Fix broken URLs that point to the webcrawler
web site.
2001-12-11 Hrvoje Niksic <hniksic@arsdigita.com> 2001-12-11 Hrvoje Niksic <hniksic@arsdigita.com>
* wget.texi (HTTP Options): Explain how to make IE produce a * wget.texi (HTTP Options): Explain how to make IE produce a

View File

@ -2743,12 +2743,12 @@ server.
Until version 1.8, Wget supported the first version of the standard, Until version 1.8, Wget supported the first version of the standard,
written by Martijn Koster in 1994 and available at written by Martijn Koster in 1994 and available at
@url{http://info.webcrawler.com/mak/projects/robots/norobots.html}. As @url{http://www.robotstxt.org/wc/norobots.html}. As of version 1.8,
of version 1.8, Wget has supported the additional directives specified Wget has supported the additional directives specified in the internet
in the internet draft @samp{<draft-koster-robots-00.txt>} titled ``A draft @samp{<draft-koster-robots-00.txt>} titled ``A Method for Web
Method for Web Robots Control''. The draft, which has as far as I know Robots Control''. The draft, which has as far as I know never made to
never made to an @sc{rfc}, is available at an @sc{rfc}, is available at
@url{http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html}. @url{http://www.robotstxt.org/wc/norobots-rfc.txt}.
This manual no longer includes the text of the Robot Exclusion Standard. This manual no longer includes the text of the Robot Exclusion Standard.
@ -2762,9 +2762,9 @@ this:
@end example @end example
This is explained in some detail at This is explained in some detail at
@url{http://info.webcrawler.com/mak/projects/robots/meta-user.html}. @url{http://www.robotstxt.org/wc/meta-user.html}. Wget supports this
Wget supports this method of robot exclusion in addition to the usual method of robot exclusion in addition to the usual @file{/robots.txt}
@file{/robots.txt} exclusion. exclusion.
@node Security Considerations, Contributors, Robots, Appendices @node Security Considerations, Contributors, Robots, Appendices
@section Security Considerations @section Security Considerations

View File

@ -1,3 +1,7 @@
2001-12-13 Hrvoje Niksic <hniksic@arsdigita.com>
* res.c (matches): Fix broken URL in the docstring.
2001-12-13 Hrvoje Niksic <hniksic@arsdigita.com> 2001-12-13 Hrvoje Niksic <hniksic@arsdigita.com>
* html-url.c (tag_url_attributes): Mark <embed href=...> as * html-url.c (tag_url_attributes): Mark <embed href=...> as

View File

@ -422,8 +422,7 @@ free_specs (struct robot_specs *specs)
/* The inner matching engine: return non-zero if RECORD_PATH matches /* The inner matching engine: return non-zero if RECORD_PATH matches
URL_PATH. The rules for matching are described at URL_PATH. The rules for matching are described at
<http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>, <http://www.robotstxt.org/wc/norobots-rfc.txt>, section 3.2.2. */
section 3.2.2. */
static int static int
matches (const char *record_path, const char *url_path) matches (const char *record_path, const char *url_path)