mirror of
https://github.com/moparisthebest/wget
synced 2024-07-03 16:38:41 -04:00
[svn] Fix the broken URLs that pointed to info.webcrawler.com to point to
the new www.robotstxt.org site.
This commit is contained in:
parent
3ddcea34a4
commit
3b44ca73ab
2
NEWS
2
NEWS
@ -56,7 +56,7 @@ addresses when accessing the first one fails.
|
|||||||
non-standard port.
|
non-standard port.
|
||||||
|
|
||||||
** Wget now supports the robots.txt directives specified in
|
** Wget now supports the robots.txt directives specified in
|
||||||
<http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>.
|
<http://www.robotstxt.org/wc/norobots-rfc.txt>.
|
||||||
|
|
||||||
** URL parser has been fixed, especially the infamous overzealous
|
** URL parser has been fixed, especially the infamous overzealous
|
||||||
quoting. Wget no longer dequotes reserved characters, e.g. `%3F' is
|
quoting. Wget no longer dequotes reserved characters, e.g. `%3F' is
|
||||||
|
@ -1,3 +1,8 @@
|
|||||||
|
2001-12-13 Hrvoje Niksic <hniksic@arsdigita.com>
|
||||||
|
|
||||||
|
* wget.texi (Robots): Fix broken URLs that point to the webcrawler
|
||||||
|
web site.
|
||||||
|
|
||||||
2001-12-11 Hrvoje Niksic <hniksic@arsdigita.com>
|
2001-12-11 Hrvoje Niksic <hniksic@arsdigita.com>
|
||||||
|
|
||||||
* wget.texi (HTTP Options): Explain how to make IE produce a
|
* wget.texi (HTTP Options): Explain how to make IE produce a
|
||||||
|
@ -2743,12 +2743,12 @@ server.
|
|||||||
|
|
||||||
Until version 1.8, Wget supported the first version of the standard,
|
Until version 1.8, Wget supported the first version of the standard,
|
||||||
written by Martijn Koster in 1994 and available at
|
written by Martijn Koster in 1994 and available at
|
||||||
@url{http://info.webcrawler.com/mak/projects/robots/norobots.html}. As
|
@url{http://www.robotstxt.org/wc/norobots.html}. As of version 1.8,
|
||||||
of version 1.8, Wget has supported the additional directives specified
|
Wget has supported the additional directives specified in the internet
|
||||||
in the internet draft @samp{<draft-koster-robots-00.txt>} titled ``A
|
draft @samp{<draft-koster-robots-00.txt>} titled ``A Method for Web
|
||||||
Method for Web Robots Control''. The draft, which has as far as I know
|
Robots Control''. The draft, which has as far as I know never made to
|
||||||
never made to an @sc{rfc}, is available at
|
an @sc{rfc}, is available at
|
||||||
@url{http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html}.
|
@url{http://www.robotstxt.org/wc/norobots-rfc.txt}.
|
||||||
|
|
||||||
This manual no longer includes the text of the Robot Exclusion Standard.
|
This manual no longer includes the text of the Robot Exclusion Standard.
|
||||||
|
|
||||||
@ -2762,9 +2762,9 @@ this:
|
|||||||
@end example
|
@end example
|
||||||
|
|
||||||
This is explained in some detail at
|
This is explained in some detail at
|
||||||
@url{http://info.webcrawler.com/mak/projects/robots/meta-user.html}.
|
@url{http://www.robotstxt.org/wc/meta-user.html}. Wget supports this
|
||||||
Wget supports this method of robot exclusion in addition to the usual
|
method of robot exclusion in addition to the usual @file{/robots.txt}
|
||||||
@file{/robots.txt} exclusion.
|
exclusion.
|
||||||
|
|
||||||
@node Security Considerations, Contributors, Robots, Appendices
|
@node Security Considerations, Contributors, Robots, Appendices
|
||||||
@section Security Considerations
|
@section Security Considerations
|
||||||
|
@ -1,3 +1,7 @@
|
|||||||
|
2001-12-13 Hrvoje Niksic <hniksic@arsdigita.com>
|
||||||
|
|
||||||
|
* res.c (matches): Fix broken URL in the docstring.
|
||||||
|
|
||||||
2001-12-13 Hrvoje Niksic <hniksic@arsdigita.com>
|
2001-12-13 Hrvoje Niksic <hniksic@arsdigita.com>
|
||||||
|
|
||||||
* html-url.c (tag_url_attributes): Mark <embed href=...> as
|
* html-url.c (tag_url_attributes): Mark <embed href=...> as
|
||||||
|
@ -422,8 +422,7 @@ free_specs (struct robot_specs *specs)
|
|||||||
|
|
||||||
/* The inner matching engine: return non-zero if RECORD_PATH matches
|
/* The inner matching engine: return non-zero if RECORD_PATH matches
|
||||||
URL_PATH. The rules for matching are described at
|
URL_PATH. The rules for matching are described at
|
||||||
<http://info.webcrawler.com/mak/projects/robots/norobots-rfc.html>,
|
<http://www.robotstxt.org/wc/norobots-rfc.txt>, section 3.2.2. */
|
||||||
section 3.2.2. */
|
|
||||||
|
|
||||||
static int
|
static int
|
||||||
matches (const char *record_path, const char *url_path)
|
matches (const char *record_path, const char *url_path)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user