mirror of
https://github.com/moparisthebest/wget
synced 2024-07-03 16:38:41 -04:00
[svn] Improve documentation of "reserved" and "unsafe" chars.
This commit is contained in:
parent
ab15dd054b
commit
99625a869b
32
src/url.c
32
src/url.c
@ -76,20 +76,34 @@ static struct scheme_data supported_schemes[] =
|
|||||||
|
|
||||||
static int path_simplify PARAMS ((char *));
|
static int path_simplify PARAMS ((char *));
|
||||||
|
|
||||||
/* Support for encoding and decoding of URL strings. We determine
|
/* Support for escaping and unescaping of URL strings. */
|
||||||
whether a character is unsafe through static table lookup. This
|
|
||||||
code assumes ASCII character set and 8-bit chars.
|
|
||||||
|
|
||||||
Note that rfc2396 chose a different terminology from rfc1738. The
|
/* Table of "reserved" and "unsafe" characters. Those terms are
|
||||||
recoding that URL does should be compliant with both specs,
|
rfc1738-speak, as such largely obsoleted by rfc2396 and later
|
||||||
although escaping the "unsafe" ("unreserved" in rfc2396 parlance)
|
specs, but the general idea remains.
|
||||||
chars where not strictly necessary is now frowned upon. */
|
|
||||||
|
A reserved character is the one that you can't decode without
|
||||||
|
changing the meaning of the URL. For example, you can't decode
|
||||||
|
"/foo/%2f/bar" into "/foo///bar" because the number and contents of
|
||||||
|
path components is different. Non-reserved characters can be
|
||||||
|
changed, so "/foo/%78/bar" is safe to change to "/foo/x/bar". Wget
|
||||||
|
uses the rfc1738 set of reserved characters, plus "$" and ",", as
|
||||||
|
recommended by rfc2396.
|
||||||
|
|
||||||
|
An unsafe characters is the one that should be encoded when URLs
|
||||||
|
are placed in foreign environments. E.g. space and newline are
|
||||||
|
unsafe in HTTP contexts because HTTP uses them as separator and
|
||||||
|
terminator, so they must be encoded to %20 and %0A respectively.
|
||||||
|
"*" is unsafe in shell context, etc.
|
||||||
|
|
||||||
|
We determine whether a character is unsafe through static table
|
||||||
|
lookup. This code assumes ASCII character set and 8-bit chars. */
|
||||||
|
|
||||||
enum {
|
enum {
|
||||||
/* rfc1738 reserved chars, preserved from encoding. */
|
/* rfc1738 reserved chars + "$" and ",". */
|
||||||
urlchr_reserved = 1,
|
urlchr_reserved = 1,
|
||||||
|
|
||||||
/* rfc1738 unsafe chars, plus some more. */
|
/* rfc1738 unsafe chars, plus non-printables. */
|
||||||
urlchr_unsafe = 2
|
urlchr_unsafe = 2
|
||||||
};
|
};
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user