From c784c334d3ddaeb6c628931eca87056e152181fe Mon Sep 17 00:00:00 2001 From: Micah Cowan Date: Tue, 28 Jul 2009 00:19:48 -0700 Subject: [PATCH] Document new features in --restrict-file-names. --- ChangeLog | 5 +++++ NEWS | 11 ++++++++--- doc/ChangeLog | 5 +++++ doc/wget.texi | 52 ++++++++++++++++++++++++++++++++++----------------- 4 files changed, 53 insertions(+), 20 deletions(-) diff --git a/ChangeLog b/ChangeLog index bbc6e00f..1eeb63ca 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,8 @@ +2009-07-28 Micah Cowan + + * NEWS: Mention some more previously undocumented items, and the + new "ascii" specifer for --restrict-file-names. + 2009-07-27 Petr Pisar * po/Makevars (MSGID_BUGS_ADDRESS): Fixed. diff --git a/NEWS b/NEWS index 2d0092f4..df2c7742 100644 --- a/NEWS +++ b/NEWS @@ -1,7 +1,7 @@ GNU Wget NEWS -- history of user-visible changes. Copyright (C) 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, -2006, 2007, 2008 Free Software Foundation, Inc. +2006, 2007, 2008, 2009 Free Software Foundation, Inc. See the end for copying conditions. Please send GNU Wget bug reports to . @@ -41,8 +41,13 @@ an external file. information on how it was built, and the set of configure-time options that were selected. -** Several previously existing, but undocumented .wgetrc options -are now documented: save_headers, spider, and user_agent. +** An "ascii" specifier is now accepted by --restrict-file-names, which +forces the percent-encoding of all non-ASCII bytes + +** Several previously existing, but undocumented .wgetrc options are +now documented: save_headers, spider, and user_agent, +auth_no_challenge, and keep_session_cookies. Also added documentation +for the "lowercase" and "uppercase" values for --restrict-file-names, which had been present since Wget 1.11. * Changes in Wget 1.11.4 diff --git a/doc/ChangeLog b/doc/ChangeLog index 76395344..d5494fa2 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,8 @@ +2009-07-28 Micah Cowan + + * wget.texi (Download Options): Document "lowercase", "uppercase", + and the new "ascii" specifier for --restrict-file-names. + 2009-07-26 Micah Cowan * wget.texi (Download Options): Change --iri item to --no-iri; diff --git a/doc/wget.texi b/doc/wget.texi index 73fc5278..aab1a890 100644 --- a/doc/wget.texi +++ b/doc/wget.texi @@ -904,24 +904,36 @@ won't need it. @cindex file names, restrict @cindex Windows file names -@item --restrict-file-names=@var{mode} -Change which characters found in remote URLs may show up in local file -names generated from those URLs. Characters that are @dfn{restricted} +@item --restrict-file-names=@var{modes} +Change which characters found in remote URLs must be escaped during +generation of local filenames. Characters that are @dfn{restricted} by this option are escaped, i.e. replaced with @samp{%HH}, where @samp{HH} is the hexadecimal number that corresponds to the restricted -character. +character. This option may also be used to force all alphabetical +cases to be either lower- or uppercase. -By default, Wget escapes the characters that are not valid as part of -file names on your operating system, as well as control characters that -are typically unprintable. This option is useful for changing these -defaults, either because you are downloading to a non-native partition, -or because you want to disable escaping of the control characters. +By default, Wget escapes the characters that are not valid or safe as +part of file names on your operating system, as well as control +characters that are typically unprintable. This option is useful for +changing these defaults, perhaps because you are downloading to a +non-native partition, or because you want to disable escaping of the +control characters, or you want to further restrict characters to only +those in the @sc{ascii} range of values. -When mode is set to ``unix'', Wget escapes the character @samp{/} and +The @var{modes} are a comma-separated set of text values. The +acceptable values are @samp{unix}, @samp{windows}, @samp{nocontrol}, +@samp{ascii}, @samp{lowercase}, and @samp{uppercase}. The values +@samp{unix} and @samp{windows} are mutually exclusive (one will +override the other), as are @samp{lowercase} and +@samp{uppercase}. Those last are special cases, as they do not change +the set of characters that would be escaped, but rather force local +file paths to be converted either to lower- or uppercase. + +When ``unix'' is specified, Wget escapes the character @samp{/} and the control characters in the ranges 0--31 and 128--159. This is the -default on Unix-like OS'es. +default on Unix-like operating systems. -When mode is set to ``windows'', Wget escapes the characters @samp{\}, +When ``windows'' is given, Wget escapes the characters @samp{\}, @samp{|}, @samp{/}, @samp{:}, @samp{?}, @samp{"}, @samp{*}, @samp{<}, @samp{>}, and the control characters in the ranges 0--31 and 128--159. In addition to this, Wget in Windows mode uses @samp{+} instead of @@ -932,11 +944,17 @@ name from the rest. Therefore, a URL that would be saved as saved as @samp{www.xemacs.org+4300/search.pl@@input=blah} in Windows mode. This mode is the default on Windows. -If you append @samp{,nocontrol} to the mode, as in -@samp{unix,nocontrol}, escaping of the control characters is also -switched off. You can use @samp{--restrict-file-names=nocontrol} to -turn off escaping of control characters without affecting the choice of -the OS to use as file name restriction mode. +If you specify @samp{nocontrol}, then the escaping of the control +characters is also switched off. This option may make sense +when you are downloading URLs whose names contain UTF-8 characters, on +a system which can save and display filenames in UTF-8 (some possible +byte values used in UTF-8 byte sequences fall in the range of values +designated by Wget as ``controls''). + +The @samp{ascii} mode is used to specify that any bytes whose values +are outside the range of @sc{ascii} characters (that is, greater than +127) shall be escaped. This can be useful when saving filenames +whose encoding does not match the one used locally. @cindex IPv6 @itemx -4