mirror of
https://github.com/moparisthebest/wget
synced 2024-07-03 16:38:41 -04:00
Automated merge.
This commit is contained in:
commit
b30a0dd817
@ -36,6 +36,7 @@ src/.deps
|
|||||||
src/stamp-h1
|
src/stamp-h1
|
||||||
src/config.h
|
src/config.h
|
||||||
src/config.h.in
|
src/config.h.in
|
||||||
|
src/css.c
|
||||||
src/wget
|
src/wget
|
||||||
src/cscope.out
|
src/cscope.out
|
||||||
src/libunittest.a
|
src/libunittest.a
|
||||||
|
5
AUTHORS
5
AUTHORS
@ -45,3 +45,8 @@ Micah Cowan. Current Wget maintainer, from mid-2007.
|
|||||||
|
|
||||||
Ralf Wildenhues. Contributed patches to convert Wget to use Automake as
|
Ralf Wildenhues. Contributed patches to convert Wget to use Automake as
|
||||||
part of its build process, and various bugfixes.
|
part of its build process, and various bugfixes.
|
||||||
|
|
||||||
|
Steven Schubiger. Many helpful patches, bugfixes and improvements.
|
||||||
|
Notably, conversion of Wget to use the Gnulib quotes and quoteargs
|
||||||
|
modules, and the addition of password prompts at the console, via the
|
||||||
|
Gnulib getpasswd-gnu module.
|
||||||
|
19
ChangeLog
19
ChangeLog
@ -1,3 +1,9 @@
|
|||||||
|
2008-06-30 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
|
* NEWS: Entries for 1.11.4.
|
||||||
|
|
||||||
|
* AUTHORS: Added Steven Schubiger.
|
||||||
|
|
||||||
2008-06-26 Xavier Saint <wget@sxav.eu>
|
2008-06-26 Xavier Saint <wget@sxav.eu>
|
||||||
|
|
||||||
* configure.ac : IRIs support required libiconv, check it.
|
* configure.ac : IRIs support required libiconv, check it.
|
||||||
@ -54,6 +60,19 @@
|
|||||||
md5/m4/stdint.m4, md5/md5.c, md5/md5.h, md5/stdint.in.h,
|
md5/m4/stdint.m4, md5/md5.c, md5/md5.h, md5/stdint.in.h,
|
||||||
md5/wchar.in.h: Updated from gnulib.
|
md5/wchar.in.h: Updated from gnulib.
|
||||||
|
|
||||||
|
2008-04-24 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
|
* NEWS: Removed info about move to Automake, Gnulib. Added item
|
||||||
|
about the addition of CSS support.
|
||||||
|
|
||||||
|
2008-04-22 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
|
* ylwrap: Added via automake -ac.
|
||||||
|
|
||||||
|
2008-04-22 Ted Mielczarek <ted.mielczarek@gmail.com>
|
||||||
|
|
||||||
|
* configure.ac: Added check for lex.
|
||||||
|
|
||||||
2008-04-14 Micah Cowan <micah@cowan.name>
|
2008-04-14 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
* GNUmakefile, lib/Makefile.am, lib/error.c, lib/error.h,
|
* GNUmakefile, lib/Makefile.am, lib/error.c, lib/error.h,
|
||||||
|
27
NEWS
27
NEWS
@ -8,9 +8,36 @@ Please send GNU Wget bug reports to <bug-wget@gnu.org>.
|
|||||||
|
|
||||||
* Changes in Wget 1.12 (MAINLINE)
|
* Changes in Wget 1.12 (MAINLINE)
|
||||||
|
|
||||||
|
** Added support for CSS. This includes:
|
||||||
|
- Parsing links from CSS files, and from CSS content found in HTML
|
||||||
|
style tags and attributes.
|
||||||
|
- Supporting conversion of links found within CSS content, when
|
||||||
|
--convert-links is specified.
|
||||||
|
- Ensuring that CSS files end in the ".css" filename extension,
|
||||||
|
when --convert-links is specified.
|
||||||
|
|
||||||
|
CSS support in Wget is thanks to Ted Mielczarek
|
||||||
|
<ted.mielczarek@gmail.com>.
|
||||||
|
|
||||||
** --ask-password option (and associated wgetrc command) added to
|
** --ask-password option (and associated wgetrc command) added to
|
||||||
support password prompts at the console.
|
support password prompts at the console.
|
||||||
|
|
||||||
|
** The --input-file option now also handles retrieving links from
|
||||||
|
an external file.
|
||||||
|
|
||||||
|
* Changes in Wget 1.11.4
|
||||||
|
|
||||||
|
** Fixed an issue (apparently a regression) where -O would refuse to
|
||||||
|
download when -nc was given, even though the file didn't exist.
|
||||||
|
|
||||||
|
** Fixed a situation where Wget could abort with --continue if the
|
||||||
|
remote server gives a content-length of zero when the file exists
|
||||||
|
locally with content.
|
||||||
|
|
||||||
|
** Fixed a crash on some systems, due to Wget casting a pointer-to-long
|
||||||
|
to a pointer-to-time_t.
|
||||||
|
|
||||||
|
** Translation updates for Catalan.
|
||||||
|
|
||||||
* Changes in Wget 1.11.3
|
* Changes in Wget 1.11.3
|
||||||
|
|
||||||
|
@ -113,6 +113,8 @@ md5_EARLY
|
|||||||
|
|
||||||
AC_PROG_RANLIB
|
AC_PROG_RANLIB
|
||||||
|
|
||||||
|
AC_PROG_LEX
|
||||||
|
|
||||||
dnl Turn on optimization by default. Specifically:
|
dnl Turn on optimization by default. Specifically:
|
||||||
dnl
|
dnl
|
||||||
dnl if the user hasn't specified CFLAGS, then
|
dnl if the user hasn't specified CFLAGS, then
|
||||||
|
@ -1,3 +1,10 @@
|
|||||||
|
2008-06-29 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
|
* wget.texi <Contributors>: Added Joao Ferreira, Mike Frysinger,
|
||||||
|
Alain, Guibert, Madhusudan Hosaagrahara, Jim Paris, Kenny
|
||||||
|
Parnell, Benno Schulenberg, and Pranab Shenoy. Added Steven
|
||||||
|
Schubiger to the "Special Thanks" section.
|
||||||
|
|
||||||
2008-06-13 Micah Cowan <micah@cowan.name>
|
2008-06-13 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
* wget.texi (Mailing List): The wget-notify mailing list no longer
|
* wget.texi (Mailing List): The wget-notify mailing list no longer
|
||||||
@ -26,6 +33,11 @@
|
|||||||
* wget.texi (Download Options) <-O>: Elaborate on why certain
|
* wget.texi (Download Options) <-O>: Elaborate on why certain
|
||||||
options make poor combinations with -O.
|
options make poor combinations with -O.
|
||||||
|
|
||||||
|
2008-04-24 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
|
* wget.texi: Adjusted documentation to account for CSS support;
|
||||||
|
added Ted Mielczarek to contributors.
|
||||||
|
|
||||||
2008-04-22 Mike Frysinger <vapier@gentoo.org>
|
2008-04-22 Mike Frysinger <vapier@gentoo.org>
|
||||||
|
|
||||||
* sample.wgetrc: Added prefer_family example. Resolves bug
|
* sample.wgetrc: Added prefer_family example. Resolves bug
|
||||||
|
@ -3,7 +3,7 @@
|
|||||||
@c %**start of header
|
@c %**start of header
|
||||||
@setfilename wget.info
|
@setfilename wget.info
|
||||||
@include version.texi
|
@include version.texi
|
||||||
@set UPDATED Mar 2008
|
@set UPDATED Jun 2008
|
||||||
@settitle GNU Wget @value{VERSION} Manual
|
@settitle GNU Wget @value{VERSION} Manual
|
||||||
@c Disable the monstrous rectangles beside overfull hbox-es.
|
@c Disable the monstrous rectangles beside overfull hbox-es.
|
||||||
@finalout
|
@finalout
|
||||||
@ -133,13 +133,13 @@ which can be a great hindrance when transferring a lot of data.
|
|||||||
@c man end
|
@c man end
|
||||||
@end ignore
|
@end ignore
|
||||||
@c man begin DESCRIPTION
|
@c man begin DESCRIPTION
|
||||||
Wget can follow links in @sc{html} and @sc{xhtml} pages and create local
|
Wget can follow links in @sc{html}, @sc{xhtml}, and @sc{css} pages, to
|
||||||
versions of remote web sites, fully recreating the directory structure of
|
create local versions of remote web sites, fully recreating the
|
||||||
the original site. This is sometimes referred to as ``recursive
|
directory structure of the original site. This is sometimes referred to
|
||||||
downloading.'' While doing that, Wget respects the Robot Exclusion
|
as ``recursive downloading.'' While doing that, Wget respects the Robot
|
||||||
Standard (@file{/robots.txt}). Wget can be instructed to convert the
|
Exclusion Standard (@file{/robots.txt}). Wget can be instructed to
|
||||||
links in downloaded @sc{html} files to the local files for offline
|
convert the links in downloaded files to point at the local files, for
|
||||||
viewing.
|
offline viewing.
|
||||||
@c man end
|
@c man end
|
||||||
|
|
||||||
@item
|
@item
|
||||||
@ -480,9 +480,9 @@ printed.
|
|||||||
@cindex input-file
|
@cindex input-file
|
||||||
@item -i @var{file}
|
@item -i @var{file}
|
||||||
@itemx --input-file=@var{file}
|
@itemx --input-file=@var{file}
|
||||||
Read @sc{url}s from @var{file}. If @samp{-} is specified as
|
Read @sc{url}s from a local or external @var{file}. If @samp{-} is
|
||||||
@var{file}, @sc{url}s are read from the standard input. (Use
|
specified as @var{file}, @sc{url}s are read from the standard input.
|
||||||
@samp{./-} to read from a file literally named @samp{-}.)
|
(Use @samp{./-} to read from a file literally named @samp{-}.)
|
||||||
|
|
||||||
If this function is used, no @sc{url}s need be present on the command
|
If this function is used, no @sc{url}s need be present on the command
|
||||||
line. If there are @sc{url}s both on the command line and in an input
|
line. If there are @sc{url}s both on the command line and in an input
|
||||||
@ -1093,6 +1093,11 @@ re-downloading, you must use @samp{-k} and @samp{-K} so that the original
|
|||||||
version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive
|
version of the file will be saved as @file{@var{X}.orig} (@pxref{Recursive
|
||||||
Retrieval Options}).
|
Retrieval Options}).
|
||||||
|
|
||||||
|
As of version 1.12, Wget will also ensure that any downloaded files of
|
||||||
|
type @samp{text/css} end in the suffix @samp{.css}. Obviously, this
|
||||||
|
makes the name @samp{--html-extension} misleading; a better name is
|
||||||
|
expected to be offered as an alternative in the near future.
|
||||||
|
|
||||||
@cindex http user
|
@cindex http user
|
||||||
@cindex http password
|
@cindex http password
|
||||||
@cindex authentication
|
@cindex authentication
|
||||||
@ -1943,16 +1948,17 @@ GNU Wget is capable of traversing parts of the Web (or a single
|
|||||||
@sc{http} or @sc{ftp} server), following links and directory structure.
|
@sc{http} or @sc{ftp} server), following links and directory structure.
|
||||||
We refer to this as to @dfn{recursive retrieval}, or @dfn{recursion}.
|
We refer to this as to @dfn{recursive retrieval}, or @dfn{recursion}.
|
||||||
|
|
||||||
With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} from
|
With @sc{http} @sc{url}s, Wget retrieves and parses the @sc{html} or
|
||||||
the given @sc{url}, documents, retrieving the files the @sc{html}
|
@sc{css} from the given @sc{url}, retrieving the files the document
|
||||||
document was referring to, through markup like @code{href}, or
|
refers to, through markup like @code{href} or @code{src}, or @sc{css}
|
||||||
@code{src}. If the freshly downloaded file is also of type
|
@sc{uri} values specified using the @samp{url()} functional notation.
|
||||||
@code{text/html} or @code{application/xhtml+xml}, it will be parsed and
|
If the freshly downloaded file is also of type @code{text/html},
|
||||||
followed further.
|
@code{application/xhtml+xml}, or @code{text/css}, it will be parsed
|
||||||
|
and followed further.
|
||||||
|
|
||||||
Recursive retrieval of @sc{http} and @sc{html} content is
|
Recursive retrieval of @sc{http} and @sc{html}/@sc{css} content is
|
||||||
@dfn{breadth-first}. This means that Wget first downloads the requested
|
@dfn{breadth-first}. This means that Wget first downloads the requested
|
||||||
@sc{html} document, then the documents linked from that document, then the
|
document, then the documents linked from that document, then the
|
||||||
documents linked by them, and so on. In other words, Wget first
|
documents linked by them, and so on. In other words, Wget first
|
||||||
downloads the documents at depth 1, then those at depth 2, and so on
|
downloads the documents at depth 1, then those at depth 2, and so on
|
||||||
until the specified maximum depth.
|
until the specified maximum depth.
|
||||||
@ -2741,7 +2747,8 @@ Define a header for HTTP downloads, like using
|
|||||||
|
|
||||||
@item html_extension = on/off
|
@item html_extension = on/off
|
||||||
Add a @samp{.html} extension to @samp{text/html} or
|
Add a @samp{.html} extension to @samp{text/html} or
|
||||||
@samp{application/xhtml+xml} files without it, like @samp{-E}.
|
@samp{application/xhtml+xml} files without it, or a @samp{.css}
|
||||||
|
extension to @samp{text/css} files without it, like @samp{-E}.
|
||||||
|
|
||||||
@item http_keep_alive = on/off
|
@item http_keep_alive = on/off
|
||||||
Turn the keep-alive feature on or off (defaults to on). Turning it
|
Turn the keep-alive feature on or off (defaults to on). Turning it
|
||||||
@ -3103,7 +3110,7 @@ wget -r http://www.gnu.org/ -o gnulog
|
|||||||
@end example
|
@end example
|
||||||
|
|
||||||
@item
|
@item
|
||||||
The same as the above, but convert the links in the @sc{html} files to
|
The same as the above, but convert the links in the downloaded files to
|
||||||
point to local files, so you can view the documents off-line:
|
point to local files, so you can view the documents off-line:
|
||||||
|
|
||||||
@example
|
@example
|
||||||
@ -3749,21 +3756,30 @@ Junio Hamano---donated support for Opie and @sc{http} @code{Digest}
|
|||||||
authentication.
|
authentication.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Mauro Tortonesi---Improved IPv6 support, adding support for dual
|
Mauro Tortonesi---improved IPv6 support, adding support for dual
|
||||||
family systems. Refactored and enhanced FTP IPv6 code. Maintained GNU
|
family systems. Refactored and enhanced FTP IPv6 code. Maintained GNU
|
||||||
Wget from 2004--2007.
|
Wget from 2004--2007.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Christopher G.@: Lewis---Maintenance of the Windows version of GNU WGet.
|
Christopher G.@: Lewis---maintenance of the Windows version of GNU WGet.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Gisle Vanem---Many helpful patches and improvements, especially for
|
Gisle Vanem---many helpful patches and improvements, especially for
|
||||||
Windows and MS-DOS support.
|
Windows and MS-DOS support.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Ralf Wildenhues---Contributed patches to convert Wget to use Automake as
|
Ralf Wildenhues---contributed patches to convert Wget to use Automake as
|
||||||
part of its build process, and various bugfixes.
|
part of its build process, and various bugfixes.
|
||||||
|
|
||||||
|
@item
|
||||||
|
Steven Schubiger---Many helpful patches, bugfixes and improvements.
|
||||||
|
Notably, conversion of Wget to use the Gnulib quotes and quoteargs
|
||||||
|
modules, and the addition of password prompts at the console, via the
|
||||||
|
Gnulib getpasswd-gnu module.
|
||||||
|
|
||||||
|
@item
|
||||||
|
Ted Mielczarek---donated support for CSS.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
People who provided donations for development---including Brian Gough.
|
People who provided donations for development---including Brian Gough.
|
||||||
@end itemize
|
@end itemize
|
||||||
@ -3819,8 +3835,15 @@ Aleksandar Erkalovi@'{c},
|
|||||||
Aleksandar Erkalovic,
|
Aleksandar Erkalovic,
|
||||||
@end ifnottex
|
@end ifnottex
|
||||||
Andy Eskilsson,
|
Andy Eskilsson,
|
||||||
|
@iftex
|
||||||
|
Jo@~{a}o Ferreira,
|
||||||
|
@end iftex
|
||||||
|
@ifnottex
|
||||||
|
Joao Ferreira,
|
||||||
|
@end ifnottex
|
||||||
Christian Fraenkel,
|
Christian Fraenkel,
|
||||||
David Fritz,
|
David Fritz,
|
||||||
|
Mike Frysinger,
|
||||||
Charles C.@: Fu,
|
Charles C.@: Fu,
|
||||||
FUJISHIMA Satsuki,
|
FUJISHIMA Satsuki,
|
||||||
Masashi Fujita,
|
Masashi Fujita,
|
||||||
@ -3828,10 +3851,12 @@ Howard Gayle,
|
|||||||
Marcel Gerrits,
|
Marcel Gerrits,
|
||||||
Lemble Gregory,
|
Lemble Gregory,
|
||||||
Hans Grobler,
|
Hans Grobler,
|
||||||
|
Alain Guibert,
|
||||||
Mathieu Guillaume,
|
Mathieu Guillaume,
|
||||||
Aaron Hawley,
|
Aaron Hawley,
|
||||||
Jochen Hein,
|
Jochen Hein,
|
||||||
Karl Heuer,
|
Karl Heuer,
|
||||||
|
Madhusudan Hosaagrahara,
|
||||||
HIROSE Masaaki,
|
HIROSE Masaaki,
|
||||||
Ulf Harnhammar,
|
Ulf Harnhammar,
|
||||||
Gregor Hoffleit,
|
Gregor Hoffleit,
|
||||||
@ -3895,6 +3920,7 @@ Andre Majorel,
|
|||||||
Aurelien Marchand,
|
Aurelien Marchand,
|
||||||
Matthew J.@: Mellon,
|
Matthew J.@: Mellon,
|
||||||
Jordan Mendelson,
|
Jordan Mendelson,
|
||||||
|
Ted Mielczarek,
|
||||||
Lin Zhe Min,
|
Lin Zhe Min,
|
||||||
Jan Minar,
|
Jan Minar,
|
||||||
Tim Mooney,
|
Tim Mooney,
|
||||||
@ -3903,6 +3929,8 @@ Adam D.@: Moss,
|
|||||||
Simon Munton,
|
Simon Munton,
|
||||||
Charlie Negyesi,
|
Charlie Negyesi,
|
||||||
R.@: K.@: Owen,
|
R.@: K.@: Owen,
|
||||||
|
Jim Paris,
|
||||||
|
Kenny Parnell,
|
||||||
Leonid Petrov,
|
Leonid Petrov,
|
||||||
Simone Piunno,
|
Simone Piunno,
|
||||||
Andrew Pollock,
|
Andrew Pollock,
|
||||||
@ -3937,9 +3965,11 @@ Edward J.@: Sabol,
|
|||||||
Heinz Salzmann,
|
Heinz Salzmann,
|
||||||
Robert Schmidt,
|
Robert Schmidt,
|
||||||
Nicolas Schodet,
|
Nicolas Schodet,
|
||||||
|
Benno Schulenberg,
|
||||||
Andreas Schwab,
|
Andreas Schwab,
|
||||||
Steven M.@: Schweda,
|
Steven M.@: Schweda,
|
||||||
Chris Seawood,
|
Chris Seawood,
|
||||||
|
Pranab Shenoy,
|
||||||
Dennis Smit,
|
Dennis Smit,
|
||||||
Toomas Soome,
|
Toomas Soome,
|
||||||
Tage Stabell-Kulo,
|
Tage Stabell-Kulo,
|
||||||
|
@ -6,6 +6,11 @@
|
|||||||
* host.c : Show hostname to be resolved both in locale and
|
* host.c : Show hostname to be resolved both in locale and
|
||||||
ASCII encoded.
|
ASCII encoded.
|
||||||
|
|
||||||
|
2008-06-28 Steven Schubiger <stsc@members.fsf.org>
|
||||||
|
|
||||||
|
* retr.c (retrieve_from_file): Allow for reading the links from
|
||||||
|
an external file (HTTP/FTP).
|
||||||
|
|
||||||
2008-06-26 Xavier Saint <wget@sxav.eu>
|
2008-06-26 Xavier Saint <wget@sxav.eu>
|
||||||
|
|
||||||
* iri.c, iri.h : New functions locale_to_utf8() and
|
* iri.c, iri.h : New functions locale_to_utf8() and
|
||||||
@ -14,6 +19,11 @@
|
|||||||
* url.c : Convert URLs from locale to UTF-8 allowing a basic
|
* url.c : Convert URLs from locale to UTF-8 allowing a basic
|
||||||
support of IRI/IDN
|
support of IRI/IDN
|
||||||
|
|
||||||
|
2008-06-25 Steven Schubiger <stsc@members.fsf.org>
|
||||||
|
|
||||||
|
* ftp.c (getftp): When spidering a FTP URL, emit a diagnostic
|
||||||
|
message if the remote file exists.
|
||||||
|
|
||||||
2008-06-24 Steven Schubiger <stsc@members.fsf.org>
|
2008-06-24 Steven Schubiger <stsc@members.fsf.org>
|
||||||
|
|
||||||
* http.c (http_loop): Replace escnonprint() occurence with
|
* http.c (http_loop): Replace escnonprint() occurence with
|
||||||
@ -210,11 +220,55 @@
|
|||||||
|
|
||||||
* Makefile.am: -I foo -> -Ifoo.
|
* Makefile.am: -I foo -> -Ifoo.
|
||||||
|
|
||||||
|
2008-04-24 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
|
* main.c: Revised usage description of --convert-links to apply
|
||||||
|
to CSS as well as to HTML.
|
||||||
|
|
||||||
2008-04-23 Micah Cowan <micah@cowan.name>
|
2008-04-23 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
* utils.c (test_dir_matches_p): Added a test for the case
|
* utils.c (test_dir_matches_p): Added a test for the case
|
||||||
described in issue #20518.
|
described in issue #20518.
|
||||||
|
|
||||||
|
2008-04-22 Micah Cowan <micah@cowan.name>
|
||||||
|
|
||||||
|
* Makefile.am, css.lex, css.l: Renamed css.lex to css.l.
|
||||||
|
* recur.c (retrieve_tree): Fix typo to allow text/css files to
|
||||||
|
be parsed.
|
||||||
|
|
||||||
|
2008-04-22 Ted Mielczarek <ted.mielczarek@gmail.com>
|
||||||
|
|
||||||
|
* css.lex, css-url.c, css-url.h: Added to implement support for
|
||||||
|
parsing CSS in Wget.
|
||||||
|
* convert.c: Convert links in CSS files, too.
|
||||||
|
* convert.h (convert_options): Added for options link_css_p,
|
||||||
|
link_expect_css.
|
||||||
|
* convert.h: Added prototype for new register_css function.
|
||||||
|
* html-parse.c: Added support for parsing element content, in
|
||||||
|
addition to tag starts and ends.
|
||||||
|
* html-parse.h (taginfo): Added delimiter fields for element
|
||||||
|
content.
|
||||||
|
* html-url.h: Added.
|
||||||
|
* html-url.c (append_url): No longer internal-linkage only. Now
|
||||||
|
takes position and size as explicit parameters.
|
||||||
|
* html-url.c: Use new html-url.h header, add support for
|
||||||
|
handling of "style" HTML attributes. Mark URIs obtained from
|
||||||
|
link tags with rel="stylesheet" with link_expect_css. Adapt
|
||||||
|
uses of append_url to supply the newly-added parameters for
|
||||||
|
position and size.
|
||||||
|
* http.c: Add detection for when the content-type is text/css;
|
||||||
|
and ensure that such files have the ".css" filename extension,
|
||||||
|
when --convert-links is active.
|
||||||
|
* recur.h: Remove declarations for functions found in
|
||||||
|
html-url.c (moved to html-url.h).
|
||||||
|
* recur.c: Add support for culling links from CSS files, too,
|
||||||
|
and tracking for when we're expecting the file to be CSS (even
|
||||||
|
when its content type isn't text/css).
|
||||||
|
* retr.c (retrieve_url): Add registration of CSS files.
|
||||||
|
* wget.h: Added TEXTCSS to dt flags enum.
|
||||||
|
* Makefile.am: Added css.lex, css-url.c, css-url.h, html-url.h
|
||||||
|
to wget_SOURCES.
|
||||||
|
|
||||||
2008-04-22 Jim Paris <jim@jtan.com>
|
2008-04-22 Jim Paris <jim@jtan.com>
|
||||||
|
|
||||||
* openssl.c (ssl_init): Enable combined certificate/key in
|
* openssl.c (ssl_init): Enable combined certificate/key in
|
||||||
|
@ -40,13 +40,14 @@ LIBS = @LIBSSL@ @LIBGNUTLS@ @LIBINTL@ @LIBS@
|
|||||||
|
|
||||||
bin_PROGRAMS = wget
|
bin_PROGRAMS = wget
|
||||||
wget_SOURCES = build_info.c cmpt.c connect.c convert.c cookies.c ftp.c \
|
wget_SOURCES = build_info.c cmpt.c connect.c convert.c cookies.c ftp.c \
|
||||||
|
css.l css-url.c \
|
||||||
ftp-basic.c ftp-ls.c hash.c host.c html-parse.c html-url.c \
|
ftp-basic.c ftp-ls.c hash.c host.c html-parse.c html-url.c \
|
||||||
http.c init.c log.c main.c netrc.c progress.c ptimer.c \
|
http.c init.c log.c main.c netrc.c progress.c ptimer.c \
|
||||||
recur.c res.c retr.c snprintf.c spider.c url.c \
|
recur.c res.c retr.c snprintf.c spider.c url.c \
|
||||||
utils.c $(IRI_OBJ) \
|
utils.c $(IRI_OBJ) \
|
||||||
connect.h convert.h cookies.h \
|
css-url.h connect.h convert.h cookies.h \
|
||||||
ftp.h gen-md5.h hash.h host.h html-parse.h \
|
ftp.h gen-md5.h hash.h host.h html-parse.h html-url.h \
|
||||||
http.h http-ntlm.h init.h iri.h log.h mswindows.h netrc.h \
|
http.h http-ntlm.h init.h log.h mswindows.h netrc.h \
|
||||||
options.h progress.h ptimer.h recur.h res.h retr.h \
|
options.h progress.h ptimer.h recur.h res.h retr.h \
|
||||||
spider.h ssl.h sysdep.h url.h utils.h wget.h
|
spider.h ssl.h sysdep.h url.h utils.h wget.h
|
||||||
nodist_wget_SOURCES = version.c
|
nodist_wget_SOURCES = version.c
|
||||||
|
104
src/convert.c
104
src/convert.c
@ -45,50 +45,37 @@ as that of the covered work. */
|
|||||||
#include "hash.h"
|
#include "hash.h"
|
||||||
#include "ptimer.h"
|
#include "ptimer.h"
|
||||||
#include "res.h"
|
#include "res.h"
|
||||||
|
#include "html-url.h"
|
||||||
|
#include "css-url.h"
|
||||||
|
|
||||||
static struct hash_table *dl_file_url_map;
|
static struct hash_table *dl_file_url_map;
|
||||||
struct hash_table *dl_url_file_map;
|
struct hash_table *dl_url_file_map;
|
||||||
|
|
||||||
/* Set of HTML files downloaded in this Wget run, used for link
|
/* Set of HTML/CSS files downloaded in this Wget run, used for link
|
||||||
conversion after Wget is done. */
|
conversion after Wget is done. */
|
||||||
struct hash_table *downloaded_html_set;
|
struct hash_table *downloaded_html_set;
|
||||||
|
struct hash_table *downloaded_css_set;
|
||||||
|
|
||||||
static void convert_links (const char *, struct urlpos *);
|
static void convert_links (const char *, struct urlpos *);
|
||||||
|
|
||||||
/* This function is called when the retrieval is done to convert the
|
|
||||||
links that have been downloaded. It has to be called at the end of
|
|
||||||
the retrieval, because only then does Wget know conclusively which
|
|
||||||
URLs have been downloaded, and which not, so it can tell which
|
|
||||||
direction to convert to.
|
|
||||||
|
|
||||||
The "direction" means that the URLs to the files that have been
|
|
||||||
downloaded get converted to the relative URL which will point to
|
|
||||||
that file. And the other URLs get converted to the remote URL on
|
|
||||||
the server.
|
|
||||||
|
|
||||||
All the downloaded HTMLs are kept in downloaded_html_files, and
|
|
||||||
downloaded URLs in urls_downloaded. All the information is
|
|
||||||
extracted from these two lists. */
|
|
||||||
|
|
||||||
void
|
void
|
||||||
convert_all_links (void)
|
convert_links_in_hashtable (struct hash_table *downloaded_set,
|
||||||
|
int is_css,
|
||||||
|
int *file_count)
|
||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
double secs;
|
|
||||||
int file_count = 0;
|
|
||||||
|
|
||||||
struct ptimer *timer = ptimer_new ();
|
|
||||||
|
|
||||||
int cnt;
|
int cnt;
|
||||||
char **file_array;
|
char **file_array;
|
||||||
|
|
||||||
cnt = 0;
|
cnt = 0;
|
||||||
if (downloaded_html_set)
|
if (downloaded_set)
|
||||||
cnt = hash_table_count (downloaded_html_set);
|
cnt = hash_table_count (downloaded_set);
|
||||||
if (cnt == 0)
|
if (cnt == 0)
|
||||||
goto cleanup;
|
return;
|
||||||
file_array = alloca_array (char *, cnt);
|
file_array = alloca_array (char *, cnt);
|
||||||
string_set_to_array (downloaded_html_set, file_array);
|
string_set_to_array (downloaded_set, file_array);
|
||||||
|
|
||||||
for (i = 0; i < cnt; i++)
|
for (i = 0; i < cnt; i++)
|
||||||
{
|
{
|
||||||
@ -96,7 +83,7 @@ convert_all_links (void)
|
|||||||
char *url;
|
char *url;
|
||||||
char *file = file_array[i];
|
char *file = file_array[i];
|
||||||
|
|
||||||
/* Determine the URL of the HTML file. get_urls_html will need
|
/* Determine the URL of the file. get_urls_{html,css} will need
|
||||||
it. */
|
it. */
|
||||||
url = hash_table_get (dl_file_url_map, file);
|
url = hash_table_get (dl_file_url_map, file);
|
||||||
if (!url)
|
if (!url)
|
||||||
@ -107,8 +94,9 @@ convert_all_links (void)
|
|||||||
|
|
||||||
DEBUGP (("Scanning %s (from %s)\n", file, url));
|
DEBUGP (("Scanning %s (from %s)\n", file, url));
|
||||||
|
|
||||||
/* Parse the HTML file... */
|
/* Parse the file... */
|
||||||
urls = get_urls_html (file, url, NULL);
|
urls = is_css ? get_urls_css_file (file, url) :
|
||||||
|
get_urls_html (file, url, NULL);
|
||||||
|
|
||||||
/* We don't respect meta_disallow_follow here because, even if
|
/* We don't respect meta_disallow_follow here because, even if
|
||||||
the file is not followed, we might still want to convert the
|
the file is not followed, we might still want to convert the
|
||||||
@ -160,27 +148,55 @@ convert_all_links (void)
|
|||||||
|
|
||||||
/* Convert the links in the file. */
|
/* Convert the links in the file. */
|
||||||
convert_links (file, urls);
|
convert_links (file, urls);
|
||||||
++file_count;
|
++*file_count;
|
||||||
|
|
||||||
/* Free the data. */
|
/* Free the data. */
|
||||||
free_urlpos (urls);
|
free_urlpos (urls);
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* This function is called when the retrieval is done to convert the
|
||||||
|
links that have been downloaded. It has to be called at the end of
|
||||||
|
the retrieval, because only then does Wget know conclusively which
|
||||||
|
URLs have been downloaded, and which not, so it can tell which
|
||||||
|
direction to convert to.
|
||||||
|
|
||||||
|
The "direction" means that the URLs to the files that have been
|
||||||
|
downloaded get converted to the relative URL which will point to
|
||||||
|
that file. And the other URLs get converted to the remote URL on
|
||||||
|
the server.
|
||||||
|
|
||||||
|
All the downloaded HTMLs are kept in downloaded_html_files, and
|
||||||
|
downloaded URLs in urls_downloaded. All the information is
|
||||||
|
extracted from these two lists. */
|
||||||
|
|
||||||
|
void
|
||||||
|
convert_all_links (void)
|
||||||
|
{
|
||||||
|
double secs;
|
||||||
|
int file_count = 0;
|
||||||
|
|
||||||
|
struct ptimer *timer = ptimer_new ();
|
||||||
|
|
||||||
|
convert_links_in_hashtable (downloaded_html_set, 0, &file_count);
|
||||||
|
convert_links_in_hashtable (downloaded_css_set, 1, &file_count);
|
||||||
|
|
||||||
secs = ptimer_measure (timer);
|
secs = ptimer_measure (timer);
|
||||||
logprintf (LOG_VERBOSE, _("Converted %d files in %s seconds.\n"),
|
logprintf (LOG_VERBOSE, _("Converted %d files in %s seconds.\n"),
|
||||||
file_count, print_decimal (secs));
|
file_count, print_decimal (secs));
|
||||||
cleanup:
|
|
||||||
ptimer_destroy (timer);
|
ptimer_destroy (timer);
|
||||||
}
|
}
|
||||||
|
|
||||||
static void write_backup_file (const char *, downloaded_file_t);
|
static void write_backup_file (const char *, downloaded_file_t);
|
||||||
|
static const char *replace_plain (const char*, int, FILE*, const char *);
|
||||||
static const char *replace_attr (const char *, int, FILE *, const char *);
|
static const char *replace_attr (const char *, int, FILE *, const char *);
|
||||||
static const char *replace_attr_refresh_hack (const char *, int, FILE *,
|
static const char *replace_attr_refresh_hack (const char *, int, FILE *,
|
||||||
const char *, int);
|
const char *, int);
|
||||||
static char *local_quote_string (const char *);
|
static char *local_quote_string (const char *);
|
||||||
static char *construct_relative (const char *, const char *);
|
static char *construct_relative (const char *, const char *);
|
||||||
|
|
||||||
/* Change the links in one HTML file. LINKS is a list of links in the
|
/* Change the links in one file. LINKS is a list of links in the
|
||||||
document, along with their positions and the desired direction of
|
document, along with their positions and the desired direction of
|
||||||
the conversion. */
|
the conversion. */
|
||||||
static void
|
static void
|
||||||
@ -277,7 +293,9 @@ convert_links (const char *file, struct urlpos *links)
|
|||||||
char *newname = construct_relative (file, link->local_name);
|
char *newname = construct_relative (file, link->local_name);
|
||||||
char *quoted_newname = local_quote_string (newname);
|
char *quoted_newname = local_quote_string (newname);
|
||||||
|
|
||||||
if (!link->link_refresh_p)
|
if (link->link_css_p)
|
||||||
|
p = replace_plain (p, link->size, fp, quoted_newname);
|
||||||
|
else if (!link->link_refresh_p)
|
||||||
p = replace_attr (p, link->size, fp, quoted_newname);
|
p = replace_attr (p, link->size, fp, quoted_newname);
|
||||||
else
|
else
|
||||||
p = replace_attr_refresh_hack (p, link->size, fp, quoted_newname,
|
p = replace_attr_refresh_hack (p, link->size, fp, quoted_newname,
|
||||||
@ -296,7 +314,9 @@ convert_links (const char *file, struct urlpos *links)
|
|||||||
char *newlink = link->url->url;
|
char *newlink = link->url->url;
|
||||||
char *quoted_newlink = html_quote_string (newlink);
|
char *quoted_newlink = html_quote_string (newlink);
|
||||||
|
|
||||||
if (!link->link_refresh_p)
|
if (link->link_css_p)
|
||||||
|
p = replace_plain (p, link->size, fp, quoted_newlink);
|
||||||
|
else if (!link->link_refresh_p)
|
||||||
p = replace_attr (p, link->size, fp, quoted_newlink);
|
p = replace_attr (p, link->size, fp, quoted_newlink);
|
||||||
else
|
else
|
||||||
p = replace_attr_refresh_hack (p, link->size, fp, quoted_newlink,
|
p = replace_attr_refresh_hack (p, link->size, fp, quoted_newlink,
|
||||||
@ -406,6 +426,7 @@ write_backup_file (const char *file, downloaded_file_t downloaded_file_return)
|
|||||||
size_t filename_len = strlen (file);
|
size_t filename_len = strlen (file);
|
||||||
char* filename_plus_orig_suffix;
|
char* filename_plus_orig_suffix;
|
||||||
|
|
||||||
|
/* TODO: hack this to work with css files */
|
||||||
if (downloaded_file_return == FILE_DOWNLOADED_AND_HTML_EXTENSION_ADDED)
|
if (downloaded_file_return == FILE_DOWNLOADED_AND_HTML_EXTENSION_ADDED)
|
||||||
{
|
{
|
||||||
/* Just write "orig" over "html". We need to do it this way
|
/* Just write "orig" over "html". We need to do it this way
|
||||||
@ -465,6 +486,15 @@ write_backup_file (const char *file, downloaded_file_t downloaded_file_return)
|
|||||||
|
|
||||||
static bool find_fragment (const char *, int, const char **, const char **);
|
static bool find_fragment (const char *, int, const char **, const char **);
|
||||||
|
|
||||||
|
/* Replace a string with NEW_TEXT. Ignore quoting. */
|
||||||
|
static const char *
|
||||||
|
replace_plain (const char *p, int size, FILE *fp, const char *new_text)
|
||||||
|
{
|
||||||
|
fputs (new_text, fp);
|
||||||
|
p += size;
|
||||||
|
return p;
|
||||||
|
}
|
||||||
|
|
||||||
/* Replace an attribute's original text with NEW_TEXT. */
|
/* Replace an attribute's original text with NEW_TEXT. */
|
||||||
|
|
||||||
static const char *
|
static const char *
|
||||||
@ -832,6 +862,16 @@ register_html (const char *url, const char *file)
|
|||||||
string_set_add (downloaded_html_set, file);
|
string_set_add (downloaded_html_set, file);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/* Register that FILE is a CSS file that has been downloaded. */
|
||||||
|
|
||||||
|
void
|
||||||
|
register_css (const char *url, const char *file)
|
||||||
|
{
|
||||||
|
if (!downloaded_css_set)
|
||||||
|
downloaded_css_set = make_string_hash_table (0);
|
||||||
|
string_set_add (downloaded_css_set, file);
|
||||||
|
}
|
||||||
|
|
||||||
static void downloaded_files_free (void);
|
static void downloaded_files_free (void);
|
||||||
|
|
||||||
/* Cleanup the data structures associated with this file. */
|
/* Cleanup the data structures associated with this file. */
|
||||||
|
@ -33,6 +33,7 @@ as that of the covered work. */
|
|||||||
struct hash_table; /* forward decl */
|
struct hash_table; /* forward decl */
|
||||||
extern struct hash_table *dl_url_file_map;
|
extern struct hash_table *dl_url_file_map;
|
||||||
extern struct hash_table *downloaded_html_set;
|
extern struct hash_table *downloaded_html_set;
|
||||||
|
extern struct hash_table *downloaded_css_set;
|
||||||
|
|
||||||
enum convert_options {
|
enum convert_options {
|
||||||
CO_NOCONVERT = 0, /* don't convert this URL */
|
CO_NOCONVERT = 0, /* don't convert this URL */
|
||||||
@ -64,7 +65,9 @@ struct urlpos {
|
|||||||
unsigned int link_complete_p :1; /* the link was complete (had host name) */
|
unsigned int link_complete_p :1; /* the link was complete (had host name) */
|
||||||
unsigned int link_base_p :1; /* the url came from <base href=...> */
|
unsigned int link_base_p :1; /* the url came from <base href=...> */
|
||||||
unsigned int link_inline_p :1; /* needed to render the page */
|
unsigned int link_inline_p :1; /* needed to render the page */
|
||||||
|
unsigned int link_css_p :1; /* the url came from CSS */
|
||||||
unsigned int link_expect_html :1; /* expected to contain HTML */
|
unsigned int link_expect_html :1; /* expected to contain HTML */
|
||||||
|
unsigned int link_expect_css :1; /* expected to contain CSS */
|
||||||
|
|
||||||
unsigned int link_refresh_p :1; /* link was received from
|
unsigned int link_refresh_p :1; /* link was received from
|
||||||
<meta http-equiv=refresh content=...> */
|
<meta http-equiv=refresh content=...> */
|
||||||
@ -98,6 +101,7 @@ downloaded_file_t downloaded_file (downloaded_file_t, const char *);
|
|||||||
void register_download (const char *, const char *);
|
void register_download (const char *, const char *);
|
||||||
void register_redirection (const char *, const char *);
|
void register_redirection (const char *, const char *);
|
||||||
void register_html (const char *, const char *);
|
void register_html (const char *, const char *);
|
||||||
|
void register_css (const char *, const char *);
|
||||||
void register_delete_file (const char *);
|
void register_delete_file (const char *);
|
||||||
void convert_all_links (void);
|
void convert_all_links (void);
|
||||||
void convert_cleanup (void);
|
void convert_cleanup (void);
|
||||||
|
66
src/css-tokens.h
Normal file
66
src/css-tokens.h
Normal file
@ -0,0 +1,66 @@
|
|||||||
|
/* Declarations for css.lex
|
||||||
|
Copyright (C) 2006 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
This file is part of GNU Wget.
|
||||||
|
|
||||||
|
GNU Wget is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
GNU Wget is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with Wget; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
In addition, as a special exception, the Free Software Foundation
|
||||||
|
gives permission to link the code of its release of Wget with the
|
||||||
|
OpenSSL project's "OpenSSL" library (or with modified versions of it
|
||||||
|
that use the same license as the "OpenSSL" library), and distribute
|
||||||
|
the linked executables. You must obey the GNU General Public License
|
||||||
|
in all respects for all of the code used other than "OpenSSL". If you
|
||||||
|
modify this file, you may extend this exception to your version of the
|
||||||
|
file, but you are not obligated to do so. If you do not wish to do
|
||||||
|
so, delete this exception statement from your version. */
|
||||||
|
|
||||||
|
#ifndef CSS_TOKENS_H
|
||||||
|
#define CSS_TOKENS_H
|
||||||
|
|
||||||
|
enum {
|
||||||
|
CSSEOF,
|
||||||
|
S,
|
||||||
|
CDO,
|
||||||
|
CDC,
|
||||||
|
INCLUDES,
|
||||||
|
DASHMATCH,
|
||||||
|
LBRACE,
|
||||||
|
PLUS,
|
||||||
|
GREATER,
|
||||||
|
COMMA,
|
||||||
|
STRING,
|
||||||
|
INVALID,
|
||||||
|
IDENT,
|
||||||
|
HASH,
|
||||||
|
IMPORT_SYM,
|
||||||
|
PAGE_SYM,
|
||||||
|
MEDIA_SYM,
|
||||||
|
CHARSET_SYM,
|
||||||
|
IMPORTANT_SYM,
|
||||||
|
EMS,
|
||||||
|
EXS,
|
||||||
|
LENGTH,
|
||||||
|
ANGLE,
|
||||||
|
TIME,
|
||||||
|
FREQ,
|
||||||
|
DIMENSION,
|
||||||
|
PERCENTAGE,
|
||||||
|
NUMBER,
|
||||||
|
URI,
|
||||||
|
FUNCTION
|
||||||
|
} css_tokens;
|
||||||
|
|
||||||
|
#endif /* CSS_TOKENS_H */
|
273
src/css-url.c
Normal file
273
src/css-url.c
Normal file
@ -0,0 +1,273 @@
|
|||||||
|
/* Collect URLs from CSS source.
|
||||||
|
Copyright (C) 1998, 2000, 2001, 2002, 2003 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
This file is part of GNU Wget.
|
||||||
|
|
||||||
|
GNU Wget is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
GNU Wget is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with Wget; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
In addition, as a special exception, the Free Software Foundation
|
||||||
|
gives permission to link the code of its release of Wget with the
|
||||||
|
OpenSSL project's "OpenSSL" library (or with modified versions of it
|
||||||
|
that use the same license as the "OpenSSL" library), and distribute
|
||||||
|
the linked executables. You must obey the GNU General Public License
|
||||||
|
in all respects for all of the code used other than "OpenSSL". If you
|
||||||
|
modify this file, you may extend this exception to your version of the
|
||||||
|
file, but you are not obligated to do so. If you do not wish to do
|
||||||
|
so, delete this exception statement from your version. */
|
||||||
|
|
||||||
|
/*
|
||||||
|
Note that this is not an actual CSS parser, but just a lexical
|
||||||
|
scanner with a tiny bit more smarts bolted on top. A full parser
|
||||||
|
is somewhat overkill for this job. The only things we're interested
|
||||||
|
in are @import rules and url() tokens, so it's easy enough to
|
||||||
|
grab those without truly understanding the input. The only downside
|
||||||
|
to this is that we might be coerced into downloading files that
|
||||||
|
a browser would ignore. That might merit some more investigation.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include <config.h>
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#ifdef HAVE_STRING_H
|
||||||
|
# include <string.h>
|
||||||
|
#else
|
||||||
|
# include <strings.h>
|
||||||
|
#endif
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <ctype.h>
|
||||||
|
#include <errno.h>
|
||||||
|
#include <assert.h>
|
||||||
|
|
||||||
|
#include "wget.h"
|
||||||
|
#include "utils.h"
|
||||||
|
#include "convert.h"
|
||||||
|
#include "html-url.h"
|
||||||
|
#include "css-tokens.h"
|
||||||
|
|
||||||
|
/* from lex.yy.c */
|
||||||
|
extern char *yytext;
|
||||||
|
extern int yyleng;
|
||||||
|
typedef struct yy_buffer_state *YY_BUFFER_STATE;
|
||||||
|
extern YY_BUFFER_STATE yy_scan_bytes (const char *bytes,int len );
|
||||||
|
extern int yylex (void);
|
||||||
|
|
||||||
|
#if 1
|
||||||
|
const char *token_names[] = {
|
||||||
|
"CSSEOF",
|
||||||
|
"S",
|
||||||
|
"CDO",
|
||||||
|
"CDC",
|
||||||
|
"INCLUDES",
|
||||||
|
"DASHMATCH",
|
||||||
|
"LBRACE",
|
||||||
|
"PLUS",
|
||||||
|
"GREATER",
|
||||||
|
"COMMA",
|
||||||
|
"STRING",
|
||||||
|
"INVALID",
|
||||||
|
"IDENT",
|
||||||
|
"HASH",
|
||||||
|
"IMPORT_SYM",
|
||||||
|
"PAGE_SYM",
|
||||||
|
"MEDIA_SYM",
|
||||||
|
"CHARSET_SYM",
|
||||||
|
"IMPORTANT_SYM",
|
||||||
|
"EMS",
|
||||||
|
"EXS",
|
||||||
|
"LENGTH",
|
||||||
|
"ANGLE",
|
||||||
|
"TIME",
|
||||||
|
"FREQ",
|
||||||
|
"DIMENSION",
|
||||||
|
"PERCENTAGE",
|
||||||
|
"NUMBER",
|
||||||
|
"URI",
|
||||||
|
"FUNCTION"
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/*
|
||||||
|
Given a detected URI token, get only the URI specified within.
|
||||||
|
Also adjust the starting position and length of the string.
|
||||||
|
|
||||||
|
A URI can be specified with or without quotes, and the quotes
|
||||||
|
can be single or double quotes. In addition there can be
|
||||||
|
whitespace after the opening parenthesis and before the closing
|
||||||
|
parenthesis.
|
||||||
|
*/
|
||||||
|
char *
|
||||||
|
get_uri_string (const char *at, int *pos, int *length)
|
||||||
|
{
|
||||||
|
char *uri;
|
||||||
|
/*char buf[1024];
|
||||||
|
strncpy(buf,at + *pos, *length);
|
||||||
|
buf[*length] = '\0';
|
||||||
|
DEBUGP (("get_uri_string: \"%s\"\n", buf));*/
|
||||||
|
|
||||||
|
if (0 != strncasecmp (at + *pos, "url(", 4))
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
*pos += 4;
|
||||||
|
*length -= 5; /* url() */
|
||||||
|
/* skip leading space */
|
||||||
|
while (isspace (at[*pos]))
|
||||||
|
{
|
||||||
|
(*pos)++;
|
||||||
|
(*length)--;
|
||||||
|
}
|
||||||
|
/* skip trailing space */
|
||||||
|
while (isspace (at[*pos + *length - 1]))
|
||||||
|
{
|
||||||
|
(*length)--;
|
||||||
|
}
|
||||||
|
/* trim off quotes */
|
||||||
|
if (at[*pos] == '\'' || at[*pos] == '"')
|
||||||
|
{
|
||||||
|
(*pos)++;
|
||||||
|
*length -= 2;
|
||||||
|
}
|
||||||
|
|
||||||
|
uri = xmalloc (*length + 1);
|
||||||
|
if (uri)
|
||||||
|
{
|
||||||
|
strncpy (uri, at + *pos, *length);
|
||||||
|
uri[*length] = '\0';
|
||||||
|
}
|
||||||
|
|
||||||
|
return uri;
|
||||||
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
get_urls_css (struct map_context *ctx, int offset, int buf_length)
|
||||||
|
{
|
||||||
|
int token;
|
||||||
|
/*char tmp[2048];*/
|
||||||
|
int buffer_pos = 0;
|
||||||
|
int pos, length;
|
||||||
|
char *uri;
|
||||||
|
|
||||||
|
/*
|
||||||
|
strncpy(tmp,ctx->text + offset, buf_length);
|
||||||
|
tmp[buf_length] = '\0';
|
||||||
|
DEBUGP (("get_urls_css: \"%s\"\n", tmp));
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* tell flex to scan from this buffer */
|
||||||
|
yy_scan_bytes (ctx->text + offset, buf_length);
|
||||||
|
|
||||||
|
while((token = yylex()) != CSSEOF)
|
||||||
|
{
|
||||||
|
/*DEBUGP (("%s ", token_names[token]));*/
|
||||||
|
/* @import "foo.css"
|
||||||
|
or @import url(foo.css)
|
||||||
|
*/
|
||||||
|
if(token == IMPORT_SYM)
|
||||||
|
{
|
||||||
|
do {
|
||||||
|
buffer_pos += yyleng;
|
||||||
|
} while((token = yylex()) == S);
|
||||||
|
|
||||||
|
/*DEBUGP (("%s ", token_names[token]));*/
|
||||||
|
|
||||||
|
if (token == STRING || token == URI)
|
||||||
|
{
|
||||||
|
/*DEBUGP (("Got URI "));*/
|
||||||
|
pos = buffer_pos + offset;
|
||||||
|
length = yyleng;
|
||||||
|
|
||||||
|
if (token == URI)
|
||||||
|
{
|
||||||
|
uri = get_uri_string (ctx->text, &pos, &length);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
/* cut out quote characters */
|
||||||
|
pos++;
|
||||||
|
length -= 2;
|
||||||
|
uri = xmalloc (length + 1);
|
||||||
|
strncpy (uri, yytext + 1, length);
|
||||||
|
uri[length] = '\0';
|
||||||
|
}
|
||||||
|
|
||||||
|
if (uri)
|
||||||
|
{
|
||||||
|
struct urlpos *up = append_url (uri, pos, length, ctx);
|
||||||
|
DEBUGP (("Found @import: [%s] at %d [%s]\n", yytext, buffer_pos, uri));
|
||||||
|
|
||||||
|
if (up)
|
||||||
|
{
|
||||||
|
up->link_inline_p = 1;
|
||||||
|
up->link_css_p = 1;
|
||||||
|
up->link_expect_css = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
xfree(uri);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
/* background-image: url(foo.png)
|
||||||
|
note that we don't care what
|
||||||
|
property this is actually on.
|
||||||
|
*/
|
||||||
|
else if(token == URI)
|
||||||
|
{
|
||||||
|
pos = buffer_pos + offset;
|
||||||
|
length = yyleng;
|
||||||
|
uri = get_uri_string (ctx->text, &pos, &length);
|
||||||
|
|
||||||
|
if (uri)
|
||||||
|
{
|
||||||
|
struct urlpos *up = append_url (uri, pos, length, ctx);
|
||||||
|
DEBUGP (("Found URI: [%s] at %d [%s]\n", yytext, buffer_pos, uri));
|
||||||
|
if (up)
|
||||||
|
{
|
||||||
|
up->link_inline_p = 1;
|
||||||
|
up->link_css_p = 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
xfree (uri);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
buffer_pos += yyleng;
|
||||||
|
}
|
||||||
|
DEBUGP (("\n"));
|
||||||
|
}
|
||||||
|
|
||||||
|
struct urlpos *
|
||||||
|
get_urls_css_file (const char *file, const char *url)
|
||||||
|
{
|
||||||
|
struct file_memory *fm;
|
||||||
|
struct map_context ctx;
|
||||||
|
|
||||||
|
/* Load the file. */
|
||||||
|
fm = read_file (file);
|
||||||
|
if (!fm)
|
||||||
|
{
|
||||||
|
logprintf (LOG_NOTQUIET, "%s: %s\n", file, strerror (errno));
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
DEBUGP (("Loaded %s (size %s).\n", file, number_to_static_string (fm->length)));
|
||||||
|
|
||||||
|
ctx.text = fm->content;
|
||||||
|
ctx.head = ctx.tail = NULL;
|
||||||
|
ctx.base = NULL;
|
||||||
|
ctx.parent_base = url ? url : opt.base_href;
|
||||||
|
ctx.document_file = file;
|
||||||
|
ctx.nofollow = 0;
|
||||||
|
|
||||||
|
get_urls_css (&ctx, 0, fm->length);
|
||||||
|
read_file_free (fm);
|
||||||
|
return ctx.head;
|
||||||
|
}
|
36
src/css-url.h
Normal file
36
src/css-url.h
Normal file
@ -0,0 +1,36 @@
|
|||||||
|
/* Declarations for css-url.c.
|
||||||
|
Copyright (C) 2006 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
This file is part of GNU Wget.
|
||||||
|
|
||||||
|
GNU Wget is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
GNU Wget is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with Wget; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
In addition, as a special exception, the Free Software Foundation
|
||||||
|
gives permission to link the code of its release of Wget with the
|
||||||
|
OpenSSL project's "OpenSSL" library (or with modified versions of it
|
||||||
|
that use the same license as the "OpenSSL" library), and distribute
|
||||||
|
the linked executables. You must obey the GNU General Public License
|
||||||
|
in all respects for all of the code used other than "OpenSSL". If you
|
||||||
|
modify this file, you may extend this exception to your version of the
|
||||||
|
file, but you are not obligated to do so. If you do not wish to do
|
||||||
|
so, delete this exception statement from your version. */
|
||||||
|
|
||||||
|
#ifndef CSS_URL_H
|
||||||
|
#define CSS_URL_H
|
||||||
|
|
||||||
|
void get_urls_css (struct map_context *, int, int);
|
||||||
|
struct urlpos *get_urls_css_file (const char *, const char *);
|
||||||
|
|
||||||
|
#endif /* CSS_URL_H */
|
137
src/css.l
Normal file
137
src/css.l
Normal file
@ -0,0 +1,137 @@
|
|||||||
|
%option case-insensitive
|
||||||
|
%option noyywrap
|
||||||
|
%option never-interactive
|
||||||
|
|
||||||
|
%{
|
||||||
|
/* Lex source for CSS tokenizing.
|
||||||
|
Taken from http://www.w3.org/TR/CSS21/grammar.html#q2
|
||||||
|
Copyright (C) 2006 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
This file is part of GNU Wget.
|
||||||
|
|
||||||
|
GNU Wget is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
GNU Wget is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with Wget; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
In addition, as a special exception, the Free Software Foundation
|
||||||
|
gives permission to link the code of its release of Wget with the
|
||||||
|
OpenSSL project's "OpenSSL" library (or with modified versions of it
|
||||||
|
that use the same license as the "OpenSSL" library), and distribute
|
||||||
|
the linked executables. You must obey the GNU General Public License
|
||||||
|
in all respects for all of the code used other than "OpenSSL". If you
|
||||||
|
modify this file, you may extend this exception to your version of the
|
||||||
|
file, but you are not obligated to do so. If you do not wish to do
|
||||||
|
so, delete this exception statement from your version. */
|
||||||
|
|
||||||
|
#include "css-tokens.h"
|
||||||
|
|
||||||
|
/* {s}+\/\*[^*]*\*+([^/*][^*]*\*+)*\/ {unput(' '); } */
|
||||||
|
/*replace by space*/
|
||||||
|
%}
|
||||||
|
|
||||||
|
h [0-9a-f]
|
||||||
|
nonascii [\200-\377]
|
||||||
|
unicode \\{h}{1,6}(\r\n|[ \t\r\n\f])?
|
||||||
|
escape {unicode}|\\[^\r\n\f0-9a-f]
|
||||||
|
nmstart [_a-z]|{nonascii}|{escape}
|
||||||
|
nmchar [_a-z0-9-]|{nonascii}|{escape}
|
||||||
|
string1 \"([^\n\r\f\\"]|\\{nl}|{escape})*\"
|
||||||
|
string2 \'([^\n\r\f\\']|\\{nl}|{escape})*\'
|
||||||
|
invalid1 \"([^\n\r\f\\"]|\\{nl}|{escape})*
|
||||||
|
invalid2 \'([^\n\r\f\\']|\\{nl}|{escape})*
|
||||||
|
|
||||||
|
comment \/\*[^*]*\*+([^/*][^*]*\*+)*\/
|
||||||
|
ident -?{nmstart}{nmchar}*
|
||||||
|
name {nmchar}+
|
||||||
|
num [0-9]+|[0-9]*"."[0-9]+
|
||||||
|
string {string1}|{string2}
|
||||||
|
invalid {invalid1}|{invalid2}
|
||||||
|
url ([!#$%&*-~]|{nonascii}|{escape})*
|
||||||
|
s [ \t\r\n\f]
|
||||||
|
w ({s}|{comment})*
|
||||||
|
nl \n|\r\n|\r|\f
|
||||||
|
|
||||||
|
A a|\\0{0,4}(41|61)(\r\n|[ \t\r\n\f])?
|
||||||
|
C c|\\0{0,4}(43|63)(\r\n|[ \t\r\n\f])?
|
||||||
|
D d|\\0{0,4}(44|64)(\r\n|[ \t\r\n\f])?
|
||||||
|
E e|\\0{0,4}(45|65)(\r\n|[ \t\r\n\f])?
|
||||||
|
G g|\\0{0,4}(47|67)(\r\n|[ \t\r\n\f])?|\\g
|
||||||
|
H h|\\0{0,4}(48|68)(\r\n|[ \t\r\n\f])?|\\h
|
||||||
|
I i|\\0{0,4}(49|69)(\r\n|[ \t\r\n\f])?|\\i
|
||||||
|
K k|\\0{0,4}(4b|6b)(\r\n|[ \t\r\n\f])?|\\k
|
||||||
|
M m|\\0{0,4}(4d|6d)(\r\n|[ \t\r\n\f])?|\\m
|
||||||
|
N n|\\0{0,4}(4e|6e)(\r\n|[ \t\r\n\f])?|\\n
|
||||||
|
P p|\\0{0,4}(50|70)(\r\n|[ \t\r\n\f])?|\\p
|
||||||
|
R r|\\0{0,4}(52|72)(\r\n|[ \t\r\n\f])?|\\r
|
||||||
|
S s|\\0{0,4}(53|73)(\r\n|[ \t\r\n\f])?|\\s
|
||||||
|
T t|\\0{0,4}(54|74)(\r\n|[ \t\r\n\f])?|\\t
|
||||||
|
X x|\\0{0,4}(58|78)(\r\n|[ \t\r\n\f])?|\\x
|
||||||
|
Z z|\\0{0,4}(5a|7a)(\r\n|[ \t\r\n\f])?|\\z
|
||||||
|
|
||||||
|
%%
|
||||||
|
|
||||||
|
{s} {return S;}
|
||||||
|
|
||||||
|
\/\*[^*]*\*+([^/*][^*]*\*+)*\/ {return S;} /* ignore comments */
|
||||||
|
|
||||||
|
"<!--" {return CDO;}
|
||||||
|
"-->" {return CDC;}
|
||||||
|
"~=" {return INCLUDES;}
|
||||||
|
"|=" {return DASHMATCH;}
|
||||||
|
|
||||||
|
{w}"{" {return LBRACE;}
|
||||||
|
{w}"+" {return PLUS;}
|
||||||
|
{w}">" {return GREATER;}
|
||||||
|
{w}"," {return COMMA;}
|
||||||
|
|
||||||
|
{string} {return STRING;}
|
||||||
|
{invalid} {return INVALID; /* unclosed string */}
|
||||||
|
|
||||||
|
{ident} {return IDENT;}
|
||||||
|
|
||||||
|
"#"{name} {return HASH;}
|
||||||
|
|
||||||
|
"@import" {return IMPORT_SYM;}
|
||||||
|
"@page" {return PAGE_SYM;}
|
||||||
|
"@media" {return MEDIA_SYM;}
|
||||||
|
"@charset " {return CHARSET_SYM;}
|
||||||
|
|
||||||
|
"!"{w}"important" {return IMPORTANT_SYM;}
|
||||||
|
|
||||||
|
{num}{E}{M} {return EMS;}
|
||||||
|
{num}{E}{X} {return EXS;}
|
||||||
|
{num}{P}{X} {return LENGTH;}
|
||||||
|
{num}{C}{M} {return LENGTH;}
|
||||||
|
{num}{M}{M} {return LENGTH;}
|
||||||
|
{num}{I}{N} {return LENGTH;}
|
||||||
|
{num}{P}{T} {return LENGTH;}
|
||||||
|
{num}{P}{C} {return LENGTH;}
|
||||||
|
{num}{D}{E}{G} {return ANGLE;}
|
||||||
|
{num}{R}{A}{D} {return ANGLE;}
|
||||||
|
{num}{G}{R}{A}{D} {return ANGLE;}
|
||||||
|
{num}{M}{S} {return TIME;}
|
||||||
|
{num}{S} {return TIME;}
|
||||||
|
{num}{H}{Z} {return FREQ;}
|
||||||
|
{num}{K}{H}{Z} {return FREQ;}
|
||||||
|
{num}{ident} {return DIMENSION;}
|
||||||
|
|
||||||
|
{num}% {return PERCENTAGE;}
|
||||||
|
{num} {return NUMBER;}
|
||||||
|
|
||||||
|
"url("{w}{string}{w}")" {return URI;}
|
||||||
|
"url("{w}{url}{w}")" {return URI;}
|
||||||
|
{ident}"(" {return FUNCTION;}
|
||||||
|
|
||||||
|
. {return *yytext;}
|
||||||
|
|
||||||
|
%%
|
10
src/ftp.c
10
src/ftp.c
@ -805,8 +805,14 @@ Error in server response, closing control connection.\n"));
|
|||||||
}
|
}
|
||||||
f = f->next;
|
f = f->next;
|
||||||
}
|
}
|
||||||
if (!exists)
|
if (exists)
|
||||||
{
|
{
|
||||||
|
logputs (LOG_VERBOSE, "\n");
|
||||||
|
logprintf (LOG_NOTQUIET, _("File %s exists.\n"),
|
||||||
|
quote (u->file));
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
logputs (LOG_VERBOSE, "\n");
|
logputs (LOG_VERBOSE, "\n");
|
||||||
logprintf (LOG_NOTQUIET, _("No such file %s.\n"),
|
logprintf (LOG_NOTQUIET, _("No such file %s.\n"),
|
||||||
quote (u->file));
|
quote (u->file));
|
||||||
|
128
src/html-parse.c
128
src/html-parse.c
@ -272,6 +272,94 @@ struct pool {
|
|||||||
to "<foo", but "<,foo" to "<,foo". */
|
to "<foo", but "<,foo" to "<,foo". */
|
||||||
#define SKIP_SEMI(p, inc) (p += inc, p < end && *p == ';' ? ++p : p)
|
#define SKIP_SEMI(p, inc) (p += inc, p < end && *p == ';' ? ++p : p)
|
||||||
|
|
||||||
|
struct tagstack_item {
|
||||||
|
const char *tagname_begin;
|
||||||
|
const char *tagname_end;
|
||||||
|
const char *contents_begin;
|
||||||
|
struct tagstack_item *prev;
|
||||||
|
struct tagstack_item *next;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct tagstack_item *
|
||||||
|
tagstack_push (struct tagstack_item **head, struct tagstack_item **tail)
|
||||||
|
{
|
||||||
|
struct tagstack_item *ts = xmalloc(sizeof(struct tagstack_item));
|
||||||
|
if (*head == NULL)
|
||||||
|
{
|
||||||
|
*head = *tail = ts;
|
||||||
|
ts->prev = ts->next = NULL;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
(*tail)->next = ts;
|
||||||
|
ts->prev = *tail;
|
||||||
|
*tail = ts;
|
||||||
|
ts->next = NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ts;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* remove ts and everything after it from the stack */
|
||||||
|
void
|
||||||
|
tagstack_pop (struct tagstack_item **head, struct tagstack_item **tail,
|
||||||
|
struct tagstack_item *ts)
|
||||||
|
{
|
||||||
|
if (*head == NULL)
|
||||||
|
return;
|
||||||
|
|
||||||
|
if (ts == *tail)
|
||||||
|
{
|
||||||
|
if (ts == *head)
|
||||||
|
{
|
||||||
|
xfree (ts);
|
||||||
|
*head = *tail = NULL;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
ts->prev->next = NULL;
|
||||||
|
*tail = ts->prev;
|
||||||
|
xfree (ts);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
if (ts == *head)
|
||||||
|
{
|
||||||
|
*head = NULL;
|
||||||
|
}
|
||||||
|
*tail = ts->prev;
|
||||||
|
|
||||||
|
if (ts->prev)
|
||||||
|
{
|
||||||
|
ts->prev->next = NULL;
|
||||||
|
}
|
||||||
|
while (ts)
|
||||||
|
{
|
||||||
|
struct tagstack_item *p = ts->next;
|
||||||
|
xfree (ts);
|
||||||
|
ts = p;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
struct tagstack_item *
|
||||||
|
tagstack_find (struct tagstack_item *tail, const char *tagname_begin,
|
||||||
|
const char *tagname_end)
|
||||||
|
{
|
||||||
|
int len = tagname_end - tagname_begin;
|
||||||
|
while (tail)
|
||||||
|
{
|
||||||
|
if (len == (tail->tagname_end - tail->tagname_begin))
|
||||||
|
{
|
||||||
|
if (0 == strncasecmp (tail->tagname_begin, tagname_begin, len))
|
||||||
|
return tail;
|
||||||
|
}
|
||||||
|
tail = tail->prev;
|
||||||
|
}
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
/* Decode the HTML character entity at *PTR, considering END to be end
|
/* Decode the HTML character entity at *PTR, considering END to be end
|
||||||
of buffer. It is assumed that the "&" character that marks the
|
of buffer. It is assumed that the "&" character that marks the
|
||||||
beginning of the entity has been seen at *PTR-1. If a recognized
|
beginning of the entity has been seen at *PTR-1. If a recognized
|
||||||
@ -757,6 +845,9 @@ map_html_tags (const char *text, int size,
|
|||||||
bool attr_pair_resized = false;
|
bool attr_pair_resized = false;
|
||||||
struct attr_pair *pairs = attr_pair_initial_storage;
|
struct attr_pair *pairs = attr_pair_initial_storage;
|
||||||
|
|
||||||
|
struct tagstack_item *head = NULL;
|
||||||
|
struct tagstack_item *tail = NULL;
|
||||||
|
|
||||||
if (!size)
|
if (!size)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
@ -823,6 +914,18 @@ map_html_tags (const char *text, int size,
|
|||||||
goto look_for_tag;
|
goto look_for_tag;
|
||||||
tag_name_end = p;
|
tag_name_end = p;
|
||||||
SKIP_WS (p);
|
SKIP_WS (p);
|
||||||
|
|
||||||
|
if (!end_tag)
|
||||||
|
{
|
||||||
|
struct tagstack_item *ts = tagstack_push (&head, &tail);
|
||||||
|
if (ts)
|
||||||
|
{
|
||||||
|
ts->tagname_begin = tag_name_begin;
|
||||||
|
ts->tagname_end = tag_name_end;
|
||||||
|
ts->contents_begin = NULL;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
if (end_tag && *p != '>')
|
if (end_tag && *p != '>')
|
||||||
goto backout_tag;
|
goto backout_tag;
|
||||||
|
|
||||||
@ -984,6 +1087,11 @@ map_html_tags (const char *text, int size,
|
|||||||
++nattrs;
|
++nattrs;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (!end_tag && tail && (tail->tagname_begin == tag_name_begin))
|
||||||
|
{
|
||||||
|
tail->contents_begin = p+1;
|
||||||
|
}
|
||||||
|
|
||||||
if (uninteresting_tag)
|
if (uninteresting_tag)
|
||||||
{
|
{
|
||||||
ADVANCE (p);
|
ADVANCE (p);
|
||||||
@ -995,6 +1103,7 @@ map_html_tags (const char *text, int size,
|
|||||||
{
|
{
|
||||||
int i;
|
int i;
|
||||||
struct taginfo taginfo;
|
struct taginfo taginfo;
|
||||||
|
struct tagstack_item *ts = NULL;
|
||||||
|
|
||||||
taginfo.name = pool.contents;
|
taginfo.name = pool.contents;
|
||||||
taginfo.end_tag_p = end_tag;
|
taginfo.end_tag_p = end_tag;
|
||||||
@ -1011,6 +1120,23 @@ map_html_tags (const char *text, int size,
|
|||||||
taginfo.attrs = pairs;
|
taginfo.attrs = pairs;
|
||||||
taginfo.start_position = tag_start_position;
|
taginfo.start_position = tag_start_position;
|
||||||
taginfo.end_position = p + 1;
|
taginfo.end_position = p + 1;
|
||||||
|
taginfo.contents_begin = NULL;
|
||||||
|
taginfo.contents_end = NULL;
|
||||||
|
|
||||||
|
if (end_tag)
|
||||||
|
{
|
||||||
|
ts = tagstack_find (tail, tag_name_begin, tag_name_end);
|
||||||
|
if (ts)
|
||||||
|
{
|
||||||
|
if (ts->contents_begin)
|
||||||
|
{
|
||||||
|
taginfo.contents_begin = ts->contents_begin;
|
||||||
|
taginfo.contents_end = tag_start_position;
|
||||||
|
}
|
||||||
|
tagstack_pop (&head, &tail, ts);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
mapfun (&taginfo, maparg);
|
mapfun (&taginfo, maparg);
|
||||||
ADVANCE (p);
|
ADVANCE (p);
|
||||||
}
|
}
|
||||||
@ -1030,6 +1156,8 @@ map_html_tags (const char *text, int size,
|
|||||||
POOL_FREE (&pool);
|
POOL_FREE (&pool);
|
||||||
if (attr_pair_resized)
|
if (attr_pair_resized)
|
||||||
xfree (pairs);
|
xfree (pairs);
|
||||||
|
/* pop any tag stack that's left */
|
||||||
|
tagstack_pop (&head, &tail, head);
|
||||||
}
|
}
|
||||||
|
|
||||||
#undef ADVANCE
|
#undef ADVANCE
|
||||||
|
@ -52,6 +52,9 @@ struct taginfo {
|
|||||||
|
|
||||||
const char *start_position; /* start position of tag */
|
const char *start_position; /* start position of tag */
|
||||||
const char *end_position; /* end position of tag */
|
const char *end_position; /* end position of tag */
|
||||||
|
|
||||||
|
const char *contents_begin; /* delimiters of tag contents */
|
||||||
|
const char *contents_end; /* only valid if end_tag_p */
|
||||||
};
|
};
|
||||||
|
|
||||||
struct hash_table; /* forward declaration */
|
struct hash_table; /* forward declaration */
|
||||||
|
117
src/html-url.c
117
src/html-url.c
@ -41,11 +41,11 @@ as that of the covered work. */
|
|||||||
#include "utils.h"
|
#include "utils.h"
|
||||||
#include "hash.h"
|
#include "hash.h"
|
||||||
#include "convert.h"
|
#include "convert.h"
|
||||||
#include "recur.h" /* declaration of get_urls_html */
|
#include "recur.h"
|
||||||
|
#include "html-url.h"
|
||||||
|
#include "css-url.h"
|
||||||
#include "iri.h"
|
#include "iri.h"
|
||||||
|
|
||||||
struct map_context;
|
|
||||||
|
|
||||||
typedef void (*tag_handler_t) (int, struct taginfo *, struct map_context *);
|
typedef void (*tag_handler_t) (int, struct taginfo *, struct map_context *);
|
||||||
|
|
||||||
#define DECLARE_TAG_HANDLER(fun) \
|
#define DECLARE_TAG_HANDLER(fun) \
|
||||||
@ -164,11 +164,12 @@ static struct {
|
|||||||
from the information above. However, some places in the code refer
|
from the information above. However, some places in the code refer
|
||||||
to the attributes not mentioned here. We add them manually. */
|
to the attributes not mentioned here. We add them manually. */
|
||||||
static const char *additional_attributes[] = {
|
static const char *additional_attributes[] = {
|
||||||
"rel", /* used by tag_handle_link */
|
"rel", /* used by tag_handle_link */
|
||||||
"http-equiv", /* used by tag_handle_meta */
|
"http-equiv", /* used by tag_handle_meta */
|
||||||
"name", /* used by tag_handle_meta */
|
"name", /* used by tag_handle_meta */
|
||||||
"content", /* used by tag_handle_meta */
|
"content", /* used by tag_handle_meta */
|
||||||
"action" /* used by tag_handle_form */
|
"action", /* used by tag_handle_form */
|
||||||
|
"style" /* used by check_style_attr */
|
||||||
};
|
};
|
||||||
|
|
||||||
static struct hash_table *interesting_tags;
|
static struct hash_table *interesting_tags;
|
||||||
@ -247,28 +248,20 @@ find_attr (struct taginfo *tag, const char *name, int *attrind)
|
|||||||
return NULL;
|
return NULL;
|
||||||
}
|
}
|
||||||
|
|
||||||
struct map_context {
|
/* used for calls to append_url */
|
||||||
char *text; /* HTML text. */
|
#define ATTR_POS(tag, attrind, ctx) \
|
||||||
char *base; /* Base URI of the document, possibly
|
(tag->attrs[attrind].value_raw_beginning - ctx->text)
|
||||||
changed through <base href=...>. */
|
#define ATTR_SIZE(tag, attrind) \
|
||||||
const char *parent_base; /* Base of the current document. */
|
(tag->attrs[attrind].value_raw_size)
|
||||||
const char *document_file; /* File name of this document. */
|
|
||||||
bool nofollow; /* whether NOFOLLOW was specified in a
|
|
||||||
<meta name=robots> tag. */
|
|
||||||
|
|
||||||
struct urlpos *head, *tail; /* List of URLs that is being
|
|
||||||
built. */
|
|
||||||
};
|
|
||||||
|
|
||||||
/* Append LINK_URI to the urlpos structure that is being built.
|
/* Append LINK_URI to the urlpos structure that is being built.
|
||||||
|
|
||||||
LINK_URI will be merged with the current document base. TAG and
|
LINK_URI will be merged with the current document base.
|
||||||
ATTRIND are the necessary context to store the position and
|
*/
|
||||||
size. */
|
|
||||||
|
|
||||||
static struct urlpos *
|
struct urlpos *
|
||||||
append_url (const char *link_uri,
|
append_url (const char *link_uri, int position, int size,
|
||||||
struct taginfo *tag, int attrind, struct map_context *ctx)
|
struct map_context *ctx)
|
||||||
{
|
{
|
||||||
int link_has_scheme = url_has_scheme (link_uri);
|
int link_has_scheme = url_has_scheme (link_uri);
|
||||||
struct urlpos *newel;
|
struct urlpos *newel;
|
||||||
@ -330,8 +323,8 @@ append_url (const char *link_uri,
|
|||||||
|
|
||||||
newel = xnew0 (struct urlpos);
|
newel = xnew0 (struct urlpos);
|
||||||
newel->url = url;
|
newel->url = url;
|
||||||
newel->pos = tag->attrs[attrind].value_raw_beginning - ctx->text;
|
newel->pos = position;
|
||||||
newel->size = tag->attrs[attrind].value_raw_size;
|
newel->size = size;
|
||||||
|
|
||||||
/* A URL is relative if the host is not named, and the name does not
|
/* A URL is relative if the host is not named, and the name does not
|
||||||
start with `/'. */
|
start with `/'. */
|
||||||
@ -351,6 +344,18 @@ append_url (const char *link_uri,
|
|||||||
return newel;
|
return newel;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static void
|
||||||
|
check_style_attr (struct taginfo *tag, struct map_context *ctx)
|
||||||
|
{
|
||||||
|
int attrind;
|
||||||
|
char *style = find_attr (tag, "style", &attrind);
|
||||||
|
if (!style)
|
||||||
|
return;
|
||||||
|
|
||||||
|
/* raw pos and raw size include the quotes, hence the +1 -2 */
|
||||||
|
get_urls_css (ctx, ATTR_POS(tag,attrind,ctx)+1, ATTR_SIZE(tag,attrind)-2);
|
||||||
|
}
|
||||||
|
|
||||||
/* All the tag_* functions are called from collect_tags_mapper, as
|
/* All the tag_* functions are called from collect_tags_mapper, as
|
||||||
specified by KNOWN_TAGS. */
|
specified by KNOWN_TAGS. */
|
||||||
|
|
||||||
@ -399,7 +404,8 @@ tag_find_urls (int tagid, struct taginfo *tag, struct map_context *ctx)
|
|||||||
if (0 == strcasecmp (tag->attrs[attrind].name,
|
if (0 == strcasecmp (tag->attrs[attrind].name,
|
||||||
tag_url_attributes[i].attr_name))
|
tag_url_attributes[i].attr_name))
|
||||||
{
|
{
|
||||||
struct urlpos *up = append_url (link, tag, attrind, ctx);
|
struct urlpos *up = append_url (link, ATTR_POS(tag,attrind,ctx),
|
||||||
|
ATTR_SIZE(tag,attrind), ctx);
|
||||||
if (up)
|
if (up)
|
||||||
{
|
{
|
||||||
int flags = tag_url_attributes[i].flags;
|
int flags = tag_url_attributes[i].flags;
|
||||||
@ -424,7 +430,8 @@ tag_handle_base (int tagid, struct taginfo *tag, struct map_context *ctx)
|
|||||||
if (!newbase)
|
if (!newbase)
|
||||||
return;
|
return;
|
||||||
|
|
||||||
base_urlpos = append_url (newbase, tag, attrind, ctx);
|
base_urlpos = append_url (newbase, ATTR_POS(tag,attrind,ctx),
|
||||||
|
ATTR_SIZE(tag,attrind), ctx);
|
||||||
if (!base_urlpos)
|
if (!base_urlpos)
|
||||||
return;
|
return;
|
||||||
base_urlpos->ignore_when_downloading = 1;
|
base_urlpos->ignore_when_downloading = 1;
|
||||||
@ -445,9 +452,11 @@ tag_handle_form (int tagid, struct taginfo *tag, struct map_context *ctx)
|
|||||||
{
|
{
|
||||||
int attrind;
|
int attrind;
|
||||||
char *action = find_attr (tag, "action", &attrind);
|
char *action = find_attr (tag, "action", &attrind);
|
||||||
|
|
||||||
if (action)
|
if (action)
|
||||||
{
|
{
|
||||||
struct urlpos *up = append_url (action, tag, attrind, ctx);
|
struct urlpos *up = append_url (action, ATTR_POS(tag,attrind,ctx),
|
||||||
|
ATTR_SIZE(tag,attrind), ctx);
|
||||||
if (up)
|
if (up)
|
||||||
up->ignore_when_downloading = 1;
|
up->ignore_when_downloading = 1;
|
||||||
}
|
}
|
||||||
@ -470,14 +479,23 @@ tag_handle_link (int tagid, struct taginfo *tag, struct map_context *ctx)
|
|||||||
*/
|
*/
|
||||||
if (href)
|
if (href)
|
||||||
{
|
{
|
||||||
struct urlpos *up = append_url (href, tag, attrind, ctx);
|
struct urlpos *up = append_url (href, ATTR_POS(tag,attrind,ctx),
|
||||||
|
ATTR_SIZE(tag,attrind), ctx);
|
||||||
if (up)
|
if (up)
|
||||||
{
|
{
|
||||||
char *rel = find_attr (tag, "rel", NULL);
|
char *rel = find_attr (tag, "rel", NULL);
|
||||||
if (rel
|
if (rel)
|
||||||
&& (0 == strcasecmp (rel, "stylesheet")
|
{
|
||||||
|| 0 == strcasecmp (rel, "shortcut icon")))
|
if (0 == strcasecmp (rel, "stylesheet"))
|
||||||
up->link_inline_p = 1;
|
{
|
||||||
|
up->link_inline_p = 1;
|
||||||
|
up->link_expect_css = 1;
|
||||||
|
}
|
||||||
|
else if (0 == strcasecmp (rel, "shortcut icon"))
|
||||||
|
{
|
||||||
|
up->link_inline_p = 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
else
|
else
|
||||||
/* The external ones usually point to HTML pages, such as
|
/* The external ones usually point to HTML pages, such as
|
||||||
<link rel="next" href="..."> */
|
<link rel="next" href="..."> */
|
||||||
@ -531,7 +549,8 @@ tag_handle_meta (int tagid, struct taginfo *tag, struct map_context *ctx)
|
|||||||
while (c_isspace (*p))
|
while (c_isspace (*p))
|
||||||
++p;
|
++p;
|
||||||
|
|
||||||
entry = append_url (p, tag, attrind, ctx);
|
entry = append_url (p, ATTR_POS(tag,attrind,ctx),
|
||||||
|
ATTR_SIZE(tag,attrind), ctx);
|
||||||
if (entry)
|
if (entry)
|
||||||
{
|
{
|
||||||
entry->link_refresh_p = 1;
|
entry->link_refresh_p = 1;
|
||||||
@ -595,11 +614,26 @@ collect_tags_mapper (struct taginfo *tag, void *arg)
|
|||||||
struct map_context *ctx = (struct map_context *)arg;
|
struct map_context *ctx = (struct map_context *)arg;
|
||||||
|
|
||||||
/* Find the tag in our table of tags. This must not fail because
|
/* Find the tag in our table of tags. This must not fail because
|
||||||
map_html_tags only returns tags found in interesting_tags. */
|
map_html_tags only returns tags found in interesting_tags.
|
||||||
|
|
||||||
|
I've changed this for now, I'm passing NULL as interesting_tags
|
||||||
|
to map_html_tags. This way we can check all tags for a style
|
||||||
|
attribute.
|
||||||
|
*/
|
||||||
struct known_tag *t = hash_table_get (interesting_tags, tag->name);
|
struct known_tag *t = hash_table_get (interesting_tags, tag->name);
|
||||||
assert (t != NULL);
|
|
||||||
|
|
||||||
t->handler (t->tagid, tag, ctx);
|
if (t != NULL)
|
||||||
|
t->handler (t->tagid, tag, ctx);
|
||||||
|
|
||||||
|
check_style_attr (tag, ctx);
|
||||||
|
|
||||||
|
if (tag->end_tag_p && (0 == strcasecmp (tag->name, "style")) &&
|
||||||
|
tag->contents_begin && tag->contents_end)
|
||||||
|
{
|
||||||
|
/* parse contents */
|
||||||
|
get_urls_css (ctx, tag->contents_begin - ctx->text,
|
||||||
|
tag->contents_end - tag->contents_begin);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* Analyze HTML tags FILE and construct a list of URLs referenced from
|
/* Analyze HTML tags FILE and construct a list of URLs referenced from
|
||||||
@ -643,8 +677,9 @@ get_urls_html (const char *file, const char *url, bool *meta_disallow_follow)
|
|||||||
if (opt.strict_comments)
|
if (opt.strict_comments)
|
||||||
flags |= MHT_STRICT_COMMENTS;
|
flags |= MHT_STRICT_COMMENTS;
|
||||||
|
|
||||||
|
/* the NULL here used to be interesting_tags */
|
||||||
map_html_tags (fm->content, fm->length, collect_tags_mapper, &ctx, flags,
|
map_html_tags (fm->content, fm->length, collect_tags_mapper, &ctx, flags,
|
||||||
interesting_tags, interesting_attributes);
|
NULL, interesting_attributes);
|
||||||
|
|
||||||
DEBUGP (("no-follow in %s: %d\n", file, ctx.nofollow));
|
DEBUGP (("no-follow in %s: %d\n", file, ctx.nofollow));
|
||||||
if (meta_disallow_follow)
|
if (meta_disallow_follow)
|
||||||
|
51
src/html-url.h
Normal file
51
src/html-url.h
Normal file
@ -0,0 +1,51 @@
|
|||||||
|
/* Declarations for html-url.c.
|
||||||
|
Copyright (C) 1995, 1996, 1997 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
This file is part of GNU Wget.
|
||||||
|
|
||||||
|
GNU Wget is free software; you can redistribute it and/or modify
|
||||||
|
it under the terms of the GNU General Public License as published by
|
||||||
|
the Free Software Foundation; either version 2 of the License, or
|
||||||
|
(at your option) any later version.
|
||||||
|
|
||||||
|
GNU Wget is distributed in the hope that it will be useful,
|
||||||
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
GNU General Public License for more details.
|
||||||
|
|
||||||
|
You should have received a copy of the GNU General Public License
|
||||||
|
along with Wget; if not, write to the Free Software
|
||||||
|
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||||
|
|
||||||
|
In addition, as a special exception, the Free Software Foundation
|
||||||
|
gives permission to link the code of its release of Wget with the
|
||||||
|
OpenSSL project's "OpenSSL" library (or with modified versions of it
|
||||||
|
that use the same license as the "OpenSSL" library), and distribute
|
||||||
|
the linked executables. You must obey the GNU General Public License
|
||||||
|
in all respects for all of the code used other than "OpenSSL". If you
|
||||||
|
modify this file, you may extend this exception to your version of the
|
||||||
|
file, but you are not obligated to do so. If you do not wish to do
|
||||||
|
so, delete this exception statement from your version. */
|
||||||
|
|
||||||
|
#ifndef HTML_URL_H
|
||||||
|
#define HTML_URL_H
|
||||||
|
|
||||||
|
struct map_context {
|
||||||
|
char *text; /* HTML text. */
|
||||||
|
char *base; /* Base URI of the document, possibly
|
||||||
|
changed through <base href=...>. */
|
||||||
|
const char *parent_base; /* Base of the current document. */
|
||||||
|
const char *document_file; /* File name of this document. */
|
||||||
|
bool nofollow; /* whether NOFOLLOW was specified in a
|
||||||
|
<meta name=robots> tag. */
|
||||||
|
|
||||||
|
struct urlpos *head, *tail; /* List of URLs that is being
|
||||||
|
built. */
|
||||||
|
};
|
||||||
|
|
||||||
|
struct urlpos *get_urls_file (const char *);
|
||||||
|
struct urlpos *get_urls_html (const char *, const char *, bool *);
|
||||||
|
struct urlpos *append_url (const char *, int, int, struct map_context *);
|
||||||
|
void free_urlpos (struct urlpos *);
|
||||||
|
|
||||||
|
#endif /* HTML_URL_H */
|
82
src/http.c
82
src/http.c
@ -70,11 +70,13 @@ as that of the covered work. */
|
|||||||
extern char *version_string;
|
extern char *version_string;
|
||||||
|
|
||||||
/* Forward decls. */
|
/* Forward decls. */
|
||||||
|
struct http_stat;
|
||||||
static char *create_authorization_line (const char *, const char *,
|
static char *create_authorization_line (const char *, const char *,
|
||||||
const char *, const char *,
|
const char *, const char *,
|
||||||
const char *, bool *);
|
const char *, bool *);
|
||||||
static char *basic_authentication_encode (const char *, const char *);
|
static char *basic_authentication_encode (const char *, const char *);
|
||||||
static bool known_authentication_scheme_p (const char *, const char *);
|
static bool known_authentication_scheme_p (const char *, const char *);
|
||||||
|
static void ensure_extension (struct http_stat *, const char *, int *);
|
||||||
static void load_cookies (void);
|
static void load_cookies (void);
|
||||||
|
|
||||||
#ifndef MIN
|
#ifndef MIN
|
||||||
@ -87,6 +89,7 @@ static struct cookie_jar *wget_cookie_jar;
|
|||||||
|
|
||||||
#define TEXTHTML_S "text/html"
|
#define TEXTHTML_S "text/html"
|
||||||
#define TEXTXHTML_S "application/xhtml+xml"
|
#define TEXTXHTML_S "application/xhtml+xml"
|
||||||
|
#define TEXTCSS_S "text/css"
|
||||||
|
|
||||||
/* Some status code validation macros: */
|
/* Some status code validation macros: */
|
||||||
#define H_20X(x) (((x) >= 200) && ((x) < 300))
|
#define H_20X(x) (((x) >= 200) && ((x) < 300))
|
||||||
@ -2130,34 +2133,25 @@ File %s already there; not retrieving.\n\n"), quote (hs->local_file));
|
|||||||
else
|
else
|
||||||
*dt &= ~TEXTHTML;
|
*dt &= ~TEXTHTML;
|
||||||
|
|
||||||
if (opt.html_extension && (*dt & TEXTHTML))
|
if (type &&
|
||||||
/* -E / --html-extension / html_extension = on was specified, and this is a
|
0 == strncasecmp (type, TEXTCSS_S, strlen (TEXTCSS_S)))
|
||||||
text/html file. If some case-insensitive variation on ".htm[l]" isn't
|
*dt |= TEXTCSS;
|
||||||
already the file's suffix, tack on ".html". */
|
else
|
||||||
{
|
*dt &= ~TEXTCSS;
|
||||||
char *last_period_in_local_filename = strrchr (hs->local_file, '.');
|
|
||||||
|
|
||||||
if (last_period_in_local_filename == NULL
|
if (opt.html_extension)
|
||||||
|| !(0 == strcasecmp (last_period_in_local_filename, ".htm")
|
{
|
||||||
|| 0 == strcasecmp (last_period_in_local_filename, ".html")))
|
if (*dt & TEXTHTML)
|
||||||
|
/* -E / --html-extension / html_extension = on was specified,
|
||||||
|
and this is a text/html file. If some case-insensitive
|
||||||
|
variation on ".htm[l]" isn't already the file's suffix,
|
||||||
|
tack on ".html". */
|
||||||
{
|
{
|
||||||
int local_filename_len = strlen (hs->local_file);
|
ensure_extension (hs, ".html", dt);
|
||||||
/* Resize the local file, allowing for ".html" preceded by
|
}
|
||||||
optional ".NUMBER". */
|
else if (*dt & TEXTCSS)
|
||||||
hs->local_file = xrealloc (hs->local_file,
|
{
|
||||||
local_filename_len + 24 + sizeof (".html"));
|
ensure_extension (hs, ".css", dt);
|
||||||
strcpy(hs->local_file + local_filename_len, ".html");
|
|
||||||
/* If clobbering is not allowed and the file, as named,
|
|
||||||
exists, tack on ".NUMBER.html" instead. */
|
|
||||||
if (!ALLOW_CLOBBER && file_exists_p (hs->local_file))
|
|
||||||
{
|
|
||||||
int ext_num = 1;
|
|
||||||
do
|
|
||||||
sprintf (hs->local_file + local_filename_len,
|
|
||||||
".%d.html", ext_num++);
|
|
||||||
while (file_exists_p (hs->local_file));
|
|
||||||
}
|
|
||||||
*dt |= ADDED_HTML_EXTENSION;
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -3222,6 +3216,42 @@ http_cleanup (void)
|
|||||||
cookie_jar_delete (wget_cookie_jar);
|
cookie_jar_delete (wget_cookie_jar);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void
|
||||||
|
ensure_extension (struct http_stat *hs, const char *ext, int *dt)
|
||||||
|
{
|
||||||
|
char *last_period_in_local_filename = strrchr (hs->local_file, '.');
|
||||||
|
char shortext[8];
|
||||||
|
int len = strlen (ext);
|
||||||
|
if (len == 5)
|
||||||
|
{
|
||||||
|
strncpy (shortext, ext, len - 1);
|
||||||
|
shortext[len - 2] = '\0';
|
||||||
|
}
|
||||||
|
|
||||||
|
if (last_period_in_local_filename == NULL
|
||||||
|
|| !(0 == strcasecmp (last_period_in_local_filename, shortext)
|
||||||
|
|| 0 == strcasecmp (last_period_in_local_filename, ext)))
|
||||||
|
{
|
||||||
|
int local_filename_len = strlen (hs->local_file);
|
||||||
|
/* Resize the local file, allowing for ".html" preceded by
|
||||||
|
optional ".NUMBER". */
|
||||||
|
hs->local_file = xrealloc (hs->local_file,
|
||||||
|
local_filename_len + 24 + len);
|
||||||
|
strcpy (hs->local_file + local_filename_len, ext);
|
||||||
|
/* If clobbering is not allowed and the file, as named,
|
||||||
|
exists, tack on ".NUMBER.html" instead. */
|
||||||
|
if (!ALLOW_CLOBBER && file_exists_p (hs->local_file))
|
||||||
|
{
|
||||||
|
int ext_num = 1;
|
||||||
|
do
|
||||||
|
sprintf (hs->local_file + local_filename_len,
|
||||||
|
".%d%s", ext_num++, ext);
|
||||||
|
while (file_exists_p (hs->local_file));
|
||||||
|
}
|
||||||
|
*dt |= ADDED_HTML_EXTENSION;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
#ifdef TESTING
|
#ifdef TESTING
|
||||||
|
|
||||||
|
@ -422,7 +422,7 @@ Logging and input file:\n"),
|
|||||||
N_("\
|
N_("\
|
||||||
-nv, --no-verbose turn off verboseness, without being quiet.\n"),
|
-nv, --no-verbose turn off verboseness, without being quiet.\n"),
|
||||||
N_("\
|
N_("\
|
||||||
-i, --input-file=FILE download URLs found in FILE.\n"),
|
-i, --input-file=FILE download URLs found in local or external FILE.\n"),
|
||||||
N_("\
|
N_("\
|
||||||
-F, --force-html treat input file as HTML.\n"),
|
-F, --force-html treat input file as HTML.\n"),
|
||||||
N_("\
|
N_("\
|
||||||
@ -615,7 +615,8 @@ Recursive download:\n"),
|
|||||||
N_("\
|
N_("\
|
||||||
--delete-after delete files locally after downloading them.\n"),
|
--delete-after delete files locally after downloading them.\n"),
|
||||||
N_("\
|
N_("\
|
||||||
-k, --convert-links make links in downloaded HTML point to local files.\n"),
|
-k, --convert-links make links in downloaded HTML or CSS point to\n\
|
||||||
|
local files.\n"),
|
||||||
N_("\
|
N_("\
|
||||||
-K, --backup-converted before converting file X, back up as X.orig.\n"),
|
-K, --backup-converted before converting file X, back up as X.orig.\n"),
|
||||||
N_("\
|
N_("\
|
||||||
|
59
src/recur.c
59
src/recur.c
@ -48,6 +48,8 @@ as that of the covered work. */
|
|||||||
#include "hash.h"
|
#include "hash.h"
|
||||||
#include "res.h"
|
#include "res.h"
|
||||||
#include "convert.h"
|
#include "convert.h"
|
||||||
|
#include "html-url.h"
|
||||||
|
#include "css-url.h"
|
||||||
#include "spider.h"
|
#include "spider.h"
|
||||||
#include "iri.h"
|
#include "iri.h"
|
||||||
|
|
||||||
@ -60,6 +62,8 @@ struct queue_element {
|
|||||||
bool html_allowed; /* whether the document is allowed to
|
bool html_allowed; /* whether the document is allowed to
|
||||||
be treated as HTML. */
|
be treated as HTML. */
|
||||||
char *remote_encoding;
|
char *remote_encoding;
|
||||||
|
bool css_allowed; /* whether the document is allowed to
|
||||||
|
be treated as CSS. */
|
||||||
struct queue_element *next; /* next element in queue */
|
struct queue_element *next; /* next element in queue */
|
||||||
};
|
};
|
||||||
|
|
||||||
@ -92,7 +96,8 @@ url_queue_delete (struct url_queue *queue)
|
|||||||
|
|
||||||
static void
|
static void
|
||||||
url_enqueue (struct url_queue *queue,
|
url_enqueue (struct url_queue *queue,
|
||||||
const char *url, const char *referer, int depth, bool html_allowed)
|
const char *url, const char *referer, int depth,
|
||||||
|
bool html_allowed, bool css_allowed)
|
||||||
{
|
{
|
||||||
struct queue_element *qel = xnew (struct queue_element);
|
struct queue_element *qel = xnew (struct queue_element);
|
||||||
char *charset = get_current_charset ();
|
char *charset = get_current_charset ();
|
||||||
@ -100,6 +105,7 @@ url_enqueue (struct url_queue *queue,
|
|||||||
qel->referer = referer;
|
qel->referer = referer;
|
||||||
qel->depth = depth;
|
qel->depth = depth;
|
||||||
qel->html_allowed = html_allowed;
|
qel->html_allowed = html_allowed;
|
||||||
|
qel->css_allowed = css_allowed;
|
||||||
qel->next = NULL;
|
qel->next = NULL;
|
||||||
|
|
||||||
if (charset)
|
if (charset)
|
||||||
@ -130,7 +136,7 @@ url_enqueue (struct url_queue *queue,
|
|||||||
static bool
|
static bool
|
||||||
url_dequeue (struct url_queue *queue,
|
url_dequeue (struct url_queue *queue,
|
||||||
const char **url, const char **referer, int *depth,
|
const char **url, const char **referer, int *depth,
|
||||||
bool *html_allowed)
|
bool *html_allowed, bool *css_allowed)
|
||||||
{
|
{
|
||||||
struct queue_element *qel = queue->head;
|
struct queue_element *qel = queue->head;
|
||||||
|
|
||||||
@ -149,6 +155,7 @@ url_dequeue (struct url_queue *queue,
|
|||||||
*referer = qel->referer;
|
*referer = qel->referer;
|
||||||
*depth = qel->depth;
|
*depth = qel->depth;
|
||||||
*html_allowed = qel->html_allowed;
|
*html_allowed = qel->html_allowed;
|
||||||
|
*css_allowed = qel->css_allowed;
|
||||||
|
|
||||||
--queue->count;
|
--queue->count;
|
||||||
|
|
||||||
@ -216,7 +223,7 @@ retrieve_tree (const char *start_url)
|
|||||||
|
|
||||||
/* Enqueue the starting URL. Use start_url_parsed->url rather than
|
/* Enqueue the starting URL. Use start_url_parsed->url rather than
|
||||||
just URL so we enqueue the canonical form of the URL. */
|
just URL so we enqueue the canonical form of the URL. */
|
||||||
url_enqueue (queue, xstrdup (start_url_parsed->url), NULL, 0, true);
|
url_enqueue (queue, xstrdup (start_url_parsed->url), NULL, 0, true, false);
|
||||||
string_set_add (blacklist, start_url_parsed->url);
|
string_set_add (blacklist, start_url_parsed->url);
|
||||||
|
|
||||||
while (1)
|
while (1)
|
||||||
@ -224,7 +231,8 @@ retrieve_tree (const char *start_url)
|
|||||||
bool descend = false;
|
bool descend = false;
|
||||||
char *url, *referer, *file = NULL;
|
char *url, *referer, *file = NULL;
|
||||||
int depth;
|
int depth;
|
||||||
bool html_allowed;
|
bool html_allowed, css_allowed;
|
||||||
|
bool is_css = false;
|
||||||
bool dash_p_leaf_HTML = false;
|
bool dash_p_leaf_HTML = false;
|
||||||
|
|
||||||
if (opt.quota && total_downloaded_bytes > opt.quota)
|
if (opt.quota && total_downloaded_bytes > opt.quota)
|
||||||
@ -236,7 +244,7 @@ retrieve_tree (const char *start_url)
|
|||||||
|
|
||||||
if (!url_dequeue (queue,
|
if (!url_dequeue (queue,
|
||||||
(const char **)&url, (const char **)&referer,
|
(const char **)&url, (const char **)&referer,
|
||||||
&depth, &html_allowed))
|
&depth, &html_allowed, &css_allowed))
|
||||||
break;
|
break;
|
||||||
|
|
||||||
/* ...and download it. Note that this download is in most cases
|
/* ...and download it. Note that this download is in most cases
|
||||||
@ -254,10 +262,21 @@ retrieve_tree (const char *start_url)
|
|||||||
DEBUGP (("Already downloaded \"%s\", reusing it from \"%s\".\n",
|
DEBUGP (("Already downloaded \"%s\", reusing it from \"%s\".\n",
|
||||||
url, file));
|
url, file));
|
||||||
|
|
||||||
|
/* this sucks, needs to be combined! */
|
||||||
if (html_allowed
|
if (html_allowed
|
||||||
&& downloaded_html_set
|
&& downloaded_html_set
|
||||||
&& string_set_contains (downloaded_html_set, file))
|
&& string_set_contains (downloaded_html_set, file))
|
||||||
descend = true;
|
{
|
||||||
|
descend = true;
|
||||||
|
is_css = false;
|
||||||
|
}
|
||||||
|
if (css_allowed
|
||||||
|
&& downloaded_css_set
|
||||||
|
&& string_set_contains (downloaded_css_set, file))
|
||||||
|
{
|
||||||
|
descend = true;
|
||||||
|
is_css = true;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
@ -268,7 +287,21 @@ retrieve_tree (const char *start_url)
|
|||||||
|
|
||||||
if (html_allowed && file && status == RETROK
|
if (html_allowed && file && status == RETROK
|
||||||
&& (dt & RETROKF) && (dt & TEXTHTML))
|
&& (dt & RETROKF) && (dt & TEXTHTML))
|
||||||
descend = true;
|
{
|
||||||
|
descend = true;
|
||||||
|
is_css = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* a little different, css_allowed can override content type
|
||||||
|
lots of web servers serve css with an incorrect content type
|
||||||
|
*/
|
||||||
|
if (file && status == RETROK
|
||||||
|
&& (dt & RETROKF) &&
|
||||||
|
((dt & TEXTCSS) || css_allowed))
|
||||||
|
{
|
||||||
|
descend = true;
|
||||||
|
is_css = true;
|
||||||
|
}
|
||||||
|
|
||||||
if (redirected)
|
if (redirected)
|
||||||
{
|
{
|
||||||
@ -322,14 +355,15 @@ retrieve_tree (const char *start_url)
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
/* If the downloaded document was HTML, parse it and enqueue the
|
/* If the downloaded document was HTML or CSS, parse it and enqueue the
|
||||||
links it contains. */
|
links it contains. */
|
||||||
|
|
||||||
if (descend)
|
if (descend)
|
||||||
{
|
{
|
||||||
bool meta_disallow_follow = false;
|
bool meta_disallow_follow = false;
|
||||||
struct urlpos *children
|
struct urlpos *children
|
||||||
= get_urls_html (file, url, &meta_disallow_follow);
|
= is_css ? get_urls_css_file (file, url) :
|
||||||
|
get_urls_html (file, url, &meta_disallow_follow);
|
||||||
|
|
||||||
if (opt.use_robots && meta_disallow_follow)
|
if (opt.use_robots && meta_disallow_follow)
|
||||||
{
|
{
|
||||||
@ -363,7 +397,8 @@ retrieve_tree (const char *start_url)
|
|||||||
{
|
{
|
||||||
url_enqueue (queue, xstrdup (child->url->url),
|
url_enqueue (queue, xstrdup (child->url->url),
|
||||||
xstrdup (referer_url), depth + 1,
|
xstrdup (referer_url), depth + 1,
|
||||||
child->link_expect_html);
|
child->link_expect_html,
|
||||||
|
child->link_expect_css);
|
||||||
/* We blacklist the URL we have enqueued, because we
|
/* We blacklist the URL we have enqueued, because we
|
||||||
don't want to enqueue (and hence download) the
|
don't want to enqueue (and hence download) the
|
||||||
same URL twice. */
|
same URL twice. */
|
||||||
@ -412,9 +447,9 @@ retrieve_tree (const char *start_url)
|
|||||||
{
|
{
|
||||||
char *d1, *d2;
|
char *d1, *d2;
|
||||||
int d3;
|
int d3;
|
||||||
bool d4;
|
bool d4, d5;
|
||||||
while (url_dequeue (queue,
|
while (url_dequeue (queue,
|
||||||
(const char **)&d1, (const char **)&d2, &d3, &d4))
|
(const char **)&d1, (const char **)&d2, &d3, &d4, &d5))
|
||||||
{
|
{
|
||||||
xfree (d1);
|
xfree (d1);
|
||||||
xfree_null (d2);
|
xfree_null (d2);
|
||||||
|
@ -44,9 +44,4 @@ struct urlpos;
|
|||||||
void recursive_cleanup (void);
|
void recursive_cleanup (void);
|
||||||
uerr_t retrieve_tree (const char *);
|
uerr_t retrieve_tree (const char *);
|
||||||
|
|
||||||
/* These are really in html-url.c. */
|
|
||||||
struct urlpos *get_urls_file (const char *);
|
|
||||||
struct urlpos *get_urls_html (const char *, const char *, bool *);
|
|
||||||
void free_urlpos (struct urlpos *);
|
|
||||||
|
|
||||||
#endif /* RECUR_H */
|
#endif /* RECUR_H */
|
||||||
|
29
src/retr.c
29
src/retr.c
@ -52,6 +52,7 @@ as that of the covered work. */
|
|||||||
#include "convert.h"
|
#include "convert.h"
|
||||||
#include "ptimer.h"
|
#include "ptimer.h"
|
||||||
#include "iri.h"
|
#include "iri.h"
|
||||||
|
#include "html-url.h"
|
||||||
|
|
||||||
/* Total size of downloaded files. Used to enforce quota. */
|
/* Total size of downloaded files. Used to enforce quota. */
|
||||||
SUM_SIZE_INT total_downloaded_bytes;
|
SUM_SIZE_INT total_downloaded_bytes;
|
||||||
@ -795,6 +796,16 @@ retrieve_url (const char *origurl, char **file, char **newloc,
|
|||||||
register_redirection (origurl, u->url);
|
register_redirection (origurl, u->url);
|
||||||
if (*dt & TEXTHTML)
|
if (*dt & TEXTHTML)
|
||||||
register_html (u->url, local_file);
|
register_html (u->url, local_file);
|
||||||
|
if (*dt & RETROKF)
|
||||||
|
{
|
||||||
|
register_download (u->url, local_file);
|
||||||
|
if (redirection_count && 0 != strcmp (origurl, u->url))
|
||||||
|
register_redirection (origurl, u->url);
|
||||||
|
if (*dt & TEXTHTML)
|
||||||
|
register_html (u->url, local_file);
|
||||||
|
if (*dt & TEXTCSS)
|
||||||
|
register_css (u->url, local_file);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if (file)
|
if (file)
|
||||||
@ -835,10 +846,24 @@ retrieve_from_file (const char *file, bool html, int *count)
|
|||||||
uerr_t status;
|
uerr_t status;
|
||||||
struct urlpos *url_list, *cur_url;
|
struct urlpos *url_list, *cur_url;
|
||||||
|
|
||||||
url_list = (html ? get_urls_html (file, NULL, NULL)
|
char *input_file = NULL;
|
||||||
: get_urls_file (file));
|
const char *url = file;
|
||||||
|
|
||||||
status = RETROK; /* Suppose everything is OK. */
|
status = RETROK; /* Suppose everything is OK. */
|
||||||
*count = 0; /* Reset the URL count. */
|
*count = 0; /* Reset the URL count. */
|
||||||
|
|
||||||
|
if (url_has_scheme (url))
|
||||||
|
{
|
||||||
|
uerr_t status;
|
||||||
|
status = retrieve_url (url, &input_file, NULL, NULL, NULL, false);
|
||||||
|
if (status != RETROK)
|
||||||
|
return status;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
input_file = (char *) file;
|
||||||
|
|
||||||
|
url_list = (html ? get_urls_html (input_file, NULL, NULL)
|
||||||
|
: get_urls_file (input_file));
|
||||||
|
|
||||||
for (cur_url = url_list; cur_url; cur_url = cur_url->next, ++*count)
|
for (cur_url = url_list; cur_url; cur_url = cur_url->next, ++*count)
|
||||||
{
|
{
|
||||||
|
@ -317,7 +317,8 @@ enum
|
|||||||
HEAD_ONLY = 0x0004, /* only send the HEAD request */
|
HEAD_ONLY = 0x0004, /* only send the HEAD request */
|
||||||
SEND_NOCACHE = 0x0008, /* send Pragma: no-cache directive */
|
SEND_NOCACHE = 0x0008, /* send Pragma: no-cache directive */
|
||||||
ACCEPTRANGES = 0x0010, /* Accept-ranges header was found */
|
ACCEPTRANGES = 0x0010, /* Accept-ranges header was found */
|
||||||
ADDED_HTML_EXTENSION = 0x0020 /* added ".html" extension due to -E */
|
ADDED_HTML_EXTENSION = 0x0020, /* added ".html" extension due to -E */
|
||||||
|
TEXTCSS = 0x0040 /* document is of type text/css */
|
||||||
};
|
};
|
||||||
|
|
||||||
/* Universal error type -- used almost everywhere. Error reporting of
|
/* Universal error type -- used almost everywhere. Error reporting of
|
||||||
|
223
ylwrap
Executable file
223
ylwrap
Executable file
@ -0,0 +1,223 @@
|
|||||||
|
#! /bin/sh
|
||||||
|
# ylwrap - wrapper for lex/yacc invocations.
|
||||||
|
|
||||||
|
scriptversion=2005-05-14.22
|
||||||
|
|
||||||
|
# Copyright (C) 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005
|
||||||
|
# Free Software Foundation, Inc.
|
||||||
|
#
|
||||||
|
# Written by Tom Tromey <tromey@cygnus.com>.
|
||||||
|
#
|
||||||
|
# This program is free software; you can redistribute it and/or modify
|
||||||
|
# it under the terms of the GNU General Public License as published by
|
||||||
|
# the Free Software Foundation; either version 2, or (at your option)
|
||||||
|
# any later version.
|
||||||
|
#
|
||||||
|
# This program is distributed in the hope that it will be useful,
|
||||||
|
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
# GNU General Public License for more details.
|
||||||
|
#
|
||||||
|
# You should have received a copy of the GNU General Public License
|
||||||
|
# along with this program; if not, write to the Free Software
|
||||||
|
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
|
||||||
|
# 02110-1301, USA.
|
||||||
|
|
||||||
|
# As a special exception to the GNU General Public License, if you
|
||||||
|
# distribute this file as part of a program that contains a
|
||||||
|
# configuration script generated by Autoconf, you may include it under
|
||||||
|
# the same distribution terms that you use for the rest of that program.
|
||||||
|
|
||||||
|
# This file is maintained in Automake, please report
|
||||||
|
# bugs to <bug-automake@gnu.org> or send patches to
|
||||||
|
# <automake-patches@gnu.org>.
|
||||||
|
|
||||||
|
case "$1" in
|
||||||
|
'')
|
||||||
|
echo "$0: No files given. Try \`$0 --help' for more information." 1>&2
|
||||||
|
exit 1
|
||||||
|
;;
|
||||||
|
--basedir)
|
||||||
|
basedir=$2
|
||||||
|
shift 2
|
||||||
|
;;
|
||||||
|
-h|--h*)
|
||||||
|
cat <<\EOF
|
||||||
|
Usage: ylwrap [--help|--version] INPUT [OUTPUT DESIRED]... -- PROGRAM [ARGS]...
|
||||||
|
|
||||||
|
Wrapper for lex/yacc invocations, renaming files as desired.
|
||||||
|
|
||||||
|
INPUT is the input file
|
||||||
|
OUTPUT is one file PROG generates
|
||||||
|
DESIRED is the file we actually want instead of OUTPUT
|
||||||
|
PROGRAM is program to run
|
||||||
|
ARGS are passed to PROG
|
||||||
|
|
||||||
|
Any number of OUTPUT,DESIRED pairs may be used.
|
||||||
|
|
||||||
|
Report bugs to <bug-automake@gnu.org>.
|
||||||
|
EOF
|
||||||
|
exit $?
|
||||||
|
;;
|
||||||
|
-v|--v*)
|
||||||
|
echo "ylwrap $scriptversion"
|
||||||
|
exit $?
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
|
||||||
|
# The input.
|
||||||
|
input="$1"
|
||||||
|
shift
|
||||||
|
case "$input" in
|
||||||
|
[\\/]* | ?:[\\/]*)
|
||||||
|
# Absolute path; do nothing.
|
||||||
|
;;
|
||||||
|
*)
|
||||||
|
# Relative path. Make it absolute.
|
||||||
|
input="`pwd`/$input"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
pairlist=
|
||||||
|
while test "$#" -ne 0; do
|
||||||
|
if test "$1" = "--"; then
|
||||||
|
shift
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
pairlist="$pairlist $1"
|
||||||
|
shift
|
||||||
|
done
|
||||||
|
|
||||||
|
# The program to run.
|
||||||
|
prog="$1"
|
||||||
|
shift
|
||||||
|
# Make any relative path in $prog absolute.
|
||||||
|
case "$prog" in
|
||||||
|
[\\/]* | ?:[\\/]*) ;;
|
||||||
|
*[\\/]*) prog="`pwd`/$prog" ;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# FIXME: add hostname here for parallel makes that run commands on
|
||||||
|
# other machines. But that might take us over the 14-char limit.
|
||||||
|
dirname=ylwrap$$
|
||||||
|
trap "cd `pwd`; rm -rf $dirname > /dev/null 2>&1" 1 2 3 15
|
||||||
|
mkdir $dirname || exit 1
|
||||||
|
|
||||||
|
cd $dirname
|
||||||
|
|
||||||
|
case $# in
|
||||||
|
0) $prog "$input" ;;
|
||||||
|
*) $prog "$@" "$input" ;;
|
||||||
|
esac
|
||||||
|
ret=$?
|
||||||
|
|
||||||
|
if test $ret -eq 0; then
|
||||||
|
set X $pairlist
|
||||||
|
shift
|
||||||
|
first=yes
|
||||||
|
# Since DOS filename conventions don't allow two dots,
|
||||||
|
# the DOS version of Bison writes out y_tab.c instead of y.tab.c
|
||||||
|
# and y_tab.h instead of y.tab.h. Test to see if this is the case.
|
||||||
|
y_tab_nodot="no"
|
||||||
|
if test -f y_tab.c || test -f y_tab.h; then
|
||||||
|
y_tab_nodot="yes"
|
||||||
|
fi
|
||||||
|
|
||||||
|
# The directory holding the input.
|
||||||
|
input_dir=`echo "$input" | sed -e 's,\([\\/]\)[^\\/]*$,\1,'`
|
||||||
|
# Quote $INPUT_DIR so we can use it in a regexp.
|
||||||
|
# FIXME: really we should care about more than `.' and `\'.
|
||||||
|
input_rx=`echo "$input_dir" | sed 's,\\\\,\\\\\\\\,g;s,\\.,\\\\.,g'`
|
||||||
|
|
||||||
|
while test "$#" -ne 0; do
|
||||||
|
from="$1"
|
||||||
|
# Handle y_tab.c and y_tab.h output by DOS
|
||||||
|
if test $y_tab_nodot = "yes"; then
|
||||||
|
if test $from = "y.tab.c"; then
|
||||||
|
from="y_tab.c"
|
||||||
|
else
|
||||||
|
if test $from = "y.tab.h"; then
|
||||||
|
from="y_tab.h"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
if test -f "$from"; then
|
||||||
|
# If $2 is an absolute path name, then just use that,
|
||||||
|
# otherwise prepend `../'.
|
||||||
|
case "$2" in
|
||||||
|
[\\/]* | ?:[\\/]*) target="$2";;
|
||||||
|
*) target="../$2";;
|
||||||
|
esac
|
||||||
|
|
||||||
|
# We do not want to overwrite a header file if it hasn't
|
||||||
|
# changed. This avoid useless recompilations. However the
|
||||||
|
# parser itself (the first file) should always be updated,
|
||||||
|
# because it is the destination of the .y.c rule in the
|
||||||
|
# Makefile. Divert the output of all other files to a temporary
|
||||||
|
# file so we can compare them to existing versions.
|
||||||
|
if test $first = no; then
|
||||||
|
realtarget="$target"
|
||||||
|
target="tmp-`echo $target | sed s/.*[\\/]//g`"
|
||||||
|
fi
|
||||||
|
# Edit out `#line' or `#' directives.
|
||||||
|
#
|
||||||
|
# We don't want the resulting debug information to point at
|
||||||
|
# an absolute srcdir; it is better for it to just mention the
|
||||||
|
# .y file with no path.
|
||||||
|
#
|
||||||
|
# We want to use the real output file name, not yy.lex.c for
|
||||||
|
# instance.
|
||||||
|
#
|
||||||
|
# We want the include guards to be adjusted too.
|
||||||
|
FROM=`echo "$from" | sed \
|
||||||
|
-e 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/'\
|
||||||
|
-e 's/[^ABCDEFGHIJKLMNOPQRSTUVWXYZ]/_/g'`
|
||||||
|
TARGET=`echo "$2" | sed \
|
||||||
|
-e 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/'\
|
||||||
|
-e 's/[^ABCDEFGHIJKLMNOPQRSTUVWXYZ]/_/g'`
|
||||||
|
|
||||||
|
sed -e "/^#/!b" -e "s,$input_rx,," -e "s,$from,$2," \
|
||||||
|
-e "s,$FROM,$TARGET," "$from" >"$target" || ret=$?
|
||||||
|
|
||||||
|
# Check whether header files must be updated.
|
||||||
|
if test $first = no; then
|
||||||
|
if test -f "$realtarget" && cmp -s "$realtarget" "$target"; then
|
||||||
|
echo "$2" is unchanged
|
||||||
|
rm -f "$target"
|
||||||
|
else
|
||||||
|
echo updating "$2"
|
||||||
|
mv -f "$target" "$realtarget"
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
# A missing file is only an error for the first file. This
|
||||||
|
# is a blatant hack to let us support using "yacc -d". If -d
|
||||||
|
# is not specified, we don't want an error when the header
|
||||||
|
# file is "missing".
|
||||||
|
if test $first = yes; then
|
||||||
|
ret=1
|
||||||
|
fi
|
||||||
|
fi
|
||||||
|
shift
|
||||||
|
shift
|
||||||
|
first=no
|
||||||
|
done
|
||||||
|
else
|
||||||
|
ret=$?
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Remove the directory.
|
||||||
|
cd ..
|
||||||
|
rm -rf $dirname
|
||||||
|
|
||||||
|
exit $ret
|
||||||
|
|
||||||
|
# Local Variables:
|
||||||
|
# mode: shell-script
|
||||||
|
# sh-indentation: 2
|
||||||
|
# eval: (add-hook 'write-file-hooks 'time-stamp)
|
||||||
|
# time-stamp-start: "scriptversion="
|
||||||
|
# time-stamp-format: "%:y-%02m-%02d.%02H"
|
||||||
|
# time-stamp-end: "$"
|
||||||
|
# End:
|
Loading…
Reference in New Issue
Block a user