http://www.w3.org/TR/PR-xml-&iso6.doc.date;
http://www.w3.org/TR/WD-xml-961114
http://www.w3.org/TR/WD-xml-lang-970331
http://www.w3.org/TR/WD-xml-lang-970630
http://www.w3.org/TR/WD-xml-970807
http://www.w3.org/TR/WD-xml-971117Tim BrayTextuality and Netscapetbray@textuality.comJean PaoliMicrosoftjeanpa@microsoft.comC. M. Sperberg-McQueenUniversity of Illinois at Chicagocmsmcq@uic.edu
この&TR-or-Rec;は, 1997年12月にWorld Wide Web Consortiumから
公表された勧告案Extensible Markup Language version第1.0版を翻訳し, 技
術的内容を変更することなく作成した&TR-or-Rec;である。This &eTR-or-Rec;
is a translation of the XML proposed recommendation 1.0
published by the World Wide Web Consortium in December 1997. It is
intended that &eTR-or-Rec; is technically identical to the original.
原文にある、著作権に関しての記述を次に示す。The
original copyright notice is shown below:
この版のXMLの規定は,公開レビュー及び議論を
目的とする。テキスト及び法律上の注意を改変しない限り,自由に
配布してもよい。This version of the XML specification is for
public review and discussion. It may be distributed freely,
as long as all text and legal notices remain intact.
この&TR-or-Rec;の元となったXML勧告案は,1998年2月にWorld
Wide Web Consortiumから公表されたXML勧告によってすでに置き換
えられている。この標準情報は,XML勧告に従って訂正することを
予定している。The XML Proposed Recommendation is superseded
by the XML Recommendation which was published by the World
Wide Web Consortium in February 1998. It is intended that
this &eTR-or-Rec; be revised accordingly in the near future.
この&TR-or-Rec;は,安定したものであって,昨年来のXML活動を通じて作成された,一連の作
業草案を元とする。現在,広範囲に使用されている国際的なテキスト処理の標
準(標準一般化&markup;言語,Standard Generalized Markup Language, ISO
8879:1986に追加及び訂正を加えたもの)の,WWW上での使用のために⊂
化した言語を,この&TR-or-Rec;は,規定する。ISO 8879のどの機能をこの
⊂に残すか,という決定についての詳細は,別途用意する。XMLは,
既にいくつかの商品でサポートされ,XMLをサポートするフリーウェアの数も増えて
いる。XMLに関する公開の論議も,オンラインで入手できる。It is a
stable document derived from a series of working drafts produced over
the last year as deliverables of the XML activity. It specifies a
language created by subsetting an existing, widely used international
text processing standard (Standard Generalized Markup Language, ISO
8879:1986 as amended and corrected) for use on the World Wide Web.
Details of the decisions regarding which features of ISO 8879 to
retain in the subset are available
separately. XML is already supported by some commercial
products, and there are a growing number of free implementations.
Public discussions of XML are accessible
online.
この&TR-or-Rec;では,に定義する
URI(Uniform Resource Identifier)を使用する。URIの制定作業は進行中であっ
て,及びを更新する予定と
なっている。この作業がRFCとして受け入れられない場合は,この規程内のURI
への参照は,URL(Uniform Resource Locator)への参照に代わる。This
specification uses the term URI, which is defined by , a work in progress expected to update and . Should the work not be
accepted as an RFC, the references to uniform resource identifiers
(URIs) in this specification will become references to uniform
resource locators (URLs).
XMLの仕様に準拠しているかどうかの基準となるはW3Cのサイトにあ
る原文である。The normative version of the specification is
the English version found at the W3C site.
この標準情報は原仕様と技術的に同一であることを意図しているが、
翻訳上の誤りはあり得る。Although this technical report is
intended to be technically identical to the original, it may
contain errors from the translation.
Chicago, Vancouver, Mountain View, et al.:
World-Wide Web Consortium, XML作業グループ, 1996, 1997.
Created in electronic form.
EnglishExtended Backus-Naur Form (formal grammar)1997-12-03 : CMSMcQ : yet further changes1997-12-02 : TB : further changes (see TB to XML WG,
2 December 1997)1997-12-02 : CMSMcQ : deal with as many corrections and
comments from the proofreaders as possible:
entify hard-coded document date in pubdate element,
change expansion of entity WebSGML,
update status description as per Dan Connolly (am not sure
about refernece to Berners-Lee et al.),
add 'The' to abstract as per WG decision,
move Relationship to Existing Standards to back matter and
combine with References,
re-order back matter so normative appendices come first,
re-tag back matter so informative appendices are tagged informdiv1,
remove XXX XXX from list of 'normative' specs in prose,
move some references from Other References to Normative References,
add RFC 1738, 1808, and 2141 to Other References (they are not
normative since we do not require the processor to enforce any
rules based on them),
add reference to 'Fielding draft' (Berners-Lee et al.),
move notation section to end of body,
drop URIchar non-terminal and use SkipLit instead,
lose stray reference to defunct nonterminal 'markupdecls',
move reference to Aho et al. into appendix (Tim's right),
add prose note saying that hash marks and fragment identifiers are
NOT part of the URI formally speaking, and are NOT legal in
system identifiers (processor 'may' signal an error).
Work through:
Tim Bray reacting to James Clark,
Tim Bray on his own,
Eve Maler,
NOT DONE YET:
change binary / text to unparsed / parsed.
handle James's suggestion about < in attriubte values
uppercase hex characters,
namechar list,
1997-12-01 : JB : add some column-width parameters1997-12-01 : CMSMcQ : begin round of changes to incorporate
recent WG decisions and other corrections:
binding sources of character encoding info (27 Aug / 3 Sept),
correct wording of Faust quotation (restore dropped line),
drop SDD from EncodingDecl,
change text at version number 1.0,
drop misleading (wrong!) sentence about ignorables and extenders,
modify definition of PCData to make bar on msc grammatical,
change grammar's handling of internal subset (drop non-terminal markupdecls),
change definition of includeSect to allow conditional sections,
add integral-declaration constraint on internal subset,
drop misleading / dangerous sentence about relationship of
entities with system storage objects,
change table body tag to htbody as per EM change to DTD,
add rule about space normalization in public identifiers,
add description of how to generate our name-space rules from
Unicode character database (needs further work!).
1997-10-08 : TB : Removed %-constructs again, new rules
for PE appearance.1997-10-01 : TB : Case-sensitive markup; cleaned up
element-type defs, lotsa little edits for style1997-09-25 : TB : Change to elm's new DTD, with
substantial detail cleanup as a side-effect1997-07-24 : CMSMcQ : correct error (lost *) in definition
of ignoreSectContents (thanks to Makoto Murata)Allow all empty elements to have end-tags, consistent with
SGML TC (as per JJC).1997-07-23 : CMSMcQ : pre-emptive strike on pending corrections:
introduce the term 'empty-element tag', note that all empty elements
may use it, and elements declared EMPTY must use it.
Add WFC requiring encoding decl to come first in an entity.
Redefine notations to point to PIs as well as binary entities.
Change autodetection table by removing bytes 3 and 4 from
examples with Byte Order Mark.
Add content model as a term and clarify that it applies to both
mixed and element content.
1997-06-30 : CMSMcQ : change date, some cosmetic changes,
changes to productions for choice, seq, Mixed, NotationType,
Enumeration. Follow James Clark's suggestion and prohibit
conditional sections in internal subset. TO DO: simplify
production for ignored sections as a result, since we don't
need to worry about parsers which don't expand PErefs finding
a conditional section.1997-06-29 : TB : various edits1997-06-29 : CMSMcQ : further changes:
Suppress old FINAL EDIT comments and some dead material.
Revise occurrences of % in grammar to exploit Henry Thompson's pun,
especially markupdecl and attdef.
Remove RMD requirement relating to element content (?).
1997-06-28 : CMSMcQ : Various changes for 1 July draft:
Add text for draconian error handling (introduce
the term Fatal Error).
RE deleta est (changing wording from
original announcement to restrict the requirement to validating
parsers).
Tag definition of validating processor and link to it.
Add colon as name character.
Change def of %operator.
Change standard definitions of lt, gt, amp.
Strip leading zeros from #x00nn forms.1997-04-02 : CMSMcQ : final corrections of editorial errors
found in last night's proofreading. Reverse course once more on
well-formed: Webster's Second hyphenates it, and that's enough
for me.1997-04-01 : CMSMcQ : corrections from JJC, EM, HT, and self1997-03-31 : Tim Bray : many changes1997-03-29 : CMSMcQ : some Henry Thompson (on entity handling),
some Charles Goldfarb, some ERB decisions (PE handling in miscellaneous
declarations. Changed Ident element to accept def attribute.
Allow normalization of Unicode characters. move def of systemliteral
into section on literals.1997-03-28 : CMSMcQ : make as many corrections as possible, from
Terry Allen, Norbert Mikula, James Clark, Jon Bosak, Henry Thompson,
Paul Grosso, and self. Among other things: give in on "well formed"
(Terry is right), tentatively rename QuotedCData as AttValue
and Literal as EntityValue to be more informative, since attribute
values are the only place QuotedCData was used, and
vice versa for entity text and Literal. (I'd call it Entity Text,
but 8879 uses that name for both internal and external entities.)1997-03-26 : CMSMcQ : resynch the two forks of this draft, reapply
my changes dated 03-20 and 03-21. Normalize old 'may not' to 'must not'
except in the one case where it meant 'may or may not'.1997-03-21 : TB : massive changes on plane flight from Chicago
to Vancouver1997-03-21 : CMSMcQ : correct as many reported errors as possible.
1997-03-20 : CMSMcQ : correct typos listed in CMSMcQ hand copy of spec.1997-03-20 : CMSMcQ : cosmetic changes preparatory to revision for
WWW conference April 1997: restore some of the internal entity
references (e.g. to docdate, etc.), change character xA0 to
and define nbsp as  , and refill a lot of paragraphs for
legibility.1996-11-12 : CMSMcQ : revise using Tim's edits:
Add list type of NUMBERED and change most lists either to
BULLETS or to NUMBERED.
Suppress QuotedNames, Names (not used).
Correct trivial-grammar doc type decl.
Rename 'marked section' as 'CDATA section' passim.
Also edits from James Clark:
Define the set of characters from which [^abc] subtracts.
Charref should use just [0-9] not Digit.
Location info needs cleaner treatment: remove? (ERB
question).
One example of a PI has wrong pic.
Clarify discussion of encoding names.
Encoding failure should lead to unspecified results; don't
prescribe error recovery.
Don't require exposure of entity boundaries.
Ignore white space in element content.
Reserve entity names of the form u-NNNN.
Clarify relative URLs.
And some of my own:
Correct productions for content model: model cannot
consist of a name, so "elements ::= cp" is no good.
1996-11-11 : CMSMcQ : revise for style.
Add new rhs to entity declaration, for parameter entities.1996-11-10 : CMSMcQ : revise for style.
Fix / complete section on names, characters.
Add sections on parameter entities, conditional sections.
Still to do: Add compatibility note on deterministic content models.
Finish stylistic revision.1996-10-31 : TB : Add Entity Handling section1996-10-30 : TB : Clean up term & termdef. Slip in
ERB decision re EMPTY.1996-10-28 : TB : Change DTD. Implement some of Michael's
suggestions. Change comments back to //. Introduce language for
XML namespace reservation. Add section on white-space handling.
Lots more cleanup.1996-10-24 : CMSMcQ : quick tweaks, implement some ERB
decisions. Characters are not integers. Comments are /* */ not //.
Add bibliographic refs to 10646, HyTime, Unicode.
Rename old Cdata as MsData since it's only seen
in marked sections. Call them attribute-value pairs not
name-value pairs, except once. Internal subset is optional, needs
'?'. Implied attributes should be signaled to the app, not
have values supplied by processor.1996-10-16 : TB : track down & excise all DSD references;
introduce some EBNF for entity declarations.1996-10-?? : TB : consistency check, fix up scraps so
they all parse, get formatter working, correct a few productions.1996-10-10/11 : CMSMcQ : various maintenance, stylistic, and
organizational changes:
Replace a few literals with xmlpio and
pic entities, to make them consistent and ensure we can change pic
reliably when the ERB votes.
Drop paragraph on recognizers from notation section.
Add match, exact match to terminology.
Move old 2.2 XML Processors and Apps into intro.
Mention comments, PIs, and marked sections in discussion of
delimiter escaping.
Streamline discussion of doctype decl syntax.
Drop old section of 'PI syntax' for doctype decl, and add
section on partial-DTD summary PIs to end of Logical Structures
section.
Revise DSD syntax section to use Tim's subset-in-a-PI
mechanism.1996-10-10 : TB : eliminate name recognizers (and more?)1996-10-09 : CMSMcQ : revise for style, consistency through 2.3
(Characters)1996-10-09 : CMSMcQ : re-unite everything for convenience,
at least temporarily, and revise quickly1996-10-08 : TB : first major homogenization pass1996-10-08 : TB : turn "current" attribute on div type into
CDATA1996-10-02 : TB : remould into skeleton + entities1996-09-30 : CMSMcQ : add a few more sections prior to exchange
with Tim.1996-09-20 : CMSMcQ : finish transcribing notes.1996-09-19 : CMSMcQ : begin transcribing notes for draft.1996-09-13 : CMSMcQ : made outline from notes of 09-06,
do some housekeeping
一般事項
属性リスト宣言の例を,次に示す。
<!ATTLIST termdef
id ID #REQUIRED
name CDATA #IMPLIED>
<!ATTLIST list
type (bullets|ordered|glossary) "ordered">
<!ATTLIST form
method CDATA #FIXED "POST">
外部⊂又は外部パラメタ実体をもっていて,"standalone='no'"をもつ文書において,実体参照で用いる Name は,その実体の宣言で与える名前と&match;しなければならない。相互運用性のため,&valid;な文書はあらかじめ定義した実体の規定で指定した書式によって,実体 &magicents;を宣言することが望ましい。パラメタ実体の場合は,宣言は,参照に先行しなければならない。同様に,一般実体の場合は,属性リスト宣言の&default-value;内での参照よりも先に,宣言が現れなければならない。
キーワード SYSTEM の後の SystemLiteral を,実体のシステム&identifier;と呼ぶ。これはURIとし,その実体の内容を取り出すのに用いてもよい。URIと共に使うことの多いハッシュ("#")及びフラグメント&identifier;は,正式には,URI自体の一部とはしない。フラグメント&identifier;が,システム&identifier;の部分として与えられている場合,XML&processor;は,&error;を出してもよい。この&TR-or-Rec;の範囲外の情報(例えば,ある特定のDTDの特別なXML要素又は特定の&application;の仕様によって定義された処理命令)によって上書きされない限り,相対的なURIは,その実体の位置,すなわち,その実体の宣言があるファイルに相対的とする。したがって,DTDの内部⊂にある実体宣言での相対的なURIは,文書の位置について相対的とする。外部⊂にある実体宣言での相対的なURIは,その外部⊂を含むファイルの位置に相対的とする。
外部実体宣言の例を,次に示す。
<!ENTITY open-hatch
SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml">
<!ENTITY open-hatch
PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN"
"http://www.textuality.com/boilerplate/OpenHatch.xml">
<!ENTITY hatch-pic
SYSTEM "../grafix/OpenHatch.gif"
NDATA gif >
参考文献
&normative;参考文献
IETF (Internet Engineering Task Force).
RFC 1766: Tags for the Identification of Languages,
ed. H. Alvestrand.
1995.
(International Organization for Standardization).
ISO 8879:1988 (E).
Code for the representation of names of languages.
[Geneva]: International Organization for
Standardization, 1988.
(International Organization for Standardization).
ISO 3166-1:1997 (E).
Codes for the representation of names of countries and their subdivisions
— Part 1: Country codes
[Geneva]: International Organization for
Standardization, 1997.ISO
(International Organization for Standardization).
ISO/IEC 10646-1993 (E). Information technology — Universal
Multiple-Octet Coded Character Set (UCS) — Part 1:
Architecture and Basic Multilingual Plane.
[Geneva]: International Organization for
Standardization, 1993 (plus amendments AM 1 through AM 7).
The Unicode Consortium.
The Unicode Standard, Version 2.0.
Reading, Mass.: Addison-Wesley Developers Press, 1996.他の参考文献
Aho, Alfred V.,
Ravi Sethi, and Jeffrey D. Ullman.
Compilers: Principles, Techniques, and Tools.
Reading: Addison-Wesley, 1986, rpt. corr. 1988.
Berners-Lee, T., R. Fielding, and L. Masinter.
Uniform Resource Identifiers (URI): Generic Syntax and
Semantics.
1997.
(Work in progress; see updates to RFC1738.)Brüggemann-Klein, Anne.
Regular Expressions into Finite Automata.
Extended abstract in I. Simon, Hrsg., LATIN 1992,
S. 97-98. Springer-Verlag, Berlin 1992.
Full Version in Theoretical Computer Science 120: 197-213, 1993.
Brüggemann-Klein, Anne,
and Derick Wood.
Deterministic Regular Languages.
Universität Freiburg, Institut für Informatik,
Bericht 38, Oktober 1991.
IETF (Internet Engineering Task Force).
RFC 1738: Uniform Resource Locators (URL),
ed. T. Berners-Lee, L. Masinter, M. McCahill.
1994.
IETF (Internet Engineering Task Force).
RFC 1808: Relative Uniform Resource Locators,
ed. R. Fielding.
1995.
IETF (Internet Engineering Task Force).
RFC 2141: URN Syntax,
ed. R. Moats.
1997.
ISO
(International Organization for Standardization).
ISO/IEC 8879-1986 (E). Information processing — Text and Office
Systems — Standard Generalized Markup Language (SGML). First
edition — 1986-10-15. [Geneva]: International Organization for
Standardization, 1986.
ISO
(International Organization for Standardization).
ISO/IEC 10744-1992 (E). Information technology —
Hypermedia/Time-based Structuring Language (HyTime).
[Geneva]: International Organization for
Standardization, 1992.
Extended Facilities Annexe.
[Geneva]: International Organization for
Standardization, 1996.
文字クラス
DTDが,次の宣言を含む場合を考える。
An ampersand (&) may be escaped
numerically (&#38;) or with a general entity
(&).
" >
]]>
XML&processor;は,実体の宣言を構文解析した時点で文字参照を認識し,これを解決する。実体"example"の値として,次の&string;を保存する。
An ampersand (&) may be escaped
numerically (&) or with a general entity
(&).
]]>
文書内で"&example;"を参照すると,このテキストは,再び構文解析される。このとき,要素"p"の開始タグ及び終了タグを認識し,三つの参照を認識し展開する。その結果,要素"p"は,次の内容をもつ(すべてデータとし,区切り子又は&markup;は存在しない。)。
規則及びその効果をより詳細に示すため,さらに複雑な例を示す。次の例で,行番号は,参照の便宜のためだけに付ける。
2
4
5 ' >
6 %xx;
7 ]>
8 This sample shows a &tricky; method.
]]>
これを処理すると,次のとおりとなる。
a) 4行目で,37番目の文字への参照を直ちに展開し,パラメタ実体"xx"を,シンボルテーブルに"%zz;"という値とともに保存する。&replacement-text;を再び走査することはないので,パラメタ実体"zz"への参照は認識しない("zz"は,まだ宣言されていないので,走査されれば,&error;となる。)。
b) 5行目で,文字参照"<"を直ちに展開し,パラメタ実体"zz"を"<!ENTITY tricky "error-prone" >"という&replacement-text;とともに保存する。これは,&well-formed;の実体宣言とする。
c) 6行目で,"xx"への参照を認識し,"xx"の&replacement-text;(すなわち,"%zz;")を構文解析する。"zz"への参照を続いて認識し,&replacement-text;("<!ENTITY tricky "error-prone" >")を構文解析する。一般実体"tricky"は,この時点では,宣言されており,その&replacement-text;は,"error-prone"とする。
d) 8行目で,一般実体"tricky"への参照を認識し,展開する。要素"test"の完全な内容は,次の(内容をそれ自体表現する。)&string;となる。つまり,This sample shows a error-prone method.