From d45325589df2f4f97eae872fcf9f46c3dc38abfa Mon Sep 17 00:00:00 2001 From: Daniel Veillard Date: Tue, 25 Nov 2003 18:29:55 +0000 Subject: [PATCH] fixed #127877, never output " in element content this changes the * entities.c: fixed #127877, never output " in element content * result/isolat3 result/slashdot16.xml result/noent/isolat3 result/noent/slashdot16.xml result/valid/REC-xml-19980210.xml result/valid/index.xml result/valid/xlink.xml: this changes the output of a few tests Daniel --- ChangeLog | 8 + entities.c | 2 + result/isolat3 | 2 +- result/noent/isolat3 | 2 +- result/noent/slashdot16.xml | Bin 10414 -> 10374 bytes result/slashdot16.xml | Bin 10414 -> 10374 bytes result/valid/REC-xml-19980210.xml | 252 +++++++++++++++--------------- result/valid/index.xml | 2 +- result/valid/xlink.xml | 58 +++---- 9 files changed, 168 insertions(+), 158 deletions(-) diff --git a/ChangeLog b/ChangeLog index afea5216..3442c3f6 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,11 @@ +Tue Nov 25 18:39:44 CET 2003 Daniel Veillard + + * entities.c: fixed #127877, never output " in element content + * result/isolat3 result/slashdot16.xml result/noent/isolat3 + result/noent/slashdot16.xml result/valid/REC-xml-19980210.xml + result/valid/index.xml result/valid/xlink.xml: this changes the + output of a few tests + Tue Nov 25 16:36:21 CET 2003 Daniel Veillard * include/libxml/schemasInternals.h include/libxml/xmlerror.h diff --git a/entities.c b/entities.c index fb582054..0b0df65c 100644 --- a/entities.c +++ b/entities.c @@ -625,6 +625,7 @@ xmlEncodeSpecialChars(xmlDocPtr doc ATTRIBUTE_UNUSED, const xmlChar *input) { *out++ = 'm'; *out++ = 'p'; *out++ = ';'; +#if 0 } else if (*cur == '"') { *out++ = '&'; *out++ = 'q'; @@ -632,6 +633,7 @@ xmlEncodeSpecialChars(xmlDocPtr doc ATTRIBUTE_UNUSED, const xmlChar *input) { *out++ = 'o'; *out++ = 't'; *out++ = ';'; +#endif } else if (*cur == '\r') { *out++ = '&'; *out++ = '#'; diff --git a/result/isolat3 b/result/isolat3 index 9d5bb5bb..1abf7b42 100644 --- a/result/isolat3 +++ b/result/isolat3 @@ -4,7 +4,7 @@ ]]> -then the replacement text for the entity "book" is: +then the replacement text for the entity "book" is: La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights; diff --git a/result/noent/isolat3 b/result/noent/isolat3 index 9d5bb5bb..1abf7b42 100644 --- a/result/noent/isolat3 +++ b/result/noent/isolat3 @@ -4,7 +4,7 @@ ]]> -then the replacement text for the entity "book" is: +then the replacement text for the entity "book" is: La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights; diff --git a/result/noent/slashdot16.xml b/result/noent/slashdot16.xml index f4b168d686f418bfade53b4d86f8c41cbfad45fc..f6a7f2a589ac72d5c214b3dcba088a7dc23d3fac 100644 GIT binary patch delta 65 zcmZ1%*cP}Uh?8H5!Gj@@A( APyhe` delta 89 zcmZn*To APyhe` delta 89 zcmZn*To 1997-03-28 : CMSMcQ : make as many corrections as possible, from Terry Allen, Norbert Mikula, James Clark, Jon Bosak, Henry Thompson, -Paul Grosso, and self. Among other things: give in on "well formed" +Paul Grosso, and self. Among other things: give in on "well formed" (Terry is right), tentatively rename QuotedCData as AttValue and Literal as EntityValue to be more informative, since attribute values are the only place QuotedCData was used, and @@ -289,7 +289,7 @@ Reserve entity names of the form u-NNNN. Clarify relative URLs. And some of my own: Correct productions for content model: model cannot -consist of a name, so "elements ::= cp" is no good. +consist of a name, so "elements ::= cp" is no good. 1996-11-11 : CMSMcQ : revise for style. Add new rhs to entity declaration, for parameter entities. @@ -339,7 +339,7 @@ mechanism. 1996-10-09 : CMSMcQ : re-unite everything for convenience, at least temporarily, and revise quickly 1996-10-08 : TB : first major homogenization pass -1996-10-08 : TB : turn "current" attribute on div type into +1996-10-08 : TB : turn "current" attribute on div type into CDATA 1996-10-02 : TB : remould into skeleton + entities 1996-09-30 : CMSMcQ : add a few more sections prior to exchange @@ -550,7 +550,7 @@ constraints.

Each XML document has both a logical and a physical structure. Physically, the document is composed of units called entities. An entity may refer to other entities to cause their -inclusion in the document. A document begins in a "root" or document entity. +inclusion in the document. A document begins in a "root" or document entity. Logically, the document is composed of declarations, elements, comments, character references, and @@ -634,7 +634,7 @@ is an atomic unit of text as specified by ISO/IEC 10646 . Legal characters are tab, carriage return, line feed, and the legal graphic characters of Unicode and ISO/IEC 10646. -The use of "compatibility characters", as defined in section 6.8 +The use of "compatibility characters", as defined in section 6.8 of , is discouraged. @@ -689,7 +689,7 @@ are given in .

beginning with a letter or one of a few punctuation characters, and continuing with letters, digits, hyphens, underscores, colons, or full stops, together known as name characters. -Names beginning with the string "xml", or any string +Names beginning with the string "xml", or any string which would match (('X'|'x') ('M'|'m') ('L'|'l')), are reserved for standardization in this or future versions of this specification. @@ -743,39 +743,39 @@ can be parsed without scanning for markup. Literals EntityValue -'"' -([^%&"] +'"' +([^%&"] | PEReference | Reference)* -'"' +'"' |  -"'" +"'" ([^%&'] | PEReference | Reference)* -"'" +"'" AttValue -'"' -([^<&"] +'"' +([^<&"] | Reference)* -'"' +'"' |  -"'" +"'" ([^<&'] | Reference)* -"'" +"'" SystemLiteral -('"' [^"]* '"') | ("'" [^']* "'") +('"' [^"]* '"') | ("'" [^']* "'") PubidLiteral -'"' PubidChar* -'"' -| "'" (PubidChar - "'")* "'" +'"' PubidChar* +'"' +| "'" (PubidChar - "'")* "'" PubidChar #x20 | #xD | #xA @@ -822,15 +822,15 @@ If they are needed elsewhere, they must be escaped using either numeric character references or the strings -"&amp;" and "&lt;" respectively. +"&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented using the string -"&gt;", and must, for +"&gt;", and must, for compatibility, be escaped using -"&gt;" or a character reference +"&gt;" or a character reference when it appears in the string -"]]>" +"]]>" in content, when that string is not marking the end of a CDATA section. @@ -841,12 +841,12 @@ is any string of characters which does not contain the start-delimiter of any markup. In a CDATA section, character data is any string of characters not including the CDATA-section-close -delimiter, "]]>".

+delimiter, "]]>".

To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as -"&apos;", and the double-quote character (") as -"&quot;". +"&apos;", and the double-quote character (") as +"&quot;". Character Data @@ -870,7 +870,7 @@ data; an XML processor may, but need not, make it possible for an application to retrieve the text of comments. For compatibility, the string -"--" (double-hyphen) must not occur within +"--" (double-hyphen) must not occur within comments. Comments @@ -911,7 +911,7 @@ PIs are not part of the document's character data, but must be passed through to the application. The PI begins with a target (PITarget) used to identify the application to which the instruction is directed. -The target names "XML", "xml", and so on are +The target names "XML", "xml", and so on are reserved for standardization in this or future versions of this specification. The @@ -929,8 +929,8 @@ may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the -string "<![CDATA[" and end with the string -"]]>": +string "<![CDATA[" and end with the string +"]]>": CDATA Sections CDSect @@ -953,12 +953,12 @@ string "<![CDATA[" and end with the string Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using -"&lt;" and "&amp;". CDATA sections +"&lt;" and "&amp;". CDATA sections cannot nest.

-

An example of a CDATA section, in which "<greeting>" and -"</greeting>" +

An example of a CDATA section, in which "<greeting>" and +"</greeting>" are recognized as character data, not markup: <![CDATA[<greeting>Hello, world!</greeting>]]> @@ -983,13 +983,13 @@ and so is this: ]]>

-

The version number "1.0" should be used to indicate +

The version number "1.0" should be used to indicate conformance to this version of this specification; it is an error -for a document to use the value "1.0" +for a document to use the value "1.0" if it does not conform to this version of this specification. It is the intent of the XML working group to give later versions of this specification -numbers other than "1.0", but this intent does not +numbers other than "1.0", but this intent does not indicate a commitment to produce any future versions of XML, nor if any are produced, to use any particular numbering scheme. @@ -1030,7 +1030,7 @@ the first element in the document. VersionInfo S 'version' Eq (' VersionNum ' -| " VersionNum ") +| " VersionNum ") Eq S? '=' S? @@ -1183,7 +1183,7 @@ not only between markup declarations.

Hello, world! ]]> The system identifier -"hello.dtd" gives the URI of a DTD for the document.

+"hello.dtd" gives the URI of a DTD for the document.

The declarations can also be given locally, as in this example: @@ -1217,20 +1217,20 @@ the document entity. S 'standalone' Eq -(("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) +(("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))

-In a standalone document declaration, the value "yes" indicates +In a standalone document declaration, the value "yes" indicates that there are no markup declarations external to the document entity (either in the DTD external subset, or in an external parameter entity referenced from the internal subset) which affect the information passed from the XML processor to the application. -The value "no" indicates that there are or may be such +The value "no" indicates that there are or may be such external markup declarations. Note that the standalone document declaration only denotes the presence of external declarations; the presence, in a @@ -1241,14 +1241,14 @@ does not change its standalone status.

If there are no external markup declarations, the standalone document declaration has no meaning. If there are external markup declarations but there is no standalone -document declaration, the value "no" is assumed.

-

Any XML document for which standalone="no" holds can +document declaration, the value "no" is assumed.

+

Any XML document for which standalone="no" holds can be converted algorithmically to a standalone document, which may be desirable for some network delivery applications.

Standalone Document Declaration

The standalone document declaration must have -the value "no" if any external markup declarations +the value "no" if any external markup declarations contain declarations of:

attributes with default values, if elements to which @@ -1271,17 +1271,17 @@ directly within any instance of those types. -

An example XML declaration with a standalone document declaration:<?xml version="&XML.version;" standalone='yes'?>

+

An example XML declaration with a standalone document declaration:<?xml version="&XML.version;" standalone='yes'?>

White Space Handling -

In editing XML documents, it is often convenient to use "white space" +

In editing XML documents, it is often convenient to use "white space" (spaces, tabs, and blank lines, denoted by the nonterminal S in this specification) to set apart the markup for greater readability. Such white space is typically not intended for inclusion in the delivered version of the document. -On the other hand, "significant" white space that should be preserved in the +On the other hand, "significant" white space that should be preserved in the delivered version is common, for example in poetry and source code.

An XML processor @@ -1299,11 +1299,11 @@ In valid documents, this attribute, like any other, must be declared if it is used. When declared, it must be given as an enumerated type whose only -possible values are "default" and "preserve". +possible values are "default" and "preserve". For example:]]>

-

The value "default" signals that applications' +

The value "default" signals that applications' default white-space processing modes are acceptable for this element; the -value "preserve" indicates the intent that applications preserve +value "preserve" indicates the intent that applications preserve all the white space. This declared intent is considered to apply to all elements within the content of the element where it is specified, unless overriden with another instance @@ -1325,7 +1325,7 @@ carriage-return (#xD) and line-feed (#xA).

To simplify the tasks of applications, wherever an external parsed entity or the literal entity value of an internal parsed entity contains either the literal -two-character sequence "#xD#xA" or a standalone literal +two-character sequence "#xD#xA" or a standalone literal #xD, an XML processor must pass to the application the single character #xA. (This behavior can @@ -1347,7 +1347,7 @@ of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined -by , "Tags for the Identification of Languages": +by , "Tags for the Identification of Languages": Language Identification LanguageID @@ -1370,28 +1370,28 @@ by , "Tags for the Identification of Languages" The Langcode may be any of the following:

a two-letter language code as defined by -, "Codes -for the representation of names of languages"

+, "Codes +for the representation of names of languages"

a language identifier registered with the Internet Assigned Numbers Authority ; these begin with the -prefix "i-" (or "I-")

+prefix "i-" (or "I-")

a language identifier assigned by the user, or agreed on between parties in private use; these must begin with the -prefix "x-" or "X-" in order to ensure that they do not conflict +prefix "x-" or "X-" in order to ensure that they do not conflict with names later standardized or registered with IANA

There may be any number of Subcode segments; if the first subcode segment exists and the Subcode consists of two letters, then it must be a country code from -, "Codes -for the representation of names of countries." +, "Codes +for the representation of names of countries." If the first subcode consists of more than two letters, it must be a subcode for the language in question registered with IANA, unless the Langcode begins with the prefix -"x-" or -"X-".

+"x-" or +"X-".

It is customary to give the language code in lower case, and the country code (if any) in upper case. Note that these values, unlike other names in XML documents, @@ -1451,8 +1451,8 @@ notes in English, the xml:lang attribute might be declared this way: elements, the boundaries of which are either delimited by start-tags and end-tags, or, for empty elements, by an empty-element tag. Each element has a type, -identified by name, sometimes called its "generic -identifier" (GI), and may have a set of +identified by name, sometimes called its "generic +identifier" (GI), and may have a set of attribute specifications. Each attribute specification has a name and a value.

@@ -1541,7 +1541,7 @@ the attribute specifications of the element, referred to as the attribute name and the content of the AttValue (the text between the -' or " delimiters) +' or " delimiters) as the attribute value.

@@ -1570,11 +1570,11 @@ to external entities. No < in Attribute Values

The replacement text of any entity referred to directly or indirectly in an attribute -value (other than "&lt;") must not contain +value (other than "&lt;") must not contain a <.

An example of a start-tag: -<termdef id="dt-dog" term="dog">

+<termdef id="dt-dog" term="dog">

The end of every element that begins with a start-tag must be marked by an end-tag @@ -1629,8 +1629,8 @@ content, whether or not it is declared using the keyword tag must be used, and can only be used, for elements which are declared EMPTY.

Examples of empty elements: -<IMG align="left" - src="http://www.w3.org/Icons/WWW/w3c_home" /> +<IMG align="left" + src="http://www.w3.org/Icons/WWW/w3c_home" /> <br></br> <br/>

@@ -2117,9 +2117,9 @@ match the default value. id ID #REQUIRED name CDATA #IMPLIED> <!ATTLIST list - type (bullets|ordered|glossary) "ordered"> + type (bullets|ordered|glossary) "ordered"> <!ATTLIST form - method CDATA #FIXED "POST">

+ method CDATA #FIXED "POST">

Attribute-Value Normalization @@ -2133,7 +2133,7 @@ character to the attribute value

replacement text of the entity

a whitespace character (#x20, #xD, #xA, #x9) is processed by appending #x20 to the normalized value, except that only a single #x20 -is appended for a "#xD#xA" sequence that is part of an external +is appended for a "#xD#xA" sequence that is part of an external parsed entity or the literal entity value of an internal parsed entity

other characters are processed by appending them to the normalized @@ -2327,10 +2327,10 @@ available input devices. match the production for Char.

-If the character reference begins with "&#x", the digits and +If the character reference begins with "&#x", the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. -If it begins just with "&#", the digits up to the terminating +If it begins just with "&#", the digits up to the terminating ; provide a decimal representation of the character's code point. @@ -2370,7 +2370,7 @@ semicolon Entity Declared

In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with -"standalone='yes'", +"standalone='yes'", the Name given in the entity reference must match that in an entity declaration, except that @@ -2390,7 +2390,7 @@ if standalone='yes'.

Entity Declared

In a document with an external subset or external parameter -entities with "standalone='no'", +entities with "standalone='no'", the Name given in the entity reference must match that in an entity declaration. For interoperability, valid documents should declare the entities @@ -2500,8 +2500,8 @@ text: see .

An internal entity is a parsed entity.

Example of an internal entity declaration: -<!ENTITY Pub-Status "This is a pre-release of the - specification.">

+<!ENTITY Pub-Status "This is a pre-release of the + specification.">

@@ -2573,12 +2573,12 @@ of white space in the public identifier must be normalized to single space chara and leading and trailing white space must be removed.

Examples of external entity declarations: <!ENTITY open-hatch - SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> + SYSTEM "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY open-hatch - PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" - "http://www.textuality.com/boilerplate/OpenHatch.xml"> + PUBLIC "-//Textuality//TEXT Standard open-hatch boilerplate//EN" + "http://www.textuality.com/boilerplate/OpenHatch.xml"> <!ENTITY hatch-pic - SYSTEM "../grafix/OpenHatch.gif" + SYSTEM "../grafix/OpenHatch.gif" NDATA gif >

@@ -2673,8 +2673,8 @@ declaration containing an encoding declaration: EncodingDecl S 'encoding' Eq -('"' EncName '"' | -"'" EncName "'" ) +('"' EncName '"' | +"'" EncName "'" ) EncName @@ -2688,19 +2688,19 @@ The EncName is the name of the encoding used.

In an encoding declaration, the values -"UTF-8", -"UTF-16", -"ISO-10646-UCS-2", and -"ISO-10646-UCS-4" should be +"UTF-8", +"UTF-16", +"ISO-10646-UCS-2", and +"ISO-10646-UCS-4" should be used for the various encodings and transformations of Unicode / ISO/IEC 10646, the values -"ISO-8859-1", -"ISO-8859-2", ... -"ISO-8859-9" should be used for the parts of ISO 8859, and +"ISO-8859-1", +"ISO-8859-2", ... +"ISO-8859-9" should be used for the parts of ISO 8859, and the values -"ISO-2022-JP", -"Shift_JIS", and -"EUC-JP" +"ISO-2022-JP", +"Shift_JIS", and +"EUC-JP" should be used for the various encoded forms of JIS X-0208-1997. XML processors may recognize other encodings; it is recommended that character encodings registered (as charsets) @@ -2859,8 +2859,8 @@ and (except for parameter entities) markup, which must be recognized in the usual way, except that the replacement text of entities used to escape markup delimiters (the entities &magicents;) is always treated as -data. (The string "AT&amp;T;" expands to -"AT&T;" and the remaining ampersand is not recognized +data. (The string "AT&amp;T;" expands to +"AT&T;" and the remaining ampersand is not recognized as an entity-reference delimiter.) A character reference is included when the indicated character is processed in place of the reference itself. @@ -2919,7 +2919,7 @@ For example, this is well-formed: ]]> while this is not: -<!ENTITY EndAttr "27'" > +<!ENTITY EndAttr "27'" > <element attribute='a-&EndAttr;>

@@ -2989,11 +2989,11 @@ For example, given the following declarations: ]]> -then the replacement text for the entity "book" is: +then the replacement text for the entity "book" is: La Peste: Albert Camus, © 1947 Éditions Gallimard. &rights; -The general-entity reference "&rights;" would be expanded -should the reference "&book;" appear in the document's +The general-entity reference "&rights;" would be expanded +should the reference "&book;" appear in the document's content or an attribute value.

These simple rules may have complex interactions; for a detailed discussion of a difficult example, see @@ -3010,7 +3010,7 @@ ampersand, and other delimiters. A set of general entities Numeric character references may also be used; they are expanded immediately when recognized and must be treated as character data, so the numeric character references -"&#60;" and "&#38;" may be used to +"&#60;" and "&#38;" may be used to escape < and & when they occur in character data.

All XML processors must recognize these entities whether they @@ -3029,7 +3029,7 @@ that character, as shown below. ]]> Note that the < and & characters -in the declarations of "lt" and "amp" +in the declarations of "lt" and "amp" are doubly escaped to meet the requirement that entity replacement be well-formed.

@@ -3229,7 +3229,7 @@ range indicated.

with a value not among the characters given.

- +

matches a literal string matching that given inside the double quotes.

@@ -3811,8 +3811,8 @@ names.

Characters which have a font or compatibility decomposition (i.e. those -with a "compatibility formatting tag" in field 5 of the database -- -marked by field 5 beginning with a "<") are not allowed.

+with a "compatibility formatting tag" in field 5 of the database -- +marked by field 5 beginning with a "<") are not allowed.

The following characters are treated as name-start characters @@ -3864,16 +3864,16 @@ numerically (&#38;#38;) or with a general entity then the XML processor will recognize the character references when it parses the entity declaration, and resolve them before storing the following string as the -value of the entity "example": +value of the entity "example": An ampersand (&) may be escaped numerically (&#38;) or with a general entity (&amp;).

]]> -A reference in the document to "&example;" +A reference in the document to "&example;" will cause the text to be reparsed, at which time the -start- and end-tags of the "p" element will be recognized +start- and end-tags of the "p" element will be recognized and the three references will be recognized and expanded, -resulting in a "p" element with the following content +resulting in a "p" element with the following content (all data, no delimiters or markup):

in line 4, the reference to character 37 is expanded immediately, -and the parameter entity "xx" is stored in the symbol -table with the value "%zz;". Since the replacement text -is not rescanned, the reference to parameter entity "zz" +and the parameter entity "xx" is stored in the symbol +table with the value "%zz;". Since the replacement text +is not rescanned, the reference to parameter entity "zz" is not recognized. (And it would be an error if it were, since -"zz" is not yet declared.)

-

in line 5, the character reference "&#60;" is -expanded immediately and the parameter entity "zz" is +"zz" is not yet declared.)

+

in line 5, the character reference "&#60;" is +expanded immediately and the parameter entity "zz" is stored with the replacement text -"<!ENTITY tricky "error-prone" >", +"<!ENTITY tricky "error-prone" >", which is a well-formed entity declaration.

-

in line 6, the reference to "xx" is recognized, -and the replacement text of "xx" (namely -"%zz;") is parsed. The reference to "zz" +

in line 6, the reference to "xx" is recognized, +and the replacement text of "xx" (namely +"%zz;") is parsed. The reference to "zz" is recognized in its turn, and its replacement text -("<!ENTITY tricky "error-prone" >") is parsed. -The general entity "tricky" has now been -declared, with the replacement text "error-prone".

+("<!ENTITY tricky "error-prone" >") is parsed. +The general entity "tricky" has now been +declared, with the replacement text "error-prone".

-in line 8, the reference to the general entity "tricky" is +in line 8, the reference to the general entity "tricky" is recognized, and it is expanded, so the full content of the -"test" element is the self-describing (and ungrammatical) string +"test" element is the self-describing (and ungrammatical) string This sample shows a error-prone method.

@@ -3930,7 +3930,7 @@ that content models in element type declarations be deterministic.

SGML requires deterministic content models (it calls them -"unambiguous"); XML processors built using SGML systems may +"unambiguous"); XML processors built using SGML systems may flag non-deterministic content models as errors.

For example, the content model ((b, c) | (b, d)) is non-deterministic, because given an initial b the parser @@ -3986,8 +3986,8 @@ begin with an XML encoding declaration, in which the first characters must be '<?xml', any conforming processor can detect, after two to four octets of input, which of the following cases apply. In reading this list, it may help to know that in UCS-4, '<' is -"#x0000003C" and '?' is "#x0000003F", and the Byte -Order Mark required of UTF-16 data streams is "#xFEFF".

+"#x0000003C" and '?' is "#x0000003F", and the Byte +Order Mark required of UTF-16 data streams is "#xFEFF".

diff --git a/result/valid/index.xml b/result/valid/index.xml index d0ab20da..734fa4d6 100644 --- a/result/valid/index.xml +++ b/result/valid/index.xml @@ -787,7 +787,7 @@ - Sibyll Klotz: Vollblutpolitikerin mit "Berliner Schnauze" + Sibyll Klotz: Vollblutpolitikerin mit "Berliner Schnauze" diff --git a/result/valid/xlink.xml b/result/valid/xlink.xml index 70096cdd..7b35a0f8 100644 --- a/result/valid/xlink.xml +++ b/result/valid/xlink.xml @@ -54,7 +54,7 @@ type="text/css"?> -

This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at http://www.w3.org/TR.

+

This is a W3C Working Draft for review by W3C members and other interested parties. It is a draft document and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C working drafts can be found at http://www.w3.org/TR.

Note: Since working drafts are subject to frequent change, you are advised to reference the above URI, rather than the URIs for working drafts themselves. Some of the work remaining is described in .

This work is part of the W3C XML Activity (for current status, see http://www.w3.org/XML/Activity ). For information about the XPointer language which is expected to be used with XLink, see http://www.w3.org/TR/WD-xptr.

@@ -164,7 +164,7 @@ document. bent--> -

A representation of the relevant structure specified by the tags and attributes in an XML document, based on "groves" as defined in the ISO DSSSL standard.

+

A representation of the relevant structure specified by the tags and attributes in an XML document, based on "groves" as defined in the ISO DSSSL standard.

@@ -191,14 +191,14 @@ document. bent--> -

A link whose traversal can be initiated from more than one of its participating resources. Note that being able to "go back" after following a one-directional link does not make the link multidirectional.

+

A link whose traversal can be initiated from more than one of its participating resources. Note that being able to "go back" after following a one-directional link does not make the link multidirectional.

A link whose content does not serve as one of the link's participating resources . Such links presuppose a notion like extended link groups, which instruct application software where to look for links. Out-of-line links are generally required for supporting multidirectional traversal and for allowing read-only resources to have outgoing links.

-

In the context of link behavior, a parsed link is any link whose content is transcluded into the document where the link originated. The use of the term "parsed" directly refers to the concept in XML of a +

In the context of link behavior, a parsed link is any link whose content is transcluded into the document where the link originated. The use of the term "parsed" directly refers to the concept in XML of a parsed entity.

@@ -237,7 +237,7 @@ document. bent--> Locator Syntax

The locator for a resource is typically provided by means of a Uniform Resource Identifier, or URI. XPointers can be used in conjunction with the URI structure, as fragment identifiers, to specify a more precise sub-resource.

-

A locator generally contains a URI, as described in IETF RFCs and . As these RFCs state, the URI may include a trailing query (marked by a leading "?"), and be followed by a "#" and a fragment identifier, with the query interpreted by the host providing the indicated resource, and the interpretation of the fragment identifier dependent on the data type of the indicated resource.

+

A locator generally contains a URI, as described in IETF RFCs and . As these RFCs state, the URI may include a trailing query (marked by a leading "?"), and be followed by a "#" and a fragment identifier, with the query interpreted by the host providing the indicated resource, and the interpretation of the fragment identifier dependent on the data type of the indicated resource.

In order to locate XML documents and portions of documents, a locator value may contain either a URI or a fragment identifier, or both. Any fragment identifier for pointing into XML must be an XPointer.

Special syntax may be used to request the use of particular processing models in accessing the locator's resource. This is designed to reflect the realities of network operation, where it may or may not be desirable to exercise fine control over the distribution of work between local and remote processors. @@ -273,15 +273,15 @@ document. bent--> -

If the Connector is followed directly by a Name, the Name is shorthand for the XPointer"id(Name)"; that is, the sub-resource is the element in the containing resource that has an XML ID attribute whose value matches the Name. This shorthand is to encourage use of the robust id addressing mode.

+

If the Connector is followed directly by a Name, the Name is shorthand for the XPointer"id(Name)"; that is, the sub-resource is the element in the containing resource that has an XML ID attribute whose value matches the Name. This shorthand is to encourage use of the robust id addressing mode.

-

If the connector is "#", this signals an intent that the containing resource is to be fetched as a whole from the host that provides it, and that the XPointer processing to extract the sub-resource +

If the connector is "#", this signals an intent that the containing resource is to be fetched as a whole from the host that provides it, and that the XPointer processing to extract the sub-resource is to be performed on the client, that is to say on the same system where the linking element is recognized and processed.

-

If the connector is "|", no intent is signaled as to what processing model is to be used to go about accessing the designated resource.

+

If the connector is "|", no intent is signaled as to what processing model is to be used to go about accessing the designated resource.

@@ -301,10 +301,10 @@ document. bent--> Link Recognition

The existence of a link is asserted by a linking element. Linking elements must be recognized reliably by application software in order to provide appropriate display and behavior. There are several ways link recognition could be accomplished: for example, reserving element type names, reserving attributes names, leaving the matter of recognition entirely up to stylesheets and application software, or using the XLink namespace to specify element names and attribute names that would be recognized by namespace and XLink-aware processors. Using element and attribute names within the XLink namespace provides a balance between giving users control of their own markup language design and keeping the identification of linking elements simple and unambiguous.

The two approaches to identifying linking elements are relatively simple to implement. For example, here's how the HTML A element would be declared using attributes within the XLink namespace, and then how an element within the XLink namespace might do the same: - <A xlink:type="simple" xlink:href="http://www.w3.org/TR/wd-xlink/" -xlink:title="The Xlink Working Draft">The XLink Working Draft.</A> - <xlink:simple href="http://www.w3.org/TR/wd-xlink/" -title="The XLink Working Draft">The XLink Working Draft</xlink:simple> + <A xlink:type="simple" xlink:href="http://www.w3.org/TR/wd-xlink/" +xlink:title="The Xlink Working Draft">The XLink Working Draft.</A> + <xlink:simple href="http://www.w3.org/TR/wd-xlink/" +title="The XLink Working Draft">The XLink Working Draft</xlink:simple> Any arbitrary element can be made into an XLink by using the xlink:type attribute. And, of course, the explicit XLink elements may be used, as well. This document will go on to describe the linking attributes that are associated with linking elements. It may be assumed by the reader that these attributes would require the xlink namespace prefix if they existed within an arbitrary element, or that they may be used directly if they exist within an explicit Xlink element.

@@ -336,7 +336,7 @@ title="The XLink Working Draft">The XLink Working Draft</xlink:s Semantic Attributes -

There are two attributes associated with semantics, role and title. The role attribute is a generic string used to describe the function of the link's content. For example, a poem might have a link with a role="stanza". The role is also used as an identifier for the from and to attributes of arcs.

+

There are two attributes associated with semantics, role and title. The role attribute is a generic string used to describe the function of the link's content. For example, a poem might have a link with a role="stanza". The role is also used as an identifier for the from and to attributes of arcs.

The title attribute is designed to provide human-readable text describing the link. It is very useful for those who have text-based applications, whether that be due to a constricted device that cannot display the link's content, or if it's being read by an application to a visually-impaired user, or if it's being used to create a table of links. The title attribute contains a simple, descriptive string.

@@ -355,28 +355,28 @@ title="The XLink Working Draft">The XLink Working Draft</xlink:s href CDATA #REQUIRED role CDATA #IMPLIED title CDATA #IMPLIED - show (new|parsed|replace) "replace" - actuate (user|auto) "user" + show (new|parsed|replace) "replace" + actuate (user|auto) "user" > And here is how to make an arbitrary element into a simple link. <!ELEMENT xlink:simple ANY> <!ATTLIST foo - xlink:type (simple|extended|locator|arc) #FIXED "simple" + xlink:type (simple|extended|locator|arc) #FIXED "simple" xlink:href CDATA #REQUIRED xlink:role CDATA #IMPLIED xlink:title CDATA #IMPLIED - xlink:show (new|parsed|replace) "replace" - xlink:actuate (user|auto) "user" + xlink:show (new|parsed|replace) "replace" + xlink:actuate (user|auto) "user" > Here is how the first example might look in a document: -<xlink:simple href="http://www.w3.org/TR/wd-xlink" role="working draft" - title="The XLink Working Draft" show="replace" actuate="user"> +<xlink:simple href="http://www.w3.org/TR/wd-xlink" role="working draft" + title="The XLink Working Draft" show="replace" actuate="user"> The XLink Working Draft.</xlink:simple> -<foo xlink:href="http://www.w3.org/TR/wd-xlink" xlink:role="working draft" - xlink:title="The XLink Working Draft" xlink:show="new" xlink:actuate="user"> +<foo xlink:href="http://www.w3.org/TR/wd-xlink" xlink:role="working draft" + xlink:title="The XLink Working Draft" xlink:show="new" xlink:actuate="user"> The XLink Working Draft.</foo> Alternately, a simple link could be as terse as this: -<foo xlink:href="#stanza1">The First Stanza.</foo> +<foo xlink:href="#stanza1">The First Stanza.</foo>

There are no constraints on the contents of a simple linking element. In @@ -385,7 +385,7 @@ The XLink Working Draft.</foo> a valid document, every element that is significant to XLink must still conform to the constraints expressed in its governing DTD.

Note that it is meaningful to have an out-of-line simple link, although - such links are uncommon. They are called "one-ended" and are typically used + such links are uncommon. They are called "one-ended" and are typically used to associate discrete semantic properties with locations. The properties might be expressed by attributes on the link, the link's element type name, or in some other way, and are not considered full-fledged resources of the link. @@ -427,7 +427,7 @@ The XLink Working Draft.</foo> <!ELEMENT foo ((xlink:arc | xlink:locator)*)> <!ATTLIST foo - xlink:type (simple|extended|locator|arc) #FIXED "extended" + xlink:type (simple|extended|locator|arc) #FIXED "extended" xlink:role CDATA #IMPLIED xlink:title CDATA #IMPLIED xlink:showdefault (new|parsed|replace) #IMPLIED @@ -435,11 +435,11 @@ The XLink Working Draft.</foo> The following two examples demonstrate how each of the above might appear within a document instance. Note that the content of these examples would be other elements. For brevity's sake, they've been left blank. The first example shows how the link might appear, using an explicit XLink extended link: -<xlink:extended role="address book" title="Ben's Address Book" showdefault="replace" actuatedefault="user"> ... </xlink:extended> +<xlink:extended role="address book" title="Ben's Address Book" showdefault="replace" actuatedefault="user"> ... </xlink:extended> And the second shows how the link might appear, using an arbitrary element: -<foo xlink:type="extended" xlink:role="address book" xlink:title="Ben's Address Book" xlink:showdefault="replace" xlink:actuatedefault="user"> ... </foo> +<foo xlink:type="extended" xlink:role="address book" xlink:title="Ben's Address Book" xlink:showdefault="replace" xlink:actuatedefault="user"> ... </foo>

@@ -514,8 +514,8 @@ for Computers and the Humanities (ACH), Association for Computational Linguistics (ACL), and Association for Literary and Linguistic Computing (ALLC). Chicago, Oxford: Text Encoding Initiative, 1994. -]Steven J. DeRose and David G. Durand. 1995. "The -TEI Hypertext Guidelines." In Computing and the Humanities +]Steven J. DeRose and David G. Durand. 1995. "The +TEI Hypertext Guidelines." In Computing and the Humanities 29(3). Reprinted in Text Encoding Initiative: Background and Context,