mirror of
https://gitlab.gnome.org/GNOME/libxml2
synced 2025-03-28 21:33:13 +00:00
parser: Document that XML_PARSE_NOBLANKS is broken
Long text content can generate multiple "characters" callbacks which can lead to NOBLANKS removing whitespace in non-whitespace text nodes. So the NOBLANKS option doesn't even work reliably with the pull parser. This would be extremely hard to fix. Unfortunately, `xmllint --format` relies on this option which is another reason why this feature never really worked.
This commit is contained in:
parent
40e423d6c2
commit
7a8722f557
@ -283,6 +283,10 @@
|
|||||||
environment variable controls the indentation. The default value is two
|
environment variable controls the indentation. The default value is two
|
||||||
spaces " ").
|
spaces " ").
|
||||||
</para>
|
</para>
|
||||||
|
<para>
|
||||||
|
Especially in the absence of a DTD, this feature has never worked reliably
|
||||||
|
and is fundamentally broken.
|
||||||
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
|
13
parser.c
13
parser.c
@ -4914,6 +4914,11 @@ get_more_space:
|
|||||||
(ctxt->disableSAX == 0) &&
|
(ctxt->disableSAX == 0) &&
|
||||||
(ctxt->sax->ignorableWhitespace !=
|
(ctxt->sax->ignorableWhitespace !=
|
||||||
ctxt->sax->characters)) {
|
ctxt->sax->characters)) {
|
||||||
|
/*
|
||||||
|
* Calling areBlanks with only parts of a text node
|
||||||
|
* is fundamentally broken, making the NOBLANKS option
|
||||||
|
* essentially unusable.
|
||||||
|
*/
|
||||||
if (areBlanks(ctxt, tmp, nbchar, 1)) {
|
if (areBlanks(ctxt, tmp, nbchar, 1)) {
|
||||||
if (ctxt->sax->ignorableWhitespace != NULL)
|
if (ctxt->sax->ignorableWhitespace != NULL)
|
||||||
ctxt->sax->ignorableWhitespace(ctxt->userData,
|
ctxt->sax->ignorableWhitespace(ctxt->userData,
|
||||||
@ -13715,11 +13720,9 @@ xmlCtxtSetOptionsInternal(xmlParserCtxtPtr ctxt, int options, int keepMask)
|
|||||||
*
|
*
|
||||||
* XML_PARSE_NOBLANKS
|
* XML_PARSE_NOBLANKS
|
||||||
*
|
*
|
||||||
* Remove some text nodes containing only whitespace from the
|
* Remove some whitespace from the result document. Where to
|
||||||
* result document. Which nodes are removed depends on DTD
|
* remove whitespace depends on DTD element declarations or a
|
||||||
* element declarations or a conservative heuristic. The
|
* broken heuristic with unfixable bugs. Use of this option is
|
||||||
* reindenting feature of the serialization code relies on this
|
|
||||||
* option to be set when parsing. Use of this option is
|
|
||||||
* DISCOURAGED.
|
* DISCOURAGED.
|
||||||
*
|
*
|
||||||
* Not supported by the push parser.
|
* Not supported by the push parser.
|
||||||
|
Loading…
x
Reference in New Issue
Block a user