Passing a NULL systemId results in snprintf("%s", NULL) which crashes on
some platforms. Regressed with commit 4ff2dccf.
Note that systemId should never be NULL during normal parsing. It can
only be NULL if API functions are called with a NULL systemId.
Should fix#825.
The Document Object Model (DOM) Level 3 Core Specification says:
> Adjacent CDATASection nodes are not merged by use of the normalize
> method of the Node interface.
Fixes#412.
Revert a change from d025cfbb and don't overwrite ID table entries, so
that the first attribute will be returned if there are duplicate IDs.
This requires two other changes:
- Attributes in entity content are never added to the ID table. This
seems reasonable.
- Remove the optimization to skip ID lookup when copying and the target
document has an empty ID table. This also seems more correct since the
document could have ID declarations nevertheless or we could be
copying xml:ids into the document for the first time.
Fixes#757.
Search parent inputs of internal parameter entities for base URI.
Fixes a long-standing bug, which manifested in a different way after
commit 955c177f. Reproduce with
xmllint --noent xmlconf/eduni/errata-2e/E18.xml
After the failed experiment with a static XML namespace, introduce
versions of xmlSearchNs that report malloc failures.
Optimize the no-document case by only adding the XML namespace
declaration if it wasn't found in an ancestor.
Replace xmlStringGetNodeList and xmlStringLenGetNodeList with
xmlNodeParseContentInternal which also updates an optional parent
node.
Don't look up entities a second time via xmlNewReference.
Don't use a separate function to handle "complex" attributes. Validate
UTF-8 byte sequences without decoding. This should improve performance
considerably when parsing multi-byte UTF-8 sequences.
Use a string buffer to avoid unnecessary allocations and copying when
expanding entities.
Normalize attribute values in a single pass while expanding entities.
Be more lenient in recovery mode.
If no entity substitution was requested, validate entities without
expanding. Fixes#596.
Also fixes#655.
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.
Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
Today I learned that the TSCII character encoding [1] can blow up the
size of text 12 times when converted to UTF-8:
$ printf '\x82' |iconv -f TSCII -t UTF-8 |hexdump -C
00000000 e0 ae b8 e0 af 8d e0 ae b0 e0 af 80
0000000c
[1] https://en.wikipedia.org/wiki/Tamil_Script_Code_for_Information_Interchange
Introduce xmlCtxtSetErrorHandler allowing to set a structured error for
a parser context. There already was the "serror" SAX handler but this
always receives the parser context as argument.
Start to use xmlRaiseMemoryError.
Remove useless arguments from memory error functions. Rename
xmlErrMemory to xmlCtxtErrMemory.
Remove a few calls to xmlGenericError.
Remove support for runtime entity debugging.
Set the dictionary for newDoc in xmlParseBalancedChunkMemoryRecover.
This is a long-standing bug which was masked by
- xmlParseBalancedChunkMemoryRecover changing the document of the root
node. This is a really bad idea, resulting in a mismatch between
ctxt->myDoc and ctxt->node->doc.
- SAX2.c preferring ctxt->node->doc over ctxt->myDoc until commit
a31e1b06.
Fixes#641.
Use a hash table to lookup namespaces by prefix. The hash table stores
an index into the namespace table. Auxiliary data for namespaces is
stored in a separate array along the main namespace table.
Use a hash table to verify attribute uniqueness. The hash table stores
an index into the attribute table.
Reuse hash value from the dictionary to avoid computing them twice.
See #346.