Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
When push-parsing with an encoding handler, we must convert the whole
buffer in the initial conversion. Otherwise, parsing a single chunk
larger than ~4KB would fail.
Regressed with commit 34c9108f.
Allow drive letters in URI paths. Technically, these should be treated
as URI schemes, but this is not what users expect. This also makes sure
that paths with drive letters are resolved as filesystem paths and
unescaped, for example when used in libxslt's document() function.
Should fix#832.
Make sure to return NULL for node types except elements or text to match
the old behavior.
Note that CDATA sections are still treated like text nodes and will have
their content returned.
Fixes#838.
Downstream code like the nginx xslt module can change the document's DTD
pointers in a SAX callback. If an entity from a separate DTD is parsed
lazily, its content must not reference the current document.
Regressed with commit d025cfbb.
Fixes#815.
This implements xmlCtxtParseContent, a better alternative to
xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a
parser context and a parser input, making it a lot more versatile.
xmlParseInNodeContext is now implemented in terms of
xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never
modifies the target document, improving thread safety.
xmlParseInNodeContext is also more lenient now with regard to undeclared
entities.
Fixes#727.
Reuse some of the old members.
The "input" and "output" function pointers are actually of type
xmlCharEncConvFunc, accepting an additional argument. For default
handlers, this argument is unused, so this should work with most ABIs.
For iconv handlers, these function pointers used to be NULL but now
point to a function which requires the extra argument.
"iconv_in" and "iconv_out" are made void pointers. "uconv_in" and
"uconv_out" are renamed and made void pointers. This is unlikely to
cause issues.
We now expect that the built-in conversion functions correctly report
XML_ENC_ERR_SPACE. For UTF8ToHtml and the ISO-8859-X code, this will be
done in the following commits.
When looking up encodings with xmlLookupCharEncodingHandler, the
returned handler can have a different name than requested
(capitalization, internal aliases). This should eventually be fixed.
For now we revert part of commit 5b893fa9, start the lookup with
xmlFindHandler and add an explicit check for UTF-8.
Should fix the encoding name issue mentioned in #749.
It's possible to create references to predefined entities using the tree
API. This edge case was exposed by making predefined entities const in
commit 63ce5f9a.
Only use structured error handlers for parser, Schemas and RelaxNG
contexts. Also use structured error handler for XInclude context.
Remove TODO macro.