Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
Another regression related to reading from stdin.
Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.
This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.
Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
Allow drive letters in URI paths. Technically, these should be treated
as URI schemes, but this is not what users expect. This also makes sure
that paths with drive letters are resolved as filesystem paths and
unescaped, for example when used in libxslt's document() function.
Should fix#832.
Meson wrapdb provides a wrap for ICU, so libxml2 and ICU could both be
built as subprojects of the same Meson parent project. In this case, with
the icu option enabled, setup was failing with:
subprojects/libxml2-2.13.5/meson.build:603:22: ERROR: Could not get an internal variable and no default provided for <InternalDependency dep228908115162702543524838879388991448872: True>
This is because we can't get a dependency variable from a subproject that
hasn't been built yet. Fall back to assuming DEFS is empty, as it is on
my system.
Suffixes like "//IGNORE" change the behavior of iconv.
Also add comment on how we currently rely on GNU libiconv behavior
which technically violates the POSIX spec.
Use "UCS-*" instead of "ISO-10646-UCS-*". While the XML spec recommends
"ISO-10646-UCS-2" and "ISO-10646-UCS-4", GNU iconv doesn't understand
these names.
Ignore UCS4_2143 and UCS4_3412 which were never supported.
Make sure to return NULL for node types except elements or text to match
the old behavior.
Note that CDATA sections are still treated like text nodes and will have
their content returned.
Fixes#838.