The initial clang patch to support __counted_by__ was landed and
reverted several times. There are some clang toolchains (e.g. the
Android toolchain) that report themselves as version 18 but do not
support __counted_by__. While it is debatable if Android should be
shipping a pre-release clang, using __has_attribute should be a bit
simpler overall.
Note that this doesn't migrate everything else to use __has_attribute:
while clang has always supported __has_attribute, gcc didn't support
it until a bit later.
While looking over the code in the fallback method for `vstateVPush` in
valid.c when `LIBXML_REGEXP_ENABLED` is not defined, I noticed that
there is an ungated `return(-1)` after attempting to allocate memory.
I believe this should be inside a check, for if the malloc fails.
Not long ago, Apple decided to replace GNU libiconv with a patched up
version of FreeBSD's iconv implementation in their operating systems.
Unfortunately, the quality of both the original implementation as well
as Apple's patches is so abysmal that you routinely find issues when
fuzzing your own code.
For now, incomplete UTF-8 is always an error in push mode.
Eventually, we could pass chunked data to the character handler when
push-parsing. Then we'd have to handle incomplete sequences.
Don't check for UTF-32 or EBCDIC.
We now perform BOM sniffing and the first step of the HTML5 prescan
algorithm (detect UTF-16 XML declarations). The rest of the algorithm
still has to be implemented.
Don't change the SAX handler.
Use a helper function to invoke "characters" SAX callback.
The old code didn't advance the input pointer consistently before
invoking the callback. There was also some inconsistency wrt to
ctxt->space handling. I don't understand the ctxt->space thing, but
now we always behave like the non-complex case before.
Long text content can generate multiple "characters" callbacks which can
lead to NOBLANKS removing whitespace in non-whitespace text nodes. So
the NOBLANKS option doesn't even work reliably with the pull parser.
This would be extremely hard to fix.
Unfortunately, `xmllint --format` relies on this option which is another
reason why this feature never really worked.
Also serialize the result of push-parsing and compare whether pull and
push parser produce the same result (differential fuzzing).
We lose the ability to inject IO errors when serializing for now, but
this isn't too important.
Use variable chunk size for push parser.
Fixes#849.
When push-parsing with an encoding handler, we must convert the whole
buffer in the initial conversion. Otherwise, parsing a single chunk
larger than ~4KB would fail.
Regressed with commit 34c9108f.
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
Another regression related to reading from stdin.
Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.
This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.
Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
Allow drive letters in URI paths. Technically, these should be treated
as URI schemes, but this is not what users expect. This also makes sure
that paths with drive letters are resolved as filesystem paths and
unescaped, for example when used in libxslt's document() function.
Should fix#832.
Meson wrapdb provides a wrap for ICU, so libxml2 and ICU could both be
built as subprojects of the same Meson parent project. In this case, with
the icu option enabled, setup was failing with:
subprojects/libxml2-2.13.5/meson.build:603:22: ERROR: Could not get an internal variable and no default provided for <InternalDependency dep228908115162702543524838879388991448872: True>
This is because we can't get a dependency variable from a subproject that
hasn't been built yet. Fall back to assuming DEFS is empty, as it is on
my system.
Suffixes like "//IGNORE" change the behavior of iconv.
Also add comment on how we currently rely on GNU libiconv behavior
which technically violates the POSIX spec.