7269 Commits

Author SHA1 Message Date
Nick Wellnhofer
05bd1720ce parser: Fix parsing of DTD content
Regressed in 2.11. Fixes #868.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
552864f109 Remove os400 port
This is based on an ancient version and completely outdated.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
e60f0712ea Update NEWS 2025-03-01 15:18:20 +01:00
Nick Wellnhofer
e50d314a27 build: Add separate configuration option for RELAX NG
Support for RELAX NG used to be enabled together with XML Schema support
(--with-schemas). Now there's a separate option and a new feature macro
LIBXML_RELAXNG_ENABLED.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
ce1b704e33 doc: Regenerate libxml2-api.xml 2025-02-25 20:09:36 +01:00
Nick Wellnhofer
6ab430ca2e Remove unnecessary #includes 2025-02-22 21:55:58 +01:00
Nick Wellnhofer
7ae8e8ac7d schemas: Make xmlSchemaDump depend on DEBUG_ENABLED 2025-02-22 21:06:34 +01:00
Nick Wellnhofer
6fc260760a regexp: Hide debugging code behind DEBUG_REGEXP
xmlRegexpPrint is now a deprecated no-op.
2025-02-22 20:55:06 +01:00
Florin Haja
4649f28f77 xmlregexp: add support for compact form of automata in xmlRegexpPrint 2025-02-22 19:29:07 +00:00
Nick Wellnhofer
c82270a9a7 regexp: Avoid dangling start/stop pointers in atom
States could be eliminated later, so set start/stop pointers to NULL
after they're used in xmlFAGenerateTransitions.
2025-02-22 18:55:43 +01:00
Nick Wellnhofer
5ed4eafd8a html: Don't invoke SAX callbacks if parser was stopped 2025-02-22 14:52:47 +01:00
Nick Wellnhofer
6dfa68ac7f SAX2: Fix ctxt->nodemem check
In some error cases and maybe other situations, nodemem can have a
value of -1.
2025-02-22 14:52:47 +01:00
Nick Wellnhofer
73514f2d2e gitlab-ci: Stop downloading and installing CMake for MSVC
CMake should already be installed.
2025-02-20 18:50:58 +01:00
Jan Alexander Steffens (heftig)
064a02114a meson: Fix Python module build 2025-02-20 13:53:25 +01:00
Jan Alexander Steffens (heftig)
c2e2d76211 python: Pass destination dir to generator.py
Simplify usage across build systems.
2025-02-20 13:53:25 +01:00
Jan Alexander Steffens (heftig)
82fb5caee5 meson: Use project_name instead of 'libxml2' 2025-02-20 13:53:25 +01:00
Nick Wellnhofer
e649c97246 fuzz: Add utility scripts
Add scripts to minimize a corpus and generate HTML coverage reports.
2025-02-20 12:22:12 +01:00
Nick Wellnhofer
63dfcca670 fuzz: Reduce initial array size 2025-02-20 12:22:12 +01:00
Nick Wellnhofer
6f903d434f fuzz: Rework fixed parser options
Remove XML_PARSE_XINCLUDE. This is only honored by the XML Reader
interface which is now fuzzed in reader.c.

Don't validate in XInclude fuzzer. This doesn't increase coverage after
moving the Reader fuzzer.
2025-02-20 12:22:12 +01:00
Nick Wellnhofer
44628d4559 fuzz: Harden leak check in lint fuzzer
Check for undetected memory leaks from previous iterations. This also
makes sure that the maxmem limit is checked deterministically.
2025-02-20 12:22:12 +01:00
Nick Wellnhofer
c6c6d8afef fuzz: Mutate fuzz data chunks separately
Implement a custom mutator that takes a list of fixed-size chunks which
are mutated with a given probability. This makes sure that values like
parser options or failure position are mutated regularly even as the
fuzz data grows large. Values can also be adjusted temporarily to make
the fuzzer focus on failure injection, for example.

Thanks to David Kilzer for the idea.
2025-02-20 12:22:12 +01:00
Nick Wellnhofer
f5257d92bf fuzz: Fix failure injection in schema fuzzer 2025-02-20 12:10:50 +01:00
Nick Wellnhofer
fd359a7e49 fuzz: Start to fuzz XML Schema validator 2025-02-20 11:35:47 +01:00
Nick Wellnhofer
9f86dae989 test: Add test case for UAF in xmlSchemaIDCFillNodeTables 2025-02-20 11:35:47 +01:00
Himanshibansal
fe7f835f32 Fix C4296 warning: Resolve comparison of unsigned int with 0 2025-02-20 10:24:50 +00:00
Nick Wellnhofer
b8234e8c73 html: Fix check for partial named character references
Digits are allowed after the first character.
2025-02-19 12:53:32 +01:00
Nick Wellnhofer
f68c70d298 html: Remove htmlSaveErr
This function is useless now.
2025-02-19 12:27:26 +01:00
Nick Wellnhofer
0315ac9390 html: Handle error from htmlFindOutputEncoder 2025-02-19 12:27:26 +01:00
Nick Wellnhofer
22ada0a0bf tests: Look for xmlconf in source directory
Add -d option to runxmlconf for automake.

Fix extraction of xmlconf.tar.gz on Windows.

Make runxmlconf work with Meson CI.
2025-02-18 23:55:28 +01:00
Nick Wellnhofer
aedc1f3d14 gitlab-ci: Run meson tests verbosely 2025-02-18 23:15:20 +01:00
Nick Wellnhofer
9037dce918 fuzz: Add dictionary for lint fuzzer
Mostly a combination of xml.dict and xpath.dict. This should with
fuzzing pattern.c.
2025-02-18 19:38:28 +01:00
Nick Wellnhofer
51622c058e doc: Update release instructions 2025-02-18 17:27:16 +01:00
Nick Wellnhofer
8c8753ad52 [CVE-2025-24928] Fix stack-buffer-overflow in xmlSnprintfElements
Fixes #847.
2025-02-18 15:07:51 +01:00
Nick Wellnhofer
5880a9a6bd [CVE-2024-56171] Fix use-after-free after xmlSchemaItemListAdd
xmlSchemaItemListAdd can reallocate the items array. Update local
variables after adding item in

- xmlSchemaIDCFillNodeTables
- xmlSchemaBubbleIDCNodeTables

Fixes #828.
2025-02-18 15:07:44 +01:00
Nick Wellnhofer
06b3965086 fuzz: Stop testing xmllint --memory option
The --memory option mmaps files directly, bypassing the resource loader.
We'd need a temp file to make it work when fuzzing.
2025-02-17 12:19:23 +01:00
Nick Wellnhofer
25ae533b3e xmllint: Fix SIGBUS with --memory option
If the input file size is a multiple of page size, the byte after the
file's content is on a new page and accessing it will lead to SIGBUS.

Remove XML_INPUT_BUF_ZERO_TERMINATED hint for mmapped files.

Regressed with a221cd78.

Fixes #864.
2025-02-17 11:45:16 +01:00
Nick Wellnhofer
7a61c32bfa html: Use enum instead of magic values for insertion modes 2025-02-17 11:41:57 +01:00
Nick Wellnhofer
3793eaadb7 fuzz: Fix build 2025-02-16 13:55:18 +01:00
Nick Wellnhofer
69b91da3a8 Revert "xpath: Make contextSize and proximityPosition default to 1"
This reverts commit afbc0a0405236de4ab8cbac94745e9885db0a198.
2025-02-13 20:20:17 +01:00
Nick Wellnhofer
9c16a153d8 Revert "include: Make most IS_* macros private"
This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
2025-02-13 20:20:17 +01:00
Nick Wellnhofer
6c716d491d pattern: Fix compilation of explicit child axis
The child axis is the default axis and should generate XML_OP_ELEM like
the case without an axis.
2025-02-13 20:20:17 +01:00
Nick Wellnhofer
8cf6129bbd html: Stop implying <p> start tags
Only <html>, <head> or <body> should be implied. Opening extra <p> tags
has always been a libxml2 quirk.
2025-02-13 20:20:17 +01:00
Nick Wellnhofer
71122421a1 html: Make implied <p> tags more deterministic
libxml2's HTML parser adds <p> start tags in some situations. This
behavior, which doesn't follow any standard, was added in 2000, see
here: http://veillard.com/XML/messages/0655.html

Text nodes that only contain whitespace don't imply a <p> tag, but the
whitespace check cannot work reliably if we're parsing partial text data
which can happen with both pull and push parser.

The logic in `areBlanks` is hard to follow. The checks involving `CUR`
depend on the position of the input pointer and seem dubious. It's also
possible that the behavior changed inadvertently with a later commit.
As a result, it's hard to come up with good test cases.

We now process leading whitespace before creating implied tags. This is
more in line with HTML5 and should avoid at least some issues with
partial text data.

For example, parsing the string "<head>   x" used to result in:

<html>
<head></head>
<body><p>   x</p></body>
</html>

And now results in:

<html>
<head>   </head>
<body><p>x</p></body>
</html>

Except for the implied <p> tag, this matches HTML5.
2025-02-13 14:31:44 +01:00
Nick Wellnhofer
ebbc31cc6b malloc-fail: Check for malloc failure in xhtmlNodeDumpOutput 2025-02-13 12:09:58 +01:00
Nick Wellnhofer
79ab721cb3 tests: Fix error return in testHugeEncodedChunk
Fixes #859.
2025-02-11 11:39:08 +01:00
Nick Wellnhofer
cfc854b839 fuzz: Work around glibc iconv() bug 2025-02-11 00:21:12 +01:00
Nick Wellnhofer
3a1526a5f7 xpath: Don't raise OOM error on long names
Short-lived regression.
2025-02-10 19:32:32 +01:00
Daniel Cheng
3dcde736d0 Use __has_attribute to check for __counted_by__ support
The initial clang patch to support __counted_by__ was landed and
reverted several times. There are some clang toolchains (e.g. the
Android toolchain) that report themselves as version 18 but do not
support __counted_by__. While it is debatable if Android should be
shipping a pre-release clang, using __has_attribute should be a bit
simpler overall.

Note that this doesn't migrate everything else to use __has_attribute:
while clang has always supported __has_attribute, gcc didn't support
it until a bit later.
2025-02-06 10:17:09 +01:00
Nick Wellnhofer
35d8a230a8 tests: Fix expected errors in runxmlconf
The extra failure if regexps weren't enabled was actually a regression
fixed by the previous commit.
2025-02-06 10:14:56 +01:00
Zak Ridouh
b466e70ae5
Fix early return in vstateVPush in valid.c
While looking over the code in the fallback method for `vstateVPush` in
valid.c when `LIBXML_REGEXP_ENABLED` is not defined, I noticed that
there is an ungated `return(-1)` after attempting to allocate memory.

I believe this should be inside a check, for if the malloc fails.
2025-02-05 14:11:04 -08:00