1057 Commits

Author SHA1 Message Date
Nick Wellnhofer
cdc5cfed0b legacy: Remove legacy symbols 2025-03-04 16:54:05 +01:00
Nick Wellnhofer
c42b32277d parser: Convert inputPush and inputPop to macros 2025-03-04 16:53:28 +01:00
Nick Wellnhofer
361f7bff92 parser: Make nodePush, nodePop, namePush, namePop private 2025-03-04 16:47:14 +01:00
Nick Wellnhofer
05bd1720ce parser: Fix parsing of DTD content
Regressed in 2.11. Fixes #868.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
e50d314a27 build: Add separate configuration option for RELAX NG
Support for RELAX NG used to be enabled together with XML Schema support
(--with-schemas). Now there's a separate option and a new feature macro
LIBXML_RELAXNG_ENABLED.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
b4d3d87ed2 parser: Fix parsing of doctype declarations
Fix some long-standing issues.

Fixes #504.
2025-02-02 11:15:45 +01:00
Nick Wellnhofer
57e4bbd803 parser: Improve handling of NOCDATA option
Don't modify the callback structure. This makes sure that unsetting the
option works.
2025-01-31 18:41:35 +01:00
Nick Wellnhofer
1f5b5371cf parser: Improve handling of NOBLANKS option
Don't change the SAX handler.

Use a helper function to invoke "characters" SAX callback.

The old code didn't advance the input pointer consistently before
invoking the callback. There was also some inconsistency wrt to
ctxt->space handling. I don't understand the ctxt->space thing, but
now we always behave like the non-complex case before.
2025-01-31 18:09:22 +01:00
Nick Wellnhofer
7a8722f557 parser: Document that XML_PARSE_NOBLANKS is broken
Long text content can generate multiple "characters" callbacks which can
lead to NOBLANKS removing whitespace in non-whitespace text nodes. So
the NOBLANKS option doesn't even work reliably with the pull parser.
This would be extremely hard to fix.

Unfortunately, `xmllint --format` relies on this option which is another
reason why this feature never really worked.
2025-01-31 18:09:03 +01:00
Nick Wellnhofer
9efe141422 parser: Fix detection of ']]>' when push-parsing
Fixes #850.
2025-01-31 15:50:00 +01:00
Nick Wellnhofer
115b13f9d1 parser: Document push parser limitations 2025-01-31 15:50:00 +01:00
Nick Wellnhofer
53a48468ae xmllint: Make --push report parse errors
The push parser leaves documents in ctxt->myDoc even if they're invalid.

Also fix documentation.

Regressed with f8ff4d86.
2025-01-31 15:50:00 +01:00
Nick Wellnhofer
5535721f04 parser: Grow input buffer after lots of whitespace
Make sure that the input buffer is grown after consuming large amounts
of whitespace.

Also move a comment.
2025-01-31 15:49:53 +01:00
Nick Wellnhofer
218264fada parser: Always shrink input buffer
Shrinking the input buffer is cheap now and should be done as soon as
possible.
2025-01-30 01:26:01 +01:00
Nick Wellnhofer
93506d41cb parser: Make catalog PIs opt-in
This is an obscure feature that shouldn't be enabled by default.
2025-01-29 00:50:47 +01:00
Nick Wellnhofer
1082d813e8 parser: Prepare to make decompression opt-in
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
2025-01-29 00:49:57 +01:00
Nick Wellnhofer
a78843be5e xmllint: Support compressed input from stdin
Another regression related to reading from stdin.

Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.

This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.

Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
2025-01-28 23:20:37 +01:00
Nick Wellnhofer
ca81916023 include: Use intptr_t to cast between pointers and ints 2025-01-03 20:59:10 +01:00
Nick Wellnhofer
2e3a91a766 doc: Fix documentation 2024-12-26 21:05:39 +01:00
Nick Wellnhofer
8231c03663 parser: Check reallocations for overflow 2024-12-21 19:37:37 +01:00
Nick Wellnhofer
6548ba11b8 parser: Fix argument checks in xmlCtxtParse*
- Raise invalid argument error.
- Free input stream if ctxt is NULL.
2024-12-13 17:57:11 +01:00
Nick Wellnhofer
eae9a1bd8b parser: Pop input stream in xmlCtxtValidateDtd 2024-11-26 14:30:54 +01:00
Nick Wellnhofer
dafcefb228 parser: Fail on catastrophic errors in recovery mode 2024-11-26 00:47:48 +01:00
Nick Wellnhofer
0dc26910c1 parser: Deprecate more internal functions 2024-11-21 22:31:20 +01:00
Nick Wellnhofer
84a6eece62 parser: Remove unneeded call to xmlDetectEncoding 2024-11-19 00:25:23 +01:00
Nick Wellnhofer
497081baab parser: Remove remaining calls to xml{Push|Pop}Input 2024-11-19 00:25:23 +01:00
Nick Wellnhofer
0f4f89005d parser: Rename inputPush to xmlCtxtPushInput 2024-11-19 00:25:23 +01:00
Nick Wellnhofer
e2ad249c23 parser: Deprecate more internal symbols
- xmlParseExternalSubset
- xmlPushInput
- xmlPopInput
- xmlCopyCharMultiByte
- xmlCreateEntityParserCtxt
- xmlStringComment
2024-11-19 00:25:23 +01:00
Nick Wellnhofer
631778f679 parser: Check for malloc failure in xmlCtxtParseDtd 2024-11-17 12:11:41 +01:00
Nick Wellnhofer
7f8c436c75 parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd
This allows to use the context's error handler, options and other
settings.

Fixes #808.
2024-11-15 16:30:52 +01:00
Ruslan Garipov
aaecdc92e2
parser: Assign value without if-statement
This avoids an if-statement, because effectively it does nothing.  And,
for example, binary artifact generated by GCC with -O2 optimization
settings does not contain that if-statement -- the code just uses the
hprefix->name field explicitly.

No functional changes intended.

Signed-off-by: Ruslan Garipov <ruslanngaripov@gmail.com>
2024-11-12 16:42:36 +05:00
Nick Wellnhofer
869e3fd421 parser: Fix loading of parameter entities in external DTDs
Regressed with commit 12f0bb94.

Fixes #816.
2024-11-01 16:53:18 +01:00
Nick Wellnhofer
efb57ddba3 parser: Fix downstream code that swaps DTDs
Downstream code like the nginx xslt module can change the document's DTD
pointers in a SAX callback. If an entity from a separate DTD is parsed
lazily, its content must not reference the current document.

Regressed with commit d025cfbb.

Fixes #815.
2024-10-30 14:13:38 +01:00
Nick Wellnhofer
0ec5687e06 parser: Rework xmlCtxtGrowAttrs
Remove unneeded argument.

Check for integer overflow. We probably hit the buffer size limit in
xmlParserGrow before, but better be safe.
2024-10-28 21:06:52 +01:00
Nick Wellnhofer
ffb058f484 parser: Fix detection of duplicate attributes
We really need a second scan if more than one namespace clash was
detected.
2024-10-28 20:26:55 +01:00
Nick Wellnhofer
b52a3044aa parser: Use counted_by attribute if supported
We only have a single struct with a flexible array member.
2024-10-24 18:18:47 +02:00
Nick Wellnhofer
74dfc49b5f parser: Clarify logic in xmlParseStartTag2 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0bc4608c50 html: Use hash table to check for duplicate attributes 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0ce7bfe559 html: Try to avoid passing XML options to HTML parser 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
16de1346eb parser: Make new options actually work 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
dde62ae5d5 parser: Align push parsing of CDATA sections with pull parser
Remove special handling of CDATA sections in push parser. This makes
sure that only a single callback is generated for large sections.

Fixes #22 and needed for #412.
2024-08-29 01:28:49 +02:00
Nick Wellnhofer
4d10e53af1 parser: Make sure to set and increment input id
Revert part of commits 410931e3 and b9d2f3c9.
2024-08-28 22:47:20 +02:00
Nick Wellnhofer
6d365ca02c doc: XML_PARSE_NO_XXE is available since 2.13.0 2024-08-28 22:09:30 +02:00
makise-homura
103aadbc66 parser: Suppress EDG maybe-uninitialized warning 2024-08-16 22:26:07 +03:00
Nick Wellnhofer
02fcb1effb parser: Make xmlParseChunk return an error if parser was stopped
This regressed after enhancing the disableSAX member in 2.13.

Should fix #777.
2024-07-25 17:07:18 +02:00
Nick Wellnhofer
1a89323039 [CVE-2024-40896] Fix XXE protection in downstream code
Some users set an entity's children manually in the getEntity SAX
callback to restrict entity expansion. This stopped working after
renaming the "checked" member of xmlEntity, making at least one
downstream project and its dependants susceptible to XXE attacks.

See #761.
2024-07-24 17:19:32 +02:00
Nick Wellnhofer
6a3c0b0d93 parser: Increase XML_MAX_DICTIONARY_LIMIT
This limit is somewhat arbitrary and can be reached when fuzzing
documents up to 1 MB.

Increase limit to 100 MB and disable limit if XML_PARSE_HUGE is set.
2024-07-22 12:53:00 +02:00
Nick Wellnhofer
5d36664fc9 memory: Deprecate xmlGcMemSetup 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
7148b77820 parser: Optimize memory buffer I/O
Reenable zero-copy IO for zero-terminated static memory buffers.

Don't stream zero-terminated dynamic memory buffers on top of creating
a copy.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
34c9108f15 encoding: Add sizeOut argument to xmlCharEncInput
When push parsing, we want to convert as much of the input as possible.
When pull parsing memory buffers, we want to convert data chunk by chunk
to save memory.
2024-07-16 17:42:10 +02:00