1062 Commits

Author SHA1 Message Date
Nick Wellnhofer
8696ebe182 parser: Fix ignorableWhitespace callback
If ignorableWhitespace differs from the "characters" callback, we have
to check for blanks as well.

Regressed with 1f5b537.
2025-03-11 16:34:30 +01:00
Nick Wellnhofer
25490528af parser: Fix spurious error in SAX mode
Short-lived regression from 5f0b1378.
2025-03-11 16:34:30 +01:00
Nick Wellnhofer
5f0b1378d7 parser: Add more parser context accessors
Fixes #763.
2025-03-08 22:36:06 +01:00
Nick Wellnhofer
94d8a3e231 parser: Convert xmlParserMaxDepth to macro 2025-03-05 14:56:46 +01:00
Nick Wellnhofer
03a8d5f93d unicode: Make Unicode functions private 2025-03-04 17:31:11 +01:00
Nick Wellnhofer
cdc5cfed0b legacy: Remove legacy symbols 2025-03-04 16:54:05 +01:00
Nick Wellnhofer
c42b32277d parser: Convert inputPush and inputPop to macros 2025-03-04 16:53:28 +01:00
Nick Wellnhofer
361f7bff92 parser: Make nodePush, nodePop, namePush, namePop private 2025-03-04 16:47:14 +01:00
Nick Wellnhofer
05bd1720ce parser: Fix parsing of DTD content
Regressed in 2.11. Fixes #868.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
e50d314a27 build: Add separate configuration option for RELAX NG
Support for RELAX NG used to be enabled together with XML Schema support
(--with-schemas). Now there's a separate option and a new feature macro
LIBXML_RELAXNG_ENABLED.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
b4d3d87ed2 parser: Fix parsing of doctype declarations
Fix some long-standing issues.

Fixes #504.
2025-02-02 11:15:45 +01:00
Nick Wellnhofer
57e4bbd803 parser: Improve handling of NOCDATA option
Don't modify the callback structure. This makes sure that unsetting the
option works.
2025-01-31 18:41:35 +01:00
Nick Wellnhofer
1f5b5371cf parser: Improve handling of NOBLANKS option
Don't change the SAX handler.

Use a helper function to invoke "characters" SAX callback.

The old code didn't advance the input pointer consistently before
invoking the callback. There was also some inconsistency wrt to
ctxt->space handling. I don't understand the ctxt->space thing, but
now we always behave like the non-complex case before.
2025-01-31 18:09:22 +01:00
Nick Wellnhofer
7a8722f557 parser: Document that XML_PARSE_NOBLANKS is broken
Long text content can generate multiple "characters" callbacks which can
lead to NOBLANKS removing whitespace in non-whitespace text nodes. So
the NOBLANKS option doesn't even work reliably with the pull parser.
This would be extremely hard to fix.

Unfortunately, `xmllint --format` relies on this option which is another
reason why this feature never really worked.
2025-01-31 18:09:03 +01:00
Nick Wellnhofer
9efe141422 parser: Fix detection of ']]>' when push-parsing
Fixes #850.
2025-01-31 15:50:00 +01:00
Nick Wellnhofer
115b13f9d1 parser: Document push parser limitations 2025-01-31 15:50:00 +01:00
Nick Wellnhofer
53a48468ae xmllint: Make --push report parse errors
The push parser leaves documents in ctxt->myDoc even if they're invalid.

Also fix documentation.

Regressed with f8ff4d86.
2025-01-31 15:50:00 +01:00
Nick Wellnhofer
5535721f04 parser: Grow input buffer after lots of whitespace
Make sure that the input buffer is grown after consuming large amounts
of whitespace.

Also move a comment.
2025-01-31 15:49:53 +01:00
Nick Wellnhofer
218264fada parser: Always shrink input buffer
Shrinking the input buffer is cheap now and should be done as soon as
possible.
2025-01-30 01:26:01 +01:00
Nick Wellnhofer
93506d41cb parser: Make catalog PIs opt-in
This is an obscure feature that shouldn't be enabled by default.
2025-01-29 00:50:47 +01:00
Nick Wellnhofer
1082d813e8 parser: Prepare to make decompression opt-in
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
2025-01-29 00:49:57 +01:00
Nick Wellnhofer
a78843be5e xmllint: Support compressed input from stdin
Another regression related to reading from stdin.

Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.

This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.

Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
2025-01-28 23:20:37 +01:00
Nick Wellnhofer
ca81916023 include: Use intptr_t to cast between pointers and ints 2025-01-03 20:59:10 +01:00
Nick Wellnhofer
2e3a91a766 doc: Fix documentation 2024-12-26 21:05:39 +01:00
Nick Wellnhofer
8231c03663 parser: Check reallocations for overflow 2024-12-21 19:37:37 +01:00
Nick Wellnhofer
6548ba11b8 parser: Fix argument checks in xmlCtxtParse*
- Raise invalid argument error.
- Free input stream if ctxt is NULL.
2024-12-13 17:57:11 +01:00
Nick Wellnhofer
eae9a1bd8b parser: Pop input stream in xmlCtxtValidateDtd 2024-11-26 14:30:54 +01:00
Nick Wellnhofer
dafcefb228 parser: Fail on catastrophic errors in recovery mode 2024-11-26 00:47:48 +01:00
Nick Wellnhofer
0dc26910c1 parser: Deprecate more internal functions 2024-11-21 22:31:20 +01:00
Nick Wellnhofer
84a6eece62 parser: Remove unneeded call to xmlDetectEncoding 2024-11-19 00:25:23 +01:00
Nick Wellnhofer
497081baab parser: Remove remaining calls to xml{Push|Pop}Input 2024-11-19 00:25:23 +01:00
Nick Wellnhofer
0f4f89005d parser: Rename inputPush to xmlCtxtPushInput 2024-11-19 00:25:23 +01:00
Nick Wellnhofer
e2ad249c23 parser: Deprecate more internal symbols
- xmlParseExternalSubset
- xmlPushInput
- xmlPopInput
- xmlCopyCharMultiByte
- xmlCreateEntityParserCtxt
- xmlStringComment
2024-11-19 00:25:23 +01:00
Nick Wellnhofer
631778f679 parser: Check for malloc failure in xmlCtxtParseDtd 2024-11-17 12:11:41 +01:00
Nick Wellnhofer
7f8c436c75 parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd
This allows to use the context's error handler, options and other
settings.

Fixes #808.
2024-11-15 16:30:52 +01:00
Ruslan Garipov
aaecdc92e2
parser: Assign value without if-statement
This avoids an if-statement, because effectively it does nothing.  And,
for example, binary artifact generated by GCC with -O2 optimization
settings does not contain that if-statement -- the code just uses the
hprefix->name field explicitly.

No functional changes intended.

Signed-off-by: Ruslan Garipov <ruslanngaripov@gmail.com>
2024-11-12 16:42:36 +05:00
Nick Wellnhofer
869e3fd421 parser: Fix loading of parameter entities in external DTDs
Regressed with commit 12f0bb94.

Fixes #816.
2024-11-01 16:53:18 +01:00
Nick Wellnhofer
efb57ddba3 parser: Fix downstream code that swaps DTDs
Downstream code like the nginx xslt module can change the document's DTD
pointers in a SAX callback. If an entity from a separate DTD is parsed
lazily, its content must not reference the current document.

Regressed with commit d025cfbb.

Fixes #815.
2024-10-30 14:13:38 +01:00
Nick Wellnhofer
0ec5687e06 parser: Rework xmlCtxtGrowAttrs
Remove unneeded argument.

Check for integer overflow. We probably hit the buffer size limit in
xmlParserGrow before, but better be safe.
2024-10-28 21:06:52 +01:00
Nick Wellnhofer
ffb058f484 parser: Fix detection of duplicate attributes
We really need a second scan if more than one namespace clash was
detected.
2024-10-28 20:26:55 +01:00
Nick Wellnhofer
b52a3044aa parser: Use counted_by attribute if supported
We only have a single struct with a flexible array member.
2024-10-24 18:18:47 +02:00
Nick Wellnhofer
74dfc49b5f parser: Clarify logic in xmlParseStartTag2 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0bc4608c50 html: Use hash table to check for duplicate attributes 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0ce7bfe559 html: Try to avoid passing XML options to HTML parser 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
16de1346eb parser: Make new options actually work 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
dde62ae5d5 parser: Align push parsing of CDATA sections with pull parser
Remove special handling of CDATA sections in push parser. This makes
sure that only a single callback is generated for large sections.

Fixes #22 and needed for #412.
2024-08-29 01:28:49 +02:00
Nick Wellnhofer
4d10e53af1 parser: Make sure to set and increment input id
Revert part of commits 410931e3 and b9d2f3c9.
2024-08-28 22:47:20 +02:00
Nick Wellnhofer
6d365ca02c doc: XML_PARSE_NO_XXE is available since 2.13.0 2024-08-28 22:09:30 +02:00
makise-homura
103aadbc66 parser: Suppress EDG maybe-uninitialized warning 2024-08-16 22:26:07 +03:00
Nick Wellnhofer
02fcb1effb parser: Make xmlParseChunk return an error if parser was stopped
This regressed after enhancing the disableSAX member in 2.13.

Should fix #777.
2024-07-25 17:07:18 +02:00