1309 Commits

Author SHA1 Message Date
Nick Wellnhofer
b349225952 include: Change some return types from int to enum
This also affects some new functions from 2.13.
2025-03-14 02:31:01 +01:00
Nick Wellnhofer
fd1b939168 include: Convert some macros to enums 2025-03-14 00:35:40 +01:00
Nick Wellnhofer
84c6524e26 encoding: Support input-only and output-only converters
Make it possible to open an encoding handler only for input or output.
This avoids the creation of unnecessary converters.

Should also fix #863.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
69b83bb68e encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.

It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.

Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
03a8f1dd75 doc: Document SAX handlers a little more 2025-03-11 18:53:59 +01:00
Nick Wellnhofer
87c9e000e5 encoding: Rework custom encoding implementation API 2025-03-09 22:37:13 +01:00
Nick Wellnhofer
ba9148d8a5 parser: Undeprecate input->consumed
Should be deprecated after fixing #762.
2025-03-09 20:30:49 +01:00
Nick Wellnhofer
a0dbf030ee parser: Undeprecate ctxt->loadsubset
Should be deprecated after fixing #873.
2025-03-09 20:24:06 +01:00
Nick Wellnhofer
d96911f100 doc: Documentation fixes 2025-03-08 23:03:26 +01:00
Nick Wellnhofer
5f0b1378d7 parser: Add more parser context accessors
Fixes #763.
2025-03-08 22:36:06 +01:00
Nick Wellnhofer
38f475072a encoding: Make conversion callbacks more type-safe 2025-03-05 22:25:14 +01:00
Nick Wellnhofer
a846d96468 encoding: Remove compatibility struct members 2025-03-05 16:49:42 +01:00
Nick Wellnhofer
94d8a3e231 parser: Convert xmlParserMaxDepth to macro 2025-03-05 14:56:46 +01:00
Nick Wellnhofer
696572248f globals: Remove unused globals
- xmlBufferAllocScheme
- xmlDefaultBufferSize
- xmlParserDebugEntities
2025-03-05 12:24:38 +01:00
Nick Wellnhofer
92d7b0cd90 xpath: Rename valuePush and valuePop 2025-03-05 12:24:38 +01:00
Nick Wellnhofer
03be993ce5 Use memcpy to avoid pointer cast warnings 2025-03-05 12:24:38 +01:00
Nick Wellnhofer
f502e9b2f6 include: Add more deprecation warnings 2025-03-04 17:38:10 +01:00
Nick Wellnhofer
85bd58ef56 globals: Remove functions related to global state handling
- xmlGetGlobalState
- xmlInitializeGlobalState
- xmlGetThreadId
- xmlIsMainThread
2025-03-04 17:38:10 +01:00
Nick Wellnhofer
03a8d5f93d unicode: Make Unicode functions private 2025-03-04 17:31:11 +01:00
Nick Wellnhofer
3d37ff84c3 globals: Also use global state struct if threads are disabled 2025-03-04 16:54:41 +01:00
Nick Wellnhofer
a15ad9b268 parser: Remove compatibility symbols 2025-03-04 16:54:41 +01:00
Nick Wellnhofer
8e871162a6 parser: Remove oldXMLWDcompatibility 2025-03-04 16:54:41 +01:00
Nick Wellnhofer
cdc5cfed0b legacy: Remove legacy symbols 2025-03-04 16:54:05 +01:00
Nick Wellnhofer
3250a01dc2 error: Convert initGenericErrorDefaultFunc to macro 2025-03-04 16:53:59 +01:00
Nick Wellnhofer
c42b32277d parser: Convert inputPush and inputPop to macros 2025-03-04 16:53:28 +01:00
Nick Wellnhofer
361f7bff92 parser: Make nodePush, nodePop, namePush, namePop private 2025-03-04 16:47:14 +01:00
Nick Wellnhofer
0b27097a92 encoding: Rename unprefixed public functions 2025-03-04 16:46:53 +01:00
Nick Wellnhofer
e50d314a27 build: Add separate configuration option for RELAX NG
Support for RELAX NG used to be enabled together with XML Schema support
(--with-schemas). Now there's a separate option and a new feature macro
LIBXML_RELAXNG_ENABLED.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
7ae8e8ac7d schemas: Make xmlSchemaDump depend on DEBUG_ENABLED 2025-02-22 21:06:34 +01:00
Nick Wellnhofer
6fc260760a regexp: Hide debugging code behind DEBUG_REGEXP
xmlRegexpPrint is now a deprecated no-op.
2025-02-22 20:55:06 +01:00
Nick Wellnhofer
9c16a153d8 Revert "include: Make most IS_* macros private"
This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
2025-02-13 20:20:17 +01:00
Nick Wellnhofer
93506d41cb parser: Make catalog PIs opt-in
This is an obscure feature that shouldn't be enabled by default.
2025-01-29 00:50:47 +01:00
Nick Wellnhofer
1082d813e8 parser: Prepare to make decompression opt-in
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
2025-01-29 00:49:57 +01:00
Nick Wellnhofer
a78843be5e xmllint: Support compressed input from stdin
Another regression related to reading from stdin.

Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.

This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.

Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
2025-01-28 23:20:37 +01:00
Nick Wellnhofer
bfe6af2eed fuzz: Remove hacks to build lint fuzzer
Don't include source file directly.
2025-01-17 20:06:45 +01:00
Nick Wellnhofer
e41941109d schemas: Make ValidateStream take a const SAXHandler 2025-01-17 20:05:57 +01:00
Nick Wellnhofer
c134e8b4dc include: Make INPUT_CHUNK macro private 2024-12-21 20:02:34 +01:00
Nick Wellnhofer
84a6c82ff8 include: Make most IS_* macros private
Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.
2024-12-21 20:01:30 +01:00
Nick Wellnhofer
2e18e5dc6d memory: Grow dynamic arrays by 50%
Growing by a factor lower than the golden ratio increases the chances of
reusing memory freed from earlier allocations. Set growth rate to 1.5
which also reduces internal fragmentation.
2024-12-21 19:37:38 +01:00
Nick Wellnhofer
5320a4aa38 memory: Implement xmlGrowCapacity to safely grow arrays
xmlGrowCapacity makes sure that dynamic arrays don't grow beyond an
explicit maximum size. size_t considerations are also taken into account.
A macro XML_MAX_ITEMS is provided as default maximum with value
1 billion.

When fuzzing, the initial size is set to 1 to cause more reallocations.
This can require adjustments if callers really need larger arrays.
2024-12-21 19:37:37 +01:00
Nick Wellnhofer
0dd910e82b save: Fix handling of catastrophic errors
Don't overwrite catastrophic errors xmlSaveErr.

Overwrite non-catastrophic errors in xmlOutputBufferClose.
2024-12-19 02:30:36 +01:00
Nick Wellnhofer
57087e5fc7 parser: Don't overwrite catastrophic errors
Stop reporting errors after a catastrophic error.

Also make sure that ctxt->errNo matches ctxt->lastError.code.
2024-11-26 00:47:48 +01:00
Nick Wellnhofer
0dc26910c1 parser: Deprecate more internal functions 2024-11-21 22:31:20 +01:00
Nick Wellnhofer
a227a71ac9 regexp: Deprecate internal functions 2024-11-20 17:03:11 +01:00
Nick Wellnhofer
0f4f89005d parser: Rename inputPush to xmlCtxtPushInput 2024-11-19 00:25:23 +01:00
Nick Wellnhofer
e2ad249c23 parser: Deprecate more internal symbols
- xmlParseExternalSubset
- xmlPushInput
- xmlPopInput
- xmlCopyCharMultiByte
- xmlCreateEntityParserCtxt
- xmlStringComment
2024-11-19 00:25:23 +01:00
Nick Wellnhofer
4d1f35b0a9 valid: Deprecate more internal functions 2024-11-19 00:03:37 +01:00
Nick Wellnhofer
5a51f08517 valid: Implement xmlCtxtValidateDocument
This allows to use the error handler or resource loader of a parser
context.
2024-11-19 00:03:37 +01:00
Nick Wellnhofer
7f8c436c75 parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd
This allows to use the context's error handler, options and other
settings.

Fixes #808.
2024-11-15 16:30:52 +01:00
Nick Wellnhofer
0bc4608c50 html: Use hash table to check for duplicate attributes 2024-10-06 20:04:00 +02:00