Nick Wellnhofer
69b83bb68e
encoding: Detect truncated multi-byte sequences with ICU
...
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
03a8f1dd75
doc: Document SAX handlers a little more
2025-03-11 18:53:59 +01:00
Nick Wellnhofer
87c9e000e5
encoding: Rework custom encoding implementation API
2025-03-09 22:37:13 +01:00
Nick Wellnhofer
ba9148d8a5
parser: Undeprecate input->consumed
...
Should be deprecated after fixing #762 .
2025-03-09 20:30:49 +01:00
Nick Wellnhofer
a0dbf030ee
parser: Undeprecate ctxt->loadsubset
...
Should be deprecated after fixing #873 .
2025-03-09 20:24:06 +01:00
Nick Wellnhofer
d96911f100
doc: Documentation fixes
2025-03-08 23:03:26 +01:00
Nick Wellnhofer
5f0b1378d7
parser: Add more parser context accessors
...
Fixes #763 .
2025-03-08 22:36:06 +01:00
Nick Wellnhofer
38f475072a
encoding: Make conversion callbacks more type-safe
2025-03-05 22:25:14 +01:00
Nick Wellnhofer
a846d96468
encoding: Remove compatibility struct members
2025-03-05 16:49:42 +01:00
Nick Wellnhofer
94d8a3e231
parser: Convert xmlParserMaxDepth to macro
2025-03-05 14:56:46 +01:00
Nick Wellnhofer
696572248f
globals: Remove unused globals
...
- xmlBufferAllocScheme
- xmlDefaultBufferSize
- xmlParserDebugEntities
2025-03-05 12:24:38 +01:00
Nick Wellnhofer
92d7b0cd90
xpath: Rename valuePush and valuePop
2025-03-05 12:24:38 +01:00
Nick Wellnhofer
03be993ce5
Use memcpy to avoid pointer cast warnings
2025-03-05 12:24:38 +01:00
Nick Wellnhofer
f502e9b2f6
include: Add more deprecation warnings
2025-03-04 17:38:10 +01:00
Nick Wellnhofer
85bd58ef56
globals: Remove functions related to global state handling
...
- xmlGetGlobalState
- xmlInitializeGlobalState
- xmlGetThreadId
- xmlIsMainThread
2025-03-04 17:38:10 +01:00
Nick Wellnhofer
03a8d5f93d
unicode: Make Unicode functions private
2025-03-04 17:31:11 +01:00
Nick Wellnhofer
3d37ff84c3
globals: Also use global state struct if threads are disabled
2025-03-04 16:54:41 +01:00
Nick Wellnhofer
a15ad9b268
parser: Remove compatibility symbols
2025-03-04 16:54:41 +01:00
Nick Wellnhofer
8e871162a6
parser: Remove oldXMLWDcompatibility
2025-03-04 16:54:41 +01:00
Nick Wellnhofer
cdc5cfed0b
legacy: Remove legacy symbols
2025-03-04 16:54:05 +01:00
Nick Wellnhofer
3250a01dc2
error: Convert initGenericErrorDefaultFunc to macro
2025-03-04 16:53:59 +01:00
Nick Wellnhofer
c42b32277d
parser: Convert inputPush and inputPop to macros
2025-03-04 16:53:28 +01:00
Nick Wellnhofer
361f7bff92
parser: Make nodePush, nodePop, namePush, namePop private
2025-03-04 16:47:14 +01:00
Nick Wellnhofer
0b27097a92
encoding: Rename unprefixed public functions
2025-03-04 16:46:53 +01:00
Nick Wellnhofer
e50d314a27
build: Add separate configuration option for RELAX NG
...
Support for RELAX NG used to be enabled together with XML Schema support
(--with-schemas). Now there's a separate option and a new feature macro
LIBXML_RELAXNG_ENABLED.
2025-03-01 15:18:20 +01:00
Nick Wellnhofer
7ae8e8ac7d
schemas: Make xmlSchemaDump depend on DEBUG_ENABLED
2025-02-22 21:06:34 +01:00
Nick Wellnhofer
6fc260760a
regexp: Hide debugging code behind DEBUG_REGEXP
...
xmlRegexpPrint is now a deprecated no-op.
2025-02-22 20:55:06 +01:00
Nick Wellnhofer
9c16a153d8
Revert "include: Make most IS_* macros private"
...
This reverts commit 84a6c82ff83d04963d6e1c5cd18ded68ea02d99f.
2025-02-13 20:20:17 +01:00
Nick Wellnhofer
93506d41cb
parser: Make catalog PIs opt-in
...
This is an obscure feature that shouldn't be enabled by default.
2025-01-29 00:50:47 +01:00
Nick Wellnhofer
1082d813e8
parser: Prepare to make decompression opt-in
...
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
2025-01-29 00:49:57 +01:00
Nick Wellnhofer
e41941109d
schemas: Make ValidateStream take a const SAXHandler
2025-01-17 20:05:57 +01:00
Nick Wellnhofer
c134e8b4dc
include: Make INPUT_CHUNK macro private
2024-12-21 20:02:34 +01:00
Nick Wellnhofer
84a6c82ff8
include: Make most IS_* macros private
...
Macros like IS_DIGIT or IS_LETTER severely pollute the C namespace.
2024-12-21 20:01:30 +01:00
Nick Wellnhofer
0dc26910c1
parser: Deprecate more internal functions
2024-11-21 22:31:20 +01:00
Nick Wellnhofer
a227a71ac9
regexp: Deprecate internal functions
2024-11-20 17:03:11 +01:00
Nick Wellnhofer
0f4f89005d
parser: Rename inputPush to xmlCtxtPushInput
2024-11-19 00:25:23 +01:00
Nick Wellnhofer
e2ad249c23
parser: Deprecate more internal symbols
...
- xmlParseExternalSubset
- xmlPushInput
- xmlPopInput
- xmlCopyCharMultiByte
- xmlCreateEntityParserCtxt
- xmlStringComment
2024-11-19 00:25:23 +01:00
Nick Wellnhofer
4d1f35b0a9
valid: Deprecate more internal functions
2024-11-19 00:03:37 +01:00
Nick Wellnhofer
5a51f08517
valid: Implement xmlCtxtValidateDocument
...
This allows to use the error handler or resource loader of a parser
context.
2024-11-19 00:03:37 +01:00
Nick Wellnhofer
7f8c436c75
parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd
...
This allows to use the context's error handler, options and other
settings.
Fixes #808 .
2024-11-15 16:30:52 +01:00
Nick Wellnhofer
c32397d51f
html: Improve character class macros
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
c34d0ae9cc
html: Deprecate htmlIsBooleanAttr
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
6040785ac4
html: Deprecate AutoClose API
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
188cad68a4
html: Remove obsolete content model
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
462bf0b7a5
html: Rework options
...
Introduce htmlCtxtSetOptions, see similar changes made to XML parser.
Add HTML_PARSE_HUGE alias. Support HTML_PARSE_BIG_LINES.
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
e062a4a9b3
html: Add HTML5 parser option
...
This option passes tokenizer output directly to the SAX callbacks,
making it possible to test the tokenizer against the html5lib test
suite.
This will produce unbalanced calls to the startElement and endElement
callbacks, but it's the only way to support a SAX like interface for
HTML5. It can be used for filtering or rewriting HTML5, for example.
A HTML5 tree builder could then be implemented on top of the SAX
callbacks.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
f9ed30e972
html: HTML5 character data states
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
b1c5aa6544
xpath: Deprecate xmlXPathNAN and xmlXPath*INF
...
Users should simply use the C99 macros.
2024-09-19 12:50:59 +02:00
Nick Wellnhofer
c46b89e243
xpath: Deprecate xmlXPathEvalExpr
...
Also check the argument instead of crashing if there's no context.
2024-09-13 21:06:36 +02:00
Nick Wellnhofer
de10d4cd5f
include: Check whether _MSC_VER is defined
...
Should fix #795 .
2024-09-04 16:32:22 +02:00