6958 Commits

Author SHA1 Message Date
Nick Wellnhofer
97680d6c08 io: Rework xmlParserInputBufferGrow
Remove dubious (len != 4) check.

Remove compression-related code. This should already be set when
opening the input.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
a6f54f055b io: Fine-tune initial IO buffer size 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
7148b77820 parser: Optimize memory buffer I/O
Reenable zero-copy IO for zero-terminated static memory buffers.

Don't stream zero-terminated dynamic memory buffers on top of creating
a copy.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
34c9108f15 encoding: Add sizeOut argument to xmlCharEncInput
When push parsing, we want to convert as much of the input as possible.
When pull parsing memory buffers, we want to convert data chunk by chunk
to save memory.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
8e871a31f8 buf: Rework xmlBuffer code
Port most changes made to the xmlBuf code in f3807d76, except that
"size" still includes the terminating NULL byte.

Make xmlSetBufferAllocationScheme, xmlBufferAllocScheme and
xmlDefaultBufferSize no-ops.

Deprecate a few functions.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
888f70c77e buf: Move xmlBuffer code to buf.c 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
92f30711de parser: Optimize buffer shrinking
Remove checks now that we can shrink memory buffers efficiently.

Shrink more aggressively.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
a221cd7849 buf: Rework xmlBuf code
Always use what the old implementation called the "IO" allocation
scheme, allowing to move the content pointer past the initial
allocation. This is inexpensive and allows efficient shrinking.

Optimize xmlBufGrow, reusing shrunken memory as much as possible.

Simplify xmlBufAdd.

Make xmlBufBackToBuffer return an error on overflow.

Make "size" exclude the terminating NULL byte.

Always provide an initial size.

Reintroduce static buffers.

Remove xmlBufResize and several other functions.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
2adcde3920 save: Optimize xmlSerializeText
Use lookup tables.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
1b06708271 save: Always serialize CR as decimal "
"
We used to serialize CR as "
" when there was no encoding and we
weren't in an attribute. This was somewhat inconsistent.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
1cfc5b8089 entities: Rework serialization of numeric character references 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
8d1606265d entities: Rework text escaping 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
cc45f618ae save: Rework text escaping
Stop using xmlOutputBufferWriteEscape except when using deprecated
xmlSaveSetEscape. Rewrite xmlOutputBufferWriteEscape to use an extra
buffer and call xmlOutputBufferWrite.

Introduce xmlSerializeText to serialize both text and attribute content.

Don't read encoding from document when serializing and remove all hacks
that temporarily changed the document's encoding.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
e488695b1a save: Deprecate xmlSaveSet*Escape
xmlSaveSetAttrEscape never had an effect.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
0ab07b21dd io: Rework xmlOutputBufferWrite
Simplify code, handle short writes from callback.
2024-07-16 17:42:10 +02:00
Markus Rickert
bb1884cb13 Enable CMake checks for MSVC 2024-07-16 10:19:23 +02:00
Nick Wellnhofer
e0494c0d43 io: Add some deprecation warnings 2024-07-15 16:33:38 +02:00
Nick Wellnhofer
2dcd561dc8 regexp: Don't print to stderr 2024-07-15 16:33:38 +02:00
Nick Wellnhofer
4b1832c115 relaxng: Use error handler for internal errors
Don't print to stderr.
2024-07-15 16:33:38 +02:00
Nick Wellnhofer
728869809e error: Add helper functions to print errors and abort 2024-07-15 16:33:38 +02:00
Nick Wellnhofer
f6170b489c memory: Don't report OOM to stderr 2024-07-15 16:33:38 +02:00
Nick Wellnhofer
6be79014d7 Remove unused code 2024-07-15 16:33:38 +02:00
Nick Wellnhofer
fee0006a06 parser: Fix memory leak after malloc failure in xml*ParseDTD 2024-07-15 13:03:55 +02:00
Nick Wellnhofer
69f12d6d47 encoding: Deprecate xmlByteConsumed
This was only used by Chromium/WebKit to detect whether xmlParseContent
really succeeded. It's a horrible, overcomplicated hack.

See 8c5848bd and #767.
2024-07-13 15:42:02 +02:00
Nick Wellnhofer
440d11afd4 reader: Deprecate xmlTextReaderByteConsumed
Document that this function is useless.

Stop trying to handle encoding via xmlByteConsumed which can be
expensive.
2024-07-13 15:42:02 +02:00
Nick Wellnhofer
3528b81f8a tools: Move codegen tools to 'tools' directory 2024-07-13 15:42:02 +02:00
Nick Wellnhofer
c3b2f4713c cmake: Update option description 2024-07-13 15:42:02 +02:00
Nick Wellnhofer
3048793251 meson: Also disable icu and thread_alloc by default 2024-07-13 15:42:02 +02:00
Nick Wellnhofer
aa6aec19b0 parser: Fix xmlInputSetEncodingHandler again
Short-lived regression.
2024-07-11 12:42:13 +02:00
Nick Wellnhofer
8af55c8d20 parser: Rename new input API functions
These weren't made public yet.
2024-07-11 01:33:29 +02:00
Nick Wellnhofer
d74ca59491 parser: Rename internal xmlNewInput functions 2024-07-11 01:31:50 +02:00
Nick Wellnhofer
4f329dc524 parser: Implement xmlCtxtParseContent
This implements xmlCtxtParseContent, a better alternative to
xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a
parser context and a parser input, making it a lot more versatile.

xmlParseInNodeContext is now implemented in terms of
xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never
modifies the target document, improving thread safety.
xmlParseInNodeContext is also more lenient now with regard to undeclared
entities.

Fixes #727.
2024-07-11 01:26:32 +02:00
Nick Wellnhofer
673ca0edaf tests: Regenerate testapi.c 2024-07-11 01:23:57 +02:00
Nick Wellnhofer
4fec0889e0 parser: Fix memory leak in xmlInputSetEncodingHandler
Short-lived regression.
2024-07-10 22:32:33 +02:00
Nick Wellnhofer
d099795611 encoding: Readd some UTF-8 validation to encoders
This isn't strictly needed but avoids generating invalid UTF-16 and
unsigned integer overflows.
2024-07-10 22:26:19 +02:00
Nick Wellnhofer
ae6e2ee7ec fuzz: Adjust reader fuzzer 2024-07-10 22:26:11 +02:00
Nick Wellnhofer
f48eefe3d0 encoding: Rework xmlByteConsumed
Don't loop infinitely if input buffer is too large. Allocate conversion
buffer on the heap.
2024-07-09 14:25:32 +02:00
Nick Wellnhofer
8c4cc0be35 fuzz: Improve debug output of reader fuzzer 2024-07-09 14:25:16 +02:00
Nick Wellnhofer
5935471732 parser: Fix malloc failure handling in xmlInputSetEncodingHandler
Don't set encoder if allocating buffer failed. This could lead to
xmlByteConsumed processing invalid UTF-8.
2024-07-09 14:11:28 +02:00
Nick Wellnhofer
da68639926 io: Fix return value of xmlFileRead
This broke in commit 6d27c54.

Fixes #766.
2024-07-09 13:02:31 +02:00
Nick Wellnhofer
f51ad063a7 parser: Fix error return of xmlParseBalancedChunkMemory
Only return an error code if the chunk is not well-formed to match the
2.12 behavior. Return 0 on non-fatal errors like invalid namespaces.

Fixes #765.
2024-07-08 11:28:33 +02:00
Nick Wellnhofer
2e63656ec6 parser: Check return value of inputPush
inputPush typically doesn't fail because we pre-allocate the input
table. The return value should be checked nevertheless.
2024-07-08 11:27:52 +02:00
Nick Wellnhofer
ea31ac5bba fuzz: Fix spaceMax 2024-07-07 04:19:09 +02:00
Nick Wellnhofer
82e0455cf6 Undeprecate some symbols for now
- xmlKeepBlanksDefault is needed as a work-around for
  xmlParseBalancedChunk, see issue #727.
- ctxt->options already has an accessor and will be deprecated
  later.
- input->cur, input->base, input->end: See #762.
2024-07-06 20:19:51 +02:00
Nick Wellnhofer
29e3ab92f0 fuzz: Make reallocs more likely 2024-07-06 15:48:43 +02:00
Nick Wellnhofer
de3221b179 fuzz: Adjust for xmlNodeParseContent changes
xmlStringGetNodeList returns NULL again for empty strings.
2024-07-06 15:33:06 +02:00
Nick Wellnhofer
1e5375c1b4 SAX2: Check return value of xmlPushInput
Fix null deref in case of malloc failure.
2024-07-06 15:33:06 +02:00
Nick Wellnhofer
38195cf596 parser: Don't produce names with invalid UTF-8 in recovery mode 2024-07-06 15:33:06 +02:00
Nick Wellnhofer
c45c15f5af ci: Add job for perl-XML-LibXML 2024-07-04 15:47:49 +02:00
Nick Wellnhofer
ec0881099b parser: Upgrade XML_IO_NETWORK_ATTEMPT to error
Fixes XML::LibXML test suite.
2024-07-04 15:47:20 +02:00