337 Commits

Author SHA1 Message Date
Nick Wellnhofer
b349225952 include: Change some return types from int to enum
This also affects some new functions from 2.13.
2025-03-14 02:31:01 +01:00
Nick Wellnhofer
fd1b939168 include: Convert some macros to enums 2025-03-14 00:35:40 +01:00
Nick Wellnhofer
69b83bb68e encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.

It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.

Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
d96911f100 doc: Documentation fixes 2025-03-08 23:03:26 +01:00
Nick Wellnhofer
a0f156fffb io: Fix compressed flag for uncompressed stdin
This could cause xmlstarlet to generate compressed output unexpectedly.

Regressed with a78843be. Should fix #869.
2025-03-02 13:22:56 +01:00
Nick Wellnhofer
a78843be5e xmllint: Support compressed input from stdin
Another regression related to reading from stdin.

Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.

This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.

Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
2025-01-28 23:20:37 +01:00
Nick Wellnhofer
1c82bca6bd xmllint: Improve error reports from reader 2025-01-17 23:29:30 +01:00
Nick Wellnhofer
41c10c0cec io: Don't cast file descriptors to pointers
This doesn't work if open() returns 0 which is rare but can happen. Wrap
the fd in a context struct.

Fixes #835.
2025-01-03 20:15:52 +01:00
Nick Wellnhofer
b3871dd138 io: Fix memory leaks of encoding handler in error cases
xmlOutputBufferCreate* must always free the encoding handler.
2024-12-21 21:58:25 +01:00
Nick Wellnhofer
0dd910e82b save: Fix handling of catastrophic errors
Don't overwrite catastrophic errors xmlSaveErr.

Overwrite non-catastrophic errors in xmlOutputBufferClose.
2024-12-19 02:30:36 +01:00
Nick Wellnhofer
1e4d8c55f0 xmlIO: Fix reading from non-regular files like pipes
Commit 7e14c05d removed unnecessary copying of uncompressed input
through zlib or xzlib. This broke input from non-regular files like
pipes which can't be reopened. Try to detect such files by checking
whether they're seekable and always pipe them through zlib or xzlib.

Also remove seemingly unnecessary calls to gzread and gzrewind to
support unseekable files.

Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/124.
2024-11-06 16:49:53 +01:00
Nick Wellnhofer
55ddccb645 io: Make sure not to pass partial UTF-8 to write callback
We cannot split UTF-8 at arbitrary boundaries.
2024-09-14 00:05:13 +02:00
triallax
67ff748c3e
io: don't set the executable bit when creating files
Issue seems to have been introduced in
0bef93bf24def68c448af0e71844b942e0ed93ec.
2024-08-26 23:53:29 +01:00
Nick Wellnhofer
f2c48847fa io: Add missing calls to xmlInitParser
This is required after c9a46a91.

Should fix #782.
2024-08-13 14:38:59 +02:00
Nick Wellnhofer
a530ff125d io: Always consume encoding handler when creating output buffers
Also free encoding handler in error case.

Remove xmlAllocOutputBufferInternal which was identical to
xmlAllocOutputBuffer.
2024-07-29 14:25:39 +02:00
Nick Wellnhofer
36ea881b9d malloc-fail: Fix memory leak in xmlOutputBufferCreateFilename
Close encoding handler on error.
2024-07-26 18:07:27 +02:00
Nick Wellnhofer
7b98e8d695 io: Don't call getcwd in xmlParserGetDirectory
The "directory" value isn't used internally. Calling getcwd is
unnecessary and can cause problems in sandboxed environments.

Fixes #770.
2024-07-18 03:22:20 +02:00
Nick Wellnhofer
eb66d03ef7 io: Deprecate a few functions 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
97680d6c08 io: Rework xmlParserInputBufferGrow
Remove dubious (len != 4) check.

Remove compression-related code. This should already be set when
opening the input.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
a6f54f055b io: Fine-tune initial IO buffer size 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
7148b77820 parser: Optimize memory buffer I/O
Reenable zero-copy IO for zero-terminated static memory buffers.

Don't stream zero-terminated dynamic memory buffers on top of creating
a copy.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
34c9108f15 encoding: Add sizeOut argument to xmlCharEncInput
When push parsing, we want to convert as much of the input as possible.
When pull parsing memory buffers, we want to convert data chunk by chunk
to save memory.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
a221cd7849 buf: Rework xmlBuf code
Always use what the old implementation called the "IO" allocation
scheme, allowing to move the content pointer past the initial
allocation. This is inexpensive and allows efficient shrinking.

Optimize xmlBufGrow, reusing shrunken memory as much as possible.

Simplify xmlBufAdd.

Make xmlBufBackToBuffer return an error on overflow.

Make "size" exclude the terminating NULL byte.

Always provide an initial size.

Reintroduce static buffers.

Remove xmlBufResize and several other functions.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
8d1606265d entities: Rework text escaping 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
cc45f618ae save: Rework text escaping
Stop using xmlOutputBufferWriteEscape except when using deprecated
xmlSaveSetEscape. Rewrite xmlOutputBufferWriteEscape to use an extra
buffer and call xmlOutputBufferWrite.

Introduce xmlSerializeText to serialize both text and attribute content.

Don't read encoding from document when serializing and remove all hacks
that temporarily changed the document's encoding.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
0ab07b21dd io: Rework xmlOutputBufferWrite
Simplify code, handle short writes from callback.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
e0494c0d43 io: Add some deprecation warnings 2024-07-15 16:33:38 +02:00
Nick Wellnhofer
da68639926 io: Fix return value of xmlFileRead
This broke in commit 6d27c54.

Fixes #766.
2024-07-09 13:02:31 +02:00
Nick Wellnhofer
84a4f84c1c build: Don't check for required headers and functions
Unless we are on Windows, the following POSIX headers are required.
They're part of the earliest POSIX specs and it doesn't make sense to
check for them.

- fcntl.h
- unistd.h
- sys/stat.h
- sys/time.h

On Windows, io.h, fcntl.h and sys/stat.h are always available.
2024-06-22 18:41:00 +02:00
Nick Wellnhofer
dba1ed85a3 ftp: Remove FTP support
Remove the built-in FTP client. If you configure --with-legacy, old
symbols are retained for ABI compatibility.
2024-06-12 18:19:55 +02:00
Nick Wellnhofer
ab5e6debd1 parser: Introduce XML_INPUT_NETWORK input flag
This allows to disable network access when creating parser inputs with
xmlInputCreateUrl.
2024-06-12 16:36:12 +02:00
Nick Wellnhofer
64ad272525 parser: Introduce per-context resource loader 2024-06-12 16:22:52 +02:00
Nick Wellnhofer
b9d2f3c911 parser: Introduce new input API
- xmlInputCreateUrl
- xmlInputCreateMemory
- xmlInputCreateString
- xmlInputCreateFd
- xmlInputCreateIO
- xmlInputSetEncoding

These functions don't take a parser context and work on xmlParserInputs,
replacing functions working on xmlParserInputBuffers.

xmlInputCreateUrl and xmlInputSetEncoding offer fine-grained error
handling.

Several XML_INPUT_* flags offer additional control.
2024-06-12 16:22:52 +02:00
Nick Wellnhofer
ff3b091910 parser: Implement XML_PARSE_NO_UNZIP option 2024-06-12 16:14:15 +02:00
Nick Wellnhofer
1432949d3c io: Pass input flags to xmlParserInputBufferCreateUrl 2024-06-12 16:14:15 +02:00
Nick Wellnhofer
b5890cb425 io: Remove xmlParserInputBufferCreateFilenameSafe 2024-06-12 16:14:15 +02:00
Nick Wellnhofer
1b1e8b3c12 io: Stop invoking generic error handler for IO errors 2024-06-12 16:14:15 +02:00
Nick Wellnhofer
a331526c8e io: Don't report write errors twice 2024-06-12 16:07:20 +02:00
Nick Wellnhofer
717f3a7b21 io: Fix resetting xmlParserInputBufferCreateFilename hook
We don't want to invoke the default function.
2024-06-12 16:04:45 +02:00
Nick Wellnhofer
e75e878e02 doc: Update and fix documentation 2024-05-20 14:23:39 +02:00
Nick Wellnhofer
a4c2b7233f io: Don't set close callback in xmlParserInputBufferCreateFd 2024-05-05 17:27:12 +02:00
Nick Wellnhofer
a279aae30f io: Allocate output buffer with XML_BUFFER_ALLOC_IO
This allows efficient shrinking of memory buffers.

Support IO buffers in xmlBufDetach.
2024-03-18 15:14:43 +01:00
Nick Wellnhofer
c1fe9e72ef io: Report more malloc failures when writing to output buffer 2024-03-15 19:47:08 +01:00
Nick Wellnhofer
67e475b78e http: Improve error message for HTTPS redirects 2024-02-19 11:09:39 +01:00
Nick Wellnhofer
e314109ad1 save: Don't write directly to internal buffer
Make sure that OOM errors are reported.
2024-02-16 16:14:05 +01:00
Nick Wellnhofer
0d170acaba io: Report malloc failure in xmlOutputBufferWrite
Fixes #676.
2024-02-01 11:51:58 +01:00
Nick Wellnhofer
d2b55a7a02 writer: Implement xmlTextWriterClose
This function can be used to make sure that closing the output stream
succeeded.

Fixes #513.
2024-01-05 20:50:00 +01:00
Nick Wellnhofer
e45a4d7115 io: Always forward IO errors to global handler
The HTTP module raises errors without context. This won't be fixed,
so send them to the global error handler.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
7e0bbbc143 parser: New input API
Provide a new set of functions to create xmlParserInputs. These can be
used for the document entity or from external entity loaders.

- Don't require xmlParserInputBuffer.
- All functions take a base URI.
- All functions take an encoding as string.
- xmlNewInputURL also takes a public ID.
- xmlNewInputMemory takes a size_t.
- Optimization hints for memory buffers.

Improve documentation.

Only call xmlInitParser before allocating a new parser context.

Call xmlCtxtUseOptions as early as possible.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
c2ef78f76e io: Fix close error handling
There's no way to report error codes from closing an output buffer yet.
2023-12-25 23:38:40 +01:00