libxml2

c/libxml2

Fork 0

mirror of https://gitlab.gnome.org/GNOME/libxml2 synced 2025-03-28 21:33:13 +00:00

Commit Graph

Author	SHA1	Message	Date
Nick Wellnhofer	69b83bb68e	encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.	2025-03-13 22:15:10 +01:00
Nick Wellnhofer	855818bd2b	parser: Check for truncated multi-byte sequences When decoding input data, check whether the "raw" buffer is empty after parsing the document. Otherwise, the input ends with a truncated multi-byte sequence which shouldn't be silently ignored.	2023-08-08 15:21:37 +02:00

Author

SHA1

Message

Date

Nick Wellnhofer

69b83bb68e

encoding: Detect truncated multi-byte sequences with ICU

Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.

It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.

Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.

2025-03-13 22:15:10 +01:00

Nick Wellnhofer

855818bd2b

parser: Check for truncated multi-byte sequences

When decoding input data, check whether the "raw" buffer is empty after
parsing the document. Otherwise, the input ends with a truncated
multi-byte sequence which shouldn't be silently ignored.

2023-08-08 15:21:37 +02:00

2 Commits