5 Commits

Author SHA1 Message Date
Nick Wellnhofer
69b83bb68e encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.

It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.

Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00
Nick Wellnhofer
87c9e000e5 encoding: Rework custom encoding implementation API 2025-03-09 22:37:13 +01:00
Nick Wellnhofer
cdfb54ff7b Fix typos 2025-01-31 18:41:41 +01:00
Nick Wellnhofer
ec909ed27e example: Fix indentation in icu.c, mention in NEWS 2024-11-23 15:40:44 +01:00
Nick Wellnhofer
9cd4748799 doc: Add example for ICU with xmlCtxtSetCharEncConvImpl
See #819.
2024-11-22 19:51:32 +01:00