Nick Wellnhofer 69b83bb68e encoding: Detect truncated multi-byte sequences with ICU
Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.

It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.

Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
2025-03-13 22:15:10 +01:00

20 lines
536 B
C

#ifndef XML_ENC_H_PRIVATE__
#define XML_ENC_H_PRIVATE__
#include <libxml/encoding.h>
#include <libxml/tree.h>
XML_HIDDEN void
xmlInitEncodingInternal(void);
XML_HIDDEN int
xmlEncInputChunk(xmlCharEncodingHandler *handler, unsigned char *out,
int *outlen, const unsigned char *in, int *inlen,
int flush);
XML_HIDDEN int
xmlCharEncInput(xmlParserInputBufferPtr input, size_t *sizeOut, int flush);
XML_HIDDEN int
xmlCharEncOutput(xmlOutputBufferPtr output, int init);
#endif /* XML_ENC_H_PRIVATE__ */