libxml2

c/libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2 synced 2025-03-28 21:33:13 +00:00

Author	SHA1	Message	Date
Nick Wellnhofer	69b83bb68e	encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.	2025-03-13 22:15:10 +01:00
Nick Wellnhofer	87c9e000e5	encoding: Rework custom encoding implementation API	2025-03-09 22:37:13 +01:00
Nick Wellnhofer	38f475072a	encoding: Make conversion callbacks more type-safe	2025-03-05 22:25:14 +01:00
Nick Wellnhofer	a846d96468	encoding: Remove compatibility struct members	2025-03-05 16:49:42 +01:00
Nick Wellnhofer	0b27097a92	encoding: Rename unprefixed public functions	2025-03-04 16:46:53 +01:00
Nick Wellnhofer	1167c3340e	encoding: Don't include iconv.h from libxml/encoding.h	2024-07-01 18:05:40 +02:00
Nick Wellnhofer	95d3633350	encoding: Rework conversion error codes This should match the old code more closely. Remove XML_ERR_PARTIAL. It's unlikely that anyone is using these codes already.	2024-07-01 18:05:40 +02:00
Nick Wellnhofer	282ec1d548	encoding: Rework xmlCharEncodingHandler layout Reuse some of the old members. The "input" and "output" function pointers are actually of type xmlCharEncConvFunc, accepting an additional argument. For default handlers, this argument is unused, so this should work with most ABIs. For iconv handlers, these function pointers used to be NULL but now point to a function which requires the extra argument. "iconv_in" and "iconv_out" are made void pointers. "uconv_in" and "uconv_out" are renamed and made void pointers. This is unlikely to cause issues. We now expect that the built-in conversion functions correctly report XML_ENC_ERR_SPACE. For UTF8ToHtml and the ISO-8859-X code, this will be done in the following commits.	2024-07-01 18:05:40 +02:00
Nick Wellnhofer	501e5d195d	encoding: Stop using XML_ENC_ERR_PARTIAL	2024-07-01 18:05:40 +02:00
Nick Wellnhofer	c59c24494d	encoding: Support custom implementations	2024-07-01 18:05:40 +02:00
Nick Wellnhofer	1e3da9f4d4	encoding: Start with callbacks	2024-07-01 18:05:40 +02:00
Nick Wellnhofer	6d8427dc97	encoding: Rework encoding lookup Add missing xmlCharEncoding enum values. Simplify and speed up encoding lookup by using a table mapping names to xmlCharEncoding enums and binary search. Rearrange the default handler table to match the enum layout. For some encodings we now only lookup the provided or most canonical name instead of trying several names, expecting that iconv or ICU handle aliases: - IBM037 (EBCDIC) - UCS-2 - UCS-4 - Shift_JIS	2024-07-01 18:05:40 +02:00
Nick Wellnhofer	3b4a84e4b7	encoding: Deprecate xmlCharEncodingHandler members	2024-06-13 18:09:17 +02:00
Nick Wellnhofer	e75e878e02	doc: Update and fix documentation	2024-05-20 14:23:39 +02:00
Nick Wellnhofer	0821efc8ee	encoding: Check whether encoding handlers support input/output The "HTML" encoding handler doesn't support input which could lead to a wrong error report.	2024-01-02 19:48:23 +01:00
Nick Wellnhofer	bd5ad0308d	encoding: Report malloc failures Introduce new API functions that return a separate error code if a memory allocation fails. - xmlOpenCharEncodingHandler - xmlLookupCharEncodingHandler Fix a few places where malloc failures weren't reported.	2023-12-11 22:05:47 +01:00
Nick Wellnhofer	7909ff08e2	include: Remove unnecessary includes - Don't include tree.h from encoding.h - Don't include parser.h from xmlIO.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	3ff6abbf58	encoding: Rework error codes Use an enum instead of magic numbers. Fix a few error codes. Simplify handling of "space" and "partial" errors. See #506.	2023-04-30 16:43:29 +02:00
Nick Wellnhofer	98840d40da	parser: Rework EBCDIC code page detection To detect EBCDIC code pages, we used to switch the encoding twice and had to be very careful not to decode data after the XML declaration before the second switch. This relied on a hard-coded expected size of the XML declaration and was complicated and unreliable. Now we convert the first 200 bytes to EBCDIC-US and parse the encoding declaration manually.	2023-03-21 21:35:15 +01:00
Nick Wellnhofer	ce9baf94d5	Remove XMLCALL and XMLCDECL macros from public headers	2022-12-08 02:48:27 +01:00
Nick Wellnhofer	40483d0ce2	Deprecate module init and cleanup functions These functions shouldn't be part of the public API. Most init functions are only thread-safe when called from xmlInitParser. Global variables should only be cleaned up by calling xmlCleanupParser.	2022-03-06 15:59:43 +01:00
Nick Wellnhofer	b66ce0bba8	Don't include ICU headers in public headers There's no need to make these implementation details public.	2022-03-01 13:02:49 +01:00
Joel Hockey	0b19f236a2	Fixed ICU to set flush correctly and provide pivot buffer. By always setting flush=TRUE when doing multiple reads, ICU will not correctly handle truncated utf8 chars across read boundaries. The fix is to set flush=TRUE only on final read, and to provide a pivot buffer which is maintained by libxml between calls to ucnv_convertEx.	2017-11-04 15:25:31 +01:00
Daniel Veillard	f8e3db0445	Big space and tab cleanup Remove all space before tabs and space and tabs at end of lines.	2012-09-11 13:26:36 +08:00
Giuseppe Iuculano	48f7dcb724	480323 add code to plug in ICU converters by default This is not configured in by default but after some serious massaging incorporate that patch from Chromium/Chrome.	2010-11-04 17:42:42 +01:00
William M. Brack	21e4ef20f6	Re-examined the problems of configuring a "minimal" library. Synchronized the header files with the library code in order to assure that all the various conditionals (LIBXML_xxxx_ENABLED) were the same in both. Modified the API database content to more accurately reflect the conditionals. Enhanced the generation of that database. Although there was no substantial change to any of the library code's logic, a large number of files were modified to achieve the above, and the configuration script was enhanced to do some automatic enabling of features (e.g. --with-xinclude forces --with-xpath). Additionally, all the format errors discovered by apibuild.py were corrected. * configure.in: enhanced cross-checking of options * doc/apibuild.py, doc/elfgcchack.xsl, doc/libxml2-refs.xml, doc/libxml2-api.xml, gentest.py: changed the usage of the <cond> element in module descriptions * elfgcchack.h, testapi.c: regenerated with proper conditionals * HTMLparser.c, SAX.c, globals.c, tree.c, xmlschemas.c, xpath.c, testSAX.c: cleaned up conditionals * include/libxml/[SAX.h, SAX2.h, debugXML.h, encoding.h, entities.h, hash.h, parser.h, parserInternals.h, schemasInternals.h, tree.h, valid.h, xlink.h, xmlIO.h, xmlautomata.h, xmlreader.h, xpath.h]: synchronized the conditionals with the corresponding module code * doc/examples/tree2.c, doc/examples/xpath1.c, doc/examples/xpath2.c: added additional conditions required for compilation * doc/.html, doc/html/.html: rebuilt the docs	2005-01-02 09:53:13 +00:00
Daniel Veillard	3671190b54	added xmlByteConsumed() interface updated the benchmark rebuilt the docs * parserInternals.c xmlIO.c encoding.c include/libxml/parser.h include/libxml/xmlIO.h: added xmlByteConsumed() interface * doc/: updated the benchmark rebuilt the docs python/tests/Makefile.am python/tests/indexes.py: added a specific regression test for xmlByteConsumed() * include/libxml/encoding.h rngparser.c tree.c: small cleanups Daniel	2004-02-11 13:25:26 +00:00
William M. Brack	a2e844a3b3	moved string and UTF8 routines out of parser.c and encoding.c into a new * encoding.c, parser.c, xmlstring.c, Makefile.am, include/libxml/Makefile.am, include/libxml/catalog.c, include/libxml/chvalid.h, include/libxml/encoding.h, include/libxml/parser.h, include/libxml/relaxng.h, include/libxml/tree.h, include/libxml/xmlwriter.h, include/libxml/xmlstring.h: moved string and UTF8 routines out of parser.c and encoding.c into a new module xmlstring.c with include file include/libxml/xmlstring.h mostly using patches from Reid Spencer. Since xmlChar now defined in xmlstring.h, several include files needed to have a #include added for safety. * doc/apibuild.py: added some additional sorting for various references displayed in the APIxxx.html files. Rebuilt the docs, and also added new file for xmlstring module. * configure.in: small addition to help my testing; no effect on normal usage. * doc/search.php: added $_GET[query] so that persistent globals can be disabled (for recent versions of PHP)	2004-01-06 11:52:13 +00:00
William M. Brack	f9415e4989	Enhanced the handling of UTF-16, UTF-16LE and UTF-16BE encodings. Now * encoding.c, include/libxml/encoding.h: Enhanced the handling of UTF-16, UTF-16LE and UTF-16BE encodings. Now UTF-16 output is handled internally by default, with proper BOM and UTF-16LE encoding. Native UTF-16LE and UTF-16BE encoding will not generate a BOM on output, and will be automatically recognized on input. * test/utf16lebom.xml, test/utf16bebom.xml, result/utf16?ebom*: added regression tests for above.	2003-11-28 09:39:10 +00:00
Daniel Veillard	be5869729a	modified the file header to add more informations, painful... updated to * include/libxml/.h include/libxml/.h.in: modified the file header to add more informations, painful... * genChRanges.py genUnicode.py: updated to generate said changes in headers * doc/apibuild.py: extract headers, add them to libxml2-api.xml * .html .xsl .xml: updated the stylesheets to flag geprecated APIs modules. Updated the stylesheets, some cleanups, regenerated doc/html/*.html: regenerated added back book1 and libxml-lib.html Daniel	2003-11-18 20:56:51 +00:00
William M. Brack	60f394e96d	Finally - found the problem with the page generation (XMLPUBFUN not * doc/html/.html: Finally - found the problem with the page generation (XMLPUBFUN not recognized by gtkdoc). Re-created the pages using a temporary version of include/libxml/.h. * testOOMlib.c,include/libxml/encoding.h, include/libxml/schemasInternals.h,include/libxml/valid.h, include/libxml/xlink.h,include/libxml/xmlwin32version.h, include/libxml/xmlwin32version.h.in, include/libxml/xpathInternals.h: minor edit of comments to help automatic documentation generation * doc/docdescr.doc: small elaboration * doc/examples/test1.c,doc/examples/Makefile.am: re-commit (messed up on last try) * xmlreader.c: minor change to clear warning.	2003-11-16 06:25:42 +00:00
Igor Zlatkovic	76874e4516	Exportability taint of the headers	2003-08-25 09:05:12 +00:00
William M. Brack	4a557d97bf	fixed problem with comments reported by Nick Kew added routines * HTMLparser.c: fixed problem with comments reported by Nick Kew * encoding.c: added routines xmlUTF8Size and xmlUTF8Charcmp for some future cleanup of UTF8 handling	2003-07-29 04:28:04 +00:00
Igor Zlatkovic	7ae91bcd9e	retired xmlwin32version.h	2002-11-08 17:18:52 +00:00
Daniel Veillard	f000f07303	made xmlGetUTF8Char public Daniel * include/libxml/encoding.h encoding.c: made xmlGetUTF8Char public Daniel	2002-10-22 14:28:17 +00:00
Daniel Veillard	6f46f6c5b8	Opening the interface xmlNewCharEncodingHandler as requested in #89415 * encoding.c include/libxml/encoding.h: Opening the interface xmlNewCharEncodingHandler as requested in #89415 * python/generator.py python/setup.py.in: applied cleanup patches from Marc-Andre Lemburg * tree.c: fixing bug #89332 on a specific case of loosing the XML-1.0 namespace on xml:xxx attributes Daniel	2002-08-01 12:22:24 +00:00
Igor Zlatkovic	a6f2d90669	* empty log message *	2002-04-16 17:57:17 +00:00
Daniel Veillard	61f261749f	Heiko W. Rupp fixed a lot of comments to generate better API descriptions * include/libxml/*.h: Heiko W. Rupp fixed a lot of comments to generate better API descriptions etc... Daniel	2002-03-12 18:46:39 +00:00
Daniel Veillard	6c4ffafd8f	trying to fix the include mess Daniel * include/libxml/encoding.h include/libxml/entities.h include/libxml/globals.h include/libxml/parser.h include/libxml/threads.h include/libxml/tree.h include/libxml/xmlmemory.h: trying to fix the include mess Daniel	2002-02-11 08:54:05 +00:00
Daniel Veillard	963d2ae415	cleanup patch from Anthony Jones fix the headers to avoid in make scan * SAX.c: cleanup patch from Anthony Jones * doc/Makefile.am: fix the headers to avoid in make scan * parserInternals.c xpath.c include/libxml/.h: cleanup of the includes, vs Ptr and general cleanup * parsedecl.py: first version of a script to extract the module interfaces, the goal will be to provide .decl or XML specification of the interfaces to build wrappers. Daniel	2002-01-20 22:08:18 +00:00
Daniel Veillard	cbaf399537	applied 42 documentation patches from Charlie Bozeman. Regenerated the * .c include/libxml/.h doc/html/*: applied 42 documentation patches from Charlie Bozeman. Regenerated the HTML docs. Daniel	2001-12-31 16:16:02 +00:00
Daniel Veillard	60087f30f3	preparing 2.4.6 release updated and rebuilt the docs fixed a number of * configure.in: preparing 2.4.6 release * doc/xml.html doc/html/: updated and rebuilt the docs include/libxml/.h .c: fixed a number of teh/the widht/width typos Daniel	2001-10-10 09:45:09 +00:00
Daniel Veillard	c5d64345cf	Summer's cleanup, a really big one: * AUTHORS: added William and Bjorn * include/libxml/.h .c README doc/.html etc.: changed old email to daniel@veillard.com hopefully I won't have to do this again doc/Makefile.am doc/html/.html: cleanup makefile, checked that docs can be rebuilt cleanly now include/libxml/xmlversion.h: removed include/libxml/xmlversion.h from CVs it's generated, added include/libxml/xmlwin32version.h also generated but which should change far less frequently. * catalog.c nanoftp.c: made sure to include libxml.h not libxml/xmlversion.h directly * include/libxml/*.h: include xmlwin32version.h instead of xmlversion.h when compiling on WIN32 and MSC Daniel	2001-06-24 12:13:24 +00:00
Daniel Veillard	97ac13197c	- xpath.c encoding.[ch]: William M. Brack provided a set of UTF8 string oriented functions and started cleaning the related areas in xpath.c which needed fixing in this respect Daniel	2001-05-30 19:14:17 +00:00
Daniel Veillard	f69bb4b5bf	- HTMLparser.c: Closed bug #54891 - result/HTML/cf_128.html* test/HTML/cf_128.html: added the test to the suite forgot to commit this one yesterday - encoding.h hash.c nanoftp.h parser.h tree.h uri.h xlink.h xpointer.c: applied a documentation patch from LotR and filled in a few missing descriptions Daniel	2001-05-19 13:24:56 +00:00
Daniel Veillard	e043ee17c2	- xpath.c: fixed xmlXPathNodeCollectAndTest() to do proper prefix lookup. - parserInternals.c: fixed the bug reported by Morus Walter due to an off by one typo in xmlStringCurrentChar() Daniel	2001-04-16 14:08:07 +00:00
Daniel Veillard	56a4cb8c4d	Huge cleanup, I switched to compile with -Wall -g -O -ansi -pedantic -W -Wunused -Wimplicit -Wreturn-type -Wswitch -Wcomment -Wtrigraphs -Wformat -Wchar-subscripts -Wuninitialized -Wparentheses -Wshadow -Wpointer-arith -Wcast-align -Wwrite-strings -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline - HTMLparser.[ch] HTMLtree.c SAX.c debugXML.c encoding.[ch] encoding.h entities.c error.c list.[ch] nanoftp.c nanohttp.c parser.[ch] parserInternals.[ch] testHTML.c testSAX.c testURI.c testXPath.c tree.[ch] uri.c valid.[ch] xinclude.c xmlIO.[ch] xmllint.c xmlmemory.c xpath.c xpathInternals.h xpointer.[ch] example/gjobread.c: Cleanup, staticfied a number of non-exported functions, detected and cleaned up a dozen of problem found this way, avoided a lot of public function name/typedef/system names clashes - doc/xml.html: updated - configure.in: switched private flags to the really pedantic ones. Daniel	2001-03-24 17:00:36 +00:00
Owen Taylor	3473f88a7a	Revert directory structure changes	2001-02-23 17:55:21 +00:00
CET 2001 Tomasz K�oczko	64636e7f6e	moved to libxml directory - this allow simplify automake/autoconf. Now Thu Feb 23 02:03:56 CET 2001 Tomasz K�oczko <kloczek@pld.org.pl> * .c .h libxml files: moved to libxml directory - this allow simplify automake/autoconf. Now isn't neccessary hack on am/ac level for make and remove libxml symlink (modified for this also configure.in and main Makefile.am). Now automake abilities are used in best way (like in many other projects with libraries). * include/win32config.h: moved to libxml directory (now include directory isn't neccessary). * Makefile.am, examples/Makefile.am, libxml/Makefile.am: added empty DEFS and in INCLUDES rest only -I$(top_builddir) - this allow minimize parameters count passed to libtool script (now compilation is also slyghtly more quiet). * configure.in: simplifies libzdetestion - prepare separated variables for keep libz name and path to libz header files isn't realy neccessary (if someone have libz installed in non standard prefix path to header files ald library can be passed as: $ CFALGS="-I</libz.h/path>" LDFLAGS="-L</libz/path>" ./configure * autogen.sh: check now for libxml/entities.h. After above building libxml pass correctly and also pass "make install DESTDIR=</install/prefix>" from tar ball generated by "make dist". Seems ac/am reorganization is finished. This changes not touches any other things on *.{c,h} files level.	2001-02-23 01:37:32 +00:00
Daniel Veillard	f0cc7ccc7d	libxml now grok Docbook-3.1.5 and Docbook-4.1.1 DTDs, this popped out a couple of bugs and 3 speed issues, there is only on minor speed issue left. Assorted collection of user reported bugs and fixes: - doc/encoding.html: added encoding aliases doc - doc/xml.html: updates - encoding.[ch]: added EncodingAliases functions - entities.[ch] valid.[ch] debugXML.c: removed two serious bottleneck affecting large DTDs like Docbook - parser.[ch] xmllint.c: added a pedantic option, will be useful - SAX.c: redefinition of entities is reported in pedantic mode - testHTML.c: uninitialized warning from gcc - uri.c: fixed a couple of bugs - TODO: added issue raised by Michael Daniel	2000-08-26 21:40:43 +00:00

1 2

63 Commits