61 Commits

Author SHA1 Message Date
Nick Wellnhofer
38f475072a encoding: Make conversion callbacks more type-safe 2025-03-05 22:25:14 +01:00
Nick Wellnhofer
a846d96468 encoding: Remove compatibility struct members 2025-03-05 16:49:42 +01:00
Nick Wellnhofer
0b27097a92 encoding: Rename unprefixed public functions 2025-03-04 16:46:53 +01:00
Nick Wellnhofer
1167c3340e encoding: Don't include iconv.h from libxml/encoding.h 2024-07-01 18:05:40 +02:00
Nick Wellnhofer
95d3633350 encoding: Rework conversion error codes
This should match the old code more closely. Remove XML_ERR_PARTIAL.

It's unlikely that anyone is using these codes already.
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
282ec1d548 encoding: Rework xmlCharEncodingHandler layout
Reuse some of the old members.

The "input" and "output" function pointers are actually of type
xmlCharEncConvFunc, accepting an additional argument. For default
handlers, this argument is unused, so this should work with most ABIs.
For iconv handlers, these function pointers used to be NULL but now
point to a function which requires the extra argument.

"iconv_in" and "iconv_out" are made void pointers. "uconv_in" and
"uconv_out" are renamed and made void pointers. This is unlikely to
cause issues.

We now expect that the built-in conversion functions correctly report
XML_ENC_ERR_SPACE. For UTF8ToHtml and the ISO-8859-X code, this will be
done in the following commits.
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
501e5d195d encoding: Stop using XML_ENC_ERR_PARTIAL 2024-07-01 18:05:40 +02:00
Nick Wellnhofer
c59c24494d encoding: Support custom implementations 2024-07-01 18:05:40 +02:00
Nick Wellnhofer
1e3da9f4d4 encoding: Start with callbacks 2024-07-01 18:05:40 +02:00
Nick Wellnhofer
6d8427dc97 encoding: Rework encoding lookup
Add missing xmlCharEncoding enum values.

Simplify and speed up encoding lookup by using a table mapping names to
xmlCharEncoding enums and binary search. Rearrange the default handler
table to match the enum layout.

For some encodings we now only lookup the provided or most canonical
name instead of trying several names, expecting that iconv or ICU handle
aliases:

- IBM037 (EBCDIC)
- UCS-2
- UCS-4
- Shift_JIS
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
3b4a84e4b7 encoding: Deprecate xmlCharEncodingHandler members 2024-06-13 18:09:17 +02:00
Nick Wellnhofer
e75e878e02 doc: Update and fix documentation 2024-05-20 14:23:39 +02:00
Nick Wellnhofer
0821efc8ee encoding: Check whether encoding handlers support input/output
The "HTML" encoding handler doesn't support input which could lead to a
wrong error report.
2024-01-02 19:48:23 +01:00
Nick Wellnhofer
bd5ad0308d encoding: Report malloc failures
Introduce new API functions that return a separate error code if a
memory allocation fails.

- xmlOpenCharEncodingHandler
- xmlLookupCharEncodingHandler

Fix a few places where malloc failures weren't reported.
2023-12-11 22:05:47 +01:00
Nick Wellnhofer
7909ff08e2 include: Remove unnecessary includes
- Don't include tree.h from encoding.h
- Don't include parser.h from xmlIO.h
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
3ff6abbf58 encoding: Rework error codes
Use an enum instead of magic numbers. Fix a few error codes. Simplify
handling of "space" and "partial" errors.

See #506.
2023-04-30 16:43:29 +02:00
Nick Wellnhofer
98840d40da parser: Rework EBCDIC code page detection
To detect EBCDIC code pages, we used to switch the encoding twice and
had to be very careful not to decode data after the XML declaration
before the second switch. This relied on a hard-coded expected size of
the XML declaration and was complicated and unreliable.

Now we convert the first 200 bytes to EBCDIC-US and parse the encoding
declaration manually.
2023-03-21 21:35:15 +01:00
Nick Wellnhofer
ce9baf94d5 Remove XMLCALL and XMLCDECL macros from public headers 2022-12-08 02:48:27 +01:00
Nick Wellnhofer
40483d0ce2 Deprecate module init and cleanup functions
These functions shouldn't be part of the public API. Most init
functions are only thread-safe when called from xmlInitParser. Global
variables should only be cleaned up by calling xmlCleanupParser.
2022-03-06 15:59:43 +01:00
Nick Wellnhofer
b66ce0bba8 Don't include ICU headers in public headers
There's no need to make these implementation details public.
2022-03-01 13:02:49 +01:00
Joel Hockey
0b19f236a2 Fixed ICU to set flush correctly and provide pivot buffer.
By always setting flush=TRUE when doing multiple reads, ICU
will not correctly handle truncated utf8 chars across read
boundaries.

The fix is to set flush=TRUE only on final read, and to
provide a pivot buffer which is maintained by libxml
between calls to ucnv_convertEx.
2017-11-04 15:25:31 +01:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Giuseppe Iuculano
48f7dcb724 480323 add code to plug in ICU converters by default
This is not configured in by default but after some serious massaging
incorporate that patch from Chromium/Chrome.
2010-11-04 17:42:42 +01:00
William M. Brack
21e4ef20f6 Re-examined the problems of configuring a "minimal" library.
Synchronized the header files with the library code in order
to assure that all the various conditionals (LIBXML_xxxx_ENABLED)
were the same in both.  Modified the API database content to more
accurately reflect the conditionals.  Enhanced the generation
of that database.  Although there was no substantial change to
any of the library code's logic, a large number of files were
modified to achieve the above, and the configuration script
was enhanced to do some automatic enabling of features (e.g.
--with-xinclude forces --with-xpath).  Additionally, all the format
errors discovered by apibuild.py were corrected.
* configure.in: enhanced cross-checking of options
* doc/apibuild.py, doc/elfgcchack.xsl, doc/libxml2-refs.xml,
  doc/libxml2-api.xml, gentest.py: changed the usage of the
  <cond> element in module descriptions
* elfgcchack.h, testapi.c: regenerated with proper conditionals
* HTMLparser.c, SAX.c, globals.c, tree.c, xmlschemas.c, xpath.c,
  testSAX.c: cleaned up conditionals
* include/libxml/[SAX.h, SAX2.h, debugXML.h, encoding.h, entities.h,
  hash.h, parser.h, parserInternals.h, schemasInternals.h, tree.h,
  valid.h, xlink.h, xmlIO.h, xmlautomata.h, xmlreader.h, xpath.h]:
  synchronized the conditionals with the corresponding module code
* doc/examples/tree2.c, doc/examples/xpath1.c, doc/examples/xpath2.c:
  added additional conditions required for compilation
* doc/*.html, doc/html/*.html: rebuilt the docs
2005-01-02 09:53:13 +00:00
Daniel Veillard
3671190b54 added xmlByteConsumed() interface updated the benchmark rebuilt the docs
* parserInternals.c xmlIO.c encoding.c include/libxml/parser.h
  include/libxml/xmlIO.h: added xmlByteConsumed() interface
* doc/*: updated the benchmark rebuilt the docs
* python/tests/Makefile.am python/tests/indexes.py: added a
  specific regression test for xmlByteConsumed()
* include/libxml/encoding.h rngparser.c tree.c: small cleanups
Daniel
2004-02-11 13:25:26 +00:00
William M. Brack
a2e844a3b3 moved string and UTF8 routines out of parser.c and encoding.c into a new
* encoding.c, parser.c, xmlstring.c, Makefile.am,
  include/libxml/Makefile.am, include/libxml/catalog.c,
  include/libxml/chvalid.h, include/libxml/encoding.h,
  include/libxml/parser.h, include/libxml/relaxng.h,
  include/libxml/tree.h, include/libxml/xmlwriter.h,
  include/libxml/xmlstring.h:
  moved string and UTF8 routines out of parser.c and encoding.c
  into a new module xmlstring.c with include file
  include/libxml/xmlstring.h mostly using patches from Reid
  Spencer.  Since xmlChar now defined in xmlstring.h, several
  include files needed to have a #include added for safety.
* doc/apibuild.py: added some additional sorting for various
  references displayed in the APIxxx.html files.  Rebuilt the
  docs, and also added new file for xmlstring module.
* configure.in: small addition to help my testing; no effect on
  normal usage.
* doc/search.php: added $_GET[query] so that persistent globals
  can be disabled (for recent versions of PHP)
2004-01-06 11:52:13 +00:00
William M. Brack
f9415e4989 Enhanced the handling of UTF-16, UTF-16LE and UTF-16BE encodings. Now
* encoding.c, include/libxml/encoding.h: Enhanced the handling of UTF-16,
  UTF-16LE and UTF-16BE encodings.  Now UTF-16 output is handled internally
  by default, with proper BOM and UTF-16LE encoding.  Native UTF-16LE and
  UTF-16BE encoding will not generate a BOM on output, and will be
  automatically recognized on input.
* test/utf16lebom.xml, test/utf16bebom.xml, result/utf16?ebom*: added
  regression tests for above.
2003-11-28 09:39:10 +00:00
Daniel Veillard
be5869729a modified the file header to add more informations, painful... updated to
* include/libxml/*.h include/libxml/*.h.in: modified the file
  header to add more informations, painful...
* genChRanges.py genUnicode.py: updated to generate said changes
  in headers
* doc/apibuild.py: extract headers, add them to libxml2-api.xml
* *.html *.xsl *.xml: updated the stylesheets to flag geprecated
  APIs modules. Updated the stylesheets, some cleanups, regenerated
* doc/html/*.html: regenerated added back book1 and libxml-lib.html
Daniel
2003-11-18 20:56:51 +00:00
William M. Brack
60f394e96d Finally - found the problem with the page generation (XMLPUBFUN not
* doc/html/*.html: Finally - found the problem with the
  page generation (XMLPUBFUN not recognized by gtkdoc).
  Re-created the pages using a temporary version of
  include/libxml/*.h.
* testOOMlib.c,include/libxml/encoding.h,
  include/libxml/schemasInternals.h,include/libxml/valid.h,
  include/libxml/xlink.h,include/libxml/xmlwin32version.h,
  include/libxml/xmlwin32version.h.in,
  include/libxml/xpathInternals.h: minor edit of comments
  to help automatic documentation generation
* doc/docdescr.doc: small elaboration
* doc/examples/test1.c,doc/examples/Makefile.am: re-commit
  (messed up on last try)
* xmlreader.c: minor change to clear warning.
2003-11-16 06:25:42 +00:00
Igor Zlatkovic
76874e4516 Exportability taint of the headers 2003-08-25 09:05:12 +00:00
William M. Brack
4a557d97bf fixed problem with comments reported by Nick Kew added routines
* HTMLparser.c: fixed problem with comments reported by Nick Kew
* encoding.c: added routines xmlUTF8Size and xmlUTF8Charcmp for
  some future cleanup of UTF8 handling
2003-07-29 04:28:04 +00:00
Igor Zlatkovic
7ae91bcd9e retired xmlwin32version.h 2002-11-08 17:18:52 +00:00
Daniel Veillard
f000f07303 made xmlGetUTF8Char public Daniel
* include/libxml/encoding.h encoding.c: made xmlGetUTF8Char public
Daniel
2002-10-22 14:28:17 +00:00
Daniel Veillard
6f46f6c5b8 Opening the interface xmlNewCharEncodingHandler as requested in #89415
* encoding.c include/libxml/encoding.h: Opening the interface
  xmlNewCharEncodingHandler as requested in #89415
* python/generator.py python/setup.py.in: applied cleanup
  patches from Marc-Andre Lemburg
* tree.c: fixing bug #89332 on a specific case of loosing
  the XML-1.0 namespace on xml:xxx attributes
Daniel
2002-08-01 12:22:24 +00:00
Igor Zlatkovic
a6f2d90669 *** empty log message *** 2002-04-16 17:57:17 +00:00
Daniel Veillard
61f261749f Heiko W. Rupp fixed a lot of comments to generate better API descriptions
* include/libxml/*.h: Heiko W. Rupp fixed a lot of comments
  to generate better API descriptions etc...
Daniel
2002-03-12 18:46:39 +00:00
Daniel Veillard
6c4ffafd8f trying to fix the include mess Daniel
* include/libxml/encoding.h include/libxml/entities.h
  include/libxml/globals.h include/libxml/parser.h
  include/libxml/threads.h include/libxml/tree.h
  include/libxml/xmlmemory.h: trying to fix the include mess
Daniel
2002-02-11 08:54:05 +00:00
Daniel Veillard
963d2ae415 cleanup patch from Anthony Jones fix the headers to avoid in make scan
* SAX.c: cleanup patch from Anthony Jones
* doc/Makefile.am: fix the headers to avoid in make scan
* parserInternals.c xpath.c include/libxml/*.h: cleanup of the
  includes, * vs Ptr and general cleanup
* parsedecl.py: first version of a script to extract the
  module interfaces, the goal will be to provide .decl or XML
  specification of the interfaces to build wrappers.
Daniel
2002-01-20 22:08:18 +00:00
Daniel Veillard
cbaf399537 applied 42 documentation patches from Charlie Bozeman. Regenerated the
* *.c include/libxml/*.h doc/html/*: applied 42 documentation
  patches from Charlie Bozeman. Regenerated the HTML docs.
Daniel
2001-12-31 16:16:02 +00:00
Daniel Veillard
60087f30f3 preparing 2.4.6 release updated and rebuilt the docs fixed a number of
* configure.in: preparing 2.4.6 release
* doc/xml.html doc/html/*: updated and rebuilt the docs
* include/libxml/*.h *.c: fixed a number of teh/the widht/width typos
Daniel
2001-10-10 09:45:09 +00:00
Daniel Veillard
c5d64345cf Summer's cleanup, a really big one:
* AUTHORS: added William and Bjorn
* include/libxml/*.h *.c README doc/*.html etc.: changed old email to
  daniel@veillard.com hopefully I won't have to do this again
* doc/Makefile.am doc/html/*.html: cleanup makefile, checked that
  docs can be rebuilt cleanly now
* include/libxml/xml*version.h*: removed include/libxml/xmlversion.h
  from CVs it's generated, added include/libxml/xmlwin32version.h
  also generated but which should change far less frequently.
* catalog.c nanoftp.c: made sure to include libxml.h not
  libxml/xmlversion.h directly
* include/libxml/*.h: include xmlwin32version.h instead of xmlversion.h
  when compiling on WIN32 and MSC
Daniel
2001-06-24 12:13:24 +00:00
Daniel Veillard
97ac13197c - xpath.c encoding.[ch]: William M. Brack provided a set of UTF8
string oriented functions and started cleaning the related areas
  in xpath.c which needed fixing in this respect
Daniel
2001-05-30 19:14:17 +00:00
Daniel Veillard
f69bb4b5bf - HTMLparser.c: Closed bug #54891
- result/HTML/cf_128.html* test/HTML/cf_128.html: added the test
  to the suite
forgot to commit this one yesterday
- encoding.h hash.c nanoftp.h parser.h tree.h uri.h xlink.h xpointer.c:
  applied a documentation patch from LotR and filled in a few missing
  descriptions
Daniel
2001-05-19 13:24:56 +00:00
Daniel Veillard
e043ee17c2 - xpath.c: fixed xmlXPathNodeCollectAndTest() to do proper
prefix lookup.
- parserInternals.c: fixed the bug reported by Morus Walter
  due to an off by one typo in xmlStringCurrentChar()
Daniel
2001-04-16 14:08:07 +00:00
Daniel Veillard
56a4cb8c4d Huge cleanup, I switched to compile with
-Wall -g -O -ansi -pedantic -W -Wunused -Wimplicit
-Wreturn-type -Wswitch -Wcomment -Wtrigraphs -Wformat
-Wchar-subscripts -Wuninitialized -Wparentheses -Wshadow
-Wpointer-arith -Wcast-align -Wwrite-strings -Waggregate-return
-Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Winline
- HTMLparser.[ch] HTMLtree.c SAX.c debugXML.c encoding.[ch]
  encoding.h entities.c error.c list.[ch] nanoftp.c
  nanohttp.c parser.[ch] parserInternals.[ch] testHTML.c
  testSAX.c testURI.c testXPath.c tree.[ch] uri.c
  valid.[ch] xinclude.c xmlIO.[ch] xmllint.c xmlmemory.c
  xpath.c xpathInternals.h xpointer.[ch] example/gjobread.c:
  Cleanup, staticfied a number of non-exported functions,
  detected and cleaned up a dozen of problem found this way,
  avoided a lot of public function name/typedef/system names clashes
- doc/xml.html: updated
- configure.in: switched private flags to the really pedantic ones.
Daniel
2001-03-24 17:00:36 +00:00
Owen Taylor
3473f88a7a Revert directory structure changes 2001-02-23 17:55:21 +00:00
CET 2001 Tomasz K³oczko
64636e7f6e moved to libxml directory - this allow simplify automake/autoconf. Now
Thu Feb 23 02:03:56 CET 2001 Tomasz K³oczko <kloczek@pld.org.pl>

        * *.c *.h libxml files: moved to libxml directory - this allow
	  simplify automake/autoconf. Now isn't neccessary hack on
	  am/ac level for make and remove libxml symlink (modified for this
	  also configure.in and main Makefile.am). Now automake abilities
	  are used in best way (like in many other projects with libraries).
	* include/win32config.h: moved to libxml directory (now include
	  directory isn't neccessary).
	* Makefile.am, examples/Makefile.am, libxml/Makefile.am:
	  added empty DEFS and in INCLUDES rest only -I$(top_builddir) -
	  this allow minimize parameters count passed to libtool script
	  (now compilation is also slyghtly more quiet).
	* configure.in: simplifies libzdetestion - prepare separated
	  variables for keep libz name and path to libz header files isn't
	  realy neccessary (if someone have libz installed in non standard
	  prefix path to header files ald library can be passed as:
	  $ CFALGS="-I</libz.h/path>" LDFLAGS="-L</libz/path>" ./configure
	* autogen.sh: check now for libxml/entities.h.

	After above building libxml pass correctly and also pass
	"make install DESTDIR=</install/prefix>" from tar ball generated by
	"make dist". Seems ac/am reorganization is finished. This changes
	not touches any other things on *.{c,h} files level.
2001-02-23 01:37:32 +00:00
Daniel Veillard
f0cc7ccc7d libxml now grok Docbook-3.1.5 and Docbook-4.1.1 DTDs, this
popped out a couple of bugs and 3 speed issues, there is only
on minor speed issue left. Assorted collection of user reported
bugs and fixes:
- doc/encoding.html: added encoding aliases doc
- doc/xml.html: updates
- encoding.[ch]: added EncodingAliases functions
- entities.[ch] valid.[ch] debugXML.c: removed two serious
  bottleneck affecting large DTDs like Docbook
- parser.[ch] xmllint.c: added a pedantic option, will be useful
- SAX.c: redefinition of entities is reported in pedantic mode
- testHTML.c: uninitialized warning from gcc
- uri.c: fixed a couple of bugs
- TODO: added issue raised by Michael
Daniel
2000-08-26 21:40:43 +00:00
Daniel Veillard
32bc74ef98 - doc/encoding.html doc/xml.html: added I18N doc
- encoding.[ch] HTMLtree.[ch] parser.c HTMLparser.c: I18N encoding
  improvements, both parser and filters, added ASCII & HTML,
  fixed the ISO-Latin-1 one
- xmllint.c testHTML.c: added/made visible --encode
- debugXML.c : cleanup
- most .c files: applied patches due to warning on Windows and
  when using Sun Pro cc compiler
- xpath.c : cleanup memleaks
- nanoftp.c : added a TESTING preprocessor flag for standalong
  compile so that people can report bugs more easilly
- nanohttp.c : ditched socklen_t which was a portability mess
  and replaced it with unsigned int.
- tree.[ch]: added xmlHasProp()
- TODO: updated
- test/ : added more test for entities, NS, encoding, HTML, wap
- configure.in: preparing for 2.2.0 release
Daniel
2000-07-14 14:49:25 +00:00
Daniel Veillard
be803967db - Large resync between W3C and Gnome tree
- configure.in: 2.1.0 prerelease
- example/Makefile.am example/gjobread.c tree.h: work on
  libxml1 libxml2 convergence.
- nanoftp, nanohttp.c: fixed stalled connections probs
- HTMLtree.c SAX.c : support for attribute without values in
  HTML for andersca
- valid.c: Fixed most validation + namespace problems
- HTMLparser.c: start document callback for andersca
- debugXML.c xpath.c: lots of XPath fixups from Picdar Technology
- parser.h, SAX.c: serious speed improvement for large
  CDATA blocks
- encoding.[ch] xmlIO.[ch]: Improved seriously saving to
  different encoding
- config.h.in parser.c xmllint.c: added xmlCheckVersion()
  and the LIBXML_TEST_VERSION macro
Daniel
2000-06-28 23:40:59 +00:00