Unlike iconv or the internal converters, ICU consumes truncated multi-
byte sequences at the end of an input buffer. We currently check for a
non-empty raw input buffer to detect truncated sequences, so this fails
with ICU.
It might be possible to inspect the pivot buffer pointers, but it seems
cleaner to implement a `flush` flag for some encoding and I/O functions.
After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or
detect remaining input with other converters.
Also fix detection of truncated sequences for HTML, XML content and
DTDs with iconv.
Reuse some of the old members.
The "input" and "output" function pointers are actually of type
xmlCharEncConvFunc, accepting an additional argument. For default
handlers, this argument is unused, so this should work with most ABIs.
For iconv handlers, these function pointers used to be NULL but now
point to a function which requires the extra argument.
"iconv_in" and "iconv_out" are made void pointers. "uconv_in" and
"uconv_out" are renamed and made void pointers. This is unlikely to
cause issues.
We now expect that the built-in conversion functions correctly report
XML_ENC_ERR_SPACE. For UTF8ToHtml and the ISO-8859-X code, this will be
done in the following commits.
Add missing xmlCharEncoding enum values.
Simplify and speed up encoding lookup by using a table mapping names to
xmlCharEncoding enums and binary search. Rearrange the default handler
table to match the enum layout.
For some encodings we now only lookup the provided or most canonical
name instead of trying several names, expecting that iconv or ICU handle
aliases:
- IBM037 (EBCDIC)
- UCS-2
- UCS-4
- Shift_JIS
Introduce new API functions that return a separate error code if a
memory allocation fails.
- xmlOpenCharEncodingHandler
- xmlLookupCharEncodingHandler
Fix a few places where malloc failures weren't reported.
To detect EBCDIC code pages, we used to switch the encoding twice and
had to be very careful not to decode data after the XML declaration
before the second switch. This relied on a hard-coded expected size of
the XML declaration and was complicated and unreliable.
Now we convert the first 200 bytes to EBCDIC-US and parse the encoding
declaration manually.
These functions shouldn't be part of the public API. Most init
functions are only thread-safe when called from xmlInitParser. Global
variables should only be cleaned up by calling xmlCleanupParser.
By always setting flush=TRUE when doing multiple reads, ICU
will not correctly handle truncated utf8 chars across read
boundaries.
The fix is to set flush=TRUE only on final read, and to
provide a pivot buffer which is maintained by libxml
between calls to ucnv_convertEx.
Synchronized the header files with the library code in order
to assure that all the various conditionals (LIBXML_xxxx_ENABLED)
were the same in both. Modified the API database content to more
accurately reflect the conditionals. Enhanced the generation
of that database. Although there was no substantial change to
any of the library code's logic, a large number of files were
modified to achieve the above, and the configuration script
was enhanced to do some automatic enabling of features (e.g.
--with-xinclude forces --with-xpath). Additionally, all the format
errors discovered by apibuild.py were corrected.
* configure.in: enhanced cross-checking of options
* doc/apibuild.py, doc/elfgcchack.xsl, doc/libxml2-refs.xml,
doc/libxml2-api.xml, gentest.py: changed the usage of the
<cond> element in module descriptions
* elfgcchack.h, testapi.c: regenerated with proper conditionals
* HTMLparser.c, SAX.c, globals.c, tree.c, xmlschemas.c, xpath.c,
testSAX.c: cleaned up conditionals
* include/libxml/[SAX.h, SAX2.h, debugXML.h, encoding.h, entities.h,
hash.h, parser.h, parserInternals.h, schemasInternals.h, tree.h,
valid.h, xlink.h, xmlIO.h, xmlautomata.h, xmlreader.h, xpath.h]:
synchronized the conditionals with the corresponding module code
* doc/examples/tree2.c, doc/examples/xpath1.c, doc/examples/xpath2.c:
added additional conditions required for compilation
* doc/*.html, doc/html/*.html: rebuilt the docs
* parserInternals.c xmlIO.c encoding.c include/libxml/parser.h
include/libxml/xmlIO.h: added xmlByteConsumed() interface
* doc/*: updated the benchmark rebuilt the docs
* python/tests/Makefile.am python/tests/indexes.py: added a
specific regression test for xmlByteConsumed()
* include/libxml/encoding.h rngparser.c tree.c: small cleanups
Daniel
* encoding.c, parser.c, xmlstring.c, Makefile.am,
include/libxml/Makefile.am, include/libxml/catalog.c,
include/libxml/chvalid.h, include/libxml/encoding.h,
include/libxml/parser.h, include/libxml/relaxng.h,
include/libxml/tree.h, include/libxml/xmlwriter.h,
include/libxml/xmlstring.h:
moved string and UTF8 routines out of parser.c and encoding.c
into a new module xmlstring.c with include file
include/libxml/xmlstring.h mostly using patches from Reid
Spencer. Since xmlChar now defined in xmlstring.h, several
include files needed to have a #include added for safety.
* doc/apibuild.py: added some additional sorting for various
references displayed in the APIxxx.html files. Rebuilt the
docs, and also added new file for xmlstring module.
* configure.in: small addition to help my testing; no effect on
normal usage.
* doc/search.php: added $_GET[query] so that persistent globals
can be disabled (for recent versions of PHP)
* encoding.c, include/libxml/encoding.h: Enhanced the handling of UTF-16,
UTF-16LE and UTF-16BE encodings. Now UTF-16 output is handled internally
by default, with proper BOM and UTF-16LE encoding. Native UTF-16LE and
UTF-16BE encoding will not generate a BOM on output, and will be
automatically recognized on input.
* test/utf16lebom.xml, test/utf16bebom.xml, result/utf16?ebom*: added
regression tests for above.
* include/libxml/*.h include/libxml/*.h.in: modified the file
header to add more informations, painful...
* genChRanges.py genUnicode.py: updated to generate said changes
in headers
* doc/apibuild.py: extract headers, add them to libxml2-api.xml
* *.html *.xsl *.xml: updated the stylesheets to flag geprecated
APIs modules. Updated the stylesheets, some cleanups, regenerated
* doc/html/*.html: regenerated added back book1 and libxml-lib.html
Daniel
* doc/html/*.html: Finally - found the problem with the
page generation (XMLPUBFUN not recognized by gtkdoc).
Re-created the pages using a temporary version of
include/libxml/*.h.
* testOOMlib.c,include/libxml/encoding.h,
include/libxml/schemasInternals.h,include/libxml/valid.h,
include/libxml/xlink.h,include/libxml/xmlwin32version.h,
include/libxml/xmlwin32version.h.in,
include/libxml/xpathInternals.h: minor edit of comments
to help automatic documentation generation
* doc/docdescr.doc: small elaboration
* doc/examples/test1.c,doc/examples/Makefile.am: re-commit
(messed up on last try)
* xmlreader.c: minor change to clear warning.
* HTMLparser.c: fixed problem with comments reported by Nick Kew
* encoding.c: added routines xmlUTF8Size and xmlUTF8Charcmp for
some future cleanup of UTF8 handling
* encoding.c include/libxml/encoding.h: Opening the interface
xmlNewCharEncodingHandler as requested in #89415
* python/generator.py python/setup.py.in: applied cleanup
patches from Marc-Andre Lemburg
* tree.c: fixing bug #89332 on a specific case of loosing
the XML-1.0 namespace on xml:xxx attributes
Daniel
* include/libxml/encoding.h include/libxml/entities.h
include/libxml/globals.h include/libxml/parser.h
include/libxml/threads.h include/libxml/tree.h
include/libxml/xmlmemory.h: trying to fix the include mess
Daniel
* SAX.c: cleanup patch from Anthony Jones
* doc/Makefile.am: fix the headers to avoid in make scan
* parserInternals.c xpath.c include/libxml/*.h: cleanup of the
includes, * vs Ptr and general cleanup
* parsedecl.py: first version of a script to extract the
module interfaces, the goal will be to provide .decl or XML
specification of the interfaces to build wrappers.
Daniel
* configure.in: preparing 2.4.6 release
* doc/xml.html doc/html/*: updated and rebuilt the docs
* include/libxml/*.h *.c: fixed a number of teh/the widht/width typos
Daniel
* AUTHORS: added William and Bjorn
* include/libxml/*.h *.c README doc/*.html etc.: changed old email to
daniel@veillard.com hopefully I won't have to do this again
* doc/Makefile.am doc/html/*.html: cleanup makefile, checked that
docs can be rebuilt cleanly now
* include/libxml/xml*version.h*: removed include/libxml/xmlversion.h
from CVs it's generated, added include/libxml/xmlwin32version.h
also generated but which should change far less frequently.
* catalog.c nanoftp.c: made sure to include libxml.h not
libxml/xmlversion.h directly
* include/libxml/*.h: include xmlwin32version.h instead of xmlversion.h
when compiling on WIN32 and MSC
Daniel
- result/HTML/cf_128.html* test/HTML/cf_128.html: added the test
to the suite
forgot to commit this one yesterday
- encoding.h hash.c nanoftp.h parser.h tree.h uri.h xlink.h xpointer.c:
applied a documentation patch from LotR and filled in a few missing
descriptions
Daniel
Thu Feb 23 02:03:56 CET 2001 Tomasz K³oczko <kloczek@pld.org.pl>
* *.c *.h libxml files: moved to libxml directory - this allow
simplify automake/autoconf. Now isn't neccessary hack on
am/ac level for make and remove libxml symlink (modified for this
also configure.in and main Makefile.am). Now automake abilities
are used in best way (like in many other projects with libraries).
* include/win32config.h: moved to libxml directory (now include
directory isn't neccessary).
* Makefile.am, examples/Makefile.am, libxml/Makefile.am:
added empty DEFS and in INCLUDES rest only -I$(top_builddir) -
this allow minimize parameters count passed to libtool script
(now compilation is also slyghtly more quiet).
* configure.in: simplifies libzdetestion - prepare separated
variables for keep libz name and path to libz header files isn't
realy neccessary (if someone have libz installed in non standard
prefix path to header files ald library can be passed as:
$ CFALGS="-I</libz.h/path>" LDFLAGS="-L</libz/path>" ./configure
* autogen.sh: check now for libxml/entities.h.
After above building libxml pass correctly and also pass
"make install DESTDIR=</install/prefix>" from tar ball generated by
"make dist". Seems ac/am reorganization is finished. This changes
not touches any other things on *.{c,h} files level.
popped out a couple of bugs and 3 speed issues, there is only
on minor speed issue left. Assorted collection of user reported
bugs and fixes:
- doc/encoding.html: added encoding aliases doc
- doc/xml.html: updates
- encoding.[ch]: added EncodingAliases functions
- entities.[ch] valid.[ch] debugXML.c: removed two serious
bottleneck affecting large DTDs like Docbook
- parser.[ch] xmllint.c: added a pedantic option, will be useful
- SAX.c: redefinition of entities is reported in pedantic mode
- testHTML.c: uninitialized warning from gcc
- uri.c: fixed a couple of bugs
- TODO: added issue raised by Michael
Daniel