106 Commits

Author SHA1 Message Date
Nick Wellnhofer
0f4f89005d parser: Rename inputPush to xmlCtxtPushInput 2024-11-19 00:25:23 +01:00
Nick Wellnhofer
e2ad249c23 parser: Deprecate more internal symbols
- xmlParseExternalSubset
- xmlPushInput
- xmlPopInput
- xmlCopyCharMultiByte
- xmlCreateEntityParserCtxt
- xmlStringComment
2024-11-19 00:25:23 +01:00
Nick Wellnhofer
c32397d51f html: Improve character class macros 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
6a3c0b0d93 parser: Increase XML_MAX_DICTIONARY_LIMIT
This limit is somewhat arbitrary and can be reached when fuzzing
documents up to 1 MB.

Increase limit to 100 MB and disable limit if XML_PARSE_HUGE is set.
2024-07-22 12:53:00 +02:00
Nick Wellnhofer
8af55c8d20 parser: Rename new input API functions
These weren't made public yet.
2024-07-11 01:33:29 +02:00
Nick Wellnhofer
30ef77554b parser: Don't use deprecated xmlCopyChar 2024-07-02 13:34:11 +02:00
Nick Wellnhofer
221df37529 parser: Support custom charset conversion implementations
Implement xmlCtxtSetCharEncConvImpl. I agree that the name is terrible.
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
3ff8a2c4b8 parser: Deprecate xmlIsLetter 2024-06-27 14:43:10 +02:00
Nick Wellnhofer
1112699cfa legacy: Remove most legacy functions from public headers
Also remove warning messages.
2024-06-17 15:47:42 +02:00
Nick Wellnhofer
4967277931 parser: Make XML_INPUT constants signed
Avoid conversion to unsigned which triggers (harmless) UBSan warnings.
2024-06-16 18:47:12 +02:00
Nick Wellnhofer
ab5e6debd1 parser: Introduce XML_INPUT_NETWORK input flag
This allows to disable network access when creating parser inputs with
xmlInputCreateUrl.
2024-06-12 16:36:12 +02:00
Nick Wellnhofer
b9d2f3c911 parser: Introduce new input API
- xmlInputCreateUrl
- xmlInputCreateMemory
- xmlInputCreateString
- xmlInputCreateFd
- xmlInputCreateIO
- xmlInputSetEncoding

These functions don't take a parser context and work on xmlParserInputs,
replacing functions working on xmlParserInputBuffers.

xmlInputCreateUrl and xmlInputSetEncoding offer fine-grained error
handling.

Several XML_INPUT_* flags offer additional control.
2024-06-12 16:22:52 +02:00
Nick Wellnhofer
b47a95fe31 parser: Don't make xmlCtxtErrIO public 2024-05-20 14:22:56 +02:00
Nick Wellnhofer
a2cc7f5f04 parser: Set depth limit to 2048 with XML_PARSE_HUGE
Deeply nested documents can cause performance problems, so the nesting
depth should always be limited to a reasonable value.

Also remove the global xmlParserMaxDepth setting which isn't thread-safe
and seems unused.
2024-01-02 19:42:06 +01:00
Nick Wellnhofer
23345a1cb1 io: Report IO errors through xmlCtxtErrIO
This is also a new public API function to be used in external entity
loaders.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
54c70ed57f parser: Improve error handling
Introduce xmlCtxtSetErrorHandler allowing to set a structured error for
a parser context. There already was the "serror" SAX handler but this
always receives the parser context as argument.

Start to use xmlRaiseMemoryError.

Remove useless arguments from memory error functions. Rename
xmlErrMemory to xmlCtxtErrMemory.

Remove a few calls to xmlGenericError.

Remove support for runtime entity debugging.
2023-12-21 02:46:27 +01:00
Nick Wellnhofer
f19a95108a parser: Report malloc failures
Fix many places where malloc failures aren't reported.

Make xmlErrMemory public. This is useful for custom external entity
loaders.

Introduce new API function xmlSwitchEncodingName.

Change the way how we store whether the the parser is stopped. This used
to be signaled by setting ctxt->instate to XML_PARSER_EOF which was
misdesigned and error-prone. Set ctxt->disableSAX to 2 instead and
introduce a macro PARSER_STOPPED. Also stop to remove parser inputs in
xmlHaltParser. This allows to remove many checks of ctxt->instate.

Introduce xmlErrParser to handle errors if a parser context is
available.
2023-12-11 22:13:05 +01:00
Nick Wellnhofer
a77f9ab84c globals: Don't include SAX2.h from globals.h 2023-09-20 22:06:49 +02:00
Nick Wellnhofer
3ffcc03b16 parser: Deprecate more internal functions 2023-04-26 20:23:23 +02:00
Nick Wellnhofer
e7c3a4ca1b parser: Deprecate some parser input functions 2023-03-13 19:19:46 +01:00
Nick Wellnhofer
bd63d730b8 html: Impose some length limits
Impose length limits on names, attribute values, PIs and comments,
similar to the XML parser.
2023-03-12 17:40:55 +01:00
Nick Wellnhofer
b47ebf047e parser: Deprecate xmlString*DecodeEntities
These are internal functions.
2022-12-21 21:06:03 +01:00
Nick Wellnhofer
ce9baf94d5 Remove XMLCALL and XMLCDECL macros from public headers 2022-12-08 02:48:27 +01:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
Nick Wellnhofer
48f84ea8ed Remove internal macros from parserInternals.h
Replace MOVETO_ENDTAG with code that updates line and column numbers.
2022-08-25 21:31:08 +02:00
Nick Wellnhofer
58fc89e8a9 Deprecate internal parser functions 2022-08-25 21:04:57 +02:00
Nick Wellnhofer
34a050cdee Move some HTML functions to correct header file 2022-08-24 16:44:39 +02:00
Nick Wellnhofer
aab584dc31 Clean up encoding switching code
- Remove xmlSwitchToEncodingInt which was basically just a wrapper
  around xmlSwitchInputEncodingInt.
- Simplify xmlSwitchEncoding.
- Improve error handling in xmlSwitchInputEncodingInt.
- Deprecate xmlSwitchInputEncoding.
2022-04-02 19:09:12 +02:00
Nick Wellnhofer
40483d0ce2 Deprecate module init and cleanup functions
These functions shouldn't be part of the public API. Most init
functions are only thread-safe when called from xmlInitParser. Global
variables should only be cleaned up by calling xmlCleanupParser.
2022-03-06 15:59:43 +01:00
Nick Wellnhofer
cf4893f7b3 Deprecate legacy functions 2022-02-20 21:49:04 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
David Kilzer
4472c3a5a5 Fix some format string warnings with possible format string vulnerability
For https://bugzilla.gnome.org/show_bug.cgi?id=761029

Decorate every method in libxml2 with the appropriate
LIBXML_ATTR_FORMAT(fmt,args) macro and add some cleanups
following the reports.
2016-05-23 15:01:07 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
52d8ade7a7 Introduce some default parser limits
Those can be overrided by the XML_PARSE_HUGE option, they
are just default limits for Name lenght, dictionary size limits
and maximum amount of parser lookup.
* include/libxml/parserInternals.h: define the limits
* include/libxml/xmlerror.h: add a new error
* parser.c parserInternals.c: implements the new limits
2012-07-30 10:08:45 +08:00
Daniel Veillard
97ff9b367a preparing 0.7.3 release fix a typo in a name Daniel
* configure.in doc/xml.html doc/*: preparing 0.7.3 release
* include/libxml/parserInternals.h SAX2.c: fix a typo in a name
Daniel

svn path=/trunk/; revision=3814
2009-01-18 21:43:30 +00:00
Daniel Veillard
d4d4705780 apply patch from Marcus Meissner to add gcc attribute alloc_size should
* include/libxml/xmlversion.h.in include/libxml/xmlmemory.h:
  apply patch from Marcus Meissner to add gcc attribute alloc_size
  should fix #552505
* doc/apibuild.py doc/* testapi.c: regenerate the API
* include/libxml/parserInternals.h: fix a comment problem raised
  by apibuild.py
daniel

svn path=/trunk/; revision=3811
2009-01-18 17:26:02 +00:00
Daniel Veillard
1fb2e0dfc6 add a new define XML_MAX_TEXT_LENGHT limiting the maximum size of a single
* include/libxml/parserInternals.h SAX2.c: add a new define
  XML_MAX_TEXT_LENGHT limiting the maximum size of a single text
  node, the defaultis 10MB and can be removed with the HUGE
  parsing option
Daniel

svn path=/trunk/; revision=3808
2009-01-18 14:08:36 +00:00
Daniel Veillard
a8f09ce8d3 cleanup entity pushing error handling based on a patch from Ashwin daniel
* include/libxml/parserInternals.h parser.c: cleanup entity
  pushing error handling based on a patch from Ashwin
daniel

svn path=/trunk/; revision=3779
2008-08-27 13:02:01 +00:00
William M. Brack
21e4ef20f6 Re-examined the problems of configuring a "minimal" library.
Synchronized the header files with the library code in order
to assure that all the various conditionals (LIBXML_xxxx_ENABLED)
were the same in both.  Modified the API database content to more
accurately reflect the conditionals.  Enhanced the generation
of that database.  Although there was no substantial change to
any of the library code's logic, a large number of files were
modified to achieve the above, and the configuration script
was enhanced to do some automatic enabling of features (e.g.
--with-xinclude forces --with-xpath).  Additionally, all the format
errors discovered by apibuild.py were corrected.
* configure.in: enhanced cross-checking of options
* doc/apibuild.py, doc/elfgcchack.xsl, doc/libxml2-refs.xml,
  doc/libxml2-api.xml, gentest.py: changed the usage of the
  <cond> element in module descriptions
* elfgcchack.h, testapi.c: regenerated with proper conditionals
* HTMLparser.c, SAX.c, globals.c, tree.c, xmlschemas.c, xpath.c,
  testSAX.c: cleaned up conditionals
* include/libxml/[SAX.h, SAX2.h, debugXML.h, encoding.h, entities.h,
  hash.h, parser.h, parserInternals.h, schemasInternals.h, tree.h,
  valid.h, xlink.h, xmlIO.h, xmlautomata.h, xmlreader.h, xpath.h]:
  synchronized the conditionals with the corresponding module code
* doc/examples/tree2.c, doc/examples/xpath1.c, doc/examples/xpath2.c:
  added additional conditions required for compilation
* doc/*.html, doc/html/*.html: rebuilt the docs
2005-01-02 09:53:13 +00:00
William M. Brack
d1757abcb8 added two new macros IS_ASCII_LETTER and IS_ASCII_DIGIT used with (html)
* include/libxml/parserInternals.h: added two new macros
  IS_ASCII_LETTER and IS_ASCII_DIGIT used with (html)
  parsing and xpath for testing data not necessarily
  unicode.
* HTMLparser.c, xpath.c: changed use of IS_LETTER_CH and
  IS_DIGIT_CH macros to ascii versions (bug 153936).
2004-10-02 22:07:48 +00:00
Daniel Veillard
29b1748205 small typo pointed out by Mike Hommey slightly improved the --c14n
* xmlIO.c: small typo pointed out by Mike Hommey
* doc/xmllint.xml, xmllint.html, xmllint.1: slightly improved
  the --c14n description, c.f. #144675 .
* nanohttp.c nanoftp.c: applied a first simple patch from
  Mike Hommey for $no_proxy, c.f. #133470
* parserInternals.c include/libxml/parserInternals.h
  include/libxml/xmlerror.h: cleanup to avoid 'error' identifier
  in includes #
* parser.c SAX2.c debugXML.c include/libxml/parser.h:
  first version of the inplementation of parsing within
  the context of a node in the tree #142359, new function
  xmlParseInNodeContext(), added support at the xmllint --shell
  level as the "set" function
* test/scripts/set* result/scripts/* Makefile.am: extended
  the script based regression tests to instrument the new function.
Daniel
2004-08-16 00:39:03 +00:00
Daniel Veillard
be5869729a modified the file header to add more informations, painful... updated to
* include/libxml/*.h include/libxml/*.h.in: modified the file
  header to add more informations, painful...
* genChRanges.py genUnicode.py: updated to generate said changes
  in headers
* doc/apibuild.py: extract headers, add them to libxml2-api.xml
* *.html *.xsl *.xml: updated the stylesheets to flag geprecated
  APIs modules. Updated the stylesheets, some cleanups, regenerated
* doc/html/*.html: regenerated added back book1 and libxml-lib.html
Daniel
2003-11-18 20:56:51 +00:00
Daniel Veillard
61b9338c0f implemented the XML_PARSE_NONET parser option. converted xmllint.c to use
* parser.c xmlIO.c include/libxml/parserInternals.h: implemented
  the XML_PARSE_NONET parser option.
* xmllint.c: converted xmllint.c to use the option instead of
  relying on the global resolver variable.
Daniel
2003-11-03 14:28:31 +00:00
Daniel Veillard
a840b69261 Fixed the HTTP<->parser interraction, which should fix 2 long standing
* include/libxml/nanohttp.h include/libxml/parserInternals.h
  include/libxml/xmlIO.h nanohttp.c parserInternals.c xmlIO.c:
  Fixed the HTTP<->parser interraction, which should fix 2 long
  standing bugs #104790 and #124054 , this also fix the fact that
  HTTP error code (> 400) should not generate data, we usually
  don't want to parse the HTML error information instead of the
  resource looked at.
Daniel
2003-10-19 13:35:37 +00:00
William M. Brack
76e95df055 Changed all (?) occurences where validation macros (IS_xxx) had
* include/libxml/parserInternals.h HTMLparser.c HTMLtree.c
  SAX2.c catalog.c debugXML.c entities.c parser.c relaxng.c
  testSAX.c tree.c valid.c xmlschemas.c xmlschemastypes.c
  xpath.c: Changed all (?) occurences where validation macros
  (IS_xxx) had single-byte arguments to use IS_xxx_CH instead
  (e.g. IS_BLANK changed to IS_BLANK_CH).  This gets rid of
  many warning messages on certain platforms, and also high-
  lights places in the library which may need to be enhanced
  for proper UTF8 handling.
2003-10-18 16:20:14 +00:00
William M. Brack
871611bb03 enhanced macros to avoid breaking ABI from previous versions. modified to
* genChRanges.py, chvalid.c, include/libxml/chvalid.h,
  include/libxml/parserInternals.h: enhanced macros to avoid
  breaking ABI from previous versions.
* catalog.c, parser.c, tree.c: modified to use IS_* macros
  defined in parserInternals.h.  Makes maintenance much easier.
* testHTML.c, testSAX.c, python/libxml.c: minor fixes to avoid
  compilation warnings
* configuration.in: fixed pushHTML test error; enhanced for
  better devel (me) testing
2003-10-18 04:53:14 +00:00
Daniel Veillard
4aede2e66b remove the warning for startDocument(), as it is used by glade (or
* legacy.c: remove the warning for startDocument(), as it is used by
  glade (or glade-python)
* parser.c relaxng.c xmlschemastypes.c: fixed an assorted set of
  invalid accesses found by running some Python based regression
  tests under valgrind. There is still a few leaks reported by the
  relaxng regressions which need some attention.
* doc/Makefile.am: fixed a make install problem c.f. #124539
* include/libxml/parserInternals.h: addition of xmlParserMaxDepth
  patch from crutcher
Daniel
2003-10-17 12:43:59 +00:00
William M. Brack
68aca051a6 new files for a different method for doing range validation of character
* genChRange.py, chvalid.def, chvalid.c, include/libxml/chvalid.h:
  new files for a different method for doing range validation
  of character data.
* Makefile.am, parserInternals.c, include/libxml/Makefile.am,
  include/libxml/parserInternals.h: modified for new range method.
* catalog.c: small enhance for warning message (using one
  of the new range routines)
2003-10-11 15:22:13 +00:00
Daniel Veillard
ce9457f3aa more cleanup of error handling in parserInternals, sharing the routine for
* parserInternals.c parser.c valid.c include/libxml/parserInternals.h:
  more cleanup of error handling in parserInternals, sharing the
  routine for memory errors.
Daniel
2003-10-05 21:33:18 +00:00
Daniel Veillard
2b8c4a151b changed 'make tests' to use a concise output, scrolling to see where thing
* Makefile.am: changed 'make tests' to use a concise output,
  scrolling to see where thing broke wasn't pleasant
* configure.in: some beta4 preparation, but not ready yet
* error.c globals.c include/libxml/globals.h include/libxml/xmlerror.h:
  new error handling code, last error informations are stored
  in the parsing context or a global variable, new APIs to
  handle the xmlErrorPtr type.
* parser.c parserInternals.c valid.c : started migrating to the
  new error handling code, it's a royal pain.
* include/libxml/parser.h include/libxml/parserInternals.h:
  moved the definition of xmlNewParserCtxt()
* parser.c: small potential buffer access problem in push code
  provided by Justin Fletcher
* result/*.sax result/VC/PENesting* result/namespaces/*
  result/valid/*.err: some error messages were sligthly changed.
Daniel
2003-10-02 22:28:19 +00:00