libxml2

c/libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2 synced 2025-03-28 21:33:13 +00:00

Author	SHA1	Message	Date
Nick Wellnhofer	d944a41515	parser: Fix in-parameter-entity and in-external-dtd checks Use in ctxt->input->entity instead of ctxt->inputNr to determine whether we are inside a parameter entity. Stop using ctxt->external to check whether we're in an external DTD. This is signaled by ctxt->inSubset == 2.	2023-12-29 01:19:56 +01:00
Nick Wellnhofer	c1bddd4c26	parser: Mark 'length' member of xmlParserInput as unused	2023-12-25 23:38:40 +01:00
Nick Wellnhofer	955c177f69	parser: Stop using 'directory' struct member This was only used as a pointless fallback for URI resolution.	2023-12-25 23:38:40 +01:00
Nick Wellnhofer	54c70ed57f	parser: Improve error handling Introduce xmlCtxtSetErrorHandler allowing to set a structured error for a parser context. There already was the "serror" SAX handler but this always receives the parser context as argument. Start to use xmlRaiseMemoryError. Remove useless arguments from memory error functions. Rename xmlErrMemory to xmlCtxtErrMemory. Remove a few calls to xmlGenericError. Remove support for runtime entity debugging.	2023-12-21 02:46:27 +01:00
Nick Wellnhofer	5d2dbe79fa	parser: Fix build --without-output Fixes #647	2023-12-14 13:48:41 +01:00
Nick Wellnhofer	df0b540b3e	include: Rename XML_EMPTY helper macro Avoid name clash with downstream projects.	2023-12-07 14:59:47 +01:00
Nick Wellnhofer	a9738e311c	include: Move declaration of xmlInitGlobals Fix downstream build issues after reworking globals.h.	2023-12-07 14:59:40 +01:00
Nick Wellnhofer	9122ad0ce6	include: Move globals from xmlsave.h to parser.h Fix downstream build issues after reworking globals.h.	2023-12-07 12:31:06 +01:00
Nick Wellnhofer	c011e7605d	globals: Remove unused globals from thread storage Setting these deprecated globals hasn't had an effect for a long time. Make them constants. This reduces the size of per-thread storage from ~700 to ~250 bytes.	2023-12-06 20:07:54 +01:00
Nick Wellnhofer	ff6c318862	include: Remove useless 'const' from function arguments	2023-11-23 15:27:00 +01:00
Nick Wellnhofer	aca37d8c77	parser: Only enable SAX2 if there are SAX2 element handlers This reverts part of commit 235b15a5 for backward compatibility and adds some comments trying to clarify the whole mess. Fixes #623.	2023-11-20 15:20:37 +01:00
Nick Wellnhofer	e0dd330b8f	parser: Use hash tables to avoid quadratic behavior Use a hash table to lookup namespaces by prefix. The hash table stores an index into the namespace table. Auxiliary data for namespaces is stored in a separate array along the main namespace table. Use a hash table to verify attribute uniqueness. The hash table stores an index into the attribute table. Reuse hash value from the dictionary to avoid computing them twice. See #346.	2023-09-29 12:43:22 +02:00
Nick Wellnhofer	8c084ebdc7	doc: Make apibuild.py happy	2023-09-21 22:57:33 +02:00
Nick Wellnhofer	72262030a6	parser: Readd some includes to parser.h and xmlreader.h Fix backward compatibility.	2023-09-21 15:06:05 +02:00
Nick Wellnhofer	da274bfa55	build: Fix build when certain modules are disabled	2023-09-21 02:26:43 +02:00
Nick Wellnhofer	d6ba403368	globals: Move remaining declarations to correct places globals.h is now deprecated. Sanity is restored.	2023-09-20 22:22:51 +02:00
Nick Wellnhofer	11a1839ddd	globals: Move remaining globals back to correct header files This undoes a lot of damage.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	d1336fd393	globals: Move malloc hooks back to xmlmemory.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	2e6c49a74d	globals: Don't store xmlParserVersion in global state This is a constant.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	db8b9722cb	parser: Deprecate global parser options Note that setting global options has no effect anyway when using any of the modern parser API functions which take an option argument like xmlReadMemory or when using xmlCtxtUseOptions. Global options only have an effect when using old API functions xmlParse* or xmlSAXParse* or when using an xmlParserCtxt without calling xmlCtxtUseOptions. Unfortunately, many downstream projects still modify global parser options often without realizing that it has no effect. If necessary, switch to the modern API. Then you can safely remove all code that changes global options. Here's a list of deprecated functions and global variables together with the corresponding parser options. - xmlSubstituteEntitiesDefault, xmlSubstituteEntitiesDefaultValue Parser option XML_PARSE_NOENT - xmlKeepBlanksDefault, xmlKeepBlanksDefaultValue Inverse of parser option XML_PARSE_NOBLANKS - xmlPedanticParserDefault, xmlPedanticParserDefaultValue Parser option XML_PARSE_PEDANTIC - xmlLineNumbersDefault, xmlLineNumbersDefaultValue Always enabled by new API - xmlDoValidityCheckingDefaultValue Parser option XML_PARSE_DTDVALID - xmlGetWarningsDefaultValue Inverse of parser option XML_PARSE_NOWARNING - xmlLoadExtDtdDefaultValue Parser options XML_PARSE_DTDLOAD and XML_PARSE_DTDATTR	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	ed3bd05284	parser: Allow to set maximum amplification factor	2023-08-20 20:49:16 +02:00
Nick Wellnhofer	ec7be50662	parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.	2023-08-08 15:19:46 +02:00
Nick Wellnhofer	e7c3a4ca1b	parser: Deprecate some parser input functions	2023-03-13 19:19:46 +01:00
Nick Wellnhofer	59b3366178	error: Limit number of parser errors Reporting errors is expensive and some abusive test cases can generate an error for each invalid input byte. This causes the parser to spend most of the time with error handling. Limit the number of errors and warnings to 100.	2022-12-27 14:41:19 +01:00
Nick Wellnhofer	ce76ebfd13	entities: Stop counting entities This was only used in the old version of xmlParserEntityCheck.	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	463bbeeca1	entities: Rework entity amplification checks This commit implements robust detection of entity amplification attacks, better known as the "billion laughs" attack. We now limit the size of the document after substitution of entities to 10 times the size before expansion. This guarantees linear behavior by definition. There already was a similar check before, but the accounting of "sizeentities" (size of external entities) and "sizeentcopy" (size of all copies created by entity references) wasn't accurate. We also need saturation arithmetic since we're historically limited to "unsigned long" which is 32-bit on many platforms. A maximum of 10 MB of substitutions is always allowed. This should make use cases like DITA work which have caused problems in the past. The old checks based on the number of entities were removed. This is accounted for by adding a fixed cost to each entity reference. Entity amplification checks are now enabled even if XML_PARSE_HUGE is set. This option is mainly used to allow larger text nodes. Most users were unaware that it also disabled entity expansion checks. Some of the limits might be adjusted later. If this change turns out to affect legitimate use cases, we can add a separate parser option to disable the checks. Fixes #294. Fixes #345.	2022-12-21 20:19:10 +01:00
Nick Wellnhofer	ce9baf94d5	Remove XMLCALL and XMLCDECL macros from public headers	2022-12-08 02:48:27 +01:00
Nick Wellnhofer	68a6518c45	parser: Rewrite push parser boundary checks Remove inaccurate xmlParseCheckTransition check. Remove non-incremental xmlParseGetLasts check. Add functions that check for several boundary constructs more accurately, keeping track of progress in ctxt->checkIndex. Fixes #439.	2022-11-20 21:27:08 +01:00
Nick Wellnhofer	65dc8a63ac	Make xmlNewSAXParserCtx take a const sax handler Also improve documentation.	2022-09-01 00:17:45 +02:00
Nick Wellnhofer	51035c539e	Generate deprecation warnings for old SAX API	2022-08-25 20:17:03 +02:00
Nick Wellnhofer	9a82b94a94	Introduce xmlNewSAXParserCtxt and htmlNewSAXParserCtxt Add API functions to create a parser context with a custom SAX handler without having to mess with ctxt->sax manually.	2022-08-24 14:07:55 +02:00
Nick Wellnhofer	4a8c71eb7c	Remove DOCBparser This code has been broken and deprecated since version 2.6.0, released in 2003. Because of a bug in commit 961b535c, DOCBparser.c was never compiled since 2012. I couldn't find a Debian package using any of its symbols, so it seems safe to remove this module.	2022-03-04 22:56:21 +01:00
Nick Wellnhofer	ebb1797030	Remove unneeded #includes	2022-03-04 22:11:49 +01:00
Nick Wellnhofer	cf4893f7b3	Deprecate legacy functions	2022-02-20 21:49:04 +01:00
Nick Wellnhofer	ce00c36e65	Store per-element parser state in a struct Make the parser context's "pushTab" point to an array of structs instead of void pointers. This avoids casting unrelated types to void pointers, improving readability and portability, and allows for more efficient packing. Ultimately, the struct could be extended to include the contents of "nameTab" and "spaceTab", further simplifying the code. Historically, "pushTab" was only used by the push parser (hence the name), so the change to the public headers should be safe. Also remove an unused parameter from xmlParseEndTag2.	2021-05-08 22:16:49 +02:00
Nick Wellnhofer	438e595a8c	Stop counting nbChars in parser context The value was inaccurate and never used.	2020-08-09 15:01:45 +02:00
Nick Wellnhofer	20c60886e4	Fix typos Resolves #133.	2020-03-08 17:41:53 +01:00
Jared Yanovich	2a350ee9b4	Large batch of typo fixes Closes #109.	2019-09-30 18:04:38 +02:00
Nick Wellnhofer	030b1f7a27	Revert "Add an XML_PARSE_NOXXE flag to block all entities loading even local" This reverts commit 2304078555896cf1638c628f50326aeef6f0e0d0. The new flag doesn't work and the change even broke the XML_PARSE_NONET option.	2017-06-06 15:53:42 +02:00
Doran Moppert	2304078555	Add an XML_PARSE_NOXXE flag to block all entities loading even local For https://bugzilla.gnome.org/show_bug.cgi?id=772726 * include/libxml/parser.h: Add a new parser flag XML_PARSE_NOXXE * elfgcchack.h, xmlIO.h, xmlIO.c: associated loading routine * include/libxml/xmlerror.h: new error raised * xmllint.c: adds --noxxe flag to activate the option	2017-04-07 16:55:05 +02:00
Jan Pokorný	bb654feb9a	Fix typos: dictio{ nn -> n }ar{y,ies} Signed-off-by: Jan Pokorný <jpokorny@redhat.com>	2016-04-15 22:22:48 +08:00
Daniel Veillard	23f05e0c33	Detect excessive entities expansion upon replacement If entities expansion in the XML parser is asked for, it is possble to craft relatively small input document leading to excessive on-the-fly content generation. This patch accounts for those replacement and stop parsing after a given threshold. it can be bypassed as usual with the HUGE parser option.	2013-02-19 10:21:49 +08:00
Daniel Veillard	f8e3db0445	Big space and tab cleanup Remove all space before tabs and space and tabs at end of lines.	2012-09-11 13:26:36 +08:00
Daniel Veillard	968a03a2e5	Add support for big line numbers in error reporting Fix the lack of line number as reported by Johan Corveleyn <jcorvel@gmail.com> * parser.c include/libxml/parser.h: add an XML_PARSE_BIG_LINES parser option not switch on by default, it's an opt-in * SAX2.c: if XML_PARSE_BIG_LINES is set store the long line numbers in the psvi field of text nodes * tree.c: expand xmlGetLineNo to extract those informations, also make sure we can't fail on recursive behaviour * error.c: in __xmlRaiseError, if a node is provided, call xmlGetLineNo() if we can't get a valid line number. * xmllint.c: switch on XML_PARSE_BIG_LINES in xmllint	2012-08-13 12:41:33 +08:00
Daniel Veillard	0d51cfebc9	Fix a race in xmlNewInputStream For https://bugzilla.gnome.org/show_bug.cgi?id=643148 Reported by Bill Clarke <llib@computer.org>, it used a global variable as a counter for the input id and this was not thread safe. To avoid the race without adding unneeded locking in the parser path, move the id to the parser context instead.	2012-05-15 11:18:40 +08:00
Anders F Bjorklund	eae5261779	add lzma compression support	2012-01-27 22:19:52 +08:00
Daniel Veillard	c62efc847c	Add options to ignore the internal encoding For both XML and HTML, the document can provide an encoding either in XMLDecl in XML, or as a meta element in HTML head. This adds options to ignore those encodings if the encoding is known in advace for example if the content had been converted before being passed to the parser. * parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option for XML parsing * include/libxml/HTMLparser.h HTMLparser.c: adds the HTML_PARSE_IGNORE_ENC for HTML parsing * HTMLtree.c: fix the handling of saving when an unknown encoding is defined in meta document header * xmllint.c: add a --noenc option to activate the new parser options	2011-05-26 11:47:37 +08:00
Giuseppe Iuculano	48f7dcb724	480323 add code to plug in ICU converters by default This is not configured in by default but after some serious massaging incorporate that patch from Chromium/Chrome.	2010-11-04 17:42:42 +01:00
Eugene Pimenov	615904f582	Switch the HTML parser to be non-recursive * HTMLparser.c: new htmlParseElementInternal non recursive, with htmlParseContentInternal and new function to handle node info and element end. * include/libxml/parser.h: add new stack for element info in parser context * parserInternals.c: fee element info stack	2010-03-15 15:16:02 +01:00
Daniel Veillard	029a04d265	541335 HTML avoid creating 2 head or 2 body element * HTMLparser.c: check when we see an head or a body tag and avoid autogenerating them * include/libxml/parser.h: the values for ctxt->html change depending on the head or body tags being seen	2009-08-24 12:50:23 +02:00

1 2 3 4

183 Commits