libxml2

c/libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2 synced 2025-03-28 21:33:13 +00:00

Author	SHA1	Message	Date
Nick Wellnhofer	69b83bb68e	encoding: Detect truncated multi-byte sequences with ICU Unlike iconv or the internal converters, ICU consumes truncated multi- byte sequences at the end of an input buffer. We currently check for a non-empty raw input buffer to detect truncated sequences, so this fails with ICU. It might be possible to inspect the pivot buffer pointers, but it seems cleaner to implement a `flush` flag for some encoding and I/O functions. After flushing, we can check for U_TRUNCATED_CHAR_FOUND with ICU, or detect remaining input with other converters. Also fix detection of truncated sequences for HTML, XML content and DTDs with iconv.	2025-03-13 22:15:10 +01:00
Nick Wellnhofer	05bd1720ce	parser: Fix parsing of DTD content Regressed in 2.11. Fixes #868.	2025-03-01 15:18:20 +01:00
Nick Wellnhofer	9f86dae989	test: Add test case for UAF in xmlSchemaIDCFillNodeTables	2025-02-20 11:35:47 +01:00
Nick Wellnhofer	8cf6129bbd	html: Stop implying <p> start tags Only <html>, <head> or <body> should be implied. Opening extra <p> tags has always been a libxml2 quirk.	2025-02-13 20:20:17 +01:00
Nick Wellnhofer	71122421a1	html: Make implied <p> tags more deterministic libxml2's HTML parser adds <p> start tags in some situations. This behavior, which doesn't follow any standard, was added in 2000, see here: http://veillard.com/XML/messages/0655.html Text nodes that only contain whitespace don't imply a <p> tag, but the whitespace check cannot work reliably if we're parsing partial text data which can happen with both pull and push parser. The logic in `areBlanks` is hard to follow. The checks involving `CUR` depend on the position of the input pointer and seem dubious. It's also possible that the behavior changed inadvertently with a later commit. As a result, it's hard to come up with good test cases. We now process leading whitespace before creating implied tags. This is more in line with HTML5 and should avoid at least some issues with partial text data. For example, parsing the string "<head> x" used to result in: <html> <head></head> <body><p> x</p></body> </html> And now results in: <html> <head> </head> <body><p>x</p></body> </html> Except for the implied <p> tag, this matches HTML5.	2025-02-13 14:31:44 +01:00
Nick Wellnhofer	b4d3d87ed2	parser: Fix parsing of doctype declarations Fix some long-standing issues. Fixes #504.	2025-02-02 11:15:45 +01:00
Nick Wellnhofer	080285724b	html: Make data parsing modes work with push parser This can't be solved with a simple scan for a terminator. Instead, we make htmlParseCharData handle incomplete data if the "partial" flag is set.	2025-02-02 11:15:45 +01:00
Nick Wellnhofer	cd220b93d8	valid: Remove duplicate error messages when streaming	2024-12-28 11:55:24 +01:00
Nick Wellnhofer	459146140a	xpath: Fix parsing of non-ASCII names Fix a long-standing issue where QNames starting with a non-ASCII character would be rejected. This became more visible after "streaming" XPath evaluation was disabled since the latter handled non-ASCII names correctly. Fixes #818.	2024-11-05 12:30:44 +01:00
Nick Wellnhofer	ffb058f484	parser: Fix detection of duplicate attributes We really need a second scan if more than one namespace clash was detected.	2024-10-28 20:26:55 +01:00
Nick Wellnhofer	f77ec16db0	html: Optimize htmlParseCharData	2024-10-06 20:04:00 +02:00
Nick Wellnhofer	575be6c1f1	html: Fix line numbers with CRs	2024-10-06 20:04:00 +02:00
Nick Wellnhofer	e179f3ec0e	html: Stop reporting syntax errors It doesn't make much sense to keep the old syntax error handling which doesn't conform to HTML5. Handling HTML5 parser errors is rather involved and not essential for parsers.	2024-10-06 20:04:00 +02:00
Nick Wellnhofer	c6af101728	html: Test tokenizer against html5lib test suite	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	9678163f54	html: Don't check for valid XML characters	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	4eeac30944	html: Start to fix EOF and U+0000 handling	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	17da54c522	html: Normalize newlines	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	3adb396d87	html: Parse bogus comments instead of ignoring them Also treat XML processing instructions as bogus comments.	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	e1834745e0	html: Add character data tests	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	f9ed30e972	html: HTML5 character data states	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	5951179239	html: Parse named character references according to HTML5	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	a80f8b64a9	html: Allow attributes in end tags Attribute are syntactically allowed in HTML5 end tags but otherwise ignored.	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	dcb2abb2fe	html: Parse tag and attribute names according to HTML5 HTML5 allows bascially all characters in tag and attribute names.	2024-10-06 18:13:05 +02:00
Nick Wellnhofer	bd9eed4694	parser: Make unsupported encodings an error in declarations This was changed in 45157261, but in encoding declarations, unsupported encodings should raise a fatal error. Fixes #794.	2024-09-02 19:29:39 +02:00
Nick Wellnhofer	8ae06d5223	SAX2: Don't merge CDATA sections The Document Object Model (DOM) Level 3 Core Specification says: > Adjacent CDATASection nodes are not merged by use of the normalize > method of the Node interface. Fixes #412.	2024-08-29 01:31:19 +02:00
Nick Wellnhofer	322e733b84	xinclude: Fix fallback for text includes Fixes #772.	2024-07-18 19:32:23 +02:00
Nick Wellnhofer	842a044831	valid: Restore ID lookup Revert a change from d025cfbb and don't overwrite ID table entries, so that the first attribute will be returned if there are duplicate IDs. This requires two other changes: - Attributes in entity content are never added to the ID table. This seems reasonable. - Remove the optimization to skip ID lookup when copying and the target document has an empty ID table. This also seems more correct since the document could have ID declarations nevertheless or we could be copying xml:ids into the document for the first time. Fixes #757.	2024-07-03 11:46:06 +02:00
Nick Wellnhofer	30be984a0f	encoding: Rework ISO-8859-X conversion Optimize code. Pass tables as context parameter. Check for XML_ENC_ERR_SPACE.	2024-07-01 18:05:40 +02:00
Nick Wellnhofer	7c11da2d98	tests: Clarify licence of test/intsubset2.xml	2024-06-27 12:49:06 +02:00
Nick Wellnhofer	b8903b9e0d	runtest: Remove result handling from schemasOneTest We only care about errors.	2024-06-22 21:59:03 +02:00
Nick Wellnhofer	e68ccfa988	tests: Port Schematron tests to C	2024-06-22 21:59:03 +02:00
Nick Wellnhofer	1dd5e76a69	xinclude: Don't remove root element Don't replace include element at root with empty nodeset.	2024-06-18 20:12:03 +02:00
Nick Wellnhofer	52ce0d70f9	tests: Add XInclude test for issue #733	2024-06-17 17:35:12 +02:00
Nick Wellnhofer	2608baaf92	parser: Make failure to load main document a warning Revert the change that made failures to load the main document an error. This fixes the --path option of xmllint and xsltproc. Should fix #733.	2024-06-14 20:06:07 +02:00
Nick Wellnhofer	669bd34993	xpointer: Remove support for XPointer locations The latest spec for what it essentially an XPath extension seems to be this working draft from 2002: https://www.w3.org/TR/xptr-xpointer/ The xpointer() scheme is listed as "being reviewed" in the XPointer registry since at least 2006. libxml2 seems to be the only modern software that tries to implement this spec, but the code has many bugs and quality issues. If you configure --with-legacy, old symbols are retained for ABI compatibility.	2024-06-12 18:20:01 +02:00
Nick Wellnhofer	4fefba4cf6	parser: Rework handling of undeclared entities Throw an error if entity substitution was requested. Now we only downgrade to a warning if - XML_PARSE_DTDLOAD wasn't specified, and - entity aren't substituted or XML_PARSE_NO_XXE was specified. Should fix #724.	2024-05-15 17:58:48 +02:00
Nick Wellnhofer	fdc5ff3657	parser: Always throw entity errors if external DTD is loaded When parsing with XML_PARSE_DTDLOAD, missing entities are always an error. Also consolidate behavior when validating. See b717abdd.	2024-05-03 11:52:54 +02:00
Nick Wellnhofer	39e5b35bd0	parser: Don't create undeclared entity refs in substitution mode We never want to create entity reference nodes if entity substitution is enabled. This also applies to undeclared entities.	2024-05-03 11:46:01 +02:00
Nick Wellnhofer	45fe9924f0	parser: Don't create reference in xmlLookupGeneralEntity This should only be done in xmlParseReference. The handling of undeclared entities is still somewhat inconsistent. In element content we create references even if entity substitution is enabled. In attribute values undeclared entities are always ignored.	2024-04-23 18:36:15 +02:00
Nick Wellnhofer	b717abdd09	parser: Consolidate error handling for undeclared entities Always use XML_WAR_UNDECLARED_ENTITY with warning error level in documents with external subset or parameter entities. Use XML_ERR_UNDECLARED_ENTITY otherwise.	2024-04-23 18:36:15 +02:00
Nick Wellnhofer	f506ec6654	parser: Always decode entities in namespace URIs Also decode entities in namespace URIs if entity substitution wasn't requested. This should fix some corner cases when comparing namespace URIs. The Namespaces in XML 1.0 spec says: > In a namespace declaration, the URI reference is the normalized value > of the attribute, so replacement of XML character and entity > references has already been done before any comparison. Make the serialization code escape special characters in namespace URIs like in attribute values. This fixes serialization if entities were substituted when parsing. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/106	2024-04-15 12:34:26 +02:00
Seiya Nakata	5bb84b47b8	relaxng: Fix tree corruption in xmlRelaxNGParseNameClass Don't create cycles in tree structure. This will lead to an infinite loop or call stack overflow later. Closes: https://gitlab.gnome.org/GNOME/libxml2/-/issues/711	2024-04-05 13:45:06 +02:00
Nick Wellnhofer	f43197fca7	tree: Don't coalesce text nodes in xmlAdd{Prev,Next}Sibling Commit 9e1c72da from 2001 introduced a bug where xmlAddPrevSibling and xmlAddNextSibling would only try to merge text nodes with one of its new siblings. Commit 4ccd3eb8 fixed this bug but unfortunately, lxml and possibly other downstream code depend on text nodes not being merged. To avoid breaking downstream code while still having somewhat consistent API behavior, it's probably best to make these functions never coalesce text nodes.	2024-03-29 14:21:11 +01:00
Nick Wellnhofer	4ccd3eb80f	tree: Refactor node insertion Also fixes a text coalescing bug.	2024-03-15 19:54:26 +01:00
Nick Wellnhofer	186562a182	parser: Fix detection of duplicate attributes in XML namespace Fixes a regression from commit e0dd330b, resulting in duplicate attributes in the predefined XML namespace not being detected or extraneous default attributes being passed. Fixes #704.	2024-03-12 20:02:52 +01:00
Nick Wellnhofer	63986c45b9	parser: Report fatal error if document entity couldn't be loaded Only lower error level when loading entities. Fixes #667.	2024-01-22 21:07:41 +01:00
Nick Wellnhofer	29beef653c	parser: Pop inputs if parsing DTD failed This should provide some statistics in ctxt->sizeentcopy even in the error or recovery case.	2024-01-10 15:58:23 +01:00
Nick Wellnhofer	f237e5b934	parser: Avoid duplicate namespace errors Don't report an extra attribute uniqueness error if a namespace is undeclared. This matches old behavior.	2024-01-05 20:39:40 +01:00
Nick Wellnhofer	07c05546fa	error: Make xmlFormatError public This is a useful function to get a verbose error report. Allows to remove duplicated code from runtest.c. Also reactivate check for schema parser failures.	2024-01-04 15:41:43 +01:00
Nick Wellnhofer	d0eb5a7e54	parser: Remove xmlErrEncodingInt Convert the last user to xmlFatalErr.	2024-01-04 15:28:57 +01:00

1 2 3 4 5 ...

645 Commits