libxml2

c/libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2 synced 2025-03-28 21:33:13 +00:00

Author	SHA1	Message	Date
Nick Wellnhofer	10d0947249	Fix .gitattributes The files in 'test' and 'result' have mixed line endings, so disable end-of-line conversion.	2020-07-23 20:46:42 +02:00
Nick Wellnhofer	173a0830dc	Fix quadratic runtime when push parsing HTML start tags Make sure that htmlParseStartTag doesn't terminate on characters for which IS_CHAR_CH is false like control chars. In htmlParseTryOrFinish, only switch to START_TAG if the next character starts a valid name. Otherwise, htmlParseStartTag might return without consuming all characters up to the final '>'. Found by OSS-Fuzz.	2020-07-22 23:33:04 +02:00
David Kilzer	0e5c4fec15	Reset XML parser input before reporting errors Apply changes to htmlParseChunk() in 13ba5b61 and 3f18e748 to xmlParseChunk().	2020-07-19 14:10:33 +02:00
Nick Wellnhofer	6995eed077	Fix quadratic runtime when push parsing HTML entity refs The HTML push parser would look ahead for characters in "; >/" to terminate an entity reference but actual parsing could stop earlier, potentially resulting in quadratic runtime. Parse char data and references alternately in htmlParseTryOrFinish and only look ahead once for a terminating '<' character. Found by OSS-Fuzz.	2020-07-19 14:05:57 +02:00
Nick Wellnhofer	8e219b154e	Fix HTML push parser lookahead The parsing rules when looking for terminating chars or sequences in the push parser differed from the actual parsing code. This could result in the lookahead to overshoot and data being rescanned, potentially leading to quadratic runtime. Comments must never be handled during lookahead. Attribute values must only be skipped for start tags and doctype declarations, not for end tags, comments, PIs and script content.	2020-07-15 16:44:36 +02:00
Nick Wellnhofer	e050062ca9	Make htmlCurrentChar always translate U+0000 The general assumption is that htmlCurrentChar only returns 0 if the end of the input buffer is reached. The UTF-8 path already logged an error if a zero byte U+0000 was found and returned a space character instead. Make the ASCII code path do the same. htmlParseTryOrFinish skips zero bytes at the beginning of a buffer, so even if 0 was returned from htmlCurrentChar, the push parser would make progress. But rescanning the input could cause performance problems. The pull parser would abort parsing and now handles zero bytes in ASCII mode the same way as the push parser or as in UTF-8 mode. It would be better to return the replacement character U+FFFD instead, but some of the client code assumes that the UTF-8 length of input and output matches.	2020-07-15 16:10:13 +02:00
Nick Wellnhofer	dfd4e33048	Rework control flow in htmlCurrentChar Don't call xmlCurrentChar after switching encodings. Rearrange code blocks and fall through to normal UTF-8 handling.	2020-07-15 16:10:13 +02:00
Nick Wellnhofer	922bebccdd	Make 'xmllint --html --push -' read from stdin	2020-07-15 14:20:42 +02:00
Nick Wellnhofer	1493130ef2	Fix UTF-8 decoder in HTML parser Reject sequences starting with a continuation byte as well as overlong sequences like the XML parser. Also fixes an infinite loop in connection with previous commit 50078922 since htmlCurrentChar would return 0 even if not at the end of the buffer. Found by OSS-Fuzz.	2020-07-15 12:54:25 +02:00
Nick Wellnhofer	beb7d71a8f	Remove misleading comments in xpath.c Fixes #169	2020-07-13 12:41:19 +02:00
Nick Wellnhofer	500789224b	Fix quadratic runtime when parsing HTML script content If htmlParseScript returns upon hitting an invalid character, htmlParseLookupSequence will be called again with checkIndex reset to zero, potentially resulting in quadratic runtime. Make sure that htmlParseScript consumes all input in one go and simply skips over invalid characters similar to htmlParseCharDataInternal. Found by OSS-Fuzz.	2020-07-13 12:19:24 +02:00
Andre Klapper	d6761e706f	Update to Devhelp index file format version 2 Fixes #89	2020-07-13 12:18:24 +02:00
Markus Rickert	d514e2bd40	Set project language to C	2020-07-12 18:42:49 +02:00
Markus Rickert	5ddf02f2a5	Update config.h.cmake.in	2020-07-12 18:42:18 +02:00
Markus Rickert	8bec210d4d	Add variable for working directory of XML Conformance Test Suite	2020-07-12 18:42:18 +02:00
Markus Rickert	270e165552	Add additional tests and XML Conformance Test Suite	2020-07-12 18:33:35 +02:00
Markus Rickert	e6ba4bd775	Add command line option for temp directory in runtest	2020-07-12 18:33:35 +02:00
Markus Rickert	40e7ceaaaf	Ensure LF line endings for test files	2020-07-12 18:33:35 +02:00
Markus Rickert	9ecf5ad6b1	Enable runtests and testThreads	2020-07-12 18:33:35 +02:00
Nick Wellnhofer	3f18e7486d	Reset HTML parser input before reporting error Avoid use-after-free, similar to 13ba5b61. Also make sure that xmlBufSetInputBaseCur sets valid pointers in case of buffer errors. Found by OSS-Fuzz.	2020-07-11 14:39:52 +02:00
Nick Wellnhofer	3da8d947df	Fix more quadratic runtime issues in HTML push parser Make sure that checkIndex is set when returning without match from inside a comment. Also track parser state in htmlParseLookupChars. Found by OSS-Fuzz.	2020-07-09 16:08:38 +02:00
Nick Wellnhofer	741b0d0a8b	Fix regression introduced with 477c7f6a The 'inSubset' member is actually used by the SAX2 handlers. Store extra parser state in 'hasPErefs'.	2020-07-07 12:57:01 +02:00
Nick Wellnhofer	fc842f6eba	Limit regexp nesting depth Enforce a maximum nesting depth of 50 for regular expressions. Avoids stack overflows with deeply nested regexes. Found by OSS-Fuzz.	2020-07-06 15:22:12 +02:00
Nick Wellnhofer	1e41e4fa8e	Fix return values and documentation in encoding.c Make xmlEncInputChunk and xmlEncOutputChunk return 0 on success and never a positive value. Make xmlCharEncFirstLineInt, xmlCharEncFirstLineInt and xmlCharEncOutFunc return the number of bytes written.	2020-07-06 15:06:13 +02:00
David Kilzer	6b4717d61d	Add regexp regression tests - Bug 757711: heap-buffer-overflow in xmlFAParsePosCharGroup <https://bugzilla.gnome.org/show_bug.cgi?id=757711> - Bug 783015 - Integer-overflow in xmlFAParseQuantExact <https://bugzilla.gnome.org/show_bug.cgi?id=783015> (Regexptests): Add support for checking stderr output when running regexp tests. This makes it possible to check in test cases that fail and not see false-positive error output when running the tests. Unlike other libxml2 test suites, if there is no stderr output, no *.err file needs to be created.	2020-07-06 12:37:53 +02:00
Nick Wellnhofer	477c7f6aff	Fix quadratic runtime in HTML parser Commit eeb99329 removed an important optimization avoiding quadratic runtime when repeatedly scanning the input buffer for terminating characters in the HTML push parser. The related bug is https://bugzilla.gnome.org/show_bug.cgi?id=444994 Make sure that ctxt->checkIndex is always written and store additional parser state in ctxt->inSubset which is unused in the HTML parser. Found by OSS-Fuzz.	2020-07-06 12:17:20 +02:00
Nick Wellnhofer	f8329fdc23	Report error for invalid regexp quantifiers	2020-07-02 11:54:28 +02:00
Nick Wellnhofer	13ba5b619a	Reset HTML parser input before reporting encoding error If charset conversion fails, reset the input pointers before reporting the error and bailing out. Otherwise, the input pointers are left in an invalid state which could lead to use-after-free and other memory errors. Similar to f9e7997e. Found by OSS-Fuzz.	2020-06-28 13:21:50 +02:00
Nick Wellnhofer	1e7851b5ae	Fix integer overflow in xmlFAParseQuantExact Found by OSS-Fuzz.	2020-06-25 12:18:21 +02:00
Nick Wellnhofer	84bab955fe	Fix return value of xmlC14NDocDumpMemory Make sure to return -1 in case of buffer errors. Fixes #174.	2020-06-24 20:07:32 +02:00
Martin Vidner	43a8836cde	Fix rebuilding docs, by hiding __attribute__((...)) behind a macro. When enabled via `./configure --enable-rebuild-docs`, `make -C doc libxml2-api.xml` will invoke apibuild.py to rebuild libxml2-api.xml from the sources. But the code added in 9fa3200cb366c726f7c8ef234282603bb9e8816d made it error out with ``` Parsing ../parser.c Parse Error: parsing type : expecting a name ('Got token ', ('sep', '(')) ('Last token: ', ('sep', '(')) ('Token queue: ', [('name', 'destructor'), ('sep', ')'), ('sep', ')')]) ('Line 14689 end: ', '') ```	2020-06-24 19:55:52 +02:00
Nick Wellnhofer	9f42f6baaa	Don't follow next pointer on documents in xmlXPathRunStreamEval RVTs from libxslt are document nodes which are linked using the 'next' pointer. These pointers must never be used to navigate the document tree. Otherwise, random content from other RVTs could be returned when evaluating XPath expressions. It's interesting that this seemingly long-standing bug wasn't discovered earlier. This issue could also cause severe performance degradation. Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/37	2020-06-24 15:33:38 +02:00
Nick Wellnhofer	c0440868c3	Copy xs:duration parser from libexslt The duration parser in libexslt checks for integer overflows.	2020-06-23 16:20:28 +02:00
Nick Wellnhofer	18425d3ad5	Fix integer overflow in _xmlSchemaParseGYear Found with libFuzzer and UBSan.	2020-06-23 16:20:28 +02:00
Nick Wellnhofer	070d635e77	Fix integer overflow when parsing {min,max}Occurs Clamp value to INT_MAX. Found with libFuzzer and UBSan.	2020-06-23 16:20:28 +02:00
Nick Wellnhofer	50f18830e1	Fix another memory leak in xmlSchemaValAtomicType Don't collapse language IDs twice. Found with libFuzzer and ASan.	2020-06-23 16:20:28 +02:00
Nick Wellnhofer	eac1c7e2e5	Fuzz target for XML Schemas This only tests the schema parser for now.	2020-06-23 16:20:27 +02:00
Nick Wellnhofer	ffd31dbefd	Move entity recorder to fuzz.c	2020-06-21 12:15:46 +02:00
Nick Wellnhofer	681f094e5b	Fix unsigned integer overflow in htmlParseTryOrFinish Cast to signed type before subtraction to avoid unsigned integer overflow. Also use ptrdiff_t to avoid potential integer truncation. Found with libFuzzer and UBSan.	2020-06-15 21:25:22 +02:00
Nick Wellnhofer	31ca4a728c	Fix integer overflow in htmlParseCharRef Fixes #115.	2020-06-15 21:23:54 +02:00
Nick Wellnhofer	2f9382033e	Fix undefined behavior in UTF16LEToUTF8 Don't perform arithmetic on null pointer. Found with libFuzzer and UBSan.	2020-06-15 21:23:54 +02:00
Nick Wellnhofer	536f421d37	Fuzz target for HTML parser	2020-06-15 15:23:38 +02:00
Nick Wellnhofer	a697ed1e24	Fix return value of xmlCharEncOutput Commit 407b393d introduced a regression caused by xmlCharEncOutput returning 0 in case of success instead of the number of bytes written. Always use its return value for nbchars in xmlOutputBufferWrite. Fixes #166.	2020-06-15 15:23:38 +02:00
Nick Wellnhofer	af893a58c6	Update GitLab CI container	2020-06-11 16:08:16 +02:00
Nick Wellnhofer	a28f7d8789	Never expand parameter entities in text declaration When parsing the text declaration of external DTDs or entities, make sure that parameter entities are not expanded. This also fixes a memory leak in certain error cases. The change to xmlSkipBlankChars assumes that the parser state is maintained correctly when parsing external DTDs or parameter entities, and might expose bugs in the code that were hidden previously. Found by OSS-Fuzz.	2020-06-10 14:25:19 +02:00
Nick Wellnhofer	487871b0e3	Fix undefined behavior in xmlXPathTryStreamCompile &NULL[0] is undefined behavior.	2020-06-10 13:23:43 +02:00
Nick Wellnhofer	e98150d444	Add options file for xml fuzzer This will be picked up OSS-Fuzz, limiting the maximum input size to 80 KB and hopefully avoiding timeouts. Some of the timeouts seem to be related to our suboptimal handling of excessive entity expansion. The new fuzzers support external entities and make this problem even more prominent.	2020-06-09 13:53:06 +02:00
Nick Wellnhofer	2af3c2a8b9	Fix use-after-free with validating reader Just like IDs, IDREF attributes must be removed from the document's refs table when they're freed by a reader. This bug is often hidden because xmlAttr structs are reused and strings are stored in a dictionary unless XML_PARSE_NODICT is specified. Found by OSS-Fuzz.	2020-06-08 14:05:42 +02:00
Nick Wellnhofer	00ed736eec	Add a couple of libFuzzer targets - XML fuzzer Currently tests the pull parser, push parser and reader, as well as serialization. Supports splitting fuzz data into multiple documents for things like external DTDs or entities. The seed corpus is built from parts of the test suite. - Regexp fuzzer Seed corpus was statically generated from test suite. - URI fuzzer Tests parsing and most other functions from uri.c.	2020-06-05 13:53:11 +02:00
Nick Wellnhofer	2e8cc66d8f	xmlParseBalancedChunkMemory must not be called with NULL doc There is no way to avoid memory leaks without a document to hold the namespace list.	2020-05-30 15:43:34 +02:00

1 2 3 4 5 ...

4923 Commits