15 Commits

Author SHA1 Message Date
Nick Wellnhofer
5951179239 html: Parse named character references according to HTML5 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
a80f8b64a9 html: Allow attributes in end tags
Attribute are syntactically allowed in HTML5 end tags but otherwise
ignored.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
dcb2abb2fe html: Parse tag and attribute names according to HTML5
HTML5 allows bascially all characters in tag and attribute names.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
93ce33c2b8 Fix several quadratic runtime issues in HTML push parser
Fix a few remaining cases where the HTML push parser would scan more
content during lookahead than being parsed later.

Make sure that htmlParseDocTypeDecl consumes all content up to the
final '>' in case of errors. The old comment said "We shouldn't try to
resynchronize", but ignoring invalid content is also what the HTML5
spec mandates.

Likewise, make htmlParseEndTag skip to the final '>' in invalid end
tags even if not in recovery mode. This is probably the most visible
change in practice and leads to different output for some tests but is
also more in line with HTML5.

Make sure that htmlParsePI and htmlParseComment don't abort if invalid
characters are encountered but log an error and ignore the character.

Change some other end-of-buffer checks to test for a zero byte instead
of relying on IS_CHAR.

Fix usage of IS_CHAR macro in htmlParseScript.
2020-07-23 20:47:35 +02:00
Daniel Veillard
05bcb7ed30 fixed to not send NULL to %s printing cleaning up some of the regression
* HTMLparser.c: fixed to not send NULL to %s printing
* python/tests/error.py result/HTML/doc3.htm.err
  result/HTML/test3.html.err result/HTML/wired.html.err
  result/valid/t8.xml.err result/valid/t8a.xml.err: cleaning
  up some of the regression tests error
Daniel
2003-10-19 14:26:34 +00:00
Daniel Veillard
f403d298c3 more code cleanup, especially around error messages, the HTML parser has
* HTMLparser.c Makefile.am legacy.c parser.c parserInternals.c
  include/libxml/xmlerror.h: more code cleanup, especially around
  error messages, the HTML parser has now been upgraded to the new
  handling.
* result/HTML/*: a few changes in the resulting error messages
Daniel
2003-10-05 13:51:35 +00:00
William M. Brack
3b811174f7 Updated testfiles for error.c fix 2003-05-14 02:53:43 +00:00
Daniel Veillard
77a90a7f8e patch from johan@evenhuis.nl for #107937 fixing some line counting
* HTMLparser.c parser.c parserInternals.c: patch from
  johan@evenhuis.nl for #107937 fixing some line counting
  problems, and some other cleanups.
* result/HTML/: this result in some line number changes
Daniel
2003-03-22 00:04:05 +00:00
Daniel Veillard
0a2a163d2e - HTMLparser.c: Patch from Jonas Borgström
(htmlGetEndPriority): New function, returns
the priority of a certain element.
(htmlAutoCloseOnClose): Only close inline elements if they
all have lower or equal priority.
- result/HTML: this of course changed a number of tests results.
Daniel
2001-05-11 14:18:03 +00:00
Daniel Veillard
a2bc368bc9 - HTMLparser.c: trying to fix the problem reported by Jonas Borgström
- results/HTML/ : a few changes in the output of the HTML tests as
  a result.
- configure.in: tying to fix -liconv where needed
Daniel
2001-05-03 08:27:20 +00:00
Daniel Veillard
56098d4f35 - HTMLparser.c : HTML parsing still sucks ... trying to deal
with madness
- result/HTML/ : this modified the result of the regression tests
  a lot.
Daniel
2001-04-24 12:51:09 +00:00
Daniel Veillard
a3bfca59bf parsing real HTML is a nightmare.
- HTMLparser.c result/HTML/*: revamped the way the HTML
  parser handles end of tags or end of input
Daniel
2001-04-12 15:42:58 +00:00
Daniel Veillard
87b9539573 Large sync between my W3C base and Gnome's one:
- parser.[ch]: added xmlGetFeaturesList() xmlGetFeature() and xmlAddFeature()
- tree.[ch]: added xmlAddChildList()
- xmllint.c: MAP_FAILED macro test
- parser.h: added xmlParseCtxtExternalEntity()
- valid.c: applied bug fixes removed warning
- tree.c: added CDATA block to elements content
- testSAX.c: cleanup of output
- testHTML.c: added SAX testing
- encoding.c: better error recovery
- SAX.c, parser.c: fixed one of the external entity processing of the OASis testsuite
- Makefile.am: added HTML SAX regression tests
- configure.in: bumped to 2.2.2
- test/HTML/ result/HTML: added a few of HTML tests, and added the SAX results

Daniel
2000-08-12 21:12:04 +00:00
Daniel Veillard
32bc74ef98 - doc/encoding.html doc/xml.html: added I18N doc
- encoding.[ch] HTMLtree.[ch] parser.c HTMLparser.c: I18N encoding
  improvements, both parser and filters, added ASCII & HTML,
  fixed the ISO-Latin-1 one
- xmllint.c testHTML.c: added/made visible --encode
- debugXML.c : cleanup
- most .c files: applied patches due to warning on Windows and
  when using Sun Pro cc compiler
- xpath.c : cleanup memleaks
- nanoftp.c : added a TESTING preprocessor flag for standalong
  compile so that people can report bugs more easilly
- nanohttp.c : ditched socklen_t which was a portability mess
  and replaced it with unsigned int.
- tree.[ch]: added xmlHasProp()
- TODO: updated
- test/ : added more test for entities, NS, encoding, HTML, wap
- configure.in: preparing for 2.2.0 release
Daniel
2000-07-14 14:49:25 +00:00
Daniel Veillard
eacbb8d807 Added one of the testuite results, Daniel. 2000-07-01 09:13:46 +00:00