Nick Wellnhofer
5951179239
html: Parse named character references according to HTML5
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
d5cd0f07f8
html: Prefer SKIP(1) over NEXT in HTML parser
...
Use SKIP(1) where it's safe to avoid a function call.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
dc2d498318
html: Rework htmlLookupSequence
...
Rename to htmlLookupString and use strstr for increased performance.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
637215a4de
html: Always terminate doctype declarations on '>'
...
Align with HTML5 spec. This allows to remove the old quote handling in
htmlLookupSequence.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
72e29f9a3d
html: Fix quadratic behavior in push parser
...
Fix quadratic behavior related to unquoted attribute values. We really
have to replicate parts of the HTML5 state machine to find the end of
tags relibably.
Fixes #533 .
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
a80f8b64a9
html: Allow attributes in end tags
...
Attribute are syntactically allowed in HTML5 end tags but otherwise
ignored.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
f2272c231b
html: Handle unexpected-solidus-in-tag according to HTML5
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
939b53ee12
html: Stop skipping tag content
...
Tag and attributes names should always be parsed succesfully now.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
dcb2abb2fe
html: Parse tag and attribute names according to HTML5
...
HTML5 allows bascially all characters in tag and attribute names.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
5d36664fc9
memory: Deprecate xmlGcMemSetup
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
8af55c8d20
parser: Rename new input API functions
...
These weren't made public yet.
2024-07-11 01:33:29 +02:00
Nick Wellnhofer
d74ca59491
parser: Rename internal xmlNewInput functions
2024-07-11 01:31:50 +02:00
Nick Wellnhofer
4f329dc524
parser: Implement xmlCtxtParseContent
...
This implements xmlCtxtParseContent, a better alternative to
xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a
parser context and a parser input, making it a lot more versatile.
xmlParseInNodeContext is now implemented in terms of
xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never
modifies the target document, improving thread safety.
xmlParseInNodeContext is also more lenient now with regard to undeclared
entities.
Fixes #727 .
2024-07-11 01:26:32 +02:00
Nick Wellnhofer
2e63656ec6
parser: Check return value of inputPush
...
inputPush typically doesn't fail because we pre-allocate the input
table. The return value should be checked nevertheless.
2024-07-08 11:27:52 +02:00
Nick Wellnhofer
fdfeecfe5e
parser: Reenable ctxt->directory
...
Unused internally, but used in downstream code.
Should fix #753 .
2024-07-02 22:06:53 +02:00
Nick Wellnhofer
30ef77554b
parser: Don't use deprecated xmlCopyChar
2024-07-02 13:34:11 +02:00
Nick Wellnhofer
dd8e378513
HTML: Rework UTF8ToHtml
...
Optimize code. Check for XML_ENC_ERR_SPACE. Use error macros.
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
f505dcaea0
tree: Remove underscores from xmlRegisterCallbacks
2024-06-27 14:45:35 +02:00
Nick Wellnhofer
1112699cfa
legacy: Remove most legacy functions from public headers
...
Also remove warning messages.
2024-06-17 15:47:42 +02:00
Nick Wellnhofer
039ce1e821
parser: Pass global object to sax->setDocumentLocator
...
Revert part of commit c011e760.
Fixes #732 .
2024-06-14 16:41:43 +02:00
Nick Wellnhofer
89fcae4dfd
parser: Don't report malloc failures when creating context
...
We don't want messages to stderr before an error handler could be set on
a parser context.
2024-06-12 16:36:12 +02:00
Nick Wellnhofer
e75e878e02
doc: Update and fix documentation
2024-05-20 14:23:39 +02:00
Nick Wellnhofer
a4c2b7233f
io: Don't set close callback in xmlParserInputBufferCreateFd
2024-05-05 17:27:12 +02:00
Nick Wellnhofer
05654cfe00
html: Deprecate htmlHandleOmittedElem
2024-04-28 18:58:27 +02:00
Nick Wellnhofer
aa04838eab
html: Use binary search in htmlEntityValueLookup
2024-03-26 14:21:11 +01:00
Nick Wellnhofer
3efbe916a1
parser: Mark 'token' member as unused in xmlParserCtxt
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
b82fd81d06
parser: Rework xmlCtxtParseDocument
...
Make xmlCtxtParseDocument take a parser input which can be popped after
parsing.
2024-01-05 20:39:40 +01:00
Nick Wellnhofer
7e0bbbc143
parser: New input API
...
Provide a new set of functions to create xmlParserInputs. These can be
used for the document entity or from external entity loaders.
- Don't require xmlParserInputBuffer.
- All functions take a base URI.
- All functions take an encoding as string.
- xmlNewInputURL also takes a public ID.
- xmlNewInputMemory takes a size_t.
- Optimization hints for memory buffers.
Improve documentation.
Only call xmlInitParser before allocating a new parser context.
Call xmlCtxtUseOptions as early as possible.
2023-12-29 01:22:13 +01:00
Nick Wellnhofer
6a9a88a17f
parser: Move progressive flag into input struct
2023-12-29 01:20:08 +01:00
Nick Wellnhofer
d944a41515
parser: Fix in-parameter-entity and in-external-dtd checks
...
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.
Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
2023-12-29 01:19:56 +01:00
Nick Wellnhofer
477a7ed82c
html: Abort earlier on fatal errors
2023-12-28 19:43:48 +01:00
Nick Wellnhofer
c1bddd4c26
parser: Mark 'length' member of xmlParserInput as unused
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
955c177f69
parser: Stop using 'directory' struct member
...
This was only used as a pointless fallback for URI resolution.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
8cd563174a
html: Don't close fd in htmlCtxtReadFd
...
Long-standing bug. The XML fix from 2003 was never ported to the HTML
parser. htmlReadFd was fixed with fe6890e2.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
130436917c
parser: Rename xmlErrParser to xmlCtxtErr
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
8d0aaf4b95
parser: Remove xmlErrEncoding
...
Use xmlFatalErr or xmlCtxtErrIO.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
54c70ed57f
parser: Improve error handling
...
Introduce xmlCtxtSetErrorHandler allowing to set a structured error for
a parser context. There already was the "serror" SAX handler but this
always receives the parser context as argument.
Start to use xmlRaiseMemoryError.
Remove useless arguments from memory error functions. Rename
xmlErrMemory to xmlCtxtErrMemory.
Remove a few calls to xmlGenericError.
Remove support for runtime entity debugging.
2023-12-21 02:46:27 +01:00
Nick Wellnhofer
c2bbeed1fd
io: Fix memory lifetime issue with input buffers
...
xmlParserInputBufferCreateMem must make a copy of the buffer.
This fixes a regression from 2.11 which could cause reads from freed
memory depending on the use case.
Undeprecate xmlParserInputBufferCreateStatic which can avoid copying
the whole buffer.
2023-12-12 23:51:32 +01:00
Nick Wellnhofer
abd74186f9
html: Report malloc failures
...
Fix many places where malloc failures aren't reported.
Stop checking for ctxt->instate.
2023-12-11 22:13:06 +01:00
Nick Wellnhofer
c011e7605d
globals: Remove unused globals from thread storage
...
Setting these deprecated globals hasn't had an effect for a long time.
Make them constants. This reduces the size of per-thread storage from
~700 to ~250 bytes.
2023-12-06 20:07:54 +01:00
Nick Wellnhofer
c7629c9eb1
parser: Clarify documentation regarding xmlReadMemory buffer size
...
Fixes #638 .
2023-11-30 16:52:34 +01:00
Nick Wellnhofer
e395946194
html: Reenable buggy detection of XML declarations
...
Switch to UTF-8 if a document starts with '<?xm' to match old behavior.
Also enable this check in the push parser.
Fixes #637 .
2023-11-30 16:22:59 +01:00
Nick Wellnhofer
ff6c318862
include: Remove useless 'const' from function arguments
2023-11-23 15:27:00 +01:00
Nick Wellnhofer
b9db3d7d02
parser: Simplify xmlStringCurrentChar
...
Start to move away from using this function.
2023-09-22 19:01:11 +02:00
Nick Wellnhofer
8c084ebdc7
doc: Make apibuild.py happy
2023-09-21 22:57:33 +02:00
Nick Wellnhofer
c5890716a6
html: Fix logic in htmlAutoClose
...
Note that the function is never called with a NULL newtag.
Fixes #591 .
2023-09-21 17:01:35 +02:00
Nick Wellnhofer
9b5cce7a71
include: Remove more unnecessary includes
2023-09-21 01:50:53 +02:00
Nick Wellnhofer
11a1839ddd
globals: Move remaining globals back to correct header files
...
This undoes a lot of damage.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
4e1c13ebfd
debug: Remove debugging code
...
This is barely useful these days and only clutters the code base.
2023-09-19 17:35:09 +02:00
Nick Wellnhofer
e48f2695fe
parser: Remove push parser debugging code
2023-08-29 18:17:09 +02:00