599 Commits

Author SHA1 Message Date
Nick Wellnhofer
0c56eb8215 tree: Restore return value of xmlNodeListGetString with NULL list
When passing a NULL list to xmlNodeListGetString or
xmlNodeListGetRawString, return NULL instead of "" to match the old
behavior.

Fixes #783.
2024-08-12 21:38:50 +02:00
Nick Wellnhofer
5d36664fc9 memory: Deprecate xmlGcMemSetup 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
888f70c77e buf: Move xmlBuffer code to buf.c 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
a221cd7849 buf: Rework xmlBuf code
Always use what the old implementation called the "IO" allocation
scheme, allowing to move the content pointer past the initial
allocation. This is inexpensive and allows efficient shrinking.

Optimize xmlBufGrow, reusing shrunken memory as much as possible.

Simplify xmlBufAdd.

Make xmlBufBackToBuffer return an error on overflow.

Make "size" exclude the terminating NULL byte.

Always provide an initial size.

Reintroduce static buffers.

Remove xmlBufResize and several other functions.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
8d1606265d entities: Rework text escaping 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
de3221b179 fuzz: Adjust for xmlNodeParseContent changes
xmlStringGetNodeList returns NULL again for empty strings.
2024-07-06 15:33:06 +02:00
Nick Wellnhofer
944cc23c84 tree: Fix handling of empty strings in xmlNodeParseContent
We shouldn't create an empty text node to match the old behavior.

Fixes #759.
2024-07-03 16:07:10 +02:00
Nick Wellnhofer
842a044831 valid: Restore ID lookup
Revert a change from d025cfbb and don't overwrite ID table entries, so
that the first attribute will be returned if there are duplicate IDs.

This requires two other changes:

- Attributes in entity content are never added to the ID table. This
  seems reasonable.

- Remove the optimization to skip ID lookup when copying and the target
  document has an empty ID table. This also seems more correct since the
  document could have ID declarations nevertheless or we could be
  copying xml:ids into the document for the first time.

Fixes #757.
2024-07-03 11:46:06 +02:00
Nick Wellnhofer
f505dcaea0 tree: Remove underscores from xmlRegisterCallbacks 2024-06-27 14:45:35 +02:00
Nick Wellnhofer
b0fc67aa22 build: Remove --with-tree configuration option
This option would allow for a smaller, but mostly useless minimal build.
But it complicates the symbol availability logic in an insane way and
requires specialized tools like our custom C parser in doc/apibuild.py.

See #717.
2024-06-16 18:47:12 +02:00
Nick Wellnhofer
2f12809612 tree: Fix freeing entities via xmlFreeNode
Call xmlFreeEntity to free all entity members.

Fixes #731.
2024-06-14 16:44:09 +02:00
Nick Wellnhofer
5198de4b1d fuzz: Make allocation in xmlBuildQName more likely
Limit size of static buffer in fuzzing mode.
2024-05-31 13:42:08 +02:00
Nick Wellnhofer
e75e878e02 doc: Update and fix documentation 2024-05-20 14:23:39 +02:00
Nick Wellnhofer
b8597f46df tree: Handle predefined entities in xmlBufGetEntityRefContent
It's possible to create references to predefined entities using the tree
API. This edge case was exposed by making predefined entities const in
commit 63ce5f9a.
2024-04-30 16:05:42 +02:00
Nick Wellnhofer
619e2808b5 tree: Don't call xmlNewCharRef in xmlNodeParseContent
xmlNewCharRef also tries to handle strings like '&name;' but in
xmlNodeParseContentInternal, we really want to use the possibly invalid
name without modification. Otherwise, content like '&"' could
create a reference to a predefined entity.
2024-04-30 15:53:08 +02:00
Nick Wellnhofer
5e80f4381b tree: Deprecate xmlRegisterNodeDefault
This rarely used feature should be phased out.
2024-04-28 19:30:40 +02:00
Nick Wellnhofer
88169bfda6 tree: Deprecate xmlSetCompressMode 2024-04-28 19:30:39 +02:00
Niels Dossche
6053f1ff54 Remove redundant size check
The condition size > UINT_MAX - 10 is already checked earlier, so the
check is always false.
2024-04-19 15:33:40 +02:00
Nick Wellnhofer
fbea03f3d0 tree: Remove another redundant check in xmlDOMWrapCloneNode
The node type was already checked earlier.
2024-04-19 15:22:30 +02:00
Niels Dossche
1a865567d4 Remove redundant NULL check on cur
This variable is already NULL checked in the previous if condition.
2024-04-19 15:14:15 +02:00
Niels Dossche
6fadd7980a Remove always-false check old == cur
This case is already checked at the start of the function.
There it returns NULL, which seems more correct.
2024-04-19 15:14:15 +02:00
Niels Dossche
2766520062 Remove redundant NULL check on cur
cur = node, and node cannot be NULL as it is checked at the start of the
function.
2024-04-19 15:12:56 +02:00
Nick Wellnhofer
a0341ac8e9 tree: Don't return empty localname in xmlSplitQName{2,3}
Match the behavior of xmlSplitQName and xmlSplitQName4.
2024-04-18 12:11:13 +02:00
Nick Wellnhofer
5c55332591 Revert "tree: Only allow elements in xmlDocSetRootElement"
This reverts commit 4b698dbaec9bc6775fc8341ef8a3f0d8321f8548.

lxml assumes that xmlDocSetRootElement works with non-elements.
2024-03-29 15:29:53 +01:00
Nick Wellnhofer
7c5daa3763 tree: Ignore namespace with NULL href in xmlSearchNs
Some users set href to NULL to unset a namespace without deleting it.

Also change the duplicate check in xmlNewNs which must agree with
xmlSearchNs.

Short-lived regression from f960c60d.
2024-03-29 15:28:47 +01:00
Nick Wellnhofer
f43197fca7 tree: Don't coalesce text nodes in xmlAdd{Prev,Next}Sibling
Commit 9e1c72da from 2001 introduced a bug where xmlAddPrevSibling and
xmlAddNextSibling would only try to merge text nodes with one of its
new siblings. Commit 4ccd3eb8 fixed this bug but unfortunately, lxml
and possibly other downstream code depend on text nodes not being
merged.

To avoid breaking downstream code while still having somewhat
consistent API behavior, it's probably best to make these functions
never coalesce text nodes.
2024-03-29 14:21:11 +01:00
Nick Wellnhofer
2a713a8091 tree: Document behavior if xmlSetTreeDoc fails 2024-03-29 12:57:20 +01:00
Nick Wellnhofer
f1e9c7bdf1 tree: Optimize xmlInsertNode
Relink the node directly without calling xmlUnlinkNodeInternal.
2024-03-29 12:57:20 +01:00
Nick Wellnhofer
ea0ee36546 tree: Align xmlAddChild with other node insertion functions
Make xmlAddChild unlink the child before insertion. Originally, linked
children would most likely cause tree corruption. The first fix
disallowed linked nodes, but there are cases where insertion of such
nodes could succeed.

Don't abort if the node is already a child of parent. In this case,
the node will be moved to the end of the child list.
2024-03-29 12:57:20 +01:00
Nick Wellnhofer
e5cdb23f10 tree: Introduce xmlUnlinkNodeInternal
xmlUnlinkNode also removes references to DTD nodes which shouldn't be
done when moving nodes within a document. Introduce a new function
xmlUnlinkNodeInternal which only unlinks a node from the tree.
Remove references to DTD nodes in xmlNodeSetDoc. Note that moving
element and attribute declarations to another document will still leave
references in the source document.
2024-03-29 12:56:56 +01:00
Nick Wellnhofer
23a81841d2 tree: Work on documentation 2024-03-25 20:51:14 +01:00
Nick Wellnhofer
ad9a5637f9 tree: Fix uninitialized value in xmlSearchNsSafe
Short-lived regression.
2024-03-22 19:37:12 +01:00
Nick Wellnhofer
7b316c1139 tree: Fix uninitialized value in xmlSearchNsByHrefSafe
Short-lived regression.
2024-03-22 12:15:23 +01:00
Nick Wellnhofer
3f05508a53 tree: Report malloc failures in attribute setters 2024-03-18 15:14:43 +01:00
Nick Wellnhofer
6a49bb777c tree: Introduce xmlSearchNsSafe
After the failed experiment with a static XML namespace, introduce
versions of xmlSearchNs that report malloc failures.

Optimize the no-document case by only adding the XML namespace
declaration if it wasn't found in an ancestor.
2024-03-17 21:07:46 +01:00
Nick Wellnhofer
047ea3ecb3 Revert "tree: Allocate XML namespace statically"
This reverts commit 2840e33c5e4b51589a0b96e8102638eeaea6df72.
2024-03-17 21:04:40 +01:00
Nick Wellnhofer
2469d5d065 tree: Tighten source doc check in xmlDOMWrapAdoptNode
sourceDoc must match even if node->doc is NULL.
2024-03-15 19:54:27 +01:00
Nick Wellnhofer
37556eb32a tree: Check destParent->doc in xmlDOMWrapCloneNode
The document must match destDoc to avoid tree corruption.
2024-03-15 19:54:27 +01:00
Nick Wellnhofer
7c48c01b1c tree: Switch to xmlNodeSetDoc in xmlDOMWrapAdoptNode
Report malloc failures.

Also fixes an issue where xmlDOMWrapAdoptAttr would descend into entity
references.
2024-03-15 19:54:27 +01:00
Nick Wellnhofer
be2c26fb67 tree: Fix tree iteration in xmlDOMWrapRemoveNode
We didn't descend into elements having attributes.
2024-03-15 19:54:27 +01:00
Nick Wellnhofer
4a90ce089c tree: Don't abort early if malloc fails in DOM functions
If malloc fails halfway through updating a subtree, we must process the
rest of the tree to avoid tree corruption.
2024-03-15 19:54:27 +01:00
Nick Wellnhofer
ad019ba102 tree: Fix reallocation in xmlDOMWrapNSNormAddNsMapItem2 2024-03-15 19:54:27 +01:00
Nick Wellnhofer
e321eba0c7 tree: Set parent->last early in xmlDOMWrapCloneNode
Avoids a corrupted tree in error case.
2024-03-15 19:54:27 +01:00
Nick Wellnhofer
84e6dc9e5b tree: Declare namespace on clone in xmlDOMWrapCloneNode
The new namespace must be declared on the cloned node, not the source
node.
2024-03-15 19:54:27 +01:00
Nick Wellnhofer
09905670f4 tree: Don't free linked DOM namespaces in error case 2024-03-15 19:54:27 +01:00
Nick Wellnhofer
27f07f1002 tree: Report malloc failure in xmlDOMWrapCloneNode
Also don't store text content in dictionaries.
2024-03-15 19:54:26 +01:00
Nick Wellnhofer
8d04f0eea0 tree: Refactor text node updates 2024-03-15 19:54:26 +01:00
Nick Wellnhofer
4ccd3eb80f tree: Refactor node insertion
Also fixes a text coalescing bug.
2024-03-15 19:54:26 +01:00
Nick Wellnhofer
9f049afa6d tree: Refactor element creation and parsing of attribute values
Replace xmlStringGetNodeList and xmlStringLenGetNodeList with
xmlNodeParseContentInternal which also updates an optional parent
node.

Don't look up entities a second time via xmlNewReference.
2024-03-15 19:54:26 +01:00
Nick Wellnhofer
9991fae4f4 tree: Simplify xmlNodeGetContent, xmlBufGetNodeContent
Factor out xmlBufGetEntityRefContent and xmlBufGetChildContent.

Also allow entity declarations.

Optimize single text children.

Ignore missing or recursive entities silently.

Prefer xmlNodeGetContent over xmlNodeListGetString.

Check for entity cycles in xmlBufGetNodeContent.

Use children pointer of entity reference nodes if available to look up
entities.
2024-03-15 19:47:08 +01:00