7065 Commits

Author SHA1 Message Date
Nick Wellnhofer
0f4f89005d parser: Rename inputPush to xmlCtxtPushInput 2024-11-19 00:25:23 +01:00
Nick Wellnhofer
e2ad249c23 parser: Deprecate more internal symbols
- xmlParseExternalSubset
- xmlPushInput
- xmlPopInput
- xmlCopyCharMultiByte
- xmlCreateEntityParserCtxt
- xmlStringComment
2024-11-19 00:25:23 +01:00
Nick Wellnhofer
2fcdc5f7e7 globals: More comments on future directions 2024-11-19 00:08:39 +01:00
Nick Wellnhofer
4d1f35b0a9 valid: Deprecate more internal functions 2024-11-19 00:03:37 +01:00
Nick Wellnhofer
de0c779116 fuzz: Switch to xmlCtxtValidateDocument
This allows to check malloc failure reports during post-validation.
2024-11-19 00:03:37 +01:00
Nick Wellnhofer
5a51f08517 valid: Implement xmlCtxtValidateDocument
This allows to use the error handler or resource loader of a parser
context.
2024-11-19 00:03:37 +01:00
Nick Wellnhofer
1e1731a43d valid: Add NULL check in xmlCtxtValidateDtd 2024-11-17 13:20:06 +01:00
Nick Wellnhofer
631778f679 parser: Check for malloc failure in xmlCtxtParseDtd 2024-11-17 12:11:41 +01:00
Nick Wellnhofer
7f8c436c75 parser: Implement xmlCtxtParseDtd and xmlCtxtValidateDtd
This allows to use the context's error handler, options and other
settings.

Fixes #808.
2024-11-15 16:30:52 +01:00
Nick Wellnhofer
764b8086d1 tests: Fix sanitizer version check on old Apple clang
See #669.
2024-11-13 20:22:32 +01:00
Nick Wellnhofer
b57e022d75 build: Check for icu-uc instead of icu-i18n
This should be the ICU component we actually need.
2024-11-13 19:10:45 +01:00
Ruslan Garipov
aaecdc92e2
parser: Assign value without if-statement
This avoids an if-statement, because effectively it does nothing.  And,
for example, binary artifact generated by GCC with -O2 optimization
settings does not contain that if-statement -- the code just uses the
hprefix->name field explicitly.

No functional changes intended.

Signed-off-by: Ruslan Garipov <ruslanngaripov@gmail.com>
2024-11-12 16:42:36 +05:00
Nick Wellnhofer
1e4d8c55f0 xmlIO: Fix reading from non-regular files like pipes
Commit 7e14c05d removed unnecessary copying of uncompressed input
through zlib or xzlib. This broke input from non-regular files like
pipes which can't be reopened. Try to detect such files by checking
whether they're seekable and always pipe them through zlib or xzlib.

Also remove seemingly unnecessary calls to gzread and gzrewind to
support unseekable files.

Fixes https://gitlab.gnome.org/GNOME/libxslt/-/issues/124.
2024-11-06 16:49:53 +01:00
Nick Wellnhofer
459146140a xpath: Fix parsing of non-ASCII names
Fix a long-standing issue where QNames starting with a non-ASCII
character would be rejected. This became more visible after "streaming"
XPath evaluation was disabled since the latter handled non-ASCII names
correctly.

Fixes #818.
2024-11-05 12:30:44 +01:00
Nick Wellnhofer
9201173c5a xmlreader: Fix return value of xmlTextReaderReadString
Return NULL if the node has no children or the children were already
deleted to match the 2.12 behavior.

Fixes #817.
2024-11-05 11:41:28 +01:00
Nick Wellnhofer
869e3fd421 parser: Fix loading of parameter entities in external DTDs
Regressed with commit 12f0bb94.

Fixes #816.
2024-11-01 16:53:18 +01:00
Nick Wellnhofer
36117723d4 Update README 2024-10-31 17:52:42 +01:00
Nick Wellnhofer
467f444544 SAX2: Add NULL check for ctxt->myDoc 2024-10-30 14:13:38 +01:00
Nick Wellnhofer
efb57ddba3 parser: Fix downstream code that swaps DTDs
Downstream code like the nginx xslt module can change the document's DTD
pointers in a SAX callback. If an entity from a separate DTD is parsed
lazily, its content must not reference the current document.

Regressed with commit d025cfbb.

Fixes #815.
2024-10-30 14:13:38 +01:00
Nick Wellnhofer
0ec5687e06 parser: Rework xmlCtxtGrowAttrs
Remove unneeded argument.

Check for integer overflow. We probably hit the buffer size limit in
xmlParserGrow before, but better be safe.
2024-10-28 21:06:52 +01:00
Nick Wellnhofer
ffb058f484 parser: Fix detection of duplicate attributes
We really need a second scan if more than one namespace clash was
detected.
2024-10-28 20:26:55 +01:00
Nick Wellnhofer
89b9f45711 entities: Allow control chars when serializing HTML 2024-10-25 18:02:58 +02:00
Nick Wellnhofer
b52a3044aa parser: Use counted_by attribute if supported
We only have a single struct with a flexible array member.
2024-10-24 18:18:47 +02:00
Nick Wellnhofer
944e5fe8df nanohttp: Fix another stdout file descriptor 2024-10-23 16:46:03 +02:00
Nick Wellnhofer
607ada90b8 nanohttp: Fix stdout file descriptor
Fixes #813.
2024-10-23 14:19:01 +02:00
Nick Wellnhofer
b7c0f9d2dd string: Fix va_copy fallback
Fix va_copy fallback reworked in 5cffba83.

Should fix #812.
2024-10-19 14:53:25 +02:00
Nick Wellnhofer
a870088f94 xpath: Hide internal sort functions 2024-10-19 14:53:25 +02:00
Yegor Yefremov
513949293d python/tests: fix typos
Typos were found with codespell.
2024-10-15 11:11:38 +02:00
Nick Wellnhofer
f9a6469a47 Update NEWS 2024-10-14 16:15:11 +02:00
Satadru Pramanik
c7b2786676 Avoid Python 'licence' distribution option is deprecated; use 'license' error 2024-10-12 11:55:50 +00:00
Nick Wellnhofer
bf3619c328 fuzz: Don't unlink DTD when replacing nodes
OP_XML_REPLACE_NODE needs the same check as OP_XML_UNLINK_NODE.
2024-10-10 12:14:47 +02:00
Nick Wellnhofer
a4c16a140c xmllint: Improve --memory and --testIO options
Support --memory and --testIO in SAX mode.

Keep memory-mapped file across repetitions.

Options `--sax --memory --noout --repeat` can now be used to benchmark
the core parser without building a DOM tree or repeatedly reading files
from disk.
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
3ac214f01e xmllint: Support --html --sax 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
225ed70737 html: Accelerate htmlParseCharData 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
74dfc49b5f parser: Clarify logic in xmlParseStartTag2 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
207999793f html: Handle numeric character references directly 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0bc4608c50 html: Use hash table to check for duplicate attributes 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
24a6149fc4 html: Make sure that character data mode is reset 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
c32397d51f html: Improve character class macros 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
e840655414 html: Rewrite parsing of most data 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
f77ec16db0 html: Optimize htmlParseCharData 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
440bd64c69 html: Optimize htmlParseHTMLName 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
c34d0ae9cc html: Deprecate htmlIsBooleanAttr 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
6040785ac4 html: Deprecate AutoClose API 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
188cad68a4 html: Remove obsolete content model 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0144f662d7 html: Remove obsolete code 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
0ce7bfe559 html: Try to avoid passing XML options to HTML parser 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
76cc63942a test: Fix XML_PARSE_HTML constant 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
575be6c1f1 html: Fix line numbers with CRs 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
be874d7831 html: Ignore unexpected DOCTYPE declarations 2024-10-06 20:04:00 +02:00