330 Commits

Author SHA1 Message Date
Nick Wellnhofer
1082d813e8 parser: Prepare to make decompression opt-in
Add a new parser option XML_PARSE_UNZIP that enables decompression.
xmlReadFile, xmlCtxtReadFile and xmlCreateURLParserCtxt always set
this option currently, but downstream users should start to set the
option if they really need it.
2025-01-29 00:49:57 +01:00
Nick Wellnhofer
a78843be5e xmllint: Support compressed input from stdin
Another regression related to reading from stdin.

Making a "-" filename read from stdin was deeply baked into the core
IO code but is inherently insecure. I really want to reenable this
dangerous feature as sparingly as possible.

This now enables compressed input when using the "Fd" API functions
which wan't supported before. But XML_PARSE_NO_UNZIP will be
inverted later.

Allow compressed stdin in xmlReadFile to support xmlstarlet and older
versions of xsltproc. So far, these are the only known command-line
tools that rely on "-" meaning stdin.
2025-01-28 23:20:37 +01:00
Nick Wellnhofer
e95c4b07ae fuzz: Also test xmllint --repeat option 2025-01-23 20:30:40 +01:00
Nick Wellnhofer
dc6270d110 xmllint: Fix UAF with --push --repeat
Short-lived regression. Fixes #841.
2025-01-23 20:30:25 +01:00
Nick Wellnhofer
1c82bca6bd xmllint: Improve error reports from reader 2025-01-17 23:29:30 +01:00
Nick Wellnhofer
16286dea31 xmllint: Fix memory leak in parseAndPrintFile 2025-01-17 23:14:44 +01:00
Nick Wellnhofer
9cfc723cad xmllint: Always reuse parser context
Also move push parsing into parseXml which makes "--sax --push" work.
2025-01-17 21:42:35 +01:00
Nick Wellnhofer
00167cae33 xmllint: Report OOM errors to stderr
For the validators, some work still has to be done, but for core
features, xmllint should now report OOM errors reliably.
2025-01-17 20:06:45 +01:00
Nick Wellnhofer
67b738d9a5 fuzz: Check whether xmllint reports malloc failures correctly
This relies on xmllint's "maxmem" option.
2025-01-17 20:06:45 +01:00
Nick Wellnhofer
bfe6af2eed fuzz: Remove hacks to build lint fuzzer
Don't include source file directly.
2025-01-17 20:06:45 +01:00
Nick Wellnhofer
bf1d8b9cfb xmllint: Report malloc failures from parsing patterns 2025-01-17 20:06:45 +01:00
Nick Wellnhofer
255fd5f3f1 xmllint: Store error stream in global state 2025-01-17 20:06:45 +01:00
Nick Wellnhofer
e42ded421c xmllint: Stop using global variables
The only exception is "maxmem". The custom malloc functions don't
support an extra context.
2025-01-17 20:06:45 +01:00
Nick Wellnhofer
d39e5714b0 xmllint: Fix memory leak in parseFile
Short-lived regression.
2025-01-17 20:05:56 +01:00
Nick Wellnhofer
0f4d36e055 xmllint: Fix memory leak in error case 2025-01-17 13:13:54 +01:00
Nick Wellnhofer
86401cc3d2 xmllint: Make --shell ignore some other options
When the shell should be launched with the --shell option, don't
post-validate, stream or dump the document. Ignore the --repeat option.
2025-01-07 19:10:19 +01:00
Nick Wellnhofer
c0c69cb868 xmllint: Always reuse parser context
Simplifies "repeat" logic.
2025-01-07 18:55:35 +01:00
Nick Wellnhofer
a5be2cc303 xmllint: Support --xpath --debug
Dump compiled expression if --debug was supplied.
2025-01-06 19:14:28 +01:00
Nick Wellnhofer
f22707f42b xmllint: Use xmlXPathOrderDocElems for XPath queries 2025-01-06 19:14:21 +01:00
Nick Wellnhofer
169857ad26 xmllint: Check return value of htmlNewParserCtxt 2024-12-13 18:07:03 +01:00
Nick Wellnhofer
1dc5e50a8e catalog: Only use XML_SYSCONFDIR if catalogs are enabled 2024-11-21 23:43:23 +01:00
Nick Wellnhofer
a5764b56d2 build: Define XML_SYSCONFDIR in config.h
Rename SYSCONFDIR macro to XML_SYSCONFDIR.

Use AX_RECURSIVE_EVAL with Autotools. This is GPL v2 with Autoconf
exception which shouldn't be a problem.

Finally support meson.
2024-11-21 22:44:02 +01:00
Nick Wellnhofer
a4c16a140c xmllint: Improve --memory and --testIO options
Support --memory and --testIO in SAX mode.

Keep memory-mapped file across repetitions.

Options `--sax --memory --noout --repeat` can now be used to benchmark
the core parser without building a DOM tree or repeatedly reading files
from disk.
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
3ac214f01e xmllint: Support --html --sax 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
d67833a3c5 xmllint: Use proper type to store seconds since epoch
Should avoid year 2038 problem.

Fixes #801.
2024-09-26 19:34:34 +02:00
Nick Wellnhofer
8ad618d2d6 doc: Document all xmllint options
Remove --pushsmall.

Fixes #785.
2024-08-28 22:03:30 +02:00
Nick Wellnhofer
3ef6661175 build: Rework mmap checks
Switch to AC_CHECK_DECLS/check_symbol_exists. Don't check for
sys/mman.h separately. Don't check for munmap.
2024-07-22 17:03:27 +02:00
Nick Wellnhofer
8af55c8d20 parser: Rename new input API functions
These weren't made public yet.
2024-07-11 01:33:29 +02:00
Nick Wellnhofer
37f7237050 xmllint: Fix unsigned integer overflow
Short-lived regression.
2024-07-01 18:03:06 +02:00
Nick Wellnhofer
71eb710914 xmllint: Switch to xmlCtxtSetErrorHandler 2024-06-27 14:44:55 +02:00
Nick Wellnhofer
5589c9ea6f xmllint: Set stdin/stdout to binary on Windows 2024-06-22 21:23:15 +02:00
Nick Wellnhofer
84a4f84c1c build: Don't check for required headers and functions
Unless we are on Windows, the following POSIX headers are required.
They're part of the earliest POSIX specs and it doesn't make sense to
check for them.

- fcntl.h
- unistd.h
- sys/stat.h
- sys/time.h

On Windows, io.h, fcntl.h and sys/stat.h are always available.
2024-06-22 18:41:00 +02:00
Nick Wellnhofer
f23fc4faed xmllint: Simplify time handling
Assume that gettimeofday is always available.
2024-06-22 18:41:00 +02:00
Rosen Penev
2def7b4b28 clang-tidy: move assignments out of if
Found with bugprone-assignment-in-if-condition

Signed-off-by: Rosen Penev <rosenp@gmail.com>
2024-06-20 21:11:44 -07:00
Nick Wellnhofer
1341deac13 xmllint: Move shell to xmllint
Move source code for xmllint shell to shell.c and move it from the
libxml2 library to the xmllint executable.

Also allow shell to run without XPath and debug modules.

Add stubs for old shell API functions in legacy build mode.
2024-06-16 18:47:12 +02:00
Nick Wellnhofer
c9b065914f xmllint: Fix resetting error in xmlHTMLPrintError
Make sure that we don't change the error handler when fuzzing.
2024-06-16 18:47:12 +02:00
Nick Wellnhofer
b0fc67aa22 build: Remove --with-tree configuration option
This option would allow for a smaller, but mostly useless minimal build.
But it complicates the symbol availability logic in an insane way and
requires specialized tools like our custom C parser in doc/apibuild.py.

See #717.
2024-06-16 18:47:12 +02:00
Nick Wellnhofer
0c97eaa772 xmllint: Rewrite HTML error output 2024-06-13 16:57:52 +02:00
Nick Wellnhofer
dba1ed85a3 ftp: Remove FTP support
Remove the built-in FTP client. If you configure --with-legacy, old
symbols are retained for ABI compatibility.
2024-06-12 18:19:55 +02:00
Nick Wellnhofer
5238404325 parser: Pass resource type to resource loader 2024-06-12 16:36:12 +02:00
Nick Wellnhofer
f96dca9c0e xmllint: Switch to resource loader 2024-06-12 16:36:12 +02:00
Nick Wellnhofer
e2919516bc xmllint: Fix build --with-valid --without-html 2024-06-06 19:28:23 +02:00
Nick Wellnhofer
caa8bb3848 fuzz: Move back to xmlSetExternalEntityLoader
xmlParserInputBufferCreateFilenameDefault can't report malloc failures.
2024-05-19 19:39:22 +02:00
Nick Wellnhofer
b3cb41be8b fuzz: Add xmllint fuzzer 2024-05-13 12:50:08 +02:00
Nick Wellnhofer
3dea98eff9 xmllint: Don't free DTD with --dropdtd
Entity references point to entities in the DTD, so only unlink the DTD
and don't destroy it.
2024-05-13 12:50:08 +02:00
Nick Wellnhofer
3ad7f81624 [CVE-2024-34459] Fix buffer overread with xmllint --htmlout
Add a missing bounds check.

Fixes #720.
2024-05-13 12:50:08 +02:00
Nick Wellnhofer
c83147bff2 xmllint: Fix --pedantic option
Regressed in 74c84a8c.
2024-05-13 12:50:08 +02:00
Nick Wellnhofer
3665d667f6 xmllint: Clean up option handling
Remove unnecessary globals and make some local.

Remove unnecessary calls to xmlTextReaderSetParserProp.

Remove unused "oldout" code.

Fix skipArgs.
2024-05-13 12:50:08 +02:00
Nick Wellnhofer
f8ff4d8688 xmllint: Rework parsing
Merge a few code paths, making options like --valid or --htmlout work
with some other options.

Improve error handling.
2024-05-07 17:11:18 +02:00
Nick Wellnhofer
3afaff7e8e xmllint: Check for NULL input in xmlHTMLValidityError
`ctxt->input` can be NULL after commit 61b4c42f.
2024-05-06 17:36:17 +02:00