<p>HTMLparser - interface for an HTML 4.0 non-verifying parser</p>
<p>this module implements an HTML 4.0 non-verifying parser with API compatible with the XML parser ones. It should be able to parse "real world" HTML, even if severely broken from a specification point of view. </p>
int <ahref="#htmlParseChunk">htmlParseChunk</a> (<ahref="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br/> const char * chunk, <br/> int size, <br/> int terminate);
int <ahref="#htmlEncodeEntities">htmlEncodeEntities</a> (unsigned char * out, <br/> int * outlen, <br/> const unsigned char * in, <br/> int * inlen, <br/> int quoteChar);
<ahref="libxml2-HTMLparser.html#htmlStatus">htmlStatus</a><ahref="#htmlNodeStatus">htmlNodeStatus</a> (const <ahref="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> node, <br/> int legacy);
int <ahref="#htmlCtxtUseOptions">htmlCtxtUseOptions</a> (<ahref="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br/> int options);
</pre><p>Returns the default subelement for this element</p><divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>elt</tt></i>:</span></td><td>HTML element</td></tr></tbody></table></div>
</pre><p>Checks whether an HTML element description may be a direct child of the specified element. Returns 1 if allowed; 0 otherwise.</p><divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>parent</tt></i>:</span></td><td>HTML parent element</td></tr><tr><td><spanclass="term"><i><tt>elt</tt></i>:</span></td><td>HTML element</td></tr></tbody></table></div>
</pre><p>Returns the attributes required for the specified element.</p><divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>elt</tt></i>:</span></td><td>HTML element</td></tr></tbody></table></div>
<aname="HTML_NA">HTML_NA</a> = 0 /* something we don't check at all */
<aname="HTML_INVALID">HTML_INVALID</a> = 1
<aname="HTML_DEPRECATED">HTML_DEPRECATED</a> = 2
<aname="HTML_VALID">HTML_VALID</a> = 4
<aname="HTML_REQUIRED">HTML_REQUIRED</a> = 12 /* VALID bit set so ( & HTML_VALID ) is TRUE */
};
</pre><p/>
</div>
<hr/>
<divclass="refsect2"lang="en"><h3><aname="UTF8ToHtml"/>UTF8ToHtml ()</h3><preclass="programlisting">int UTF8ToHtml (unsigned char * out, <br/> int * outlen, <br/> const unsigned char * in, <br/> int * inlen)<br/>
</pre><p>Take a block of UTF-8 chars in and try to convert it to an ASCII plus HTML entities block of chars out.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>out</tt></i>:</span></td><td>a pointer to an array of bytes to store the result</td></tr><tr><td><spanclass="term"><i><tt>outlen</tt></i>:</span></td><td>the length of @out</td></tr><tr><td><spanclass="term"><i><tt>in</tt></i>:</span></td><td>a pointer to an array of UTF-8 chars</td></tr><tr><td><spanclass="term"><i><tt>inlen</tt></i>:</span></td><td>the length of @in</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>0 if success, -2 if the transcoding fails, or -1 otherwise The value of @inlen after return is the number of octets consumed as the return value is positive, else unpredictable. The value of @outlen after return is the number of octets consumed.</td></tr></tbody></table></div></div>
</pre><p>Checks whether an <ahref="libxml2-SAX.html#attribute">attribute</a> is valid for an element Has full knowledge of Required and Deprecated attributes</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>elt</tt></i>:</span></td><td>HTML element</td></tr><tr><td><spanclass="term"><i><tt>attr</tt></i>:</span></td><td>HTML <ahref="libxml2-SAX.html#attribute">attribute</a></td></tr><tr><td><spanclass="term"><i><tt>legacy</tt></i>:</span></td><td>whether to allow deprecated attributes</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>one of HTML_REQUIRED, HTML_VALID, HTML_DEPRECATED, <ahref="libxml2-HTMLparser.html#HTML_INVALID">HTML_INVALID</a></td></tr></tbody></table></div></div>
</pre><p>The HTML DTD allows a tag to implicitly close other tags. The list is kept in htmlStartClose array. This function checks if the element or one of it's children would autoclose the given tag.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>doc</tt></i>:</span></td><td>the HTML document</td></tr><tr><td><spanclass="term"><i><tt>name</tt></i>:</span></td><td>The tag name</td></tr><tr><td><spanclass="term"><i><tt>elem</tt></i>:</span></td><td>the HTML element</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>1 if autoclose, 0 otherwise</td></tr></tbody></table></div></div>
<hr/>
<divclass="refsect2"lang="en"><h3><aname="htmlCreateMemoryParserCtxt"/>htmlCreateMemoryParserCtxt ()</h3><preclass="programlisting"><ahref="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> htmlCreateMemoryParserCtxt (const char * buffer, <br/> int size)<br/>
</pre><p>Create a parser context for an HTML in-memory document.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>buffer</tt></i>:</span></td><td>a pointer to a char array</td></tr><tr><td><spanclass="term"><i><tt>size</tt></i>:</span></td><td>the size of the array</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the new parser context or NULL</td></tr></tbody></table></div></div>
</pre><p>Create a parser context for using the HTML parser in push mode The value of @filename is used for fetching external entities and error/warning reports.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>sax</tt></i>:</span></td><td>a SAX handler</td></tr><tr><td><spanclass="term"><i><tt>user_data</tt></i>:</span></td><td>The user data returned on SAX callbacks</td></tr><tr><td><spanclass="term"><i><tt>chunk</tt></i>:</span></td><td>a pointer to an array of chars</td></tr><tr><td><spanclass="term"><i><tt>size</tt></i>:</span></td><td>number of chars in the array</td></tr><tr><td><spanclass="term"><i><tt>filename</tt></i>:</span></td><td>an optional file name or URI</td></tr><tr><td><spanclass="term"><i><tt>enc</tt></i>:</span></td><td>an optional encoding</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the new parser context or NULL</td></tr></tbody></table></div></div>
</pre><p>parse an XML in-memory document and build a tree. This reuses the existing @ctxt parser context</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>cur</tt></i>:</span></td><td>a pointer to a zero terminated string</td></tr><tr><td><spanclass="term"><i><tt>URL</tt></i>:</span></td><td>the base URL to use for the document</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an XML from a file descriptor and build a tree. This reuses the existing @ctxt parser context</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>fd</tt></i>:</span></td><td>an open file descriptor</td></tr><tr><td><spanclass="term"><i><tt>URL</tt></i>:</span></td><td>the base URL to use for the document</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an XML file from the filesystem or the network. This reuses the existing @ctxt parser context</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>filename</tt></i>:</span></td><td>a file or URL</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an HTML document from I/O functions and source and build a tree. This reuses the existing @ctxt parser context</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>ioread</tt></i>:</span></td><td>an I/O read function</td></tr><tr><td><spanclass="term"><i><tt>ioclose</tt></i>:</span></td><td>an I/O close function</td></tr><tr><td><spanclass="term"><i><tt>ioctx</tt></i>:</span></td><td>an I/O handler</td></tr><tr><td><spanclass="term"><i><tt>URL</tt></i>:</span></td><td>the base URL to use for the document</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an XML in-memory document and build a tree. This reuses the existing @ctxt parser context</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>buffer</tt></i>:</span></td><td>a pointer to a char array</td></tr><tr><td><spanclass="term"><i><tt>size</tt></i>:</span></td><td>the size of the array</td></tr><tr><td><spanclass="term"><i><tt>URL</tt></i>:</span></td><td>the base URL to use for the document</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr></tbody></table></div></div>
<hr/>
<divclass="refsect2"lang="en"><h3><aname="htmlCtxtUseOptions"/>htmlCtxtUseOptions ()</h3><preclass="programlisting">int htmlCtxtUseOptions (<ahref="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br/> int options)<br/>
</pre><p>Applies the options to the parser context</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>0 in case of success, the set of unknown or unimplemented options in case of error.</td></tr></tbody></table></div></div>
</pre><p>Checks whether an HTML element may be a direct child of a parent element. and if so whether it is valid or deprecated.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>parent</tt></i>:</span></td><td>HTML parent element</td></tr><tr><td><spanclass="term"><i><tt>elt</tt></i>:</span></td><td>HTML element</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>one of HTML_VALID, HTML_DEPRECATED, <ahref="libxml2-HTMLparser.html#HTML_INVALID">HTML_INVALID</a></td></tr></tbody></table></div></div>
<hr/>
<divclass="refsect2"lang="en"><h3><aname="htmlEncodeEntities"/>htmlEncodeEntities ()</h3><preclass="programlisting">int htmlEncodeEntities (unsigned char * out, <br/> int * outlen, <br/> const unsigned char * in, <br/> int * inlen, <br/> int quoteChar)<br/>
</pre><p>Take a block of UTF-8 chars in and try to convert it to an ASCII plus HTML entities block of chars out.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>out</tt></i>:</span></td><td>a pointer to an array of bytes to store the result</td></tr><tr><td><spanclass="term"><i><tt>outlen</tt></i>:</span></td><td>the length of @out</td></tr><tr><td><spanclass="term"><i><tt>in</tt></i>:</span></td><td>a pointer to an array of UTF-8 chars</td></tr><tr><td><spanclass="term"><i><tt>inlen</tt></i>:</span></td><td>the length of @in</td></tr><tr><td><spanclass="term"><i><tt>quoteChar</tt></i>:</span></td><td>the quote character to escape (' or ") or zero.</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>0 if success, -2 if the transcoding fails, or -1 otherwise The value of @inlen after return is the number of octets consumed as the return value is positive, else unpredictable. The value of @outlen after return is the number of octets consumed.</td></tr></tbody></table></div></div>
</pre><p>Free all the memory used by a parser context. However the parsed document in ctxt->myDoc is not freed.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr></tbody></table></div></div>
</pre><p>Set and return the previous value for handling HTML omitted tags.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>val</tt></i>:</span></td><td>int 0 or 1</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the last value for 0 for no handling, 1 for auto insertion.</td></tr></tbody></table></div></div>
</pre><p>The HTML DTD allows a tag to implicitly close other tags. The list is kept in htmlStartClose array. This function checks if a tag is autoclosed by one of it's child</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>doc</tt></i>:</span></td><td>the HTML document</td></tr><tr><td><spanclass="term"><i><tt>elem</tt></i>:</span></td><td>the HTML element</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>1 if autoclosed, 0 otherwise</td></tr></tbody></table></div></div>
</pre><p>Check if an <ahref="libxml2-SAX.html#attribute">attribute</a> is of content type Script</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>name</tt></i>:</span></td><td>an <ahref="libxml2-SAX.html#attribute">attribute</a> name</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>1 is the <ahref="libxml2-SAX.html#attribute">attribute</a> is a script 0 otherwise</td></tr></tbody></table></div></div>
</pre><p>Allocate and initialize a new parser context.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the <ahref="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> or NULL in case of allocation error</td></tr></tbody></table></div></div>
<divclass="refsect2"lang="en"><h3><aname="htmlNodeStatus"/>htmlNodeStatus ()</h3><preclass="programlisting"><ahref="libxml2-HTMLparser.html#htmlStatus">htmlStatus</a> htmlNodeStatus (const <ahref="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> node, <br/> int legacy)<br/>
</pre><p>Checks whether the tree node is valid. Experimental (the author only uses the HTML enhancements in a SAX parser)</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>node</tt></i>:</span></td><td>an <ahref="libxml2-HTMLparser.html#htmlNodePtr">htmlNodePtr</a> in a tree</td></tr><tr><td><spanclass="term"><i><tt>legacy</tt></i>:</span></td><td>whether to allow deprecated elements (YES is faster here for Element nodes)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>for Element nodes, a return from <ahref="libxml2-HTMLparser.html#htmlElementAllowedHere">htmlElementAllowedHere</a> (if legacy allowed) or <ahref="libxml2-HTMLparser.html#htmlElementStatusHere">htmlElementStatusHere</a> (otherwise). for Attribute nodes, a return from <ahref="libxml2-HTMLparser.html#htmlAttrAllowed">htmlAttrAllowed</a> for other nodes, <ahref="libxml2-HTMLparser.html#HTML_NA">HTML_NA</a> (no checks performed)</td></tr></tbody></table></div></div>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the value parsed (as an int)</td></tr></tbody></table></div></div>
<hr/>
<divclass="refsect2"lang="en"><h3><aname="htmlParseChunk"/>htmlParseChunk ()</h3><preclass="programlisting">int htmlParseChunk (<ahref="libxml2-HTMLparser.html#htmlParserCtxtPtr">htmlParserCtxtPtr</a> ctxt, <br/> const char * chunk, <br/> int size, <br/> int terminate)<br/>
</pre><p>Parse a Chunk of memory</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>chunk</tt></i>:</span></td><td>an char array</td></tr><tr><td><spanclass="term"><i><tt>size</tt></i>:</span></td><td>the size in byte of the chunk</td></tr><tr><td><spanclass="term"><i><tt>terminate</tt></i>:</span></td><td>last chunk indicator</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>zero if no error, the <ahref="libxml2-xmlerror.html#xmlParserErrors">xmlParserErrors</a> otherwise.</td></tr></tbody></table></div></div>
</pre><p>parse an HTML in-memory document and build a tree.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>cur</tt></i>:</span></td><td>a pointer to an array of <ahref="libxml2-xmlstring.html#xmlChar">xmlChar</a></td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>a free form C string describing the HTML document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an HTML document (and build a tree if using the standard SAX interface).</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>0, -1 in case of error. the parser context is augmented as a result of the parsing.</td></tr></tbody></table></div></div>
</pre><p>parse an HTML element, this is highly recursive this is kept for compatibility with previous code versions [39] element ::= EmptyElemTag | STag content ETag [41] Attribute ::= Name Eq AttValue</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr></tbody></table></div></div>
</pre><p>parse an HTML ENTITY references [68] EntityRef ::= '&' Name ';'</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ctxt</tt></i>:</span></td><td>an HTML parser context</td></tr><tr><td><spanclass="term"><i><tt>str</tt></i>:</span></td><td>location to store the entity name</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the associated <ahref="libxml2-HTMLparser.html#htmlEntityDescPtr">htmlEntityDescPtr</a> if found, or NULL otherwise, if non-NULL *str will have to be freed by the caller.</td></tr></tbody></table></div></div>
</pre><p>parse an HTML file and build a tree. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>filename</tt></i>:</span></td><td>the filename</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>a free form C string describing the HTML document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an XML in-memory document and build a tree.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>cur</tt></i>:</span></td><td>a pointer to a zero terminated string</td></tr><tr><td><spanclass="term"><i><tt>URL</tt></i>:</span></td><td>the base URL to use for the document</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an XML from a file descriptor and build a tree.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>fd</tt></i>:</span></td><td>an open file descriptor</td></tr><tr><td><spanclass="term"><i><tt>URL</tt></i>:</span></td><td>the base URL to use for the document</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an XML file from the filesystem or the network.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>filename</tt></i>:</span></td><td>a file or URL</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an HTML document from I/O functions and source and build a tree.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>ioread</tt></i>:</span></td><td>an I/O read function</td></tr><tr><td><spanclass="term"><i><tt>ioclose</tt></i>:</span></td><td>an I/O close function</td></tr><tr><td><spanclass="term"><i><tt>ioctx</tt></i>:</span></td><td>an I/O handler</td></tr><tr><td><spanclass="term"><i><tt>URL</tt></i>:</span></td><td>the base URL to use for the document</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>parse an XML in-memory document and build a tree.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>buffer</tt></i>:</span></td><td>a pointer to a char array</td></tr><tr><td><spanclass="term"><i><tt>size</tt></i>:</span></td><td>the size of the array</td></tr><tr><td><spanclass="term"><i><tt>URL</tt></i>:</span></td><td>the base URL to use for the document</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>the document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>options</tt></i>:</span></td><td>a combination of htmlParserOption(s)</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree</td></tr></tbody></table></div></div>
</pre><p>Parse an HTML in-memory document. If sax is not NULL, use the SAX callbacks to handle parse events. If sax is NULL, fallback to the default DOM behavior and return a tree.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>cur</tt></i>:</span></td><td>a pointer to an array of <ahref="libxml2-xmlstring.html#xmlChar">xmlChar</a></td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>a free form C string describing the HTML document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>sax</tt></i>:</span></td><td>the SAX handler block</td></tr><tr><td><spanclass="term"><i><tt>userData</tt></i>:</span></td><td>if using SAX, this pointer will be provided on callbacks.</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree unless SAX is NULL or the document is not well formed.</td></tr></tbody></table></div></div>
</pre><p>parse an HTML file and build a tree. Automatic support for ZLIB/Compress compressed document is provided by default if found at compile-time. It use the given SAX function block to handle the parsing callback. If sax is NULL, fallback to the default DOM tree building routines.</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>filename</tt></i>:</span></td><td>the filename</td></tr><tr><td><spanclass="term"><i><tt>encoding</tt></i>:</span></td><td>a free form C string describing the HTML document encoding, or NULL</td></tr><tr><td><spanclass="term"><i><tt>sax</tt></i>:</span></td><td>the SAX handler block</td></tr><tr><td><spanclass="term"><i><tt>userData</tt></i>:</span></td><td>if using SAX, this pointer will be provided on callbacks.</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the resulting document tree unless SAX is NULL or the document is not well formed.</td></tr></tbody></table></div></div>
</pre><p>Lookup the HTML tag in the ElementTable</p>
<divclass="variablelist"><tableborder="0"><colalign="left"/><tbody><tr><td><spanclass="term"><i><tt>tag</tt></i>:</span></td><td>The tag name in lowercase</td></tr><tr><td><spanclass="term"><i><tt>Returns</tt></i>:</span></td><td>the related <ahref="libxml2-HTMLparser.html#htmlElemDescPtr">htmlElemDescPtr</a> or NULL if not found.</td></tr></tbody></table></div></div>