varch/doc/xml.en.md

8.4 KiB

Introduction

This is a C language XML parser that can handle the parsing and generation of XML text files. It is suitable for use on most C language platforms.

Usage Examples

Generation

Test Code:

void test_write(void)
{
    xml_t root, x;

    root = xml_create("root");
    if (!root) return;

    x = xml_create("name");
    xml_set_text(x, "xml parser");
    xml_insert(root, 0, x);

    x = xml_create("description");
    xml_set_text(x, "This is a C language version of xml parser.");
    xml_insert(root, 1, x);

    x = xml_create("license"); 
    xml_set_text(x, "GPL3.0");
    xml_insert(root, 2, x);
    
    xml_file_dump(root, "write.xml");

    xml_delete(root);
}

Generated File Name: write.xml

<root>
    <name>xml parser</name>
    <description>This is a C language version of xml parser.</description>
    <license>GPL3.0</license>
</root>

Parsing

File Name: read.xml

<?xml version="1.0" encoding="utf-8"?>
<bookstore>
    <book category="CHILDREN">
        <title>Harry Potter</title>
        <author>J K.Rowling</author>
        <year>2005</year>
        <price>29.99</price>
    </book>
    <book category="WEB">
        <title>Learning XML</title>
        <author>Erik T.Ray</author>
        <year>2004</year>
        <price>39.95</price>
    </book>
</bookstore>

Test Code:

void test_read(void)
{
    xml_t root, x;
    
    root = xml_file_load(READ_FILE);
    if (!root) return;

    printf("load success!\r\n");

    x = xml_to(root, "book", 1);
    printf("x attr: %s\r\n", xml_get_attribute(x, NULL, 0));

    x = xml_to(x, "author", 0);
    printf("author: %s\r\n", xml_get_text(x));

    xml_delete(root);
}

Printed Result:

load success!
x attr: WEB
author: Erik T.Ray

XML Syntax

XML Documents Must Have a Root Element

XML must contain a root element which is the parent of all other elements. For example, in the following instance, root is the root element:

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

XML Declaration

The XML declaration is an optional part of the XML file. If it exists, it should be placed on the first line of the document, like this:

<?xml version="1.0" encoding="utf-8"?>
  • This XML parser only supports the parsing of this declaration and doesn't actually apply the parsed version and encoding in practice.

All XML Elements Must Have a Closing Tag

In XML, omitting a closing tag is illegal. All elements must have closing tags:

<p>This is a paragraph.</p>

XML Tags Are Case-Sensitive

XML tags are case-sensitive. The tag <Letter> is different from the tag <letter>. The opening and closing tags must be written in the same case:

<Message>这是错误的</message>
<message>这是正确的</message>

XML Must Be Correctly Nested

In XML, all elements must be correctly nested within each other:

<b><i>This text is bold and italic</i></b>

XML Attribute Values Must Be in Quotes

In XML, the attribute values of XML elements must be enclosed in quotes.

<note date=12/11/2007>
<to>Tove</to>
<from>Jani</from>
</note>
<note date="12/11/2007">
<to>Tove</to>
<from>Jani</from>
</note>

The error in the first document is that the date attribute in the note element is not in quotes.

Entity References

In XML, some characters have special meanings. If you put the character "<" inside an XML element, an error will occur because the parser will treat it as the start of a new element. To avoid this error, use entity references instead of the "<" character:

<message>if salary &lt; 1000 then</message>

There are five predefined entity references in XML:

&lt; < less than
&gt; > greater than
&amp; & ampersand
&apos; ' apostrophe
&quot; " quotation mark

Note: In XML, only the characters "<" and "&" are actually illegal. The greater than sign is legal, but it's a good practice to use entity references instead.

Operation Methods

Common Methods

XML Parsing

Method Prototypes:

xml_t xml_loads(const char* text);
xml_t xml_file_load(const char* filename);

The xml_loads function takes XML text information as input and returns the handle of the parsed XML object. The xml_file_load function takes a file name as input to load the file and return the XML object. Inside the function, it reads the file using the C language standard file operation function set and then applies the xml_loads function for parsing. It supports files encoded in UTF-8.

XML Generation

Method Prototypes:

char* xml_dumps(xml_t xml, int preset, int unformat, int* len);
int xml_file_dump(xml_t xml, char* filename);

The xml_dumps function converts an XML object into text information. The preset parameter is the preset text length. If the preset length is close to the final output text length, it can reduce the number of memory reallocations and improve the conversion efficiency. The unformat parameter determines whether to use formatted output or not. If not using formatted output, the text will be squeezed into one line. The len parameter is the length of the converted output. The xml_file_dump function uses the xml_dumps function to store the text information into a file with the specified name.

XML Object Creation and Deletion

Method Prototypes:

xml_t xml_create(void);
void xml_delete(xml_t xml);

The xml_create function creates and returns an empty XML object. If it returns NULL, it means the creation has failed. The xml_delete function is used to delete an XML object.

XML Getting Child Objects

Method Prototypes:

xml_t xml_to(xml_t xml, const char *name, int index);

In an XML object, the name is not checked for duplication. That is, in the same level of XML, there may be multiple elements with the same name. The xml_to method can be used to match specific elements. When name is passed as NULL, only the index is used to match the child object according to the index. When name is not NULL, it will only match the child objects with the corresponding name and use the index to indicate which element with the name to match.

t = xml_to(xml, NULL, 3); // Find the child object with index 3
t = xml_to(xml, "a", 3); // Find the child object with key "a" and index 3
XML Setting and Getting Text

Method Prototypes:

int xml_set_text(xml_t xml, const char *text);
const char* xml_get_text(xml_t xml);

These two methods are used to set and get the text of an XML element respectively.

XML Adding and Removing Attributes

Method Prototypes:

int xml_add_attribute(xml_t xml, const char *name, const char *value);
int xml_remove_attribute(xml_t xml, const char *name, int index);

The xml_add_attribute function adds an attribute with the corresponding name and value to the beginning of the XML element. The xml_remove_attribute function has a matching logic similar to that of xml_to and is used to remove specific attributes. Both functions return 1 if the operation is successful and 0 if it fails.

XML Getting Attributes
const char* xml_get_attribute(xml_t xml, const char *name, int index);

This method uses a matching logic similar to that of xml_to to get the corresponding attribute value.

XML Inserting and Deleting Child Objects

Method Prototypes:

int xml_insert(xml_t xml, int index, xml_t ins);
int xml_remove(xml_t xml, const char *name, int index);

The xml_insert method inserts a created object into another object according to the index. The xml_remove method is similar to xml_remove_attribute and is used to remove specific child objects. Both methods return 1 if the operation is successful and 0 if it fails.

XML Parsing Error Reporting

The error types include the following:

#define XML_E_OK				0 // ok
#define XML_E_TEXT				1 // empty text
#define XML_E_MEMORY			2 // memory
#define XML_E_LABEL				3 // label
#define XML_E_VERSION			4 // version
#define XML_E_ENCODING			5 // encoding
#define XML_E_ILLEGAL			6 // illegal character
#define XML_E_END				7 // end
#define XML_E_VALUE				8 // missing value
#define XML_E_QUOTE				9 // missing quete
#define XML_E_COMMENT			10 // missing comment tail -->
#define XML_E_NOTES				11 // head notes error
#define XML_E_CDATA			    12 // missing comment tail ]]>