### Introduction INI is a commonly used configuration file format in engineering with a simple syntax. The INI parser provided by varch offers simple and reliable APIs, enabling easy loading and generation of INI files. Here is a brief introduction to the INI specification. The components of an INI file include the following parts: **Comments** ``` INI supports comments, and comments take up a single line. Comments are marked with '#' or ';'. The comment mark should be at the beginning of the line, and spaces are allowed before it, but no other characters are permitted. ``` Example: ``` ########################### # This is a comment ########################### aaa ; What follows this is not considered a comment ``` **Section** ``` Each section occupies a single line. The section name is enclosed in "[]" brackets, and the content within the outermost brackets is taken as the section name. Any characters including brackets can be included within the brackets. Section names are case-sensitive. Sections cannot be repeated. ``` Example: ``` [section] ``` **Key** ``` Keys and values form key-value pairs. The key-value pairs are separated by ':' or '='. Spaces on both sides of the separator are allowed but not involved in the parsing (though it's recommended not to leave spaces). Keys are not case-sensitive. Keys are not allowed to be repeated within the same section, but they can be repeated in different sections. ``` **Value** ``` The content after the key-value separator is the value. Values can contain any characters (including '=', ':', etc., which are included in the value). Values can span multiple lines, provided that the indentation of the new line is greater than that of the value, and then the new line will be included in the value's parsing. ``` Example: ``` # The following example is allowed value1 = =aaaa;#[] # Values are allowed to be split into lines. What's enclosed in [] here will not be parsed as a section. value2 = 1 2 3 4 [sss] ``` An example of the **system.ini** file in the Windows system: ```ini ; for 16-bit app support [386Enh] woafont=dosapp.fon EGA80WOA.FON=EGA80WOA.FON EGA40WOA.FON=EGA40WOA.FON CGA80WOA.FON=CGA80WOA.FON CGA40WOA.FON=CGA40WOA.FON [drivers] wave=mmdrv.dll timer=timer.drv [mci] ``` ### Interface #### Creating and Deleting an INI Object ```c ini_t ini_create(void); void ini_delete(ini_t ini); ``` Here, **ini_t** is the structure type for INI. The creation method returns an empty INI object, and the deletion method deletes the passed INI object. #### Adding and Removing a Section ```c int ini_add_section(ini_t ini, const char* section); int ini_remove_section(ini_t ini, const char* section); ``` First, the method for adding a section will add a section with the specified name to the INI object. It returns 1 if the addition is successful and 0 if it fails (for example, if the name is repeated, the addition will fail). The method for removing a section also takes the specified section name as a parameter and returns 1 if the removal is successful and 0 if it fails. When a section is removed, all the key-value pairs under that section will also be removed. #### Getting a Section ```c int ini_section_index(ini_t ini, const char* section); const char* ini_section_name(ini_t ini, int index); ``` These two methods for getting a section are symmetrical. The first one gets the index of the section by its name (the index is the order of sections from the beginning to the end of the INI file, starting from 0). If the specified section is not found, it returns -1. The second one gets the section name by its index. If the section with the given index does not exist, it returns NULL. The INI parser in varch stores sections using a unidirectional linked list with a built-in iterator. Each time it is called, it records the current position. When called again, it first checks whether the position is the same as the previous one. So, when the call is the same as the previous one, the time complexity is **O(1)**, and when traversing using the **ini_section_name** method, the time complexity is **O(n)**. #### Setting and Getting a Value ```c int ini_set_value(ini_t ini, const char* section, const char* key, const char* value); const char* ini_get_value(ini_t ini, const char* section, const char* key); ``` The method for setting a value sets the value of the specified key in the specified section of the INI object to the given value. Besides modifying existing key-value pairs, this method can also be used to add new key-value pairs. First, it checks if the section exists. If not, it adds the section. Then it looks for the key. If the key is not found, it adds the key and sets the value to the specified value. It returns 1 if successful and 0 if it fails. The method for getting a value checks if the key-value pair under the specified section exists. If it exists, it returns the value; otherwise, it returns NULL. #### Removing a Key-Value Pair ```c int ini_remove_key(ini_t ini, const char* section, const char* key); ``` This method removes the key-value pair with the specified key in the specified section. It returns 1 if the removal is successful and 0 if it fails. #### Getting a Key ```c int ini_key_index(ini_t ini, const char* section, const char* key); const char* ini_key_name(ini_t ini, const char* section, int index); ``` The methods for getting a key are very similar to those for getting a section. They respectively find the index by the key name and find the key name by the index. When finding the index, it returns a negative number if it fails, and when finding the key name, it returns NULL if it fails. Similarly, key-value pairs are also stored using a unidirectional linked list with a built-in iterator. When the key name or index is the same as the previous call, the time complexity is O(1). When traversing forward by index, the time complexity is O(n). #### Getting Counts ```c int ini_section_count(ini_t ini); int ini_pair_count(ini_t ini, const char* section); ``` These two methods are used to get the count of sections in the INI object and the count of key-value pairs in the specified section, respectively. #### Dumping an INI Object ```c char* ini_dumps(ini_t ini, int preset, int *len); int ini_file_dump(ini_t ini, char* filename); ``` First, the **ini_dumps** method dumps the INI object into a string according to the format. The preset parameter is used to predict the length of the resulting string. During the conversion of the INI object to a string, space is reallocated as the conversion proceeds. If the predicted length is accurate and the pre-allocated space is sufficient, the frequency of reallocating space can be reduced, improving the conversion efficiency. The *len parameter is the length of the resulting string. When NULL is passed, the length is not obtained. The return value is the converted string, which is allocated by the function and **needs to be freed after use**. The **ini_file_dump** method dumps the INI object into a file based on the **ini_dumps** method. The filename parameter is used to pass in the file name, and the return value is the length of the dumped content. A negative value indicates that the dumping failed. #### Loading an INI Object ```c ini_t ini_loads(const char* text); ini_t ini_file_load(const char* filename); ``` Similar to the dumping methods, an INI object can be loaded from a string text or from a file. If the loading is successful, it returns an INI object; otherwise, it returns NULL. #### INI Loading Errors ```c int ini_error_info(int* line, int* type); ``` The INI parser provided by varch offers accurate error identification. When the INI loading fails, this method can be called to locate the error position and the error type. A return value of 1 indicates that an error occurred during parsing, and 0 indicates no error. The line parameter outputs the line where the error occurred, and the type parameter outputs the error type. The error types are defined as follows: ```c #define INI_E_OK (0) // ok #define INI_E_BRACKETS (1) // missing brackets ']' #define INI_E_DELIM (2) // missing delimiter #define INI_E_KEY (3) // missing key #define INI_E_SECTION (4) // missing section #define INI_E_REKEY (5) // key repeat #define INI_E_RESECTION (6) // section repeat #define INI_E_MEMORY (7) // memory allocation failed #define INI_E_OPEN (8) // fail to open file ``` ### Reference Examples #### Generating an INI File ```c static void test_dump(void) { ini_t ini = NULL; // Define an INI object, usually initialized to NULL ini = ini_create(); // Create an empty INI object if (ini == NULL) { printf("INI creation failed!\r\n"); return; } /* Add sections */ ini_add_section(ini, "Zhang San"); ini_add_section(ini, "Li Si"); ini_add_section(ini, "Wang Wu"); /* Add key-value pairs */ ini_set_value(ini, "Zhang San", "age", "18"); ini_set_value(ini, "Zhang San", "height", "178"); ini_set_value(ini, "Zhang San", "email", "123456@qq.com"); ini_set_value(ini, "Li Si", "age", "20"); ini_set_value(ini, "Li Si", "gender", "man"); ini_set_value(ini, "Li Si", "weight", "65"); ini_set_value(ini, "Wang Wu", "age", "22"); /* Dump the INI object to a file */ ini_file_dump(ini, WRITE_FILE); ini_delete(ini); // Delete the object after use } ``` The dumped file: ```ini [Zhang San] age = 18 height = 178 email = 123456@qq.com [Li Si] age = 20 gender = man weight = 65 [Wang Wu] age = 22 ``` In the example, many function return values are not checked. In actual applications, it's necessary to check the return values. #### Loading an INI File The test file is the **system.ini** file found in the Windows system as mentioned before. ```ini ; for 16-bit app support [386Enh] woafont=dosapp.fon EGA80WOA.FON=EGA80WOA.FON EGA40WOA.FON=EGA40WOA.FON CGA80WOA.FON=CGA80WOA.FON CGA40WOA.FON=CGA40WOA.FON [drivers] wave=mmdrv.dll timer=timer.drv [mci] ``` The loading test code: ```c void test(void) { ini_t ini = NULL; // Define an INI object, usually initialized to NULL /* Load the INI file */ ini = ini_file_load("system.ini"); if (ini == NULL) { int line, type; ini_error_info(&line, &type); printf("INI parsing error! Line %d, error %d.\r\n", line, type); return; } /* Traverse the INI object */ int section_count = ini_section_count(ini); // Get the section count for (int i = 0; i < section_count; i++) { char *section_name = ini_section_name(ini, i); // Get the section name printf("section: [%s]\r\n", section_name); int pair_count = ini_pair_count(ini, section_name); // Get the pair count for (int j = 0; j < pair_count; j++) { char *key = ini_key_name(ini, section_name, j); // Get the key name printf("key[%s], value[%s]\r\n", key, ini_get_value(ini, section_name, key)); // Print the key-value pair } } ini_delete(ini); // Delete the object after use } ``` The running result: ``` section: [386Enh] key[woafont], value[dosapp.fon] key[EGA80WOA.FON], value[EGA80WOA.FON] key[EGA40WOA.FON], value[EGA40WOA.FON] key[CGA80WOA.FON], value[CGA80WOA.FON] key[CGA40WOA.FON], value[CGA40WOA.FON] section: [drivers] key[wave], value[mmdrv.dll] key[timer], value[timer.drv] section: [mci] ``` #### Loading Error Based on the above example, modify the **system.ini** file by deleting the right bracket ']' of the section **[mci]**. Then load it again. ```ini ; for 16-bit app support [386Enh] woafont=dosapp.fon EGA80WOA.FON=EGA80WOA.FON EGA40WOA.FON=EGA40WOA.FON CGA80WOA.FON=CGA80WOA.FON CGA40WOA.FON=CGA40WOA.FON [drivers] wave=mmdrv.dll timer=timer.drv [mci ``` The running result: ``` INI parsing error! Line 13, error 1. ``` In this way, it can be located that an error of type 1 occurred on line 13, which corresponds to ```c #define INI_E_BRACKETS (1) // missing brackets ']' ``` ### Source Code Analysis #### INI Parser Structures All the structures of the INI parser are implicit, meaning that the members of the structures cannot be accessed directly. This approach ensures the independence and security of the module and prevents external calls from modifying the members of the structures, which could otherwise damage the storage structure of INI. Therefore, the INI parser only exposes the declaration of INI in the header file, while the definitions of the structures are placed in the source file. Only the methods provided by the INI parser can be used to operate on INI objects. The declaration of the INI type is as follows: ```c typedef struct INI* ini_t; ``` When using it, just use `ini_t`. ```c typedef struct INI { SECTION* sections; /* sections base */ ITERATOR iterator; /* section iterator */ int count; /* section count */ } INI; ``` The INI structure contains 5 members: `sections` (a linked list of sections), `count` (the count of sections), and `iterator` (the section iterator). The `iterator` is worth special mention as it records the position when accessing sections. When accessing again and it's found to be at the same position, it can quickly return without having to traverse from the beginning. The iterator will be further explained later. ```c typedef struct SECTION { struct SECTION *next; /* link */ char* name; /* section name */ PAIR* pairs; /* pairs base */ ITERATOR iterator; /* pair iterator */ int count; /* pair count */ } SECTION; ``` The `SECTION` structure contains 5 members: `next` (which points to the next `SECTION`, forming a unidirectional linked list), `name` (the section name), `pairs` (a linked list of key-value pairs), `iterator` (the pair iterator), and `count` (the count of pairs). ```c typedef struct PAIR { struct PAIR *next; /* link */ char* key; /* key */ char* value; /* value */ } PAIR; ``` The `PAIR` structure contains 3 members: `next` (which points to the next `PAIR`, forming a unidirectional linked list), `key` (the key), and `value` (the value). ```c typedef struct { void *p; /* iteration pointer */ int i; /* iteration index */ } ITERATOR; ``` Finally, let's look at this iterator. In fact, it's quite simple. It records the current node pointed to in the unidirectional linked list (the `p` member) and the index of the current position in the unidirectional linked list. How it records specifically will be discussed later. The storage structure of the INI class is clear as follows: ``` INI section[0] --> pair[0] --> pair[1] --> pair[2] | k v k v k v | v section[1] --> pair[0] --> pair[1] --> pair[2] | k v k v k v | v section[2] --> pair[0] --> pair[1] | k v k v | v section[3] --> pair[0] . k v . . ``` ### Iteration of Unidirectional Linked Lists The operation of unidirectional linked lists is not the focus here. The built-in iterator is mainly used to improve the access efficiency of unidirectional linked lists, so the iteration process will be emphasized. Taking the iteration of sections as an example, let's explain the process of how this iterator obtains the nodes of the linked list: ```c static SECTION* ini_section(ini_t ini, int index) // Pass in the index { if (index >= ini->count) return NULL; // Check whether the index is out of bounds /* This step is to reset the iteration, that is, to position the iterator back to the beginning of the linked list. The iterator will be reset if any of the following 3 conditions is met: 1. Since it's a unidirectional linked list and cannot be reversed, when the target index is less than the index of the iterator, it needs to be reset and then iterated from the beginning of the linked list to the specified index. 2. When the pointer of the iterator (the `p` member) is NULL, which means it doesn't point to a specific node, it must be reset. So, if you want to reset the iterator externally, you just need to set the `p` member to NULL. 3. When the target index `index` is 0, which means actively obtaining the 0th position, that is, the first position. */ if (index < ini->iterator.i ||!ini->iterator.p || index == 0) { ini->iterator.i = 0; ini->iterator.p = ini->sections; } /* Loop to iterate the iterator to the specified index position. The index of the unidirectional linked list increases positively, so the time complexity is O(n) when traversing forward, and it's still O(n^2) when traversing backward. */ while (ini->iterator.p && ini->iterator.i < index) { ini->iterator.p = ((SECTION *)(ini->iterator.p))->next; ini->iterator.i++; } /* Return the node pointed to by the iterator */ return ini->iterator.p; } ``` When adjusting the linked list, the iterator needs to be adjusted accordingly. The simplest way is to set the `p` member to NULL to reset the iterator. This iterator improves the efficiency when accessing by index. What about accessing by section name? When accessing randomly by section name, first check whether the section name pointed to by the current iterator matches. If it matches, return it directly; if not, it still needs to traverse the linked list from the beginning to find the matching one. ### Explanation of INI Dumping Dumping means "printing" the INI according to a format, but instead of printing it on the console, it's printed in the specified memory space. Where does this space come from? It comes from dynamic memory allocation. How much space should be allocated? This goes back to the `preset` parameter mentioned earlier. If the `preset` is set properly, the appropriate space can be allocated at one time. Otherwise, the space needs to be dynamically adjusted during the dumping process to store these "printed" characters. Now, let's first look at the structure that maintains this printing space: ```c typedef struct { char* address; /**< buffer base address */ unsigned int size; /**< size of buffer */ unsigned int end; /**< end of buffer used */ } BUFFER; ``` It has 3 members in total: `address` (the base address of the space), `size` (the size of the space), and `end` (the index of the end of the used buffer). The process of dynamically adjusting the space is as follows: ```c static int expansion(BUFFER* buf, unsigned int needed) // `needed` is the additional capacity required { char* address; int size; if (!buf ||!buf->address) return 0; /* Calculate the total space required by adding the currently used space and the required space */ needed += buf->end; /* Check if the current space can still meet the need */ if (needed <= buf->size) return 1; /* There is still enough space in the current buf */ /* If the current space size cannot meet the requirement, recalculate the space that can meet the requirement. The new space should not only meet the current need but also leave some margin for the next append operation. Otherwise, it would be a waste of efficiency to reallocate space every time a little capacity is added. And what's the best size for the newly allocated space? It's the smallest power of 2 that is larger than the required size. Why choose the power of 2? 1. The algorithm is easy to implement. It's easy to calculate the smallest power of 2 that is larger than the specified value. 2. The space utilization is good, which is (1 + 2) / 2 = 75%. 3. The number of space reallocations is small. For example, when 128 is finally needed, if the increment is fixed at 10 each time, it needs 13 reallocations, while only 7 reallocations are needed if using the power of 2. */ size = pow2gt(needed); address = (char*)realloc(buf->address, size); if (!address) return 0; buf->size = size; buf->address = address; return 1; } ``` Let's take a piece of dumping code as an example to see how it's used: ```c olen = strlen(sect->name); // First calculate the length of the section name if (expansion(&p, olen + 3) == 0) goto FAIL; // Append the corresponding space. +3 is for other characters /* Copy the characters to the printing space */ p.address[p.end++] = '['; for (k = 0; k < olen; k++) { p.address[p.end++] = sect->name[k]; } p.address[p.end++] = ']'; p.address[p.end++] = '\n'; ``` There's a point to note when dumping values. Values are allowed to span multiple lines, but the indentation of the new line should be greater than that of the key. Since there's no indentation added when printing the key, when the value has a new line, an indentation needs to be inserted after the new line to indicate that the following line belongs to the previous key-value pair. ```c /* Get the length of the value */ olen = 0; k = 0; while (1) { if (!pair->value[olen]) break; if (pair->value[olen] == '\n') k++; // Record the number of lines of the value for inserting indentations olen++; // Record the actual length of the value } if (expansion(&p, olen + k + 1) == 0) goto FAIL; /* Dump the value */ for (k = 0; k < olen; k++) { p.address[p.end++] = pair->value[k]; if (pair->value[k] == '\n') p.address[p.end++] = '\t'; // Insert an indentation } p.address[p.end++] = '\n'; ``` Finally, during the printing process, the INI is traversed, and the space is dynamically adjusted to print each section and key-value pair in order to the specified space. ### Explanation of INI Loading The most important and complex part of the INI parser is the loading and parsing part. 1. First, an empty INI object is created, and then the sections, keys, and values parsed are added to the INI object one by one as the parsing progresses. After the parsing is completed, a complete INI object can be obtained. 2. Before parsing, the line number and errors for parsing need to be reset first. Then, whenever a line break is encountered during the parsing, the line number is incremented accordingly. When a parsing error occurs, the type of the error is recorded. 3. Then comes the actual parsing process. The parsing process involves traversing and judging each character one by one to determine which range it belongs to, such as whether it's a section, a key, a value, or a comment. Once the corresponding range is determined, the corresponding parsing action is executed. The entire parsing is in a large loop: ```c while (*text) { ... ... ... } ``` Since the content specified by the INI syntax is basically divided by lines (values can span lines in special cases), basically, in this loop, information of each line is processed. In the first step of the `while` loop: ```c s = skip(text); depth = s - text; text = s; ``` Skip the useless characters, such as spaces or tabs that have no practical meaning. While skipping the characters, record the indentation, which will be used as the basis for judging whether the next line belongs to a new line or is within the range of the value of the previous line. Immediately after that, check whether it's a comment. Since comments take up a single line, once a comment mark is encountered, the content after it will be ignored and skipped. ```c /* skip comments */ if (iscomment(*text)) { text = lend(text); // Directly position to the end of the line if (*text == '\n') { text++; eline++; continue; } } ``` The rest is to define the ranges of sections, keys, and values and perform the corresponding parsing. The general structure is as follows: ```c /* `scope` represents the boundary range of the previous value. If the text being parsed is greater than `scope`, that is, it exceeds the range of the previous value and no longer belongs to the previous value, then new sections and keys can be parsed. */ if (text >= scope) { /* A section is indicated by starting with [ */ if (*text == '[') { /* A section occupies a single line, and the section name is enclosed by the outermost []. So here, we need to find the outermost []. Since the left one has been found, we need to find the right one. To find the right one, first directly position to the end of the line, and then search backward from the end of the line. Skip the useless characters and find the right ]. */ ... ... ... } else { /* Keys must be at the beginning of a line. If it's not a section part, then it's a key part. The range of a key is determined by finding the first separator, which is either = or :. Once the separator is found, the part before the separator is the range of the key, and the part after it is the range of the value. Since the range of the value can span multiple lines, the range of the value needs to be calculated separately. The premise for a value to span lines is that the indentation of the new line should be greater than that of the key (a blank line is considered separately). Then, a sentinel is used to explore forward to find the position that doesn't belong to the range of the value. */ ... ... ... } } else { /* When it still belongs to the range of the previous value, define which character blocks in each subsequent line are useful, that is, remove the useless characters at the beginning and end of the line, and then append the truly meaningful string to the value. */ ... ... ... } ``` Here, a special note on obtaining the range of the value: ```c static const char* value_scope(const char* text, int depth) { const char *s = text; const char *sentinel = NULL; s = lend(s); // The range directly includes the line after the separator : = while (*s) { sentinel = skip(s + 1); // Let the sentinel move to the meaningful character of the new line /* Skip comments */ if (iscomment(*sentinel)) { s = lend(sentinel); continue; } /* If the indentation of the current line is greater than that of the key, this line is included in the range */ if (sentinel - s - 1 > depth) { s = sentinel; s = lend(s); } /* If the indentation becomes smaller, it doesn't necessarily mean it's out of the range */ else { /* If a meaningful character is encountered, it's out of the range of the value */ if (*sentinel!= 0 && *sentinel!= '\n') break; /* Then directly go to the end of the line. If it's a blank line, it still belongs to the range of the value */ else s = sentinel; } } /* Go back to remove the blank lines at the end that don't belong to the range of the value */ while (*s == 0 || *s == '\n') { sentinel = rskip(s - 1, text); if (*sentinel!= 0 && *sentinel!= '\n') break; s = sentinel; } return s; } ``` ### Explanation of INI Addition, Deletion, Modification, and Query The rest of the operations for adding, deleting, modifying, and querying INI are actually basic operations on the linked list data structure, so there's no need to emphasize them here.