varch/txls.en.md at master

mirror of https://gitee.com/Lamdonn/varch.git synced 2025-12-06 08:46:42 +08:00

Lamdonn 381435fea8 Add readme of English version, update the test code for each module, adjust some modules

2024-12-18 01:31:03 +08:00

23 KiB

Raw Permalink Blame History

Introduction

txls is a text-based table format in varch, with its syntax referring to that of Markdown tables. For better visual clarity in text, the syntax rules of txls are a bit stricter than those of Markdown tables. Through txls, it's quite convenient to access the row and column contents of a txls file, as well as generate a neatly formatted, standardized, and highly readable text-based table file.

Here's a brief introduction to the txls specification and a comparison with the Markdown table syntax.

Table Header

The table header is quite similar to that of a Markdown table, but a bit stricter.
Markdown tables don't require the use of '|' delimiters at both ends of the current row, but txls requires '|' delimiters at both ends of the row for a neater format. The same requirement applies to each subsequent row of content.
The table header is followed by a divider row, and the number of columns in the divider row must be consistent with that of the table header.
The content of the divider row must contain consecutive '-' characters.

Example:

| col 1 | col 2 | col 3              | col 4 | col 5 |
|-------|-------|--------------------|-------|-------|

Rows

In a row, the '|' delimiter is used to distinguish columns.
Between the '|' delimiters lies the cell content, excluding the spaces at both ends of the cell content.
The cell content can contain escape characters like "\|" to represent '|', and "<br>" to represent '\n'.

Example:

| col 1 | Zhang san | col 3          | col 4 | col 5 |
|-------|----------:|----------------|-------|-------|
| 11    |        21 | 31             | 41    | 51    |
| 12    |        22 | 1234\|<br>5678 | 42    | 52    |
| 13    |        23 | 33             | 43    | 53    |
| 14    |        24 | 34             | 44    | 54    |
| 15    |        25 | 35             | 45    | 55    |

Alignment

There are three alignment methods in txls: left alignment, right alignment, and center alignment. In the table, the alignment is indicated by ':' at the positions on both ends of the corresponding cell in the divider row.
':' on the left indicates left alignment, on the right indicates right alignment, and when it appears on both sides, it indicates center alignment. If there's no ':' present, it's the default alignment (currently, the default alignment in txls is left alignment).
There can be at most one ':' at each end.

Example:

| left align       |      right align |   center align   | default align    |
|:-----------------|-----------------:|:----------------:|------------------|
| 0                |                0 |        0         | 0                |
| 0123456789abcdef | 0123456789abcdef | 0123456789abcdef | 0123456789abcdef |

Interfaces

Creating and Deleting txls Objects

txls_t txls_create(int col, int row);
void txls_delete(txls_t txls);

Here, txls_t is the structure of txls. The creation method will create a table with col columns and row rows. If the creation is successful, it will return the handle of the txls object. The deletion method is used to delete a txls object.

Getting the Number of Columns and Rows

int txls_col(txls_t txls);
int txls_row(txls_t txls);

These two methods are used to return the number of columns and rows of the txls table respectively.

Inserting and Deleting a Column

int txls_insert_col(txls_t txls, int col);
int txls_delete_col(txls_t txls, int col);

These two methods are used to insert and delete a column in txls respectively. The column index starts from 1. If the operation is successful, it will return 1; otherwise, it will return 0.

Inserting and Deleting a Row

int txls_insert_row(txls_t txls, int row);
int txls_delete_row(txls_t txls, int row);

These two methods are used to insert and delete a row in txls respectively. The row index starts from 1. If the operation is successful, it will return 1; otherwise, it will return 0.

Setting and Getting Cell Content

int txls_set_text(txls_t txls, int col, int row, const char* text);
const char* txls_get_text(txls_t txls, int col, int row);

The setting method is used to set the content of the specified cell in txls to the specified text. It's not recommended to set spaces at both ends of the text content, as they will be ignored after being written into the table. Also, the text cannot contain invisible characters except for the newline character. If the operation is successful, it will return 1; otherwise, it will return 0. The getting method is used to return the content of the specified cell. Returning NULL indicates that the retrieval has failed. When row is set to 0, it's used for setting and getting the table header.

#define txls_set_head(txls, col, head)
#define txls_get_head(txls, col)

Setting the Alignment Method

int txls_set_align(txls_t txls, int col, int align);

This method is used to set the alignment method of the specified column. The alignment methods include TXLS_ALIGN_LEFT, TXLS_ALIGN_RIGHT, and TXLS_ALIGN_CENTER. Other values represent unknown alignment methods.

txls Object Dumping

char* txls_dumps(txls_t txls, int neat, int* len);
int txls_file_dump(txls_t txls, const char* filename);

Firstly, for the txls_dumps method, it dumps the txls object into a string according to the format. The neat parameter is a control variable for whether to dump neatly or not. A value of 0 means unneat output, and other values mean neat output. Neat output means that each column is aligned with the same width, while unneat output shows the actual length of each cell. Neat output maintains good appearance and readability but will occupy more space to store spaces. The len parameter is the length of the converted string. When NULL is passed in, the length will not be obtained. The return value is the converted string, which is allocated by the function. It's necessary to free this string after finishing using it. The txls_file_dump method dumps the txls object into a file based on the txls_dumps method. The filename parameter is used to pass in the file name. The return value is the length of the dumped content, and a negative value indicates that the dumping has failed.

txls Object Loading

txls_t txls_loads(const char* text);
txls_t txls_file_load(const char* filename);

Similar to the dumping methods, the txls object can be loaded from a string text or a file. If the loading is successful, it will return a txls object; otherwise, it will return NULL.

txls Loading Errors

int txls_error_info(int* line);

The txls parser in varch provides accurate error recognition. When the loading of txls fails, this method can be called to locate the error position and the error type. A return value of 1 indicates that there's an error in the current parsing, and 0 indicates that there's no error. The line parameter is used to output the line where the error occurs, and the type parameter is used to output the error type. The error types are defined as follows:

#define TXLS_E_OK							(0) /* no error */
#define TXLS_E_HEAD 						(1) /* irregular header format */
#define TXLS_E_ALLOC						(2) /* failed to allocate space */
#define TXLS_E_BEGIN						(3) /* missing "|" separator at the begin */
#define TXLS_E_END							(4) /* missing "|" separator at the end */
#define TXLS_E_IDENT						(5) /* missing "-" separator at split row */
#define TXLS_E_BRANK						(6) /* there are extra blank lines */
#define TXLS_E_MEMORY			            (7) // memory allocation failed
#define TXLS_E_OPEN			                (8) // fail to open file

Reference Examples

Generating a txls File

void test(void)
{
    txls_t x = NULL;  // Define a txls object, usually initialized as NULL

    x = txls_create(4, 5); // Create a 4x5 table
    if (!x)
    {
        return;
    }

    /* Set the table header, leaving the first column blank */
    txls_set_head(x, 2, "Zhang San");
    txls_set_head(x, 3, "Li Si");
    txls_set_head(x, 4, "Wang Wu");

    /* Set the alignment method */
    txls_set_align(x, 2, TXLS_ALIGN_LEFT);
    txls_set_align(x, 3, TXLS_ALIGN_CENTER);
    txls_set_align(x, 4, TXLS_ALIGN_RIGHT);

    /* Use the first column as the information category */
    txls_set_text(x, 1, 1, "age");
    txls_set_text(x, 1, 2, "gender");
    txls_set_text(x, 1, 3, "height");
    txls_set_text(x, 1, 4, "weight");
    txls_set_text(x, 1, 5, "email");

    /* Write each person's information */
    // Zhang San
    txls_set_text(x, 2, 1, "18");
    txls_set_text(x, 2, 2, "man");
    txls_set_text(x, 2, 3, "178.5");
    txls_set_text(x, 2, 4, "65");
    txls_set_text(x, 2, 5, "123321@qq.com");
    // Li Si
    txls_set_text(x, 3, 1, "24");
    txls_set_text(x, 3, 2, "woman");
    txls_set_text(x, 3, 3, "165");
    txls_set_text(x, 3, 4, "48");
    txls_set_text(x, 3, 5, "lisi@163.com");
    // Wang Wu
    txls_set_text(x, 4, 1, "20");
    txls_set_text(x, 4, 2, "man");
    txls_set_text(x, 4, 3, "175");
    txls_set_text(x, 4, 4, "75");
    txls_set_text(x, 4, 5, "ww1234567890@qq.com");

    txls_file_dump(x, "info.txls");

    txls_delete(x);
}

The dumped file info.txls:

|        | Zhang San     |    Li Si     |             Wang Wu |
|--------|:--------------|:------------:|--------------------:|
| age    | 18            |      24      |                  20 |
| gender | man           |    woman     |                 man |
| height | 178.5         |     165      |                 175 |
| weight | 65            |      48      |                  75 |
| email  | 123321@qq.com | lisi@163.com | ww1234567890@qq.com |

The display effect in a Markdown reader:

	Zhang San	Li Si	Wang Wu
age	18	24	20
gender	man	woman	man
height	178.5	165	175
weight	65	48	75
email	123321@qq.com	lisi@163.com	ww1234567890@qq.com

In the example, many functions don't have their return values judged. In practical applications, it's necessary to judge the return values.

Loading a txls File

Based on the file generated above, load the file. The loading test code:

void test(void)
{
    txls_t x = NULL;  // Define a txls object, usually initialized as NULL

    /* Load the txls file */
    x = txls_file_load("info.txls");
    if (!x) // If the loading fails, locate the error
    {
        int line, type;
        type = txls_error_info(&line);
        printf("txls parse error! line %d, error %d.\r\n", line, type);
        return;
    }

    /* Traverse the table header to locate the column */
    int col = 0;
    for (col = 1; col <= txls_col(x); col++)
    {
        if (strcmp("Li Si", txls_get_head(x, col)) == 0)
        {
            break;
        }
    }
    if (col > txls_col(x)) // If not found
    {
        printf("Lookup failed\r\n");
        return;
    }

    /* Print the information */
    printf("name: %s, age=%s, gender: %s, height=%s, weight=%s, email:%s\r\n", 
        txls_get_text(x, col, 0),
        txls_get_text(x, col, 1),
        txls_get_text(x, col, 2),
        txls_get_text(x, col, 3),
        txls_get_text(x, col, 4),
        txls_get_text(x, col, 5));

    txls_delete(x); // Delete the object after using it
}

Running result:

name: Li Si, age=24, gender: woman, height=165, weight=48, email:lisi@163.com

Loading Errors

Based on the above example, modify the file by deleting the '|' delimiter at the end of the 4th row, and then load it again.

|        | Zhang San     |    Li Si     |             Wang Wu |
|--------|:--------------|:------------:|--------------------:|
| age    | 18            |      24      |                  20 |
| gender | man           |    woman     |                 man 
| height | 178.5         |     165      |                 175 |
| weight | 65            |      48      |                  75 |
| email  | 123321@qq.com | lisi@163.com | ww1234567890@qq.com |

Running result:

txls parse error! line 4, error 4.

In this way, it can be located that an error of type 4 occurs on line 4, which corresponds to:

#define TXLS_E_END (4) /* missing "|" separator at the end */

Source Code Analysis

txls Parser Structure

The structures within the txls parser are all implicit, meaning that the members of these structures can't be accessed directly. This design guarantees the independence and security of the module. It prevents external calls from modifying the structure members, which could otherwise damage the storage structure of txls. As a result, only the declaration of txls is placed in the header file, while the actual definitions of the structures are in the source file. Operations on txls objects can only be carried out using the methods provided by the txls parser.

The txls type is declared as follows:

typedef struct TXLS *txls_t;

When in use, simply use txls_t.

The TXLS structure is defined like this:

typedef struct TXLS
{
    COLUMN *columns;            /* columns base */
    ITERATOR iterator;            /* column iterator */
    int col;                    /* column count */
    int row;                    /* row count */
} TXLS;

It contains four members. columns is a linked list of columns. The iterator is used to keep track of the position during column access, enabling quick returns when accessing the same position again instead of having to traverse from the beginning to the end. col and row represent the number of columns and rows of the table respectively.

The COLUMN structure is:

typedef struct COLUMN
{
    struct COLUMN *next;        /* next column */
    CELL *cells;                /* the cell base address of this column */
    ITERATOR iterator;            /* cell list iterator */
    int align;                    /* alignment */
    int width;                    /* the output width of this column when neatly outputting */
} COLUMN;

It has five members. next points to the next COLUMN to form a singly linked list. cells is a linked list of cells within this column. The iterator is for iterating over the cell list. align indicates the alignment method, and width represents the width of the column when outputting neatly.

The CELL structure is:

typedef struct CELL
{
    struct CELL *next;        /* next cell */
    char* address;            /* address of cell content */
} CELL;

It consists of two members. next points to the next CELL to form a singly linked list, and address stores the content of the cell as a string.

The ITERATOR structure is:

typedef struct
{
    void *p;                    /* iteration pointer */
    int i;                    /* iteration index */
} ITERATOR;

It simply records the node (through the p member) that the current pointer points to in the singly linked list and the index within that list.

Overall, the storage structure of TXLS is clear: columns are linked together in a list, and each column has its own list of cells linked together as well, forming a hierarchical structure like this:

INI             <it>                        
    col[1] --> col[2] --> col[3] --> col[4]
      |          |          |          |    
      V          V          V          V    
    cell[0]    cell[0]    cell[0]    cell[0]
      |          |          |          |    
      V          V          V          V    
    cell[1]    cell[1]    cell[1]    cell[1]
      |          |          |          |    
      V          V          V          V    
    cell[2]    cell[2]    cell[2]    cell[2]

Iteration of Singly Linked Lists

The operation of singly linked lists isn't the main focus here. The built-in iterator aims to improve the access efficiency of singly linked lists, and the iteration process is worth elaborating on. Taking the iteration of the column linked list as an example, here's how the iterator obtains the nodes of the linked list:

The txls_column function is as follows:

static COLUMN *txls_column(txls_t txls, int index, int col)  // Pass in the index and the limited number of columns
{
    if (index >= col) return NULL; // Check if the index is out of bounds

    // This step resets the iteration, positioning the iterator back to the head of the linked list.
    // The iterator will be reset if any of the following three conditions is met:
    // 1. Since it's a singly linked list and can't be traversed backward, when the target index is less than the iterator's index, it needs to be reset and then iterated from the head of the list to the specified index.
    // 2. When the iterator pointer (the p member) is NULL, which means it doesn't point to a specific node, it must be reset. So, externally, if you want to reset the iterator, you just need to set the p member to NULL.
    // 3. When the target index index is 0, which means actively getting the 0th position, i.e., the first position.
    if (index < txls->iterator.i ||!txls->iterator.p || index == 0)
    {
        txls->iterator.i = 0;
        txls->iterator.p = txls->columns;
    }

    // Loop to iterate the iterator to the specified index position.
    // In a singly linked list, the index increases positively, so the time complexity for forward traversal is O(n), and for reverse traversal, it's O(n^2).
    while (txls->iterator.p && txls->iterator.i < index)
    {
        txls->iterator.p = ((COLUMN *)(txls->iterator.p))->next;
        txls->iterator.i++;
    }

    // Return the node pointed to by the iterator.
    return txls->iterator.p;
}

When adjusting the linked list, the iterator also needs to be adjusted accordingly. The simplest way is to set the p member to NULL to reset the iterator.

Explanation of txls Dumping

Dumping means printing the txls in a specific format, but not to the console. Instead, it's printed into a specified memory space. Where does this space come from? It's allocated dynamically. How much space should be allocated? If it's a neat output (when neat is not 0), the required space can be predicted based on the column widths. For unneat output, the size needs to be adjusted dynamically during the dumping process.

First, let's look at the structure that maintains the printing space:

typedef struct
{
    char* address;               /* buffer base address */
    int size;                   /* size of buffer */
    int end;                    /* end of buffer used */
} BUFFER;

It has three members. address is the base address of the space, size is the size of the space, and end is the index of the used end.

The process of dynamically adjusting the space is as follows:

static int buf_append(BUFFER *buf, int needed) // needed is the additional capacity required
{
    char* address;
    int size;

    if (!buf ||!buf->address) return 0;

    // Calculate the total space required by adding the currently used space and the needed space.
    needed += buf->end;

    // Check if the current space can still meet the requirement.
    if (needed <= buf->size) return 1; /* there is still enough space in the current buf */

    // If the current space size can't meet the requirement, recalculate the space that can meet the requirement.
    // The new space should not only meet the current need but also leave some margin for the next append operation.
    // Otherwise, reallocating space every time a small amount of capacity is added would be very inefficient.
    // And what's the best size for the newly allocated space? It's a power of 2, which is the smallest power of 2 greater than the required amount.
    // Why choose a power of 2?
    // 1. The algorithm is easy to implement. Calculating the smallest power of 2 greater than a specified value is straightforward.
    // 2. It has good space utilization, which is (1 + 2) / 2 = 75%.
    // 3. The number of space reallocations is small. For example, when finally 128 units of space are needed, if adding 10 units of space each time, it would require 13 reallocations, while using powers of 2 only requires 7 reallocations.
    size = pow2gt(needed);
    address = (char*)realloc(buf->address, size);
    if (!address) return 0;

    buf->size = size;
    buf->address = address;

    return 1;
}

Here's an example of how it's used in the dumping code:

if (!buf_append(buf, 2)) return 0; // First, expand the space
buf_push(buf, '|'); // Call the function to push the character into the buf
buf_push(buf, '\n');

When dumping cells, there's a point to note. Cells are allowed to have newline characters and the delimiter '|'. The newline character is replaced with <br>, and the delimiter '|' is escaped as \|. The code for handling this is like this:

while (addr && *addr)
{
    if (*addr == '\n')
    {
        buf_push(buf, '<');
        buf_push(buf, 'b');
        buf_push(buf, 'r');
        buf_push(buf, '>');
    }
    else if (*addr == '|')
    {
        buf_push(buf, '\\');
        buf_push(buf, '|');
    }
    else  
    {
        buf_push(buf, *addr);
    }
    addr++;
}

Finally, when printing, the txls is traversed, and the space is dynamically adjusted to print each cell in each column in sequence into the specified space.

Explanation of txls Loading

The most important and complex part of the txls parser is the loading and parsing part.

First, a 0x0 txls table object is created. Then, as the parsing progresses, the parsed table header, rows, and columns are added one by one to the txls object. Once the parsing is completed, a complete txls object is obtained.
Before parsing, the line number and errors need to be reset. Then, as the parsing proceeds and a newline character is encountered, the line number is incremented accordingly. If an error occurs during parsing, the error type is recorded.
Then comes the actual parsing process, which is divided into two steps. The first step is to parse the table header and the table divider row. This step is to determine the total number of columns in the table, the alignment methods, and whether the syntax conforms to the table header syntax. The second step, based on the determined columns, parses each row one by one, analyzes the cells distinguished in each row, and classifies the cell contents into the specified rows and columns.

The entire parsing process is shown as follows:

/* Create an empty table */
txls = txls_create(0, 0);
...

/* Reset error information */
etype = TXLS_E_OK;
eline = 1;

/* Parse the table header */
s = parse_head(text, txls);
if (etype) goto FAIL;
while (1)
{
    /* Parse each row */
    s = parse_line(s, txls);
    if (etype) goto FAIL;
    if (!*s) break;
}
return txls;

Since the content specified by the txls syntax is basically divided by rows (values can have special cases where they can span multiple lines), basically, the processing of row-by-row information is done within this loop.

Explanation of txls Insertion, Deletion, Modification, and Query

The remaining operations regarding insertion, deletion, modification, and query for txls are basically common operations on linked list data structures and don't require special emphasis here.

23 KiB Raw Permalink Blame History

Introduction

Interfaces

Creating and Deleting txls Objects

Getting the Number of Columns and Rows

Inserting and Deleting a Column

Inserting and Deleting a Row

Setting and Getting Cell Content

Setting the Alignment Method

txls Object Dumping

txls Object Loading

txls Loading Errors

Reference Examples

Generating a txls File

Loading a txls File

Loading Errors

Source Code Analysis

txls Parser Structure

Iteration of Singly Linked Lists

Explanation of txls Dumping

Explanation of txls Loading

Explanation of txls Insertion, Deletion, Modification, and Query

23 KiB

Raw Permalink Blame History