varch/vector.en.md at 70d1741babab8d09389f1caa76ebac697413d397

mirror of https://gitee.com/Lamdonn/varch.git synced 2025-12-06 16:56:42 +08:00

Lamdonn 381435fea8 Add readme of English version, update the test code for each module, adjust some modules

2024-12-18 01:31:03 +08:00

13 KiB

Raw Blame History

Introduction

The vector container is quite similar to an array. It encapsulates the commonly used methods of arrays, and in most cases, it can be used to replace ordinary arrays, providing better security. Arrays have static space, meaning their size can't be changed once defined, while vector is dynamic and its size can be adjusted dynamically during usage.

Moreover, the vector in varch can also be accessed randomly and directly like an array.

Interfaces

Creating and Deleting vector Objects

vector_t vector_create(int dsize, int size);
void vector_delete(vector_t vector);
#define vector(type, size) // For more convenient use, a macro definition is wrapped around vector_create
#define _vector(vector) // A macro definition is wrapped around vector_delete, and the vector is set to NULL after deletion

Here, vector_t is the structure of vector. The creation method will return a vector object if successful, or NULL if the creation fails. The dsize parameter is used to pass in the size of the data, and size is used to pass in the size (the number of data elements) of the vector array. The deletion method is used to delete the passed-in vector object. The creation and deletion methods should be used in pairs. Once a vector object is created and no longer needed, it should be deleted. For example:

void test(void)
{
    vector_t vt = vector(int, 16); // Create an int-type vector with a length of 16
    int array[16];

    _vector(vt); // Delete vt
}

Reading and Writing vector Data

void* vector_data(vector_t vector, int index);
#define vector_at(vector, type, i)
#define v2a(vector, type)

The vector_data method is used to obtain the address of the data according to the index, and it will return the address of the specified data. If it returns NULL, it means the operation has failed. The vector_at method adds a data type on top of vector_data, and v2a enables accessing the vector as an ordinary array using the [] subscript. Here's an example:

void test(void)
{
    vector_t vt = vector(int, 8);; // Define and create an int-type vector with a length of 8
    int i = 0;

    for (i = 0; i < 8; i++)
    {
        vector_at(vt, int, i) = i; // Use the at method to access
    }

    for (i = 0; i < 8; i++)
    {
        printf("vt[%d] = %d\r\n", i, v2a(vt, int)[i]); // Use the subscript to access
    }

    _vector(vt); // Delete the vector after using it
}

The result is:

vt[0] = 0
vt[1] = 1
vt[2] = 2
vt[3] = 3
vt[4] = 4
vt[5] = 5
vt[6] = 6
vt[7] = 7

Size and Capacity of vector

int vector_size(vector_t vector);
int vector_capacity(vector_t vector);

The size of vector is easy to understand, which is similar to the size of an array. But what about the capacity? The capacity refers to the space used to store the vector size. Since it's a dynamic array, in order to facilitate better expansion, some space is usually reserved for future expansion. Therefore, the capacity must be greater than or equal to the size. In actual usage, it's usually sufficient to focus on the size rather than the capacity. For example:

void test(void)
{
    vector_t vt = vector(int, 10);
    printf("size=%d, capacity=%d\r\n", vector_size(vt), vector_capacity(vt));
    _vector(vt);
}

The result is:

size=10, capacity=12

Adjusting the Size of vector

int vector_resize(vector_t vector, int size);
#define vector_clear(vector)

This method is used to readjust the vector container. It can expand or shrink the container. When expanding, additional space is appended at the end of the original space, and the appended space is not initialized (i.e., not necessarily set to 0). When shrinking, the extra part at the tail is discarded. The vector_clear macro uses vector_resize to adjust the size to 0. Here's an example:

void test(void)
{
    vector_t vt = vector(int, 10);
    printf("size=%d\r\n", vector_size(vt));
    vector_resize(vt, 32);
    printf("size=%d\r\n", vector_size(vt));
    vector_resize(vt, 14);
    printf("size=%d\r\n", vector_size(vt));
    _vector(vt);
}

The result is:

size=10
size=32
size=14

Inserting and Removing Elements in vector

int vector_insert(vector_t vector, int index, void* data, int num);
int vector_erase(vector_t vector, int index, int num);

The insertion method is used to insert num pieces of data at the specified address into the specified index position, while the removal method is used to remove num pieces of data starting from the specified index. Here's an example:

void test(void)
{
    vector_t vt = vector(int, 0); // Create a vector container with a length of 0
    int array[10] = {0,1,2,3,4,5,6,7,8,9};
    int i = 0;

    printf("insert:\r\n");
    vector_insert(vt, 0, array, 10); // Insert array into vt, which is equivalent to assigning the value of array to vt
    for (i = 0; i < vector_size(vt); i++)
    {
        printf("vt[%d]=%d\r\n", i, v2a(vt, int)[i]);
    }

    printf("erase:\r\n");
    vector_erase(vt, 5, 2); // Remove two data elements starting from index 5, that is, the data at index 5 and index 6
    for (i = 0; i < vector_size(vt); i++)
    {
        printf("vt[%d]=%d\r\n", i, v2a(vt, int)[i]);
    }
    
    _vector(vt);
}

The result is:

insert:
vt[0]=0
vt[1]=1
vt[2]=2
vt[3]=3
vt[4]=4
vt[5]=5
vt[6]=6
vt[7]=7
vt[8]=8
vt[9]=9
erase:
vt[0]=0
vt[1]=1
vt[2]=2
vt[3]=3
vt[4]=4
vt[5]=7
vt[6]=8
vt[7]=9

Based on the insertion and removal methods, the following macro definition methods are also derived:

#define vector_push_front(vector, data)
#define vector_push_back(vector, data)
#define vector_pop_front(vector)
#define vector_pop_back(vector)

Reference Examples

void test(void)
{
    vector_t vt_vector = vector(vector_t, 3); // A vector of type vector_t
    int i = 0;
    char *name[3] = { // Use these three names as data sources to generate a new vector
        "ZhangSan",
        "LiSi",
        "WangWu"
    };

    // Traverse vt_vector
    for (i = 0; i < vector_size(vt_vector); i++)
    {
        v2a(vt_vector, vector_t)[i] = vector(char, 0); // Create a new char-type vector for each item in vt_vector
        vector_insert(v2a(vt_vector, vector_t)[i], 0, name[i], strlen(name[i]) + 1); // Insert the names into the vector
    }

    for (i = 0; i < vector_size(vt_vector); i++)
    {
        printf("vt_vector[%d]: %s\r\n", i, &vector_at(v2a(vt_vector, vector_t)[i], char, 0)); // Print the names recorded in vt_vector
        _vector(v2a(vt_vector, vector_t)[i]); // Delete the vectors under vt_vector
    }
    
    _vector(vt_vector);
}

The result is:

vt_vector[0]: ZhangSan
vt_vector[1]: LiSi
vt_vector[2]: WangWu

In the example, many functions don't have their return values judged. In practical applications, it's necessary to judge the return values.

Source Code Analysis

vector Structure

All the structures of the vector container are implicit, meaning that the structure members can't be accessed directly. This design ensures the independence and security of the module, preventing external calls from modifying the structure members and thus avoiding damage to the vector storage structure. Therefore, the vector parser only leaves the single declaration of vector in the header file, and the structure definitions are placed in the source file. Only the methods provided by the vector container can be used to operate on vector objects. The vector type is declared as follows:

typedef struct VECTOR *vector_t;

When using it, just use vector_t.

The VECTOR structure is defined as follows:

/* type of vector */
typedef struct VECTOR
{
    void* base;                    /* base address for storing data */
    int dsize;                    /* size of item */
    int size;                    /* size of vector */
    int capacity;                /* capacity of vector */
} VECTOR;

The VECTOR structure contains four members. base is the base address where the data is actually stored, dsize represents the size of each data element, size is the size (length) of the vector, and capacity is the capacity of the vector.

Dynamic Size of vector

Judging from the structure members, the logic implemented by vector to achieve dynamic size adjustment isn't too complicated. First, calculate the size of the data to be stored, and then call malloc to dynamically allocate the specified space. When it's time to adjust the size, use realloc to reallocate the space. Let's look at the source code:

int vector_resize(vector_t vector, int size)
{
    void* base = NULL;
    int capacity;
    if (!vector) return 0;
    if (size < 0) return 0;
    capacity = gradient_capacity(size); // Calculate how much capacity the new space needs
    if (capacity!= vector->capacity) // If the calculated capacity is different from the current capacity, reallocate the space
    {
        base = realloc(vector->base, capacity * vector->dsize);
        if (!base) return 0;
        vector->base = base;
        vector->capacity = capacity;
    }
    vector->size = size; // Update the new size
    return 1;
}

As shown in the above code, implementing dynamic size adjustment is that simple.

But how is the capacity size calculated?

#define up_multiple(x, mul) ((x)+((mul)-((x)-1)%(mul))-1) /* get the smallest'mul' multiple larger than 'x' */
static int gradient_capacity(int size)
{
    int capacity = 1;
    if (size <= 1) return 1;
    while (capacity < size) capacity <<= 1; // Find the smallest power of 2 larger than size
    capacity >>= 1; // Find the largest power of 2 smaller than size
    if (capacity < 4) capacity = capacity << 1;
    else if (capacity < 16) capacity = up_multiple(size, capacity >> 1);
    else if (capacity < 256) capacity = up_multiple(size, capacity >> 2);
    else capacity = up_multiple(size, 64);
    return capacity;
}

This function divides the size according to gradients. The size within a certain gradient range will get the corresponding capacity for that gradient. First, find the largest power of 2 smaller than size, and then increase it based on the gradient levels (4, 16, 256) until the maximum increase of 64 at the end. The following code can be used to test the gradients distinguished by this gradient algorithm:

int size = 0, capacity = 0;
for (size = 1; size < 1024; )
{
    capacity = gradient_capacity(size);
    printf("+%d\t%d\r\n", capacity - size + 1, capacity);
    size = capacity + 1;
}

The result is:

+1      1
+1      2
+2      4
+2      6
+2      8
+4      12
+4      16
+4      20
+4      24
+4      28
+4      32
+8      40
+8      48
+8      56
+8      64
+16     80
+16     96
+16     112
+16     128
+32     160
+32     192
+32     224
+32     256
+64     320
+64     384
+64     448
+64     512
+64     576
+64     640
+64     704
+64     768
+64     832
+64     896
+64     960
+64     1024

At the end, the maximum increase is only 64.

Insertion and Removal Principle of vector

The principle of insertion is to move the data after the specified position backward as a whole to make room for the inserted data, and then copy the data to be inserted in. Similarly, the principle of removing data is to move the data behind forward as a whole to overwrite the data segment to be removed.

First, let's look at the source code for insertion:

int vector_insert(vector_t vector, int index, void* data, int num)
{
    int i = 0;
    int size;
    if (!vector) return 0;
    if (index < 0 || index > vector->size) return 0;
    if (num == 0) return 0;
    size = vector->size; // Record the original size first, which will be needed when moving the data later. After the vector is resized, vector->size will change, and the original tail index won't be accessible.
    if (!vector_resize(vector, vector->size + num)) return 0; // Expand the vector to reserve space for the inserted data
    if (index < size) memmove(at(index + num), at(index), vector->dsize * (size - index)); // Move the data backward as a whole
    if (data) memcpy(at(index + i), data, vector->dsize * num); // Copy the new data in
    return 1;
}

It can be seen that the earlier the insertion position is, the more data needs to be moved later, and the lower the insertion efficiency will be.

Now, let's look at the source code for removal:

int vector_erase(vector_t vector, int index, int num)
{
    unsigned char *op_ptr = NULL;
    if (!vector) return 0;
    if (vector->size == 0) return 0;
    if (index < 0 || index >= vector->size) return 0;
    if (num <= 0) return 0;
    if (num > vector->size - index) num = vector->size - index; // If num exceeds the number of data elements at the tail, just remove all the data at the tail.
    memmove(at(index), at(index + num), vector->dsize * (vector->size - (index + num))); // Move the data forward
    vector_resize(vector, vector->size - num); // Shrink the vector to remove the latter part
    return 1;
}

13 KiB Raw Blame History

Introduction

Interfaces

Creating and Deleting vector Objects

Reading and Writing vector Data

Size and Capacity of vector

Adjusting the Size of vector

Inserting and Removing Elements in vector

Reference Examples

Source Code Analysis

vector Structure

Dynamic Size of vector

Insertion and Removal Principle of vector

13 KiB

Raw Blame History