varch/doc/heap.en.md

9.4 KiB

Introduction

A heap is a special kind of complete binary tree with specific heap properties. There are mainly two types of heaps: max heap and min heap.

  1. Concept:

    • A heap is a complete binary tree where every level is full except for the last level, and the nodes in the last level are arranged as far left as possible.
    • Each node in the heap satisfies the heap property. That is, in a max heap, the value of each node is greater than or equal to the values of its child nodes; in a min heap, the value of each node is less than or equal to the values of its child nodes[^1^][^2^][^3^][^4^][^5^].
  2. Structure:

    • A heap is usually implemented using an array because a heap, being a complete binary tree, can be efficiently mapped onto an array.
    • Assuming the root node of the heap is stored at the first position (index 0) in the array, the positions of the child nodes and parent node of any node can be calculated using the following formulas:
      • Parent node index: parent(i) = (i - 1) / 2
      • Left child node index: left(i) = 2i + 1
      • Right child node index: right(i) = 2i + 2[^2^][^4^].
  3. Complete Binary Tree: A heap is always a complete binary tree, meaning that all levels except the last one are full, and the nodes in the last level are arranged as far left as possible[^1^][^2^][^3^][^4^][^5^].

  4. Heap Order Property:

    • Max Heap: The value of each node is greater than or equal to the values of its child nodes, which ensures that the top of the heap (the root node) is the maximum value in the entire heap[^1^][^2^][^3^][^4^][^5^].
    • Min Heap: The value of each node is less than or equal to the values of its child nodes, which ensures that the top of the heap (the root node) is the minimum value in the entire heap[^1^][^2^][^3^][^4^][^5^].
  5. Height: For a heap containing n nodes, its height is O(log n), which makes many operations on the heap have a logarithmic time complexity[^4^].

The heap is a general-purpose heap container module for the C language. It defines heap-related data types and a series of function interfaces for operating on heaps, covering functions such as heap creation, deletion, element insertion into the heap, element removal from the heap, modification, getting the top element of the heap, and getting the size of the heap. Meanwhile, through macro definitions, it provides convenient ways to obtain the indices of the parent node, left child node, and right child node of a node in the heap, facilitating developers to use the heap data structure for data management and operations in C language projects, such as implementing application scenarios like priority queues.

Interfaces

Creation and Deletion of heap Objects

heap_t heap_create(int dsize, int capacity, heap_root_t root);
void heap_delete(heap_t heap);

Here, heap_t is the structure of heap.

Creation Method: The creation method will return a heap object. If the creation fails, it will return NULL.

  • dsize: Represents the data size of each element in the heap, in bytes. It is used to determine the amount of space allocated in memory for each element. For example, when storing a structure element with multiple members, this parameter should be set to the size of the structure.
  • capacity: The initial capacity of the heap, that is, the maximum number of elements that the heap can hold when it is created. When the number of inserted elements exceeds this capacity later, expansion and other related processing may be required (depending on the internal implementation logic). Developers can reasonably set this parameter according to the estimated number of elements.
  • root: A pointer to a function that defines the rule for the root node type of the heap, that is, determines the nature of the heap (max heap or min heap) by comparing parent nodes and child nodes. The passed-in function needs to implement according to the specified return value logic so that subsequent heap operations can adjust elements based on the correct heap properties.

Deletion Method: Deletes the passed-in heap object. The creation method and deletion should be used in pairs. Once created and when the usage is finished, it should be deleted.

static int heap_root_min(void *parent, void *child)
{
    if (*(int *)parent < *(int *)child) return 1;
    return 0;
}
static int heap_root_max(void *parent, void *child)
{
    if (*(int *)parent > *(int *)child) return 1;
    return 0;
}

static void test_create(void) 
{
    heap_t heap = heap_create(sizeof(int), 11, heap_root_max);
    
    if (heap)
    {
        printf("heap create success!!!\r\n");
    }
    else  
    {
        printf("[ERROR] heap create fail!!!\r\n");
    }

    heap_delete(heap);
}

Insertion and Removal of Elements in heap

int heap_push(heap_t heap, void *data);
int heap_pop(heap_t heap, void *data);

These two methods can conveniently add data to the heap and remove data from the heap. For the push method, the data parameter passes in the address of the data to be inserted into the heap. For the pop method, the data parameter passes in the address of the memory where the removed data will be received. For both methods, data can be passed as NULL, which just serves as a placeholder. The methods return 1 if the operation is successful and 0 if it fails.

static void test_push(void) 
{
    heap_t heap = heap_create(sizeof(int), 11, heap_root_max);
    int push = 0, top = 0;

    push = 100; heap_push(heap, &push); heap_top(heap, &top); printf("top = %d\r\n", top);
    push = 1;   heap_push(heap, &push); heap_top(heap, &top); printf("top = %d\r\n", top);
    push = 2;   heap_push(heap, &push); heap_top(heap, &top); printf("top = %d\r\n", top);
    push = 200; heap_push(heap, &push); heap_top(heap, &top); printf("top = %d\r\n", top);
    push = -10; heap_push(heap, &push); heap_top(heap, &top); printf("top = %d\r\n", top);

    heap_delete(heap);
}

Modification of heap

int heap_modify(heap_t heap, int index, void *data);

This is used to modify the element data at the specified index position in the heap. After the modification, the structure of the heap will be readjusted according to the properties of the heap to ensure that the heap still maintains the correct heap property state. If the modification operation is successful, the function returns 1; if the index is out of range, the heap does not exist, or other situations prevent the modification from being carried out, it returns 0.

  • heap: The handle of the heap. Through it, the specific heap where the element to be modified is located can be located. Only with the correct corresponding heap can the target element be accurately found and modified.
  • index: The index position of the element to be modified in the heap. The index starts from 0 and represents the sequence number of the element in the heap. Through this index, the target element that needs to be modified can be accurately found. However, it is necessary to ensure that the passed-in index value is within the valid range of the heap (less than the current number of elements in the heap).
  • data: A pointer to the new data. The data content pointed to by this pointer will replace the original element data at the specified index position in the heap. The data type and size of the new data should be consistent with the element data size specified when creating the heap.
static void test_base(void)
{
    heap_t h = heap_create(sizeof(int), 11, heap_root_max);
    int i = 0;

    for (i = 0; i < 11; i++)
    {
        heap_push(h, &i);
    }
    
    i = -9;
    heap_modify(h, 6, &i);

    heap_delete(h);
}

Top of heap

int heap_top(heap_t heap, void *data);

This is used to obtain the element data at the top of the heap. For a max heap, the maximum element is obtained, and for a min heap, the minimum element is obtained. If the retrieval operation is successful, the function returns 1, and if the data parameter is not NULL, the element data at the top of the heap will be copied to the memory space pointed to by this pointer; if the heap is empty or other situations cause the retrieval to fail, it returns 0.

  • heap: The handle of the heap. Through this handle, the target heap from which the top element is to be obtained is determined. Different heaps have their own independent top elements, and the correct corresponding heap needs to be accurately specified to obtain the correct top data.
  • data: A pointer to the memory space used to store the element data at the top of the heap. If you want to obtain the content of the top element and store it for subsequent use, you can pass in a valid pointer address, and the function will perform a data copying operation; if you don't need to obtain the specific data, you can pass in NULL as the parameter.

Size of heap

int heap_size(heap_t heap);

This is used to obtain the number of elements in the current heap. The returned integer value represents the actual number of elements stored in the heap, facilitating developers to understand the current storage situation of the heap, such as determining whether the heap is empty or full.

  • heap: The handle of the heap. Based on this handle, the corresponding heap data structure can be found, and then the information about the number of elements contained in the heap can be obtained. Each heap has its own independent element quantity statistics, and through the handle, the situations of different heaps can be accurately distinguished.