mirror of
https://github.com/intel/llvm.git
synced 2026-01-26 12:26:52 +08:00
[libc] [search] improve hsearch robustness (#73896)
Following up the discussion at https://github.com/llvm/llvm-project/pull/73469#discussion_r1409593911 by @nickdesaulniers. According to FreeBSD implementation (https://android.googlesource.com/platform/bionic/+/refs/heads/main/libc/upstream-freebsd/lib/libc/stdlib/hcreate.c), `hsearch` is able to handle the cases where the global table is not properly initialized. To do this, FreeBSD actually allows hash table to be dynamically resized. If the global table is uninitialized at the first call, the table will be initialized with a minimal size; hence subsequent insertion will be reasonable as the table grows automatically. This patch mimic such behaviors. More precisely, this patch introduces: 1. a full table iterator that scans each element in the table, 2. a resize routine that is automatically triggered whenever the load factor is reached where it iterates the old table and insert the entries into a new one, 3. more tests that cover the growth of the table.
This commit is contained in:
committed by
GitHub
parent
67f9b5ae7d
commit
86e99e11e5
@@ -62,3 +62,10 @@ Often the standard will imply an intended behavior through what it states is und
|
||||
Ignoring Bug-For-Bug Compatibility
|
||||
----------------------------------
|
||||
Any long running implementations will have bugs and deviations from the standard. Hyrum's Law states that “all observable behaviors of your system will be depended on by somebody” which includes these bugs. An example of a long-standing bug is glibc's scanf float parsing behavior. The behavior is specifically defined in the standard, but it isn't adhered to by all libc implementations. There is a longstanding bug in glibc where it incorrectly parses the string 100er and this caused the C standard to add that specific example to the definition for scanf. The intended behavior is for scanf, when parsing a float, to parse the longest possibly valid prefix and then accept it if and only if that complete parsed value is a float. In the case of 100er the longest possibly valid prefix is 100e but the float parsed from that string is only 100. Since there is no number after the e it shouldn't be included in the float, so scanf should return a parsing error. For LLVM's libc it was decided to follow the standard, even though glibc's version is slightly simpler to implement and this edge case is rare. Following the standard must be the first priority, since that's the goal of the library.
|
||||
|
||||
Design Decisions
|
||||
================
|
||||
|
||||
Resizable Tables for hsearch
|
||||
----------------------------
|
||||
The POSIX.1 standard does not delineate the behavior consequent to invoking hsearch or hdestroy without prior initialization of the hash table via hcreate. Furthermore, the standard does not specify the outcomes of successive invocations of hsearch absent intervening hdestroy calls. Libraries such as MUSL and Glibc do not apply checks to these scenarios, potentially leading to memory corruption or leakage. Conversely, FreeBSD's libc and Bionic automatically initialize the hash table to a minimal size if it is found uninitialized, and proceeding to destroy the table only if initialization has occurred. This approach also avoids redundant table allocation if an initialized hash table is already present. Given that the hash table starts with a minimal size, resizing becomes necessary to accommodate additional user insertions. LLVM's libc mirrors the approach of FreeBSD's libc and Bionic, owing to its enhanced robustness and user-friendliness. Notably, such resizing behavior itself aligns with POSIX.1 standards, which explicitly permit implementations to modify the capacity of the hash table.
|
||||
|
||||
Reference in New Issue
Block a user