LCOV - code coverage report
Current view: top level - util/alloc - fd_alloc.h (source / functions) Hit Total Coverage
Test: cov.lcov Lines: 22 22 100.0 %
Date: 2025-01-08 12:08:44 Functions: 16 3200 0.5 %

          Line data    Source code
       1             : #ifndef HEADER_fd_src_util_alloc_fd_alloc_h
       2             : #define HEADER_fd_src_util_alloc_fd_alloc_h
       3             : 
       4             : /* fd_alloc is a high performance lockfree fast O(1) (typically)
       5             :    allocator.
       6             : 
       7             :    It is optimized for high concurrency use and small-ish clustered /
       8             :    multi-modal distributed allocation sizes.  It is further optimized
       9             :    for single-threaded use cases and/or when malloc-free pairs have have
      10             :    good thread affinity (i.e. frees done by the same thread that did the
      11             :    corresponding malloc).  It can also be used optimally in more complex
      12             :    threading use cases (e.g. malloc in one or more producer threads,
      13             :    free in one or more consumer threads).  It behaves well with
      14             :    irregular sizes and exploits ultra fine grained alignment for good
      15             :    packing (e.g. reasonable low memory overhead packing of byte strings
      16             :    with irregular small-ish sizes).
      17             : 
      18             :    A fd_alloc stores its state in a wksp in a persistent way and backs
      19             :    its allocations by that same wksp.  This avoids many of the severe
      20             :    performance and reliability issues of malloc
      21             : 
      22             :    Critically, it _doesn't_ _lie_ and it _doesn't_ _blow_ _up_.
      23             : 
      24             :    fd_alloc_malloc will not stall your program behind your back, calling
      25             :    the OS to grow or shrink the program's memory footprint during the
      26             :    call; it will never use more memory than has already be procured for
      27             :    the underlying wksp.  And, if fd_alloc_malloc succeeds, the returned
      28             :    memory is real and is ready for use.
      29             : 
      30             :    Obligatory dynamic allocation rant *********************************
      31             : 
      32             :    That is, fd_alloc is not the absolute unforgivable garbage of
      33             :    Linux/libc malloc.  malloc often just reserves page table entries and
      34             :    returns, irrespective of whether or not the request can be satisfied
      35             :    (on the apparent belief that the malloc call was a bluff and the user
      36             :    is probably a bad dev who doesn't bother with error trapping anyway),
      37             :    in hopes that that a later glacially slow page fault to the OS will
      38             :    actually reserve the memory.
      39             : 
      40             :    Which, even when it does work, it will by its very nature will be at
      41             :    the worst possible times (e.g. in the middle of incoming line rate
      42             :    network traffic bursts ... data structures try to grow to accommodate
      43             :    but slowing down throughput faster than they are growing at a time
      44             :    when keeping up is critical to surviving ... and then on a
      45             :    ridiculously awful normal page by normal page basis), exposing the
      46             :    caller to non-deterministic performance and reduced throughput.
      47             : 
      48             :    Unfortunately, getting overrun by DoS-like traffic patterns is the
      49             :    least of the worries.  When Linux can't back one of the page by DRAM
      50             :    on a page fault (skipping over some additional TLB and NUMA
      51             :    dubiousness that goes on under the hood), it goes from glacial
      52             :    performance to continental drift levels of performance.  It will try
      53             :    to honor the request by shuffling things to swap, exacerbating the
      54             :    above.  Suddenly it is a feat to even keep up with a 1980s modem.
      55             : 
      56             :    But that's not the end of the horror.  Because Linux thinks it cool
      57             :    to overcommit beyond physical limits for no discernible reason and
      58             :    gets flaky if you try to disable swap and/or overcommit, the page
      59             :    fault might not be able honor the commitment.  Finding itself caught
      60             :    in a lie (it can't go back in time and rescind the success that
      61             :    malloc already returned to the unsuspecting developer), the Linux
      62             :    kernel goes full HAL-9000 and starts randomly killing things.  A dead
      63             :    process can't complain about malloc lying to it after all.  And,
      64             :    cherry on top, the victims of the oom killer are frequently not even
      65             :    the culprits.
      66             : 
      67             :    Sigh ... all completely unacceptable behaviors in any situation, much
      68             :    less mission critical ones.
      69             : 
      70             :    TL;DR Friends don't let friends malloc.
      71             : 
      72             :    If you truly need malloc-free semantics, use fd_alloc.  This at least
      73             :    eliminates the most egregious horrors above.  It can't help the
      74             :    intrinsic horrors though.
      75             : 
      76             :    (Though it is ingrained in CS teaching and languages to the extent
      77             :    there's rarely even recognition of the faintest possibility of the
      78             :    existence of alternatives, people rarely truly need malloc/free
      79             :    semantics.  But, after they convince themselves they still do because
      80             :    of the brainwashing, they need to remind themselves that computers
      81             :    don't work remotely like malloc/free suggest and then should try to
      82             :    think about resource acquisition more fundamentally.  And, after they
      83             :    still manage to talk themselves back into needing it because of the
      84             :    teaching and linguistic traps, repeat ... at least if they want to
      85             :    make something fast and robust.  Even if they can prove dynamic
      86             :    allocation requests have an attainable worst level at all points in
      87             :    time, they still have to prove that heap fragmentation over time will
      88             :    never cause malloc to fail.  Good luck with that.)
      89             : 
      90             :    The above rant applies to any paired dynamic memory strategies,
      91             :    including non-placement new, implicit copy constructors, dynamic
      92             :    resizing containers, etc.  Real world computers aren't just funky
      93             :    implementations of infinite tape Turing machines.  This make-believe
      94             :    that they are in code that interacts with the real world is a recipe
      95             :    for real world disaster.
      96             : 
      97             :    End of obligatory dynamic allocation rant **************************
      98             : 
      99             :    Since it is backed by a wksp, allocations have the same NUMA, TLB,
     100             :    IPC and persistence properties of the underlying wksp.  This allows
     101             :    fd_alloc to go far beyond the capabilities of a typical allocator
     102             :    Allocations done by fd_alloc can be shared between processes (can
     103             :    even malloc in one process, translate the pointer into the address
     104             :    space of another process, and free it there, even after the first
     105             :    process has terminated), a process can be stopped and then other
     106             :    processes can still find the stopped process's allocations and use
     107             :    them / free them / etc.
     108             : 
     109             :    Regarding time efficiency and concurrency, large allocations are
     110             :    passed through to the underlying wksp allocator (which is neither
     111             :    O(1) and only "quasi"-lockfree in the sense described in fd_wksp.h).
     112             :    But the allocation strategies used under the hood (loosely inspired
     113             :    by Hoard-style lockfree allocators but with a lot of optimizations
     114             :    and tweaks for the above) are such that, in the common case of not
     115             :    needing to fall back to the underlying wksp allocator, the allocator
     116             :    is lockfree O(1).
     117             : 
     118             :    Regarding spatial efficiency, it is reasonably space efficient
     119             :    (overhead for a cstr-style allocation is ~4 bytes) and adapts over
     120             :    time to try to bound the amount of pre-allocation for small requests. */
     121             : 
     122             : #include "../wksp/fd_wksp.h"
     123             : #include "../valloc/fd_valloc.h"
     124             : 
     125             : /* FD_ALLOC_{ALIGN,FOOTPRINT} give the required alignment and footprint
     126             :    needed for a wksp allocation to be suitable as a fd_alloc.  ALIGN is
     127             :    an integer pointer of 2 and FOOTPRINT is an integer multiple of
     128             :    ALIGN.  These are provided to facilitate compile time declarations.
     129             :    4096 for ALIGN has been is picked to be normal-page like and match
     130             :    the minimum alignment of a fd_wksp_alloc. */
     131             : 
     132             : #define FD_ALLOC_ALIGN     (4096UL)
     133         228 : #define FD_ALLOC_FOOTPRINT (20480UL)
     134             : 
     135             : /* FD_ALLOC_MALLOC_ALIGN_DEFAULT gives the alignment that will be used
     136             :    when the user does not specify an alignment.  This will be an integer
     137             :    power of 2 of at least 16 for C/C++ allocator alignment conformance.
     138             :    (16 instead of 8 on the grounds that 128-bit is a primitive type on
     139             :    platforms with FD_HAS_INT128.) */
     140             : 
     141   132009114 : #define FD_ALLOC_MALLOC_ALIGN_DEFAULT (16UL)
     142             : 
     143             : /* FD_ALLOC_JOIN_CGROUP_HINT_MAX is maximum value for a cgroup hint.
     144             :    This is an integer power of 2 minus 1 of at most FD_ALLOC_ALIGN. */
     145             : 
     146   483729876 : #define FD_ALLOC_JOIN_CGROUP_HINT_MAX (15UL)
     147             : 
     148             : /* A "fd_alloc_t *" is an opaque handle of an fd_alloc. */
     149             : 
     150             : struct fd_alloc;
     151             : typedef struct fd_alloc fd_alloc_t;
     152             : 
     153             : FD_PROTOTYPES_BEGIN
     154             : 
     155             : /* fd_alloc_{align,footprint} return FD_ALLOC_{ALIGN,FOOTPRINT}. */
     156             : 
     157             : FD_FN_CONST ulong
     158             : fd_alloc_align( void );
     159             : 
     160             : FD_FN_CONST ulong
     161             : fd_alloc_footprint( void );
     162             : 
     163             : /* fd_alloc_new formats an unused wksp allocation with the appropriate
     164             :    alignment and footprint as a fd_alloc.  Caller is not joined on
     165             :    return.  Returns shmem on success and NULL on failure (shmem NULL,
     166             :    shmem misaligned, shmem is not backed by a wksp ... logs details).  A
     167             :    workspace can have multiple fd_alloc created for it.  They will
     168             :    dynamically share the underlying workspace along with any other
     169             :    non-fd_alloc usage but will otherwise act as completely separate
     170             :    non-conflicting arenas (useful for logical grouping and improved
     171             :    concurrency).  To help with various diagnostics, garbage collection
     172             :    and what not, all allocations to the underlying wksp are tagged with
     173             :    the given tag, positive.  Ideally, the tag used here should be
     174             :    distinct from all other tags used by this workspace. */
     175             : 
     176             : void *
     177             : fd_alloc_new( void * shmem,
     178             :               ulong  tag );
     179             : 
     180             : /* fd_alloc_join joins the caller to a fd_alloc.  shalloc points to the
     181             :    first byte of the memory region backing the alloc in the caller's
     182             :    address space.  Returns an opaque handle of the join on success
     183             :    (IMPORTANT! THIS IS NOT JUST A CAST OF SHALLOC) and NULL on failure
     184             :    (NULL shalloc, misaligned shalloc, bad magic, ... logs details).
     185             :    Every successful join should have a matching leave.  The lifetime of
     186             :    the join is until the matching leave or the thread group is
     187             :    terminated (joins are local to a thread group).
     188             : 
     189             :    cgroup_hint is a concurrency hint used to optimize parallel and
     190             :    persistent use cases. Ideally each thread (regardless of thread
     191             :    group) should join the allocator with a different cgroup_hint system
     192             :    wide (note that joins are practically free).  And if using a fd_alloc
     193             :    in a persistent way, logical streams of execution would ideally
     194             :    preserve the cgroup_hint address starts and stops of that stream for
     195             :    the most optimal affinity behaviors.  0 is fine in single threaded
     196             :    use cases and 0 and/or collisions are fine in more general cases
     197             :    though concurrent performance might be reduced due to additional
     198             :    contention between threads that share the same cgroup_hint.  If
     199             :    cgroup_hint is not in [0,FD_ALLOC_JOIN_CGROUP_HINT_MAX], it will be
     200             :    wrapped to be in that range.
     201             : 
     202             :    TL;DR A cgroup_hint of 0 is often a practical choice single threaded.
     203             :    A cgroup_hint of fd_tile_idx() or just uniform random 64-bit value
     204             :    choice in more general situations. */
     205             : 
     206             : fd_alloc_t *
     207             : fd_alloc_join( void * shalloc,
     208             :                ulong  cgroup_hint );
     209             : 
     210             : /* fd_alloc_leave leaves an existing join.  Returns the underlying
     211             :    shalloc (IMPORTANT! THIS IS NOT A SIMPLE CAST OF JOIN) on success and
     212             :    NULL on failure.  Reasons for failure include join is NULL (logs
     213             :    details). */
     214             : 
     215             : void *
     216             : fd_alloc_leave( fd_alloc_t * join );
     217             : 
     218             : /* fd_alloc_delete unformats a wksp allocation used as a fd_alloc.
     219             :    Assumes nobody is or will be joined to the fd_alloc.  The caller
     220             :    further promises there are no allocations outstanding.  If there are
     221             :    still some outstanding allocations, it will try to clean up as many
     222             :    as it can find but it is not guaranteed to find all of them (those
     223             :    will continue to consume wksp space but could be theoretically be
     224             :    cleaned up in an application specific way by operating directly on
     225             :    the underlying workspace ... of course, if the application could do
     226             :    that, it probably such just clean up after itself before calling
     227             :    delete).  Returns shmem on success and NULL on failure (logs
     228             :    details).  Reasons for failure include shalloc is NULL, misaligned
     229             :    fd_alloc, bad magic, etc. */
     230             : 
     231             : void *
     232             : fd_alloc_delete( void * shalloc );
     233             : 
     234             : /* fd_alloc_join_cgroup_hint returns the cgroup_hint of the current
     235             :    join.  Assumes join is a current local join.  The return will be in
     236             :    [0,FD_ALLOC_JOIN_CGROUP_HINT_MAX].
     237             : 
     238             :    fd_alloc_join_cgroup_hint_set returns join with the cgroup_hint
     239             :    updated to provided cgroup_hint.  If cgroup hint is not in
     240             :    [0,FD_ALLOC_JOIN_CGROUP_HINT_MAX], it will be wrapped into this
     241             :    range.  Assumes join is a current local join.  The return value is
     242             :    not a new join. */
     243             : 
     244             : FD_FN_CONST static inline ulong
     245   142033005 : fd_alloc_join_cgroup_hint( fd_alloc_t * join ) {
     246   142033005 :   return ((ulong)join) & FD_ALLOC_JOIN_CGROUP_HINT_MAX;
     247   142033005 : }
     248             : 
     249             : FD_FN_CONST static inline fd_alloc_t *
     250             : fd_alloc_join_cgroup_hint_set( fd_alloc_t * join,
     251    38926167 :                                ulong        cgroup_hint ) {
     252    38926167 :   return (fd_alloc_t *)((((ulong)join) & (~FD_ALLOC_JOIN_CGROUP_HINT_MAX)) | (cgroup_hint & FD_ALLOC_JOIN_CGROUP_HINT_MAX));
     253    38926167 : }
     254             : 
     255             : /* fd_alloc_wksp returns a pointer to a local wksp join of the wksp
     256             :    backing the fd_alloc with the current local join.  Caller should not
     257             :    call fd_alloc_leave on the returned value.  Lifetime of the returned
     258             :    wksp handle is as long as the shalloc used on the fd_alloc_join is
     259             :    still mapped into the caller's address space.
     260             : 
     261             :    fd_alloc_tag returns the tag that will be used for allocations from
     262             :    this workspace. */
     263             : 
     264             : FD_FN_PURE fd_wksp_t * fd_alloc_wksp( fd_alloc_t * join ); // NULL indicates NULL join
     265             : FD_FN_PURE ulong       fd_alloc_tag ( fd_alloc_t * join ); // Positive, 0 indicates NULL join
     266             : 
     267             : /* fd_alloc_malloc_at_least allocates at least sz bytes with alignment
     268             :    of at least align from the wksp backing the fd_alloc.  join is a
     269             :    current local join to the fd_alloc.  align should be an integer power
     270             :    of 2 or 0.
     271             : 
     272             :    An align of 0 indicates to use FD_ALLOC_MALLOC_DEFAULT_ALIGN for the
     273             :    request alignment.  This will be large enough such that
     274             :    fd_alloc_malloc is conformant with C/C++ alignment specifications
     275             :    (i.e. can trivially wrap fd_alloc_malloc to use as a drop in
     276             :    replacement for malloc).
     277             : 
     278             :    Small values of align will NOT be rounded up to some minimum (e.g.
     279             :    allocating lots of 1 byte aligned short strings is fine and
     280             :    relatively space and time efficient ... the overhead is ~4 bytes per
     281             :    allocation).  fd_alloc is not particularly optimized when align>~sz
     282             :    and/or large alignments (>~4096B).  While large values for align are
     283             :    supported by fd_alloc_malloc, directly using fd_wksp_alloc is
     284             :    recommended in such cases.
     285             : 
     286             :    If an allocation is "large" (align + sz >~ 64KiB for the current
     287             :    implementation), it will be handled by fd_wksp_alloc under the hood.
     288             :    Otherwise, it will be handled by fd_alloc_malloc algorithms (which
     289             :    are ultimately backed by fd_wksp_alloc).  As such, if a small
     290             :    allocation is "new" (e.g. first allocation of a size around sz, an
     291             :    allocation that can't be packed near other existing allocations
     292             :    around that sz, etc), this might also fallback on fd_wksp_alloc.
     293             :    Typically though, after initial allocation and/or program warmup,
     294             :    fd_alloc_malloc calls will be a reasonably fast O(1) lockfree.
     295             : 
     296             :    Returns a pointer to the allocation in the local address space on
     297             :    success.  Note that this pointer will a wksp laddr.  As such, it can
     298             :    be converted to a gaddr, passed to other threads in other thread
     299             :    groups, and converted to a wksp laddr in their address spaces, freed
     300             :    via a join to the fd_alloc in that thread group, persisted beyond the
     301             :    lifetime of the calling thread, etc.
     302             : 
     303             :    Returns NULL on failure (silent to support HPC usage) or when sz is
     304             :    0.  Reasons for failure include NULL join, invalid align, sz overflow
     305             :    (sz+align>~2^64), no memory available for request (e.g. workspace has
     306             :    insufficient room or is too fragmented).
     307             : 
     308             :    On return, *max will contain the number actual number of bytes
     309             :    available at the returned gaddr.  On success, this will be at least
     310             :    sz and it is not guaranteed to be a multiple of align.  On failure,
     311             :    *max will be zero.
     312             : 
     313             :    fd_alloc_malloc is a simple wrapper around fd_alloc_malloc_at_least
     314             :    for use when applications do not care about the actual size of their
     315             :    allocation. */
     316             : 
     317             : void *
     318             : fd_alloc_malloc_at_least( fd_alloc_t * join,
     319             :                           ulong        align,
     320             :                           ulong        sz,
     321             :                           ulong *      max );
     322             : 
     323             : static inline void *
     324             : fd_alloc_malloc( fd_alloc_t * join,
     325             :                  ulong        align,
     326     1633644 :                  ulong        sz ) {
     327     1633644 :   ulong max[1];
     328     1633644 :   return fd_alloc_malloc_at_least( join, align, sz, max );
     329     1633644 : }
     330             : 
     331             : /* fd_alloc_free frees the outstanding allocation whose first byte is
     332             :    pointed to by laddr in the caller's local address space.  join is a
     333             :    current local join to the fd_alloc.  The caller promises laddr was
     334             :    allocated by the underlying fd_alloc (but not necessarily on the
     335             :    calling thread or even in this calling process or even by a thread /
     336             :    process that is still running).  Silent for HPC usage (NULL join and
     337             :    NULL laddr are a no-op).
     338             : 
     339             :    Like fd_alloc_malloc, if the allocation was large, this will be
     340             :    handled by fd_wksp_free under the hood, which is neither lockfree nor
     341             :    O(1).  If the allocation was small, this will typically be lockfree
     342             :    O(1).  It is possible that, if the amount of outstanding small
     343             :    allocations has reduced significantly, fd_alloc_free on a small
     344             :    allocation might trigger a fd_wksp_free to free up wksp space for
     345             :    other usage (including uses not through this fd_alloc).
     346             : 
     347             :    (It would be possible to implement this less efficiently in space and
     348             :    time such that join didn't need to be passed.  The current design has
     349             :    picked efficiency and consistency with other APIs though.)
     350             : 
     351             :    Note that this will implicitly optimize the freed memory to be
     352             :    preferentially reused by the join's concurrency group.  Thus the
     353             :    caller should have at least one join for each concurrency group to
     354             :    which it might want to return memory for reuse and then call free
     355             :    with the appropriate join. */
     356             : 
     357             : void
     358             : fd_alloc_free( fd_alloc_t * join,
     359             :                void *       laddr );
     360             : 
     361             : /* fd_alloc_compact frees all wksp allocations that are not required
     362             :    for any outstanding user mallocs (note that fd_alloc_free lazily
     363             :    returns unused memory from the underlying wksp to accelerate
     364             :    potential future allocations).  join is a current local join to the
     365             :    alloc.  This cannot fail from a user's POV but logs any wonkiness
     366             :    detected.
     367             : 
     368             :    fd_alloc_compact has the property that it minimizes the amount of
     369             :    wksp utilization for the set of outstanding user mallocs when there
     370             :    is no other concurrent alloc usage.  As such, if there is no
     371             :    concurrent alloc usage _and_ there are no outstanding mallocs, on
     372             :    return, all wksp allocations (except the user provided memory region
     373             :    that holds the state of the allocator) will be returned to the wksp.
     374             :    This can be then be used to reset the alloc and/or implement robust
     375             :    leak detection at program teardown.
     376             : 
     377             :    This function is safe to use even when there is other concurrent
     378             :    alloc usage.  It t is best effort in that case; it is not guaranteed
     379             :    that there was some point in time between call and return when the
     380             :    wksp utilization was minimized for the contemporaneous set of
     381             :    outstanding user mallocs.
     382             : 
     383             :    Also note that this function is not O(1) and the fd_alloc_free lazy
     384             :    return mechanism does not permit unbounded growth of unreturned free
     385             :    memory.  So this should be used sparingly at best (e.g. in teardown
     386             :    leak detection or rare non-critical path housekeeping). */
     387             : 
     388             : void
     389             : fd_alloc_compact( fd_alloc_t * join );
     390             : 
     391             : /* fd_alloc_is_empty returns 1 if the alloc has no outstanding mallocs
     392             :    and 0 otherwise.  join is a current local join to the alloc.  NULL
     393             :    join silently returns 0.
     394             : 
     395             :    Important safety tip!  This should only be run when there is no
     396             :    concurrent alloc usage.  It is not algorithmically fast.  This might
     397             :    temporarily lock the underlying wksp while running and might call
     398             :    fd_alloc_compact under the hood.  It assumes the user provided memory
     399             :    region holding the alloc state is contained within a region returned
     400             :    by a single fd_wksp_alloc call (it would be hard to create an alloc
     401             :    where that isn't the case).  It assumes alloc is the only user of the
     402             :    alloc's tag in the wksp.  As such this should be used carefully and
     403             :    sparingly (e.g. at program teardown for leak detection).
     404             : 
     405             :    It will "work" with concurrent alloc usage in that the return value
     406             :    will be in 0 or 1 and it will not corrupt the alloc or underlying
     407             :    wksp.  But the return value will not be well-defined (e.g. it is not
     408             :    guaranteed to correspond the state of the alloc at some point in time
     409             :    between when this was called and it when it returned). */
     410             : 
     411             : int
     412             : fd_alloc_is_empty( fd_alloc_t * join );
     413             : 
     414             : /* fd_alloc_max_expand computes a recommended value to use for max when
     415             :    needing to dynamically resize structures.  The below is very subtle
     416             :    and fixes a lot of pervasive errors with dynamic resizing
     417             :    implementations (either explicit or implicitly done under the hood).
     418             :    It doesn't fix the main error with dynamic resizing though.  The main
     419             :    error being deciding to use anything with dynamic resizing (outside
     420             :    of, maybe, initialization at program start).
     421             : 
     422             :    Consider an all too common case of an initially too small dynamically
     423             :    sized array that is getting elements appended to it one at a time.
     424             :    E.g. without proper error trapping, overflow handling and the like:
     425             : 
     426             :      foo_t * foo       = NULL;
     427             :      ulong   foo_max   = 0UL;
     428             :      ulong   foo_cnt   = 0UL;
     429             :      ulong   foo_delta = ... some reasonable increment ...;
     430             : 
     431             :      while( ... still appending ... ) {
     432             : 
     433             :        if( foo_cnt==foo_max ) { // Need to resize
     434             :          foo_max += foo_delta;
     435             :          foo = (foo_t *)realloc( foo, foo_max*sizeof(foo_t) );
     436             :        }
     437             : 
     438             :        foo[ foo_cnt++ ] = ... next val to append ...;
     439             :      }
     440             : 
     441             :    This is terrible theoretically and practically and yet it looks like
     442             :    it does everything right.
     443             : 
     444             :    The theoretical issue is that, if the realloc can't be done in-place
     445             :    (which is more common than most realize ... depends on how the
     446             :    underlying realloc implementation details), the memory will have to
     447             :    be copied from the original location to the resized location with a
     448             :    typical cost of final_foo_max/2 -> O(final_foo_cnt).  Because max is
     449             :    increased by fixed absolute amount each resizing, there will be
     450             :    final_foo_cnt/foo_delta -> O(final_foo_cnt) such resizes.
     451             : 
     452             :    That is, we've accidentally written a method that has a slow
     453             :    O(final_foo_cnt^2) worst case even though it superficially looks like
     454             :    a fast O(final_foo_cnt) method.  Worse still, this behavior might
     455             :    appear suddenly in previously fine code if realloc implementation
     456             :    changes or, yet again worse, because a larger problem size was used
     457             :    in the wild than used in testing.
     458             : 
     459             :    The practical issue is realloc is painfully slow and it gets worse
     460             :    for large sizes because large sizes are usually handled by operating
     461             :    system calls (e.g. mmap or sbrk under the hood).  We've also now done
     462             :    O(final_foo_cnt) slow operating system calls in our already
     463             :    algorithmically slow O(final_foo_cnt^2) worst case algorithm that
     464             :    still superficially looks like a fast O(final_foo_cnt).  (And throw
     465             :    in the other issues with malloc described above about TLB and NUMA
     466             :    inefficiency, the gaslighting the kernel does "clearly the crash had
     467             :    nothing to do with the OOM killer shooting processes randomly in the
     468             :    head, your program probably just had a bug ... yeah ... that's the
     469             :    ticket" ... for good measure).
     470             : 
     471             :    We can get an algorithmic improvement if we change the above to
     472             :    increase max by a fixed relative amount each resize.  Since we are
     473             :    dealing with integers though, we should make sure that we always
     474             :    increase max by some minimal amount.  Instead of:
     475             : 
     476             :      foo_max += foo_delta;
     477             : 
     478             :    we can use something like:
     479             : 
     480             :      foo_max = fd_ulong_max( foo_max*gamma, foo_max + foo_delta );
     481             : 
     482             :    If gamma>1, asymptotically, we will only do O(lg cnt) resizes.
     483             :    Theoretically, we've gone from an O(final_foo_cnt^2) worst case
     484             :    method to an O(final_foo_cnt lg final_foo_cnt) worst case method.  It
     485             :    is still irritating that it looks superficially like a fast
     486             :    O(final_foo_cnt) method but this is amongst the many reasons why
     487             :    dynamic resizing is gross and wrong and to be avoided when possible.
     488             : 
     489             :    The larger gamma is, the smaller the leading coefficient is in the
     490             :    O(final_foo_cnt lg final_foo_cnt) and thus the better this
     491             :    approximates the fast O(final_foo_cnt) method that it superficially
     492             :    seems to be.  But using a very large gamma is clearly absurd.  There
     493             :    are obvious memory footprint limitations for large sizes and each
     494             :    resize would trigger an ever larger amount of OS work.  This raises
     495             :    the question:
     496             : 
     497             :    What is the optimal gamma?
     498             : 
     499             :    Suppose we have worst case realloc implementation (alloc new memory,
     500             :    copy, free old memory and, when no free fragment large enough is
     501             :    available, use sbrk like semantics to get memory from the O/S ...
     502             :    not uncommon as it is trivial to implement and often works "good
     503             :    enough" in lab settings).  It always works out-of-place and it always
     504             :    just appends new memory at the end of the heap when the heap runs out
     505             :    of space.  Then, while doing the above, asymptotically, we expect the
     506             :    heap to look something like:
     507             : 
     508             :      other allocs | M foo_t alloc | padding free | unmapped
     509             : 
     510             :    On the next resize, we'd request space for M gamma foo_t.  Since
     511             :    there are no free fragments large enough for this, realloc is going
     512             :    to have to map some space from the operating system, copy our memory
     513             :    into it and free up the original space for reuse.  Post resize, we
     514             :    expect the heap to look like:
     515             : 
     516             :      other allocs | M foo_t free | M gamma foo_t alloc | padding free | unmapped
     517             : 
     518             :    On the next resize, we'd request space for M gamma^2 foo_t.  This
     519             :    also can't fit within any free fragment above for gamma>1 (noting
     520             :    that, in this worst case realloc, we have to allocate the memory
     521             :    first and then copy and then free the old).  So we end up with:
     522             : 
     523             :      other allocs | M (1+gamma) foo_t free | M gamma^2 foo_t alloc | padding free | unmapped
     524             : 
     525             :    On the next resize, we'd request space for M gamma^3 foo_t.  If we
     526             :    have:
     527             : 
     528             :      gamma^3 < 1 + gamma
     529             : 
     530             :    we can fit this request in the hole left by the two previous resizes.
     531             :    This implies we need gamma<1.32471... where the magic number is the
     532             :    positive real root of:
     533             : 
     534             :      x^3 - x - 1  = 0
     535             : 
     536             :    This is the "silver ratio" in the sense that the positive real root
     537             :    of x^2 - x - 1 is the "golden ratio" of 1.61803...  (Note that the
     538             :    golden ratio would apply if we had a more sophisticated realloc under
     539             :    the hood that aliased the resized allocation over top the M foo_t
     540             :    free and the existing M gamma foo_t alloc and then moved the aliased
     541             :    memory.  Presumably such a sophisticated realloc would also just
     542             :    append to the end of the heap without any move or copy at all but
     543             :    that eventually leads to a question about how much overallocation and
     544             :    operating system overhead is acceptable on resize discussed further
     545             :    below).
     546             : 
     547             :    After a resize with something near but smaller than the silver ratio,
     548             :    we expect the heap to look like:
     549             : 
     550             :      other allocs | M gamma^3 foo_t alloc | padding free | unmapped
     551             : 
     552             :    which is back to where we started, except with a larger allocation.
     553             : 
     554             :    We don't want to be doing floating point math in methods like this.
     555             :    Noting that gamma = 1 + 1/4 + 1/16 = 1.3125 is very close to the
     556             :    silver yields the very practical:
     557             : 
     558             :      new_max = fd_ulong_max( max + (max>>2) + (max>>4), max + delta );
     559             : 
     560             :    This is friendly with even the worst case realloc behaviors under the
     561             :    hood.  It also works will in similar situations with linear storage
     562             :    media (e.g. disk storage).  The limit also means that the worst case
     563             :    overallocation for cases like the above at most ~32% and on average
     564             :    ~16%.  This is a comparable level of overallocation that already
     565             :    happens under the hood (e.g. on par with the level of waste that
     566             :    naturally happens in allocators for metadata and padding and much
     567             :    less waste than the golden ratio or larger growth rates if we
     568             :    dubiously trust that the realloc method under the hood).
     569             : 
     570             :    In cases where we might need to resize to even larger than this, we
     571             :    just resize to the caller's requested amount and keep our fingers
     572             :    crossed that the caller realized by this time dynamic resizing was a
     573             :    mistake and is allocating the correct size this time.
     574             : 
     575             :    Adding arithmetic overflow handling then yields the below.
     576             : 
     577             :    TL;DR  Example usage (ignoring size calculation overflow handling and
     578             :    allocation error trapping):
     579             : 
     580             :      ulong   foo_cnt   = 0UL;
     581             :      ulong   foo_max   = ... good estimate the actual amount needed;
     582             :      ulong   foo_delta = ... reasonable minimum resizing increment;
     583             :      foo_t * foo       = (foo_t *)malloc( foo_max*sizeof(foo_t) );
     584             : 
     585             :      while( ... still appending ... ) {
     586             : 
     587             :        if( FD_UNLIKELY( foo_cnt==foo_max ) ) {
     588             :          foo_max = fd_alloc_max_expand( foo_max, foo_delta, foo_cnt + foo_delta );
     589             :          foo     = (foo_t *)realloc( foo, foo_max*sizeof(foo_t) );
     590             :        }
     591             : 
     592             :        foo[ foo_cnt++ ] = ... next val to append ...;
     593             : 
     594             :      }
     595             : 
     596             :      ... at this point
     597             :      ... - foo has foo_cnt elements initialized
     598             :      ... - foo has room for foo_max elements total
     599             :      ... - when the initial foo_max estimate was correct or oversized,
     600             :      ...   no resizing was done
     601             :      ... - when the initial foo_max was undersized, asymptotically,
     602             :      ...   foo_max is at most ~32% larger worst case (~16% larger
     603             :      ...   average case) than foo_cnt with at most O(lg foo_cnt)
     604             :      ...   reallocs needed to initialize foo.
     605             :      ... - the resizing test branch is highly predictable
     606             :      ... - the underlying heap shouldn't be too fragmented or
     607             :      ...   overallocated regardless of the allocator implementation
     608             :      ...   details. */
     609             : 
     610             : FD_FN_CONST static inline ulong       /* new_max, new_max>=max(needed,max), if max<ULONG_MAX, will be new_max>max */
     611             : fd_alloc_max_expand( ulong max,
     612             :                      ulong delta,     /* Assumed > 0 */
     613    76670844 :                      ulong needed ) {
     614    76670844 :   ulong t0 = max + delta;               t0 = fd_ulong_if( t0<max, ULONG_MAX, t0 ); /* Handle overflow */
     615    76670844 :   ulong t1 = max + (max>>2) + (max>>4); t1 = fd_ulong_if( t1<max, ULONG_MAX, t1 ); /* Handle overflow */
     616    76670844 :   return fd_ulong_max( fd_ulong_max( t0, t1 ), needed );
     617    76670844 : }
     618             : 
     619             : /* fd_alloc_vtable is the virtual function table implementing fd_valloc
     620             :    for fd_alloc. */
     621             : 
     622             : extern const fd_valloc_vtable_t fd_alloc_vtable;
     623             : 
     624             : /* fd_alloc_virtual returns an abstract handle to the fd_alloc join.
     625             :    Valid for lifetime of join. */
     626             : 
     627             : FD_FN_CONST static inline fd_valloc_t
     628           3 : fd_alloc_virtual( fd_alloc_t * alloc ) {
     629           3 :   fd_valloc_t valloc = { alloc, &fd_alloc_vtable };
     630           3 :   return valloc;
     631           3 : }
     632             : 
     633             : FD_PROTOTYPES_END
     634             : 
     635             : #endif /* HEADER_fd_src_util_alloc_fd_alloc_h */
     636             : 

Generated by: LCOV version 1.14