Line data Source code
1 : #ifndef HEADER_fd_src_util_alloc_fd_alloc_h
2 : #define HEADER_fd_src_util_alloc_fd_alloc_h
3 :
4 : /* fd_alloc is a high performance lockfree fast O(1) (typically)
5 : allocator.
6 :
7 : It is optimized for high concurrency use and small-ish clustered /
8 : multi-modal distributed allocation sizes. It is further optimized
9 : for single-threaded use cases and/or when malloc-free pairs have have
10 : good thread affinity (i.e. frees done by the same thread that did the
11 : corresponding malloc). It can also be used optimally in more complex
12 : threading use cases (e.g. malloc in one or more producer threads,
13 : free in one or more consumer threads). It behaves well with
14 : irregular sizes and exploits ultra fine grained alignment for good
15 : packing (e.g. reasonable low memory overhead packing of byte strings
16 : with irregular small-ish sizes).
17 :
18 : A fd_alloc stores its state in a wksp in a persistent way and backs
19 : its allocations by that same wksp. This avoids many of the severe
20 : performance and reliability issues of malloc
21 :
22 : Critically, it _doesn't_ _lie_ and it _doesn't_ _blow_ _up_.
23 :
24 : fd_alloc_malloc will not stall your program behind your back, calling
25 : the OS to grow or shrink the program's memory footprint during the
26 : call; it will never use more memory than has already be procured for
27 : the underlying wksp. And, if fd_alloc_malloc succeeds, the returned
28 : memory is real and is ready for use.
29 :
30 : Obligatory dynamic allocation rant *********************************
31 :
32 : That is, fd_alloc is not the absolute unforgivable garbage of
33 : Linux/libc malloc. malloc often just reserves page table entries and
34 : returns, irrespective of whether or not the request can be satisfied
35 : (on the apparent belief that the malloc call was a bluff and the user
36 : is probably a bad dev who doesn't bother with error trapping anyway),
37 : in hopes that that a later glacially slow page fault to the OS will
38 : actually reserve the memory.
39 :
40 : Which, even when it does work, it will by its very nature will be at
41 : the worst possible times (e.g. in the middle of incoming line rate
42 : network traffic bursts ... data structures try to grow to accommodate
43 : but slowing down throughput faster than they are growing at a time
44 : when keeping up is critical to surviving ... and then on a
45 : ridiculously awful normal page by normal page basis), exposing the
46 : caller to non-deterministic performance and reduced throughput.
47 :
48 : Unfortunately, getting overrun by DoS-like traffic patterns is the
49 : least of the worries. When Linux can't back one of the page by DRAM
50 : on a page fault (skipping over some additional TLB and NUMA
51 : dubiousness that goes on under the hood), it goes from glacial
52 : performance to continental drift levels of performance. It will try
53 : to honor the request by shuffling things to swap, exacerbating the
54 : above. Suddenly it is a feat to even keep up with a 1980s modem.
55 :
56 : But that's not the end of the horror. Because Linux thinks it cool
57 : to overcommit beyond physical limits for no discernible reason and
58 : gets flaky if you try to disable swap and/or overcommit, the page
59 : fault might not be able honor the commitment. Finding itself caught
60 : in a lie (it can't go back in time and rescind the success that
61 : malloc already returned to the unsuspecting developer), the Linux
62 : kernel goes full HAL-9000 and starts randomly killing things. A dead
63 : process can't complain about malloc lying to it after all. And,
64 : cherry on top, the victims of the oom killer are frequently not even
65 : the culprits.
66 :
67 : Sigh ... all completely unacceptable behaviors in any situation, much
68 : less mission critical ones.
69 :
70 : TL;DR Friends don't let friends malloc.
71 :
72 : If you truly need malloc-free semantics, use fd_alloc. This at least
73 : eliminates the most egregious horrors above. It can't help the
74 : intrinsic horrors though.
75 :
76 : (Though it is ingrained in CS teaching and languages to the extent
77 : there's rarely even recognition of the faintest possibility of the
78 : existence of alternatives, people rarely truly need malloc/free
79 : semantics. But, after they convince themselves they still do because
80 : of the brainwashing, they need to remind themselves that computers
81 : don't work remotely like malloc/free suggest and then should try to
82 : think about resource acquisition more fundamentally. And, after they
83 : still manage to talk themselves back into needing it because of the
84 : teaching and linguistic traps, repeat ... at least if they want to
85 : make something fast and robust. Even if they can prove dynamic
86 : allocation requests have an attainable worst level at all points in
87 : time, they still have to prove that heap fragmentation over time will
88 : never cause malloc to fail. Good luck with that.)
89 :
90 : The above rant applies to any paired dynamic memory strategies,
91 : including non-placement new, implicit copy constructors, dynamic
92 : resizing containers, etc. Real world computers aren't just funky
93 : implementations of infinite tape Turing machines. This make-believe
94 : that they are in code that interacts with the real world is a recipe
95 : for real world disaster.
96 :
97 : End of obligatory dynamic allocation rant **************************
98 :
99 : Since it is backed by a wksp, allocations have the same NUMA, TLB,
100 : IPC and persistence properties of the underlying wksp. This allows
101 : fd_alloc to go far beyond the capabilities of a typical allocator
102 : Allocations done by fd_alloc can be shared between processes (can
103 : even malloc in one process, translate the pointer into the address
104 : space of another process, and free it there, even after the first
105 : process has terminated), a process can be stopped and then other
106 : processes can still find the stopped process's allocations and use
107 : them / free them / etc.
108 :
109 : Regarding time efficiency and concurrency, large allocations are
110 : passed through to the underlying wksp allocator (which is neither
111 : O(1) and only "quasi"-lockfree in the sense described in fd_wksp.h).
112 : But the allocation strategies used under the hood (loosely inspired
113 : by Hoard-style lockfree allocators but with a lot of optimizations
114 : and tweaks for the above) are such that, in the common case of not
115 : needing to fall back to the underlying wksp allocator, the allocator
116 : is lockfree O(1).
117 :
118 : Regarding spatial efficiency, it is reasonably space efficient
119 : (overhead for a cstr-style allocation is ~4 bytes) and adapts over
120 : time to try to bound the amount of pre-allocation for small requests. */
121 :
122 : #include "../wksp/fd_wksp.h"
123 : #include "../valloc/fd_valloc.h"
124 :
125 : /* FD_ALLOC_{ALIGN,FOOTPRINT} give the required alignment and footprint
126 : needed for a wksp allocation to be suitable as a fd_alloc. ALIGN is
127 : an integer pointer of 2 and FOOTPRINT is an integer multiple of
128 : ALIGN. These are provided to facilitate compile time declarations.
129 : 4096 for ALIGN has been is picked to be normal-page like and match
130 : the minimum alignment of a fd_wksp_alloc. */
131 :
132 : #define FD_ALLOC_ALIGN (4096UL)
133 228 : #define FD_ALLOC_FOOTPRINT (20480UL)
134 :
135 : /* FD_ALLOC_MALLOC_ALIGN_DEFAULT gives the alignment that will be used
136 : when the user does not specify an alignment. This will be an integer
137 : power of 2 of at least 16 for C/C++ allocator alignment conformance.
138 : (16 instead of 8 on the grounds that 128-bit is a primitive type on
139 : platforms with FD_HAS_INT128.) */
140 :
141 132009114 : #define FD_ALLOC_MALLOC_ALIGN_DEFAULT (16UL)
142 :
143 : /* FD_ALLOC_JOIN_CGROUP_HINT_MAX is maximum value for a cgroup hint.
144 : This is an integer power of 2 minus 1 of at most FD_ALLOC_ALIGN. */
145 :
146 483729876 : #define FD_ALLOC_JOIN_CGROUP_HINT_MAX (15UL)
147 :
148 : /* A "fd_alloc_t *" is an opaque handle of an fd_alloc. */
149 :
150 : struct fd_alloc;
151 : typedef struct fd_alloc fd_alloc_t;
152 :
153 : FD_PROTOTYPES_BEGIN
154 :
155 : /* fd_alloc_{align,footprint} return FD_ALLOC_{ALIGN,FOOTPRINT}. */
156 :
157 : FD_FN_CONST ulong
158 : fd_alloc_align( void );
159 :
160 : FD_FN_CONST ulong
161 : fd_alloc_footprint( void );
162 :
163 : /* fd_alloc_new formats an unused wksp allocation with the appropriate
164 : alignment and footprint as a fd_alloc. Caller is not joined on
165 : return. Returns shmem on success and NULL on failure (shmem NULL,
166 : shmem misaligned, shmem is not backed by a wksp ... logs details). A
167 : workspace can have multiple fd_alloc created for it. They will
168 : dynamically share the underlying workspace along with any other
169 : non-fd_alloc usage but will otherwise act as completely separate
170 : non-conflicting arenas (useful for logical grouping and improved
171 : concurrency). To help with various diagnostics, garbage collection
172 : and what not, all allocations to the underlying wksp are tagged with
173 : the given tag, positive. Ideally, the tag used here should be
174 : distinct from all other tags used by this workspace. */
175 :
176 : void *
177 : fd_alloc_new( void * shmem,
178 : ulong tag );
179 :
180 : /* fd_alloc_join joins the caller to a fd_alloc. shalloc points to the
181 : first byte of the memory region backing the alloc in the caller's
182 : address space. Returns an opaque handle of the join on success
183 : (IMPORTANT! THIS IS NOT JUST A CAST OF SHALLOC) and NULL on failure
184 : (NULL shalloc, misaligned shalloc, bad magic, ... logs details).
185 : Every successful join should have a matching leave. The lifetime of
186 : the join is until the matching leave or the thread group is
187 : terminated (joins are local to a thread group).
188 :
189 : cgroup_hint is a concurrency hint used to optimize parallel and
190 : persistent use cases. Ideally each thread (regardless of thread
191 : group) should join the allocator with a different cgroup_hint system
192 : wide (note that joins are practically free). And if using a fd_alloc
193 : in a persistent way, logical streams of execution would ideally
194 : preserve the cgroup_hint address starts and stops of that stream for
195 : the most optimal affinity behaviors. 0 is fine in single threaded
196 : use cases and 0 and/or collisions are fine in more general cases
197 : though concurrent performance might be reduced due to additional
198 : contention between threads that share the same cgroup_hint. If
199 : cgroup_hint is not in [0,FD_ALLOC_JOIN_CGROUP_HINT_MAX], it will be
200 : wrapped to be in that range.
201 :
202 : TL;DR A cgroup_hint of 0 is often a practical choice single threaded.
203 : A cgroup_hint of fd_tile_idx() or just uniform random 64-bit value
204 : choice in more general situations. */
205 :
206 : fd_alloc_t *
207 : fd_alloc_join( void * shalloc,
208 : ulong cgroup_hint );
209 :
210 : /* fd_alloc_leave leaves an existing join. Returns the underlying
211 : shalloc (IMPORTANT! THIS IS NOT A SIMPLE CAST OF JOIN) on success and
212 : NULL on failure. Reasons for failure include join is NULL (logs
213 : details). */
214 :
215 : void *
216 : fd_alloc_leave( fd_alloc_t * join );
217 :
218 : /* fd_alloc_delete unformats a wksp allocation used as a fd_alloc.
219 : Assumes nobody is or will be joined to the fd_alloc. The caller
220 : further promises there are no allocations outstanding. If there are
221 : still some outstanding allocations, it will try to clean up as many
222 : as it can find but it is not guaranteed to find all of them (those
223 : will continue to consume wksp space but could be theoretically be
224 : cleaned up in an application specific way by operating directly on
225 : the underlying workspace ... of course, if the application could do
226 : that, it probably such just clean up after itself before calling
227 : delete). Returns shmem on success and NULL on failure (logs
228 : details). Reasons for failure include shalloc is NULL, misaligned
229 : fd_alloc, bad magic, etc. */
230 :
231 : void *
232 : fd_alloc_delete( void * shalloc );
233 :
234 : /* fd_alloc_join_cgroup_hint returns the cgroup_hint of the current
235 : join. Assumes join is a current local join. The return will be in
236 : [0,FD_ALLOC_JOIN_CGROUP_HINT_MAX].
237 :
238 : fd_alloc_join_cgroup_hint_set returns join with the cgroup_hint
239 : updated to provided cgroup_hint. If cgroup hint is not in
240 : [0,FD_ALLOC_JOIN_CGROUP_HINT_MAX], it will be wrapped into this
241 : range. Assumes join is a current local join. The return value is
242 : not a new join. */
243 :
244 : FD_FN_CONST static inline ulong
245 142033005 : fd_alloc_join_cgroup_hint( fd_alloc_t * join ) {
246 142033005 : return ((ulong)join) & FD_ALLOC_JOIN_CGROUP_HINT_MAX;
247 142033005 : }
248 :
249 : FD_FN_CONST static inline fd_alloc_t *
250 : fd_alloc_join_cgroup_hint_set( fd_alloc_t * join,
251 38926167 : ulong cgroup_hint ) {
252 38926167 : return (fd_alloc_t *)((((ulong)join) & (~FD_ALLOC_JOIN_CGROUP_HINT_MAX)) | (cgroup_hint & FD_ALLOC_JOIN_CGROUP_HINT_MAX));
253 38926167 : }
254 :
255 : /* fd_alloc_wksp returns a pointer to a local wksp join of the wksp
256 : backing the fd_alloc with the current local join. Caller should not
257 : call fd_alloc_leave on the returned value. Lifetime of the returned
258 : wksp handle is as long as the shalloc used on the fd_alloc_join is
259 : still mapped into the caller's address space.
260 :
261 : fd_alloc_tag returns the tag that will be used for allocations from
262 : this workspace. */
263 :
264 : FD_FN_PURE fd_wksp_t * fd_alloc_wksp( fd_alloc_t * join ); // NULL indicates NULL join
265 : FD_FN_PURE ulong fd_alloc_tag ( fd_alloc_t * join ); // Positive, 0 indicates NULL join
266 :
267 : /* fd_alloc_malloc_at_least allocates at least sz bytes with alignment
268 : of at least align from the wksp backing the fd_alloc. join is a
269 : current local join to the fd_alloc. align should be an integer power
270 : of 2 or 0.
271 :
272 : An align of 0 indicates to use FD_ALLOC_MALLOC_DEFAULT_ALIGN for the
273 : request alignment. This will be large enough such that
274 : fd_alloc_malloc is conformant with C/C++ alignment specifications
275 : (i.e. can trivially wrap fd_alloc_malloc to use as a drop in
276 : replacement for malloc).
277 :
278 : Small values of align will NOT be rounded up to some minimum (e.g.
279 : allocating lots of 1 byte aligned short strings is fine and
280 : relatively space and time efficient ... the overhead is ~4 bytes per
281 : allocation). fd_alloc is not particularly optimized when align>~sz
282 : and/or large alignments (>~4096B). While large values for align are
283 : supported by fd_alloc_malloc, directly using fd_wksp_alloc is
284 : recommended in such cases.
285 :
286 : If an allocation is "large" (align + sz >~ 64KiB for the current
287 : implementation), it will be handled by fd_wksp_alloc under the hood.
288 : Otherwise, it will be handled by fd_alloc_malloc algorithms (which
289 : are ultimately backed by fd_wksp_alloc). As such, if a small
290 : allocation is "new" (e.g. first allocation of a size around sz, an
291 : allocation that can't be packed near other existing allocations
292 : around that sz, etc), this might also fallback on fd_wksp_alloc.
293 : Typically though, after initial allocation and/or program warmup,
294 : fd_alloc_malloc calls will be a reasonably fast O(1) lockfree.
295 :
296 : Returns a pointer to the allocation in the local address space on
297 : success. Note that this pointer will a wksp laddr. As such, it can
298 : be converted to a gaddr, passed to other threads in other thread
299 : groups, and converted to a wksp laddr in their address spaces, freed
300 : via a join to the fd_alloc in that thread group, persisted beyond the
301 : lifetime of the calling thread, etc.
302 :
303 : Returns NULL on failure (silent to support HPC usage) or when sz is
304 : 0. Reasons for failure include NULL join, invalid align, sz overflow
305 : (sz+align>~2^64), no memory available for request (e.g. workspace has
306 : insufficient room or is too fragmented).
307 :
308 : On return, *max will contain the number actual number of bytes
309 : available at the returned gaddr. On success, this will be at least
310 : sz and it is not guaranteed to be a multiple of align. On failure,
311 : *max will be zero.
312 :
313 : fd_alloc_malloc is a simple wrapper around fd_alloc_malloc_at_least
314 : for use when applications do not care about the actual size of their
315 : allocation. */
316 :
317 : void *
318 : fd_alloc_malloc_at_least( fd_alloc_t * join,
319 : ulong align,
320 : ulong sz,
321 : ulong * max );
322 :
323 : static inline void *
324 : fd_alloc_malloc( fd_alloc_t * join,
325 : ulong align,
326 1633644 : ulong sz ) {
327 1633644 : ulong max[1];
328 1633644 : return fd_alloc_malloc_at_least( join, align, sz, max );
329 1633644 : }
330 :
331 : /* fd_alloc_free frees the outstanding allocation whose first byte is
332 : pointed to by laddr in the caller's local address space. join is a
333 : current local join to the fd_alloc. The caller promises laddr was
334 : allocated by the underlying fd_alloc (but not necessarily on the
335 : calling thread or even in this calling process or even by a thread /
336 : process that is still running). Silent for HPC usage (NULL join and
337 : NULL laddr are a no-op).
338 :
339 : Like fd_alloc_malloc, if the allocation was large, this will be
340 : handled by fd_wksp_free under the hood, which is neither lockfree nor
341 : O(1). If the allocation was small, this will typically be lockfree
342 : O(1). It is possible that, if the amount of outstanding small
343 : allocations has reduced significantly, fd_alloc_free on a small
344 : allocation might trigger a fd_wksp_free to free up wksp space for
345 : other usage (including uses not through this fd_alloc).
346 :
347 : (It would be possible to implement this less efficiently in space and
348 : time such that join didn't need to be passed. The current design has
349 : picked efficiency and consistency with other APIs though.)
350 :
351 : Note that this will implicitly optimize the freed memory to be
352 : preferentially reused by the join's concurrency group. Thus the
353 : caller should have at least one join for each concurrency group to
354 : which it might want to return memory for reuse and then call free
355 : with the appropriate join. */
356 :
357 : void
358 : fd_alloc_free( fd_alloc_t * join,
359 : void * laddr );
360 :
361 : /* fd_alloc_compact frees all wksp allocations that are not required
362 : for any outstanding user mallocs (note that fd_alloc_free lazily
363 : returns unused memory from the underlying wksp to accelerate
364 : potential future allocations). join is a current local join to the
365 : alloc. This cannot fail from a user's POV but logs any wonkiness
366 : detected.
367 :
368 : fd_alloc_compact has the property that it minimizes the amount of
369 : wksp utilization for the set of outstanding user mallocs when there
370 : is no other concurrent alloc usage. As such, if there is no
371 : concurrent alloc usage _and_ there are no outstanding mallocs, on
372 : return, all wksp allocations (except the user provided memory region
373 : that holds the state of the allocator) will be returned to the wksp.
374 : This can be then be used to reset the alloc and/or implement robust
375 : leak detection at program teardown.
376 :
377 : This function is safe to use even when there is other concurrent
378 : alloc usage. It t is best effort in that case; it is not guaranteed
379 : that there was some point in time between call and return when the
380 : wksp utilization was minimized for the contemporaneous set of
381 : outstanding user mallocs.
382 :
383 : Also note that this function is not O(1) and the fd_alloc_free lazy
384 : return mechanism does not permit unbounded growth of unreturned free
385 : memory. So this should be used sparingly at best (e.g. in teardown
386 : leak detection or rare non-critical path housekeeping). */
387 :
388 : void
389 : fd_alloc_compact( fd_alloc_t * join );
390 :
391 : /* fd_alloc_is_empty returns 1 if the alloc has no outstanding mallocs
392 : and 0 otherwise. join is a current local join to the alloc. NULL
393 : join silently returns 0.
394 :
395 : Important safety tip! This should only be run when there is no
396 : concurrent alloc usage. It is not algorithmically fast. This might
397 : temporarily lock the underlying wksp while running and might call
398 : fd_alloc_compact under the hood. It assumes the user provided memory
399 : region holding the alloc state is contained within a region returned
400 : by a single fd_wksp_alloc call (it would be hard to create an alloc
401 : where that isn't the case). It assumes alloc is the only user of the
402 : alloc's tag in the wksp. As such this should be used carefully and
403 : sparingly (e.g. at program teardown for leak detection).
404 :
405 : It will "work" with concurrent alloc usage in that the return value
406 : will be in 0 or 1 and it will not corrupt the alloc or underlying
407 : wksp. But the return value will not be well-defined (e.g. it is not
408 : guaranteed to correspond the state of the alloc at some point in time
409 : between when this was called and it when it returned). */
410 :
411 : int
412 : fd_alloc_is_empty( fd_alloc_t * join );
413 :
414 : /* fd_alloc_max_expand computes a recommended value to use for max when
415 : needing to dynamically resize structures. The below is very subtle
416 : and fixes a lot of pervasive errors with dynamic resizing
417 : implementations (either explicit or implicitly done under the hood).
418 : It doesn't fix the main error with dynamic resizing though. The main
419 : error being deciding to use anything with dynamic resizing (outside
420 : of, maybe, initialization at program start).
421 :
422 : Consider an all too common case of an initially too small dynamically
423 : sized array that is getting elements appended to it one at a time.
424 : E.g. without proper error trapping, overflow handling and the like:
425 :
426 : foo_t * foo = NULL;
427 : ulong foo_max = 0UL;
428 : ulong foo_cnt = 0UL;
429 : ulong foo_delta = ... some reasonable increment ...;
430 :
431 : while( ... still appending ... ) {
432 :
433 : if( foo_cnt==foo_max ) { // Need to resize
434 : foo_max += foo_delta;
435 : foo = (foo_t *)realloc( foo, foo_max*sizeof(foo_t) );
436 : }
437 :
438 : foo[ foo_cnt++ ] = ... next val to append ...;
439 : }
440 :
441 : This is terrible theoretically and practically and yet it looks like
442 : it does everything right.
443 :
444 : The theoretical issue is that, if the realloc can't be done in-place
445 : (which is more common than most realize ... depends on how the
446 : underlying realloc implementation details), the memory will have to
447 : be copied from the original location to the resized location with a
448 : typical cost of final_foo_max/2 -> O(final_foo_cnt). Because max is
449 : increased by fixed absolute amount each resizing, there will be
450 : final_foo_cnt/foo_delta -> O(final_foo_cnt) such resizes.
451 :
452 : That is, we've accidentally written a method that has a slow
453 : O(final_foo_cnt^2) worst case even though it superficially looks like
454 : a fast O(final_foo_cnt) method. Worse still, this behavior might
455 : appear suddenly in previously fine code if realloc implementation
456 : changes or, yet again worse, because a larger problem size was used
457 : in the wild than used in testing.
458 :
459 : The practical issue is realloc is painfully slow and it gets worse
460 : for large sizes because large sizes are usually handled by operating
461 : system calls (e.g. mmap or sbrk under the hood). We've also now done
462 : O(final_foo_cnt) slow operating system calls in our already
463 : algorithmically slow O(final_foo_cnt^2) worst case algorithm that
464 : still superficially looks like a fast O(final_foo_cnt). (And throw
465 : in the other issues with malloc described above about TLB and NUMA
466 : inefficiency, the gaslighting the kernel does "clearly the crash had
467 : nothing to do with the OOM killer shooting processes randomly in the
468 : head, your program probably just had a bug ... yeah ... that's the
469 : ticket" ... for good measure).
470 :
471 : We can get an algorithmic improvement if we change the above to
472 : increase max by a fixed relative amount each resize. Since we are
473 : dealing with integers though, we should make sure that we always
474 : increase max by some minimal amount. Instead of:
475 :
476 : foo_max += foo_delta;
477 :
478 : we can use something like:
479 :
480 : foo_max = fd_ulong_max( foo_max*gamma, foo_max + foo_delta );
481 :
482 : If gamma>1, asymptotically, we will only do O(lg cnt) resizes.
483 : Theoretically, we've gone from an O(final_foo_cnt^2) worst case
484 : method to an O(final_foo_cnt lg final_foo_cnt) worst case method. It
485 : is still irritating that it looks superficially like a fast
486 : O(final_foo_cnt) method but this is amongst the many reasons why
487 : dynamic resizing is gross and wrong and to be avoided when possible.
488 :
489 : The larger gamma is, the smaller the leading coefficient is in the
490 : O(final_foo_cnt lg final_foo_cnt) and thus the better this
491 : approximates the fast O(final_foo_cnt) method that it superficially
492 : seems to be. But using a very large gamma is clearly absurd. There
493 : are obvious memory footprint limitations for large sizes and each
494 : resize would trigger an ever larger amount of OS work. This raises
495 : the question:
496 :
497 : What is the optimal gamma?
498 :
499 : Suppose we have worst case realloc implementation (alloc new memory,
500 : copy, free old memory and, when no free fragment large enough is
501 : available, use sbrk like semantics to get memory from the O/S ...
502 : not uncommon as it is trivial to implement and often works "good
503 : enough" in lab settings). It always works out-of-place and it always
504 : just appends new memory at the end of the heap when the heap runs out
505 : of space. Then, while doing the above, asymptotically, we expect the
506 : heap to look something like:
507 :
508 : other allocs | M foo_t alloc | padding free | unmapped
509 :
510 : On the next resize, we'd request space for M gamma foo_t. Since
511 : there are no free fragments large enough for this, realloc is going
512 : to have to map some space from the operating system, copy our memory
513 : into it and free up the original space for reuse. Post resize, we
514 : expect the heap to look like:
515 :
516 : other allocs | M foo_t free | M gamma foo_t alloc | padding free | unmapped
517 :
518 : On the next resize, we'd request space for M gamma^2 foo_t. This
519 : also can't fit within any free fragment above for gamma>1 (noting
520 : that, in this worst case realloc, we have to allocate the memory
521 : first and then copy and then free the old). So we end up with:
522 :
523 : other allocs | M (1+gamma) foo_t free | M gamma^2 foo_t alloc | padding free | unmapped
524 :
525 : On the next resize, we'd request space for M gamma^3 foo_t. If we
526 : have:
527 :
528 : gamma^3 < 1 + gamma
529 :
530 : we can fit this request in the hole left by the two previous resizes.
531 : This implies we need gamma<1.32471... where the magic number is the
532 : positive real root of:
533 :
534 : x^3 - x - 1 = 0
535 :
536 : This is the "silver ratio" in the sense that the positive real root
537 : of x^2 - x - 1 is the "golden ratio" of 1.61803... (Note that the
538 : golden ratio would apply if we had a more sophisticated realloc under
539 : the hood that aliased the resized allocation over top the M foo_t
540 : free and the existing M gamma foo_t alloc and then moved the aliased
541 : memory. Presumably such a sophisticated realloc would also just
542 : append to the end of the heap without any move or copy at all but
543 : that eventually leads to a question about how much overallocation and
544 : operating system overhead is acceptable on resize discussed further
545 : below).
546 :
547 : After a resize with something near but smaller than the silver ratio,
548 : we expect the heap to look like:
549 :
550 : other allocs | M gamma^3 foo_t alloc | padding free | unmapped
551 :
552 : which is back to where we started, except with a larger allocation.
553 :
554 : We don't want to be doing floating point math in methods like this.
555 : Noting that gamma = 1 + 1/4 + 1/16 = 1.3125 is very close to the
556 : silver yields the very practical:
557 :
558 : new_max = fd_ulong_max( max + (max>>2) + (max>>4), max + delta );
559 :
560 : This is friendly with even the worst case realloc behaviors under the
561 : hood. It also works will in similar situations with linear storage
562 : media (e.g. disk storage). The limit also means that the worst case
563 : overallocation for cases like the above at most ~32% and on average
564 : ~16%. This is a comparable level of overallocation that already
565 : happens under the hood (e.g. on par with the level of waste that
566 : naturally happens in allocators for metadata and padding and much
567 : less waste than the golden ratio or larger growth rates if we
568 : dubiously trust that the realloc method under the hood).
569 :
570 : In cases where we might need to resize to even larger than this, we
571 : just resize to the caller's requested amount and keep our fingers
572 : crossed that the caller realized by this time dynamic resizing was a
573 : mistake and is allocating the correct size this time.
574 :
575 : Adding arithmetic overflow handling then yields the below.
576 :
577 : TL;DR Example usage (ignoring size calculation overflow handling and
578 : allocation error trapping):
579 :
580 : ulong foo_cnt = 0UL;
581 : ulong foo_max = ... good estimate the actual amount needed;
582 : ulong foo_delta = ... reasonable minimum resizing increment;
583 : foo_t * foo = (foo_t *)malloc( foo_max*sizeof(foo_t) );
584 :
585 : while( ... still appending ... ) {
586 :
587 : if( FD_UNLIKELY( foo_cnt==foo_max ) ) {
588 : foo_max = fd_alloc_max_expand( foo_max, foo_delta, foo_cnt + foo_delta );
589 : foo = (foo_t *)realloc( foo, foo_max*sizeof(foo_t) );
590 : }
591 :
592 : foo[ foo_cnt++ ] = ... next val to append ...;
593 :
594 : }
595 :
596 : ... at this point
597 : ... - foo has foo_cnt elements initialized
598 : ... - foo has room for foo_max elements total
599 : ... - when the initial foo_max estimate was correct or oversized,
600 : ... no resizing was done
601 : ... - when the initial foo_max was undersized, asymptotically,
602 : ... foo_max is at most ~32% larger worst case (~16% larger
603 : ... average case) than foo_cnt with at most O(lg foo_cnt)
604 : ... reallocs needed to initialize foo.
605 : ... - the resizing test branch is highly predictable
606 : ... - the underlying heap shouldn't be too fragmented or
607 : ... overallocated regardless of the allocator implementation
608 : ... details. */
609 :
610 : FD_FN_CONST static inline ulong /* new_max, new_max>=max(needed,max), if max<ULONG_MAX, will be new_max>max */
611 : fd_alloc_max_expand( ulong max,
612 : ulong delta, /* Assumed > 0 */
613 76670844 : ulong needed ) {
614 76670844 : ulong t0 = max + delta; t0 = fd_ulong_if( t0<max, ULONG_MAX, t0 ); /* Handle overflow */
615 76670844 : ulong t1 = max + (max>>2) + (max>>4); t1 = fd_ulong_if( t1<max, ULONG_MAX, t1 ); /* Handle overflow */
616 76670844 : return fd_ulong_max( fd_ulong_max( t0, t1 ), needed );
617 76670844 : }
618 :
619 : /* fd_alloc_vtable is the virtual function table implementing fd_valloc
620 : for fd_alloc. */
621 :
622 : extern const fd_valloc_vtable_t fd_alloc_vtable;
623 :
624 : /* fd_alloc_virtual returns an abstract handle to the fd_alloc join.
625 : Valid for lifetime of join. */
626 :
627 : FD_FN_CONST static inline fd_valloc_t
628 3 : fd_alloc_virtual( fd_alloc_t * alloc ) {
629 3 : fd_valloc_t valloc = { alloc, &fd_alloc_vtable };
630 3 : return valloc;
631 3 : }
632 :
633 : FD_PROTOTYPES_END
634 :
635 : #endif /* HEADER_fd_src_util_alloc_fd_alloc_h */
636 :
|