Line data Source code
1 : #ifndef HEADER_fd_src_util_alloc_fd_alloc_h
2 : #define HEADER_fd_src_util_alloc_fd_alloc_h
3 :
4 : /* fd_alloc is a high performance lockfree fast O(1) (typically)
5 : allocator.
6 :
7 : It is optimized for high concurrency use and small-ish clustered /
8 : multi-modal distributed allocation sizes. It is further optimized
9 : for single-threaded use cases and/or when malloc-free pairs have have
10 : good thread affinity (i.e. frees done by the same thread that did the
11 : corresponding malloc). It can also be used optimally in more complex
12 : threading use cases (e.g. malloc in one or more producer threads,
13 : free in one or more consumer threads). It behaves well with
14 : irregular sizes and exploits ultra fine grained alignment for good
15 : packing (e.g. reasonable low memory overhead packing of byte strings
16 : with irregular small-ish sizes).
17 :
18 : A fd_alloc stores its state in a wksp in a persistent way and backs
19 : its allocations by that same wksp. This avoids many of the severe
20 : performance and reliability issues of malloc
21 :
22 : Critically, it _doesn't_ _lie_ and it _doesn't_ _blow_ _up_.
23 :
24 : fd_alloc_malloc will not stall your program behind your back, calling
25 : the OS to grow or shrink the program's memory footprint during the
26 : call; it will never use more memory than has already be procured for
27 : the underlying wksp. And, if fd_alloc_malloc succeeds, the returned
28 : memory is real and is ready for use.
29 :
30 : Obligatory dynamic allocation rant *********************************
31 :
32 : That is, fd_alloc is not the absolute unforgivable garbage of
33 : Linux/libc malloc. malloc often just reserves page table entries and
34 : returns, irrespective of whether or not the request can be satisfied
35 : (on the apparent belief that the malloc call was a bluff and the user
36 : is probably a bad dev who doesn't bother with error trapping anyway),
37 : in hopes that that a later glacially slow page fault to the OS will
38 : actually reserve the memory.
39 :
40 : Which, even when it does work, it will by its very nature will be at
41 : the worst possible times (e.g. in the middle of incoming line rate
42 : network traffic bursts ... data structures try to grow to accommodate
43 : but slowing down throughput faster than they are growing at a time
44 : when keeping up is critical to surviving ... and then on a
45 : ridiculously awful normal page by normal page basis), exposing the
46 : caller to non-deterministic performance and reduced throughput.
47 :
48 : Unfortunately, getting overrun by DoS-like traffic patterns is the
49 : least of the worries. When Linux can't back one of the page by DRAM
50 : on a page fault (skipping over some additional TLB and NUMA
51 : dubiousness that goes on under the hood), it goes from glacial
52 : performance to continental drift levels of performance. It will try
53 : to honor the request by shuffling things to swap, exacerbating the
54 : above. Suddenly it is a feat to even keep up with a 1980s modem.
55 :
56 : But that's not the end of the horror. Because Linux thinks it cool
57 : to overcommit beyond physical limits for no discernible reason and
58 : gets flaky if you try to disable swap and/or overcommit, the page
59 : fault might not be able honor the commitment. Finding itself caught
60 : in a lie (it can't go back in time and rescind the success that
61 : malloc already returned to the unsuspecting developer), the Linux
62 : kernel goes full HAL-9000 and starts randomly killing things. A dead
63 : process can't complain about malloc lying to it after all. And,
64 : cherry on top, the victims of the oom killer are frequently not even
65 : the culprits.
66 :
67 : Sigh ... all completely unacceptable behaviors in any situation, much
68 : less mission critical ones.
69 :
70 : TL;DR Friends don't let friends malloc.
71 :
72 : If you truly need malloc-free semantics, use fd_alloc. This at least
73 : eliminates the most egregious horrors above. It can't help the
74 : intrinsic horrors though.
75 :
76 : (Though it is ingrained in CS teaching and languages to the extent
77 : there's rarely even recognition of the faintest possibility of the
78 : existence of alternatives, people rarely truly need malloc/free
79 : semantics. But, after they convince themselves they still do because
80 : of the brainwashing, they need to remind themselves that computers
81 : don't work remotely like malloc/free suggest and then should try to
82 : think about resource acquisition more fundamentally. And, after they
83 : still manage to talk themselves back into needing it because of the
84 : teaching and linguistic traps, repeat ... at least if they want to
85 : make something fast and robust. Even if they can prove dynamic
86 : allocation requests have an attainable worst level at all points in
87 : time, they still have to prove that heap fragmentation over time will
88 : never cause malloc to fail. Good luck with that.)
89 :
90 : The above rant applies to any paired dynamic memory strategies,
91 : including non-placement new, implicit copy constructors, dynamic
92 : resizing containers, etc. Real world computers aren't just funky
93 : implementations of infinite tape Turing machines. This make-believe
94 : that they are in code that interacts with the real world is a recipe
95 : for real world disaster.
96 :
97 : End of obligatory dynamic allocation rant **************************
98 :
99 : Since it is backed by a wksp, allocations have the same NUMA, TLB,
100 : IPC and persistence properties of the underlying wksp. This allows
101 : fd_alloc to go far beyond the capabilities of a typical allocator
102 : Allocations done by fd_alloc can be shared between processes (can
103 : even malloc in one process, translate the pointer into the address
104 : space of another process, and free it there, even after the first
105 : process has terminated), a process can be stopped and then other
106 : processes can still find the stopped process's allocations and use
107 : them / free them / etc.
108 :
109 : Regarding time efficiency and concurrency, large allocations are
110 : passed through to the underlying wksp allocator (which is neither
111 : O(1) and only "quasi"-lockfree in the sense described in fd_wksp.h).
112 : But the allocation strategies used under the hood (loosely inspired
113 : by Hoard-style lockfree allocators but with a lot of optimizations
114 : and tweaks for the above) are such that, in the common case of not
115 : needing to fall back to the underlying wksp allocator, the allocator
116 : is lockfree O(1).
117 :
118 : Regarding spatial efficiency, it is reasonably space efficient
119 : (overhead for a cstr-style allocation is ~4 bytes) and adapts over
120 : time to try to bound the amount of pre-allocation for small requests. */
121 :
122 : #include "../wksp/fd_wksp.h"
123 :
124 : /* FD_ALLOC_{ALIGN,FOOTPRINT} give the required alignment and footprint
125 : needed for a wksp allocation to be suitable as a fd_alloc. ALIGN is
126 : an integer pointer of 2 and FOOTPRINT is an integer multiple of
127 : ALIGN. These are provided to facilitate compile time declarations.
128 : 4096 for ALIGN has been is picked to be normal-page like and match
129 : the minimum alignment of a fd_wksp_alloc. */
130 :
131 : #define FD_ALLOC_ALIGN (4096UL)
132 228 : #define FD_ALLOC_FOOTPRINT (20480UL)
133 :
134 : /* FD_ALLOC_MALLOC_ALIGN_DEFAULT gives the alignment that will be used
135 : when the user does not specify an alignment. This will be an integer
136 : power of 2 of at least 16 for C/C++ allocator alignment conformance.
137 : (16 instead of 8 on the grounds that 128-bit is a primitive type on
138 : platforms with FD_HAS_INT128.) */
139 :
140 25354801 : #define FD_ALLOC_MALLOC_ALIGN_DEFAULT (16UL)
141 :
142 : /* FD_ALLOC_JOIN_CGROUP_HINT_MAX is maximum value for a cgroup hint.
143 : This is an integer power of 2 minus 1 of at most FD_ALLOC_ALIGN. */
144 :
145 75861772 : #define FD_ALLOC_JOIN_CGROUP_HINT_MAX (15UL)
146 :
147 : /* A "fd_alloc_t *" is an opaque handle of an fd_alloc. */
148 :
149 : struct fd_alloc;
150 : typedef struct fd_alloc fd_alloc_t;
151 :
152 : FD_PROTOTYPES_BEGIN
153 :
154 : /* fd_alloc_{align,footprint} return FD_ALLOC_{ALIGN,FOOTPRINT}. */
155 :
156 : FD_FN_CONST ulong
157 : fd_alloc_align( void );
158 :
159 : FD_FN_CONST ulong
160 : fd_alloc_footprint( void );
161 :
162 : /* fd_alloc_new formats an unused wksp allocation with the appropriate
163 : alignment and footprint as a fd_alloc. Caller is not joined on
164 : return. Returns shmem on success and NULL on failure (shmem NULL,
165 : shmem misaligned, shmem is not backed by a wksp ... logs details). A
166 : workspace can have multiple fd_alloc created for it. They will
167 : dynamically share the underlying workspace along with any other
168 : non-fd_alloc usage but will otherwise act as completely separate
169 : non-conflicting arenas (useful for logical grouping and improved
170 : concurrency). To help with various diagnostics, garbage collection
171 : and what not, all allocations to the underlying wksp are tagged with
172 : the given tag, positive. Ideally, the tag used here should be
173 : distinct from all other tags used by this workspace. */
174 :
175 : void *
176 : fd_alloc_new( void * shmem,
177 : ulong tag );
178 :
179 : /* fd_alloc_join joins the caller to a fd_alloc. shalloc points to the
180 : first byte of the memory region backing the alloc in the caller's
181 : address space. Returns an opaque handle of the join on success
182 : (IMPORTANT! THIS IS NOT JUST A CAST OF SHALLOC) and NULL on failure
183 : (NULL shalloc, misaligned shalloc, bad magic, ... logs details).
184 : Every successful join should have a matching leave. The lifetime of
185 : the join is until the matching leave or the thread group is
186 : terminated (joins are local to a thread group).
187 :
188 : cgroup_hint is a concurrency hint used to optimize parallel and
189 : persistent use cases. Ideally each thread (regardless of thread
190 : group) should join the allocator with a different cgroup_hint system
191 : wide (note that joins are practically free). And if using a fd_alloc
192 : in a persistent way, logical streams of execution would ideally
193 : preserve the cgroup_hint address starts and stops of that stream for
194 : the most optimal affinity behaviors. 0 is fine in single threaded
195 : use cases and 0 and/or collisions are fine in more general cases
196 : though concurrent performance might be reduced due to additional
197 : contention between threads that share the same cgroup_hint. If
198 : cgroup_hint is not in [0,FD_ALLOC_JOIN_CGROUP_HINT_MAX], it will be
199 : wrapped to be in that range.
200 :
201 : TL;DR A cgroup_hint of 0 is often a practical choice single threaded.
202 : A cgroup_hint of fd_tile_idx() or just uniform random 64-bit value
203 : choice in more general situations. */
204 :
205 : fd_alloc_t *
206 : fd_alloc_join( void * shalloc,
207 : ulong cgroup_hint );
208 :
209 : /* fd_alloc_leave leaves an existing join. Returns the underlying
210 : shalloc (IMPORTANT! THIS IS NOT A SIMPLE CAST OF JOIN) on success and
211 : NULL on failure. Reasons for failure include join is NULL (logs
212 : details). */
213 :
214 : void *
215 : fd_alloc_leave( fd_alloc_t * join );
216 :
217 : /* fd_alloc_delete unformats a wksp allocation used as a fd_alloc.
218 : Assumes nobody is or will be joined to the fd_alloc. The caller
219 : further promises there are no allocations outstanding. If there are
220 : still some outstanding allocations, it will try to clean up as many
221 : as it can find but it is not guaranteed to find all of them (those
222 : will continue to consume wksp space but could be theoretically be
223 : cleaned up in an application specific way by operating directly on
224 : the underlying workspace ... of course, if the application could do
225 : that, it probably such just clean up after itself before calling
226 : delete). Returns shmem on success and NULL on failure (logs
227 : details). Reasons for failure include shalloc is NULL, misaligned
228 : fd_alloc, bad magic, etc. */
229 :
230 : void *
231 : fd_alloc_delete( void * shalloc );
232 :
233 : /* fd_alloc_join_cgroup_hint returns the cgroup_hint of the current
234 : join. Assumes join is a current local join. The return will be in
235 : [0,FD_ALLOC_JOIN_CGROUP_HINT_MAX].
236 :
237 : fd_alloc_join_cgroup_hint_set returns join with the cgroup_hint
238 : updated to provided cgroup_hint. If cgroup hint is not in
239 : [0,FD_ALLOC_JOIN_CGROUP_HINT_MAX], it will be wrapped into this
240 : range. Assumes join is a current local join. The return value is
241 : not a new join. */
242 :
243 : FD_FN_CONST static inline ulong
244 25773428 : fd_alloc_join_cgroup_hint( fd_alloc_t * join ) {
245 25773428 : return ((ulong)join) & FD_ALLOC_JOIN_CGROUP_HINT_MAX;
246 25773428 : }
247 :
248 : FD_FN_CONST static inline fd_alloc_t *
249 : fd_alloc_join_cgroup_hint_set( fd_alloc_t * join,
250 150 : ulong cgroup_hint ) {
251 150 : return (fd_alloc_t *)((((ulong)join) & (~FD_ALLOC_JOIN_CGROUP_HINT_MAX)) | (cgroup_hint & FD_ALLOC_JOIN_CGROUP_HINT_MAX));
252 150 : }
253 :
254 : /* fd_alloc_wksp returns a pointer to a local wksp join of the wksp
255 : backing the fd_alloc with the current local join. Caller should not
256 : call fd_alloc_leave on the returned value. Lifetime of the returned
257 : wksp handle is as long as the shalloc used on the fd_alloc_join is
258 : still mapped into the caller's address space.
259 :
260 : fd_alloc_tag returns the tag that will be used for allocations from
261 : this workspace. */
262 :
263 : FD_FN_PURE fd_wksp_t * fd_alloc_wksp( fd_alloc_t * join ); // NULL indicates NULL join
264 : FD_FN_PURE ulong fd_alloc_tag ( fd_alloc_t * join ); // Positive, 0 indicates NULL join
265 :
266 : /* fd_alloc_malloc_at_least allocates at least sz bytes with alignment
267 : of at least align from the wksp backing the fd_alloc. join is a
268 : current local join to the fd_alloc. align should be an integer power
269 : of 2 or 0.
270 :
271 : An align of 0 indicates to use FD_ALLOC_MALLOC_DEFAULT_ALIGN for the
272 : request alignment. This will be large enough such that
273 : fd_alloc_malloc is conformant with C/C++ alignment specifications
274 : (i.e. can trivially wrap fd_alloc_malloc to use as a drop in
275 : replacement for malloc).
276 :
277 : Small values of align will NOT be rounded up to some minimum (e.g.
278 : allocating lots of 1 byte aligned short strings is fine and
279 : relatively space and time efficient ... the overhead is ~4 bytes per
280 : allocation). fd_alloc is not particularly optimized when align>~sz
281 : and/or large alignments (>~4096B). While large values for align are
282 : supported by fd_alloc_malloc, directly using fd_wksp_alloc is
283 : recommended in such cases.
284 :
285 : If an allocation is "large" (align + sz >~ 64KiB for the current
286 : implementation), it will be handled by fd_wksp_alloc under the hood.
287 : Otherwise, it will be handled by fd_alloc_malloc algorithms (which
288 : are ultimately backed by fd_wksp_alloc). As such, if a small
289 : allocation is "new" (e.g. first allocation of a size around sz, an
290 : allocation that can't be packed near other existing allocations
291 : around that sz, etc), this might also fallback on fd_wksp_alloc.
292 : Typically though, after initial allocation and/or program warmup,
293 : fd_alloc_malloc calls will be a reasonably fast O(1) lockfree.
294 :
295 : Returns a pointer to the allocation in the local address space on
296 : success. Note that this pointer will a wksp laddr. As such, it can
297 : be converted to a gaddr, passed to other threads in other thread
298 : groups, and converted to a wksp laddr in their address spaces, freed
299 : via a join to the fd_alloc in that thread group, persisted beyond the
300 : lifetime of the calling thread, etc.
301 :
302 : Returns NULL on failure (silent to support HPC usage) or when sz is
303 : 0. Reasons for failure include NULL join, invalid align, sz overflow
304 : (sz+align>~2^64), no memory available for request (e.g. workspace has
305 : insufficient room or is too fragmented).
306 :
307 : On return, *max will contain the number actual number of bytes
308 : available at the returned gaddr. On success, this will be at least
309 : sz and it is not guaranteed to be a multiple of align. On failure,
310 : *max will be zero.
311 :
312 : fd_alloc_malloc is a simple wrapper around fd_alloc_malloc_at_least
313 : for use when applications do not care about the actual size of their
314 : allocation. */
315 :
316 : void *
317 : fd_alloc_malloc_at_least( fd_alloc_t * join,
318 : ulong align,
319 : ulong sz,
320 : ulong * max );
321 :
322 : static inline void *
323 : fd_alloc_malloc( fd_alloc_t * join,
324 : ulong align,
325 105141 : ulong sz ) {
326 105141 : ulong max[1];
327 105141 : return fd_alloc_malloc_at_least( join, align, sz, max );
328 105141 : }
329 :
330 : /* fd_alloc_free frees the outstanding allocation whose first byte is
331 : pointed to by laddr in the caller's local address space. join is a
332 : current local join to the fd_alloc. The caller promises laddr was
333 : allocated by the underlying fd_alloc (but not necessarily on the
334 : calling thread or even in this calling process or even by a thread /
335 : process that is still running). Silent for HPC usage (NULL join and
336 : NULL laddr are a no-op).
337 :
338 : Like fd_alloc_malloc, if the allocation was large, this will be
339 : handled by fd_wksp_free under the hood, which is neither lockfree nor
340 : O(1). If the allocation was small, this will typically be lockfree
341 : O(1). It is possible that, if the amount of outstanding small
342 : allocations has reduced significantly, fd_alloc_free on a small
343 : allocation might trigger a fd_wksp_free to free up wksp space for
344 : other usage (including uses not through this fd_alloc).
345 :
346 : (It would be possible to implement this less efficiently in space and
347 : time such that join didn't need to be passed. The current design has
348 : picked efficiency and consistency with other APIs though.)
349 :
350 : Note that this will implicitly optimize the freed memory to be
351 : preferentially reused by the join's concurrency group. Thus the
352 : caller should have at least one join for each concurrency group to
353 : which it might want to return memory for reuse and then call free
354 : with the appropriate join. */
355 :
356 : void
357 : fd_alloc_free( fd_alloc_t * join,
358 : void * laddr );
359 :
360 : /* fd_alloc_compact frees all wksp allocations that are not required
361 : for any outstanding user mallocs (note that fd_alloc_free lazily
362 : returns unused memory from the underlying wksp to accelerate
363 : potential future allocations). join is a current local join to the
364 : alloc. This cannot fail from a user's POV but logs any wonkiness
365 : detected.
366 :
367 : fd_alloc_compact has the property that it minimizes the amount of
368 : wksp utilization for the set of outstanding user mallocs when there
369 : is no other concurrent alloc usage. As such, if there is no
370 : concurrent alloc usage _and_ there are no outstanding mallocs, on
371 : return, all wksp allocations (except the user provided memory region
372 : that holds the state of the allocator) will be returned to the wksp.
373 : This can be then be used to reset the alloc and/or implement robust
374 : leak detection at program teardown.
375 :
376 : This function is safe to use even when there is other concurrent
377 : alloc usage. It t is best effort in that case; it is not guaranteed
378 : that there was some point in time between call and return when the
379 : wksp utilization was minimized for the contemporaneous set of
380 : outstanding user mallocs.
381 :
382 : Also note that this function is not O(1) and the fd_alloc_free lazy
383 : return mechanism does not permit unbounded growth of unreturned free
384 : memory. So this should be used sparingly at best (e.g. in teardown
385 : leak detection or rare non-critical path housekeeping). */
386 :
387 : void
388 : fd_alloc_compact( fd_alloc_t * join );
389 :
390 : /* fd_alloc_is_empty returns 1 if the alloc has no outstanding mallocs
391 : and 0 otherwise. join is a current local join to the alloc. NULL
392 : join silently returns 0.
393 :
394 : Important safety tip! This should only be run when there is no
395 : concurrent alloc usage. It is not algorithmically fast. This might
396 : temporarily lock the underlying wksp while running and might call
397 : fd_alloc_compact under the hood. It assumes the user provided memory
398 : region holding the alloc state is contained within a region returned
399 : by a single fd_wksp_alloc call (it would be hard to create an alloc
400 : where that isn't the case). It assumes alloc is the only user of the
401 : alloc's tag in the wksp. As such this should be used carefully and
402 : sparingly (e.g. at program teardown for leak detection).
403 :
404 : It will "work" with concurrent alloc usage in that the return value
405 : will be in 0 or 1 and it will not corrupt the alloc or underlying
406 : wksp. But the return value will not be well-defined (e.g. it is not
407 : guaranteed to correspond the state of the alloc at some point in time
408 : between when this was called and it when it returned). */
409 :
410 : int
411 : fd_alloc_is_empty( fd_alloc_t * join );
412 :
413 : /* fd_alloc_max_expand computes a recommended value to use for max when
414 : needing to dynamically resize structures. The below is very subtle
415 : and fixes a lot of pervasive errors with dynamic resizing
416 : implementations (either explicit or implicitly done under the hood).
417 : It doesn't fix the main error with dynamic resizing though. The main
418 : error being deciding to use anything with dynamic resizing (outside
419 : of, maybe, initialization at program start).
420 :
421 : Consider an all too common case of an initially too small dynamically
422 : sized array that is getting elements appended to it one at a time.
423 : E.g. without proper error trapping, overflow handling and the like:
424 :
425 : foo_t * foo = NULL;
426 : ulong foo_max = 0UL;
427 : ulong foo_cnt = 0UL;
428 : ulong foo_delta = ... some reasonable increment ...;
429 :
430 : while( ... still appending ... ) {
431 :
432 : if( foo_cnt==foo_max ) { // Need to resize
433 : foo_max += foo_delta;
434 : foo = (foo_t *)realloc( foo, foo_max*sizeof(foo_t) );
435 : }
436 :
437 : foo[ foo_cnt++ ] = ... next val to append ...;
438 : }
439 :
440 : This is terrible theoretically and practically and yet it looks like
441 : it does everything right.
442 :
443 : The theoretical issue is that, if the realloc can't be done in-place
444 : (which is more common than most realize ... depends on how the
445 : underlying realloc implementation details), the memory will have to
446 : be copied from the original location to the resized location with a
447 : typical cost of final_foo_max/2 -> O(final_foo_cnt). Because max is
448 : increased by fixed absolute amount each resizing, there will be
449 : final_foo_cnt/foo_delta -> O(final_foo_cnt) such resizes.
450 :
451 : That is, we've accidentally written a method that has a slow
452 : O(final_foo_cnt^2) worst case even though it superficially looks like
453 : a fast O(final_foo_cnt) method. Worse still, this behavior might
454 : appear suddenly in previously fine code if realloc implementation
455 : changes or, yet again worse, because a larger problem size was used
456 : in the wild than used in testing.
457 :
458 : The practical issue is realloc is painfully slow and it gets worse
459 : for large sizes because large sizes are usually handled by operating
460 : system calls (e.g. mmap or sbrk under the hood). We've also now done
461 : O(final_foo_cnt) slow operating system calls in our already
462 : algorithmically slow O(final_foo_cnt^2) worst case algorithm that
463 : still superficially looks like a fast O(final_foo_cnt). (And throw
464 : in the other issues with malloc described above about TLB and NUMA
465 : inefficiency, the gaslighting the kernel does "clearly the crash had
466 : nothing to do with the OOM killer shooting processes randomly in the
467 : head, your program probably just had a bug ... yeah ... that's the
468 : ticket" ... for good measure).
469 :
470 : We can get an algorithmic improvement if we change the above to
471 : increase max by a fixed relative amount each resize. Since we are
472 : dealing with integers though, we should make sure that we always
473 : increase max by some minimal amount. Instead of:
474 :
475 : foo_max += foo_delta;
476 :
477 : we can use something like:
478 :
479 : foo_max = fd_ulong_max( foo_max*gamma, foo_max + foo_delta );
480 :
481 : If gamma>1, asymptotically, we will only do O(lg cnt) resizes.
482 : Theoretically, we've gone from an O(final_foo_cnt^2) worst case
483 : method to an O(final_foo_cnt lg final_foo_cnt) worst case method. It
484 : is still irritating that it looks superficially like a fast
485 : O(final_foo_cnt) method but this is amongst the many reasons why
486 : dynamic resizing is gross and wrong and to be avoided when possible.
487 :
488 : The larger gamma is, the smaller the leading coefficient is in the
489 : O(final_foo_cnt lg final_foo_cnt) and thus the better this
490 : approximates the fast O(final_foo_cnt) method that it superficially
491 : seems to be. But using a very large gamma is clearly absurd. There
492 : are obvious memory footprint limitations for large sizes and each
493 : resize would trigger an ever larger amount of OS work. This raises
494 : the question:
495 :
496 : What is the optimal gamma?
497 :
498 : Suppose we have worst case realloc implementation (alloc new memory,
499 : copy, free old memory and, when no free fragment large enough is
500 : available, use sbrk like semantics to get memory from the O/S ...
501 : not uncommon as it is trivial to implement and often works "good
502 : enough" in lab settings). It always works out-of-place and it always
503 : just appends new memory at the end of the heap when the heap runs out
504 : of space. Then, while doing the above, asymptotically, we expect the
505 : heap to look something like:
506 :
507 : other allocs | M foo_t alloc | padding free | unmapped
508 :
509 : On the next resize, we'd request space for M gamma foo_t. Since
510 : there are no free fragments large enough for this, realloc is going
511 : to have to map some space from the operating system, copy our memory
512 : into it and free up the original space for reuse. Post resize, we
513 : expect the heap to look like:
514 :
515 : other allocs | M foo_t free | M gamma foo_t alloc | padding free | unmapped
516 :
517 : On the next resize, we'd request space for M gamma^2 foo_t. This
518 : also can't fit within any free fragment above for gamma>1 (noting
519 : that, in this worst case realloc, we have to allocate the memory
520 : first and then copy and then free the old). So we end up with:
521 :
522 : other allocs | M (1+gamma) foo_t free | M gamma^2 foo_t alloc | padding free | unmapped
523 :
524 : On the next resize, we'd request space for M gamma^3 foo_t. If we
525 : have:
526 :
527 : gamma^3 < 1 + gamma
528 :
529 : we can fit this request in the hole left by the two previous resizes.
530 : This implies we need gamma<1.32471... where the magic number is the
531 : positive real root of:
532 :
533 : x^3 - x - 1 = 0
534 :
535 : This is the "silver ratio" in the sense that the positive real root
536 : of x^2 - x - 1 is the "golden ratio" of 1.61803... (Note that the
537 : golden ratio would apply if we had a more sophisticated realloc under
538 : the hood that aliased the resized allocation over top the M foo_t
539 : free and the existing M gamma foo_t alloc and then moved the aliased
540 : memory. Presumably such a sophisticated realloc would also just
541 : append to the end of the heap without any move or copy at all but
542 : that eventually leads to a question about how much overallocation and
543 : operating system overhead is acceptable on resize discussed further
544 : below).
545 :
546 : After a resize with something near but smaller than the silver ratio,
547 : we expect the heap to look like:
548 :
549 : other allocs | M gamma^3 foo_t alloc | padding free | unmapped
550 :
551 : which is back to where we started, except with a larger allocation.
552 :
553 : We don't want to be doing floating point math in methods like this.
554 : Noting that gamma = 1 + 1/4 + 1/16 = 1.3125 is very close to the
555 : silver yields the very practical:
556 :
557 : new_max = fd_ulong_max( max + (max>>2) + (max>>4), max + delta );
558 :
559 : This is friendly with even the worst case realloc behaviors under the
560 : hood. It also works will in similar situations with linear storage
561 : media (e.g. disk storage). The limit also means that the worst case
562 : overallocation for cases like the above at most ~32% and on average
563 : ~16%. This is a comparable level of overallocation that already
564 : happens under the hood (e.g. on par with the level of waste that
565 : naturally happens in allocators for metadata and padding and much
566 : less waste than the golden ratio or larger growth rates if we
567 : dubiously trust that the realloc method under the hood).
568 :
569 : In cases where we might need to resize to even larger than this, we
570 : just resize to the caller's requested amount and keep our fingers
571 : crossed that the caller realized by this time dynamic resizing was a
572 : mistake and is allocating the correct size this time.
573 :
574 : Adding arithmetic overflow handling then yields the below.
575 :
576 : TL;DR Example usage (ignoring size calculation overflow handling and
577 : allocation error trapping):
578 :
579 : ulong foo_cnt = 0UL;
580 : ulong foo_max = ... good estimate the actual amount needed;
581 : ulong foo_delta = ... reasonable minimum resizing increment;
582 : foo_t * foo = (foo_t *)malloc( foo_max*sizeof(foo_t) );
583 :
584 : while( ... still appending ... ) {
585 :
586 : if( FD_UNLIKELY( foo_cnt==foo_max ) ) {
587 : foo_max = fd_alloc_max_expand( foo_max, foo_delta, foo_cnt + foo_delta );
588 : foo = (foo_t *)realloc( foo, foo_max*sizeof(foo_t) );
589 : }
590 :
591 : foo[ foo_cnt++ ] = ... next val to append ...;
592 :
593 : }
594 :
595 : ... at this point
596 : ... - foo has foo_cnt elements initialized
597 : ... - foo has room for foo_max elements total
598 : ... - when the initial foo_max estimate was correct or oversized,
599 : ... no resizing was done
600 : ... - when the initial foo_max was undersized, asymptotically,
601 : ... foo_max is at most ~32% larger worst case (~16% larger
602 : ... average case) than foo_cnt with at most O(lg foo_cnt)
603 : ... reallocs needed to initialize foo.
604 : ... - the resizing test branch is highly predictable
605 : ... - the underlying heap shouldn't be too fragmented or
606 : ... overallocated regardless of the allocator implementation
607 : ... details. */
608 :
609 : FD_FN_CONST static inline ulong /* new_max, new_max>=max(needed,max), if max<ULONG_MAX, will be new_max>max */
610 : fd_alloc_max_expand( ulong max,
611 : ulong delta, /* Assumed > 0 */
612 3000000 : ulong needed ) {
613 3000000 : ulong t0 = max + delta; t0 = fd_ulong_if( t0<max, ULONG_MAX, t0 ); /* Handle overflow */
614 3000000 : ulong t1 = max + (max>>2) + (max>>4); t1 = fd_ulong_if( t1<max, ULONG_MAX, t1 ); /* Handle overflow */
615 3000000 : return fd_ulong_max( fd_ulong_max( t0, t1 ), needed );
616 3000000 : }
617 :
618 : /* fd_alloc_vtable is the virtual function table implementing fd_valloc
619 : for fd_alloc. */
620 :
621 : extern const fd_valloc_vtable_t fd_alloc_vtable;
622 :
623 : /* fd_alloc_virtual returns an abstract handle to the fd_alloc join.
624 : Valid for lifetime of join. */
625 :
626 : FD_FN_CONST static inline fd_valloc_t
627 0 : fd_alloc_virtual( fd_alloc_t * alloc ) {
628 0 : fd_valloc_t valloc = { alloc, &fd_alloc_vtable };
629 0 : return valloc;
630 0 : }
631 :
632 : FD_PROTOTYPES_END
633 :
634 : #endif /* HEADER_fd_src_util_alloc_fd_alloc_h */
|