Line data Source code
1 : #ifndef HEADER_fd_src_disco_store_fd_store_h
2 : #define HEADER_fd_src_disco_store_fd_store_h
3 :
4 : /* fd_store is a high-performance in-memory storage engine for shreds as
5 : they are received from the network.
6 :
7 : The elements in the store themselves are not shreds, but FEC set
8 : payloads. Briefly, FEC sets are groupings of shreds that encode the
9 : same data but provide redundancy and security. While Firedancer
10 : receives individual shreds over Turbine / Repair (the relevant Solana
11 : protocols), a lot of Firedancer's validation of those shreds can only
12 : be done at the FEC set boundary. Also, there are potential future
13 : protocol changes to encode all shreds in a FEC set so that they can
14 : only be decoded once the entire FEC set is received ("all-coding").
15 :
16 : Therefore, the FEC set becomes the logical unit for the store vs.
17 : individual shreds. Replay will only ever attempt replay of a FEC set
18 : so there is no need to store at the granularity of individual shreds.
19 : The store is essentially a mapping of merkle root->FEC set payload,
20 : in which the merkle root was produced by feeding every shred in a FEC
21 : set into a merkle tree. This uniquely identifies a FEC set, because
22 : if any of the shreds change, the merkle root changes.
23 :
24 : Shreds are coalesced and inserted into the store as bytes. The max
25 : bytes per FEC set is currently variable-length but capped at 63985.
26 : In the future this will be fixed to 31840 bytes, when FEC sets are
27 : enforced to always be 32 shreds.
28 :
29 : The shared memory used by a store instance is within a workspace such
30 : that it is also persistent and remotely inspectable. Store is
31 : designed to be used inter-process (allowing concurrent joins from
32 : multiple tiles), relocated in memory (via wksp operations), and
33 : accessed concurrently (managing conflicts with a lock).
34 :
35 : EQUIVOCATION
36 :
37 : There is a protocol violation called equivocation (also known as
38 : "duplicates") that results in two or more blocks for the same slot.
39 : The actual conflict occurs at the shred level: a leader produces two
40 : or more shreds for the same slot at the same index, with different
41 : data payloads. The result will be two different FEC sets for the
42 : same "logical" FEC set based on (slot, fec_set_idx).
43 :
44 : Unfortunately a naive keying scheme such as (slot, fec_set_idx) is
45 : insufficient as a result of equivocation. As mentioned earlier,
46 : Store instead uses the merkle root as the key for its FEC set
47 : elements, at the cost of some keying performance and complexity.
48 :
49 : ARCHITECTURE
50 :
51 : In the Firedancer topology, Shred tile writes to store and Replay
52 : tile reads from store. Replay tile only reads from the store after
53 : Repair tile (which is downstream of Shred) has notified Replay that a
54 : FEC set is ready. Shred's writes are append-only and Replay is
55 : responsible for publishing once it is done consuming (signaled by a
56 : new Tower root).
57 :
58 : Shred (writes) -> Repair (notif) -> Replay (reads, publishes)
59 :
60 : ORDERING
61 :
62 : In the above architecture, Repair delivers FEC sets to Replay in
63 : partial order. Any given fork will be delivered in-order, but
64 : concurrent forks can be delivered in arbitrary order. Another way to
65 : phrase this is a parent FEC set will always be delivered before the
66 : child (see fd_reasm).
67 :
68 : CONCURRENCY
69 :
70 : It is possible to design Store access in a way that enables parallel
71 : writes and minimizes lock contention between readers and writers.
72 : Store contains a fd_rwlock (read-write lock), but the name is a bit
73 : of a misnomer because writes can actually be concurrent and the only
74 : operation that will actually need the write lock is publish, which
75 : will be done by Replay. It is more appropriate to describe as an
76 : exclusive-shared access lock.
77 :
78 : In this design, writers (Shred tiles) hold the shared lock during
79 : their access. The reader (Replay tile) also holds the shared lock
80 : during its access, and given both Shred and Replay will be taking out
81 : shared locks, they will not contend.
82 :
83 : For parallel writes, the Store's hash function is carefully designed
84 : to partition the keyspace so that the same Shred tile always writes
85 : to the same map slots. This ensures map collisions always happen on
86 : the same Shred tile and cannot happen across tiles. Specifically, if
87 : two different FEC sets hash to the same slot, it is guaranteed that
88 : to be the same Shred tile processing both those FEC sets. This
89 : prevents a data race in which multiple Shred tiles (each with a
90 : handle to the shared lock) write to the same map slot.
91 :
92 : The hash function is defined as follows:
93 : ```
94 : #define MAP_KEY_HASH(key,seed) ((ulong)key->mr.ul[0]%seed + (key)->part*seed)
95 : ```
96 : where `key` is a key type that includes the merkle root (32 bytes)
97 : and the partition index (8 bytes) that is equivalent to the Shred
98 : tile index doing the insertion. seed, on initialization, is the
99 : number of chains/buckets in the map_chain divided by the number of
100 : partitions. In effect, seed is the size of each partition. For
101 : example, if the map_chain is sized to 1024, and there are 4 shred
102 : tiles, then the seed is 1024/4 = 256. Then the map key hash can
103 : bound the chain index of each partition as such: shred tile 0 will
104 : write to chains 0-255, shred tile 1 will write to chains 256-511,
105 : shred tile 2 will write to chains 512-767, and shred tile 3 will
106 : write to chains 768-1023, without overlap. The merkle root is a 32
107 : byte SHA-256 hash, so we can expect a fairly uniform distribution of
108 : hash values even after truncating to the first 8 bytes, without
109 : needing to introduce more randomness. Thus we can repurpose the
110 : `seed` argument to be the number of partitions.
111 :
112 : Essentially, this allows for limited single-producer single-consumer
113 : (SPSC) concurrency, where the producer is a given Shred tile and the
114 : consumer is Replay tile. The SPSC concurrency is limited in that the
115 : Store should 1. only be read by Replay after Repair has notified
116 : Replay it is time to read (ie. Shred has finished writing), and 2. be
117 : written to by Shred(s) in append-only fashion, so Shred never
118 : modifies or removes from the map. Store is backed by fd_map_chain,
119 : which is not thread-safe generally, but does support this particular
120 : SPSC concurrency model in cases where the consumer is guaranteed to
121 : be lagging the producer.
122 :
123 : Analyzing fd_map_chain in gory detail, in the case of a map collision
124 : where Replay tile is reading an element and Shred tile writes a new
125 : element to the same map slot, that new element is prepended to the
126 : hash chain within that slot (which modifies what the head of the
127 : chain points to as well as the now-previous head in the hash chain's
128 : `.next` field, but does not touch application data). With fencing
129 : enabled (MAP_INSERT_FENCE), it is guaranteed the consumer either
130 : reads the head before or after the update. If it reads before, that
131 : is safe, it would just check the key (if no match, iterate down the
132 : chain etc.) If it reads after, it is also safe because the new
133 : element is guaranteed to be before the old element in the chain, so
134 : it would just do one more iteration. Note the consumer should always
135 : use fd_store_query_const to ensure the underlying fd_map_chain is not
136 : modified during querying.
137 :
138 : The exception to the above is publishing. Publishing requires
139 : exclusive access because it involves removing from fd_map_chain,
140 : which is not safe for shared access. So the Replay tile should take
141 : out the exclusive lock. Publishing happens at most once per slot, so
142 : it is a relatively infrequent Store access compared to FEC queries
143 : and inserts (which is good because it is also the most expensive). */
144 :
145 : #include "../../flamenco/fd_rwlock.h"
146 : #include "../../flamenco/types/fd_types_custom.h"
147 : #include "../../util/hist/fd_histf.h"
148 :
149 : /* FD_STORE_USE_HANDHOLDING: Define this to non-zero at compile time
150 : to turn on additional runtime checks and logging. */
151 :
152 : #ifndef FD_STORE_USE_HANDHOLDING
153 : #define FD_STORE_USE_HANDHOLDING 1
154 : #endif
155 :
156 : /* FD_STORE_ALIGN specifies the alignment needed for store. ALIGN is
157 : double x86 cache line to mitigate various kinds of false sharing (eg.
158 : ACLPF adjacent cache line prefetch). */
159 :
160 : #define FD_STORE_ALIGN (128UL)
161 :
162 : /* FD_STORE_MAGIC is a magic number for detecting store corruption. */
163 :
164 9 : #define FD_STORE_MAGIC (0xf17eda2ce75702e0UL) /* firedancer store version 0 */
165 :
166 : /* FD_STORE_DATA_MAX defines a constant for the maximum size of a FEC
167 : set payload. The value is computed from the maximum number
168 : of shreds in a FEC set * the payload bytes per shred.
169 :
170 : 67 shreds per FEC set * 955 payloads per shred = 63985 bytes max. */
171 :
172 0 : #define FD_STORE_DATA_MAX (63985UL) /* TODO fixed-32 */
173 :
174 : /* fd_store_fec describes a store element (FEC set). The pointer fields
175 : implement a left-child, right-sibling n-ary tree. */
176 :
177 : struct __attribute__((packed)) fd_store_key {
178 : fd_hash_t mr;
179 : ulong part; /* partition index of the inserter */
180 : };
181 : typedef struct fd_store_key fd_store_key_t;
182 :
183 : struct __attribute__((aligned(FD_STORE_ALIGN))) fd_store_fec {
184 :
185 : /* Keys */
186 :
187 : fd_store_key_t key; /* map key, merkle root of the FEC set + a partition index */
188 : fd_hash_t cmr; /* parent's map key, chained merkle root of the FEC set */
189 :
190 : /* Pointers. These are internal to the store and callers should not
191 : interface with them directly. */
192 :
193 : ulong next; /* reserved for internal use by fd_pool, fd_map_chain */
194 : ulong parent; /* pool idx of the parent */
195 : ulong child; /* pool idx of the left-child */
196 : ulong sibling; /* pool idx of the right-sibling */
197 :
198 : /* Data */
199 :
200 : uint block_offs[ 32 ]; /* block_offs[ i ] is the total size of data shreds [0, i] */
201 : ulong data_sz; /* TODO fixed-32. sz of the FEC set payload, guaranteed < FD_STORE_DATA_MAX */
202 : uchar data[FD_STORE_DATA_MAX]; /* FEC set payload = coalesced data shreds (byte array) */
203 : };
204 : typedef struct fd_store_fec fd_store_fec_t;
205 :
206 : #define POOL_NAME fd_store_pool
207 507 : #define POOL_ELE_T fd_store_fec_t
208 : #include "../../util/tmpl/fd_pool_para.c"
209 :
210 : #define MAP_NAME fd_store_map
211 : #define MAP_ELE_T fd_store_fec_t
212 : #define MAP_KEY_T fd_store_key_t
213 51 : #define MAP_KEY key
214 540 : #define MAP_KEY_EQ(k0,k1) (!memcmp((k0),(k1), sizeof(fd_hash_t)))
215 477 : #define MAP_KEY_HASH(key,seed) ((ulong)key->mr.ul[0]%seed + (key)->part*seed) /* See documentation above for the hash function */
216 : #define MAP_INSERT_FENCE 1
217 : #include "../../util/tmpl/fd_map_chain.c"
218 :
219 : struct fd_store {
220 : ulong magic; /* ==FD_STORE_MAGIC */
221 : ulong fec_max; /* max number of FEC sets that can be stored */
222 : ulong part_cnt; /* number of partitions, also the number of writers */
223 : ulong root; /* pool idx of the root */
224 : ulong slot0; /* FIXME this hack is needed until the block_id is in the bank (manifest) */
225 : ulong store_gaddr; /* wksp gaddr of store in the backing wksp, non-zero gaddr */
226 : ulong map_gaddr; /* wksp gaddr of map of fd_store_key->fd_store_fec */
227 : ulong pool_mem_gaddr; /* wksp gaddr of shmem_t object in pool_para */
228 : ulong pool_ele_gaddr; /* wksp gaddr of first ele_t object in pool_para */
229 : fd_rwlock_t lock; /* rwlock for concurrent access */
230 : };
231 : typedef struct fd_store fd_store_t;
232 :
233 : FD_PROTOTYPES_BEGIN
234 :
235 : /* Constructors */
236 :
237 : /* fd_store_{align,footprint} return the required alignment and
238 : footprint of a memory region suitable for use as store with up to
239 : fec_max elements. fec_max is an integer power-of-two. */
240 :
241 : FD_FN_CONST static inline ulong
242 102 : fd_store_align( void ) {
243 102 : return alignof(fd_store_t);
244 102 : }
245 :
246 : FD_FN_CONST static inline ulong
247 21 : fd_store_footprint( ulong fec_max ) {
248 21 : if( FD_UNLIKELY( !fd_ulong_is_pow2( fec_max ) ) ) return 0UL;
249 21 : return FD_LAYOUT_FINI(
250 21 : FD_LAYOUT_APPEND(
251 21 : FD_LAYOUT_APPEND(
252 21 : FD_LAYOUT_APPEND(
253 21 : FD_LAYOUT_APPEND(
254 21 : FD_LAYOUT_INIT,
255 21 : alignof(fd_store_t), sizeof(fd_store_t) ),
256 21 : fd_store_map_align(), fd_store_map_footprint( fec_max ) ),
257 21 : fd_store_pool_align(), fd_store_pool_footprint() ),
258 21 : alignof(fd_store_fec_t), sizeof(fd_store_fec_t)*fec_max ),
259 21 : fd_store_align() );
260 21 : }
261 :
262 : /* fd_store_new formats an unused memory region for use as a store.
263 : mem is a non-NULL pointer to this region in the local address space
264 : with the required footprint and alignment. fec_max is an integer
265 : power-of-two. */
266 :
267 : void *
268 : fd_store_new( void * shmem, ulong fec_max, ulong part_cnt );
269 :
270 : /* fd_store_join joins the caller to the store. store points to the
271 : first byte of the memory region backing the store in the caller's
272 : address space.
273 :
274 : Returns a pointer in the local address space to store on success. */
275 :
276 : fd_store_t *
277 : fd_store_join( void * store );
278 :
279 : /* fd_store_leave leaves a current local join. Returns a pointer to the
280 : underlying shared memory region on success and NULL on failure (logs
281 : details). Reasons for failure include store is NULL. */
282 :
283 : void *
284 : fd_store_leave( fd_store_t const * store );
285 :
286 : /* fd_store_delete unformats a memory region used as a store.
287 : Assumes only the nobody is joined to the region. Returns a
288 : pointer to the underlying shared memory region or NULL if used
289 : obviously in error (e.g. store is obviously not a store ... logs
290 : details). The ownership of the memory region is transferred to the
291 : caller. */
292 :
293 : void *
294 : fd_store_delete( void * shstore );
295 :
296 : /* Accessors */
297 :
298 : /* fd_store_wksp returns the local join to the wksp backing the store.
299 : The lifetime of the returned pointer is at least as long as the
300 : lifetime of the local join. Assumes store is a current local
301 : join. */
302 :
303 : FD_FN_PURE static inline fd_wksp_t *
304 1734 : fd_store_wksp( fd_store_t const * store ) {
305 1734 : return (fd_wksp_t *)( ( (ulong)store ) - store->store_gaddr );
306 1734 : }
307 :
308 : /* fd_store_pool computes and returns a local join handle to the pool_para. */
309 666 : FD_FN_PURE static inline fd_store_pool_t fd_store_pool( fd_store_t const * store ) {
310 666 : return (fd_store_pool_t){ .pool = fd_wksp_laddr_fast( fd_store_wksp( store ), store->pool_mem_gaddr ),
311 666 : .ele = fd_wksp_laddr_fast( fd_store_wksp( store ), store->pool_ele_gaddr ),
312 666 : .ele_max = store->fec_max };
313 666 : }
314 :
315 : /* fd_store_{map,map_const,fec0,fec0_const,root,root_const} returns a
316 : pointer in the caller's address space to the corresponding store
317 : field. const versions for each are also provided. */
318 :
319 168 : FD_FN_PURE static inline fd_store_map_t * fd_store_map ( fd_store_t * store ) { return fd_wksp_laddr_fast( fd_store_wksp( store ), store->map_gaddr ); }
320 234 : FD_FN_PURE static inline fd_store_map_t const * fd_store_map_const ( fd_store_t const * store ) { return fd_wksp_laddr_fast( fd_store_wksp( store ), store->map_gaddr ); }
321 165 : FD_FN_PURE static inline fd_store_fec_t * fd_store_fec0 ( fd_store_t * store ) { fd_store_pool_t pool = fd_store_pool( store ); return pool.ele; }
322 234 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_fec0_const( fd_store_t const * store ) { fd_store_pool_t pool = fd_store_pool( store ); return pool.ele; }
323 15 : FD_FN_PURE static inline fd_store_fec_t * fd_store_root ( fd_store_t * store ) { fd_store_pool_t pool = fd_store_pool( store ); return fd_store_pool_ele ( &pool, store->root); }
324 3 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_root_const( fd_store_t const * store ) { fd_store_pool_t pool = fd_store_pool( store ); return fd_store_pool_ele_const( &pool, store->root); }
325 :
326 : /* fd_store_{parent,child,sibling} returns a pointer in the caller's
327 : address space to the corresponding {parent,left-child,right-sibling}
328 : of fec. Assumes store is a current local join and fec is a valid
329 : pointer to a pool element inside store. const versions for each are
330 : also provided. */
331 :
332 0 : FD_FN_PURE static inline fd_store_fec_t * fd_store_parent ( fd_store_t * store, fd_store_fec_t const * fec ) { fd_store_pool_t pool = fd_store_pool( store ); return fd_store_pool_ele ( &pool, fec->parent ); }
333 0 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_parent_const ( fd_store_t const * store, fd_store_fec_t const * fec ) { fd_store_pool_t pool = fd_store_pool( store ); return fd_store_pool_ele_const( &pool, fec->parent ); }
334 0 : FD_FN_PURE static inline fd_store_fec_t * fd_store_child ( fd_store_t * store, fd_store_fec_t const * fec ) { fd_store_pool_t pool = fd_store_pool( store ); return fd_store_pool_ele ( &pool, fec->child ); }
335 0 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_child_const ( fd_store_t const * store, fd_store_fec_t const * fec ) { fd_store_pool_t pool = fd_store_pool( store ); return fd_store_pool_ele_const( &pool, fec->child ); }
336 0 : FD_FN_PURE static inline fd_store_fec_t * fd_store_sibling ( fd_store_t * store, fd_store_fec_t const * fec ) { fd_store_pool_t pool = fd_store_pool( store ); return fd_store_pool_ele ( &pool, fec->sibling ); }
337 0 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_sibling_const( fd_store_t const * store, fd_store_fec_t const * fec ) { fd_store_pool_t pool = fd_store_pool( store ); return fd_store_pool_ele_const( &pool, fec->sibling ); }
338 :
339 : /* fd_store_{shacq, shrel, exacq, exrel} acquires / releases the shared
340 : / exclusive lock. Callers should typically use the
341 : FD_STORE_SHARED_LOCK and FD_STORE_EXCLUSIVE_LOCK macros to acquire
342 : and release the lock instead of calling these functions directly. */
343 :
344 0 : static inline void fd_store_shacq( fd_store_t * store ) { fd_rwlock_read ( &store->lock ); }
345 0 : static inline void fd_store_shrel( fd_store_t * store ) { fd_rwlock_unread ( &store->lock ); }
346 0 : static inline void fd_store_exacq( fd_store_t * store ) { fd_rwlock_write ( &store->lock ); }
347 0 : static inline void fd_store_exrel( fd_store_t * store ) { fd_rwlock_unwrite( &store->lock ); }
348 :
349 : struct fd_store_lock_ctx {
350 : fd_store_t * store_;
351 : long * acq_start;
352 : long * acq_end;
353 : long * work_end;
354 : };
355 :
356 : static inline void
357 0 : fd_store_shared_lock_cleanup( struct fd_store_lock_ctx * ctx ) { *(ctx->work_end) = fd_tickcount(); fd_store_shrel( ctx->store_ ); }
358 :
359 0 : #define FD_STORE_SHARED_LOCK(store, shacq_start, shacq_end, shrel_end) do { \
360 0 : struct fd_store_lock_ctx lock_ctx __attribute__((cleanup(fd_store_shared_lock_cleanup))) = \
361 0 : { .store_ = (store), .work_end = &(shrel_end), .acq_start = &(shacq_start), .acq_end = &(shacq_end) }; \
362 0 : shacq_start = fd_tickcount(); \
363 0 : fd_store_shacq( lock_ctx.store_ ); \
364 0 : shacq_end = fd_tickcount(); \
365 0 : do
366 :
367 0 : #define FD_STORE_SHARED_LOCK_END while(0); } while(0)
368 :
369 : static inline void
370 0 : fd_store_exclusive_lock_cleanup( struct fd_store_lock_ctx * ctx ) { *(ctx->work_end) = fd_tickcount(); fd_store_exrel( ctx->store_ ); }
371 :
372 0 : #define FD_STORE_EXCLUSIVE_LOCK(store, exacq_start, exacq_end, exrel_end) do { \
373 0 : struct fd_store_lock_ctx lock_ctx __attribute__((cleanup(fd_store_exclusive_lock_cleanup))) = \
374 0 : { .store_ = (store), .work_end = &(exrel_end), .acq_start = &(exacq_start), .acq_end = &(exacq_end) }; \
375 0 : exacq_start = fd_tickcount(); \
376 0 : fd_store_exacq( lock_ctx.store_ ); \
377 0 : exacq_end = fd_tickcount(); \
378 0 : do
379 0 : #define FD_STORE_EXCLUSIVE_LOCK_END while(0); } while(0)
380 :
381 : struct fd_store_histf {
382 : fd_histf_t * histf;
383 : long ts;
384 : };
385 : typedef struct fd_store_histf fd_store_histf_t;
386 :
387 : static inline void
388 0 : fd_store_histf( fd_store_histf_t * ctx ) {
389 0 : fd_histf_sample( ctx->histf, (ulong)fd_long_max( fd_tickcount() - ctx->ts, 0UL ) );
390 0 : }
391 :
392 : #define FD_STORE_HISTF_BEGIN(metric) do { \
393 : fd_store_histf_t _ctx __attribute__((cleanup(fd_store_histf))) = { .histf = (metric), .ts = fd_tickcount() }; \
394 : do
395 :
396 : #define FD_STORE_HISTF_END while(0); } while(0)
397 :
398 : /* fd_store_{query,query_const} queries the FEC set keyed by merkle.
399 : Returns a pointer to the fd_store_fec_t if found, NULL otherwise.
400 :
401 : Both the const and non-const versions are concurrency safe; as in
402 : they avoid using the non-const map_ele_query that reorders the chain.
403 : fd_store_query gets around this by calling map idx_query_const, which
404 : does not reorder the chain, and then indexes directly into the pool
405 : and returns a non-const pointer to the element of interest.
406 :
407 : Assumes caller has acquired the shared lock via fd_store_shacq.
408 :
409 : IMPORTANT SAFETY TIP! Caller should only call fd_store_shrel when
410 : they no longer retain interest in the returned pointer. */
411 :
412 : FD_FN_PURE static inline fd_store_fec_t *
413 93 : fd_store_query( fd_store_t * store, fd_hash_t const * merkle_root ) {
414 93 : fd_store_key_t key = { .mr = *merkle_root, .part = UINT_MAX };
415 93 : fd_store_pool_t pool = fd_store_pool( store );
416 102 : for( uint i = 0; i < store->part_cnt; i++ ) {
417 102 : key.part = i;
418 102 : ulong idx = fd_store_map_idx_query_const( fd_store_map( store ), &key, ULONG_MAX, fd_store_fec0( store ) );
419 102 : if( idx != ULONG_MAX ) return fd_store_pool_ele( &pool, idx );
420 102 : }
421 0 : return NULL;
422 93 : }
423 :
424 : FD_FN_PURE static inline fd_store_fec_t const *
425 177 : fd_store_query_const( fd_store_t const * store, fd_hash_t * merkle_root ) {
426 177 : fd_store_key_t key = { .mr = *merkle_root, .part = UINT_MAX };
427 285 : for( uint i = 0; i < store->part_cnt; i++ ) {
428 234 : key.part = i;
429 234 : fd_store_fec_t const * fec = fd_store_map_ele_query_const( fd_store_map_const( store ), &key, NULL, fd_store_fec0_const( store ) );
430 234 : if( fec ) return fec;
431 234 : }
432 51 : return NULL;
433 177 : }
434 :
435 : /* Operations */
436 :
437 : /* fd_store_insert inserts a new FEC set keyed by merkle. Returns the
438 : newly inserted fd_store_fec_t. Each fd_store_fec_t can hold at most
439 : FD_STORE_DATA_MAX bytes of data, and caller is responsible for
440 : copying into the region.
441 :
442 : Assumes store is a current local join and has space for another
443 : element. Does additional checks when handholding is enabled and
444 : fails insertion (returning NULL) if checks fail. If this is the
445 : first element being inserted into store, the store root will be set
446 : to this newly inserted element.
447 :
448 : Assumes caller has acquired either the shared or exclusive lock via
449 : fd_store_shacq or fd_store_exacq. See top-level documentation for
450 : why this operation may only require a shared lock vs. exclusive.
451 :
452 : IMPORTANT SAFETY TIP! Caller should only call fd_store_shrel or
453 : fd_store_exrel when they no longer retain interest in the returned
454 : pointer. */
455 :
456 : fd_store_fec_t *
457 : fd_store_insert( fd_store_t * store,
458 : ulong part_idx,
459 : fd_hash_t * merkle_root );
460 :
461 : /* fd_store_link queries for and links the child keyed by merkle_root to
462 : parent keyed by chained_merkle_root. Returns a pointer to the child.
463 : Assumes merkle_root and chained_merkle_root are both non-NULL and key
464 : elements currently in the store.
465 :
466 : Assumes caller has acquired the shared lock via fd_store_shacq.
467 :
468 : IMPORTANT SAFETY TIP! Caller should only call fd_store_shrel when
469 : they no longer retain interest in the returned pointer. */
470 :
471 : fd_store_fec_t *
472 : fd_store_link( fd_store_t * store,
473 : fd_hash_t * merkle_root,
474 : fd_hash_t * chained_merkle_root );
475 :
476 : /* fd_store_publish publishes merkle_root as the new store root, pruning
477 : all elements across branches that do not descend from the new root.
478 : Returns a pointer to the new root. Assumes merkle_root is in the
479 : store and connected to the root (if handholding is enabled does
480 : additional checks and returns NULL on error). Note pruning can
481 : result in store elements greater than the new root slot being
482 : removed. These are elements that become orphaned as a result of the
483 : common ancestor with the new root being removed (the entire branch
484 : ie. fork is pruned).
485 :
486 : For example, in the tree (preorder) [0 1 2 4 3 5 6] publishing 2 will
487 : result in [0 1] being removed given they are ancestors of 2, and
488 : removing 1 will leave [3 5 6] orphaned and also removed.
489 :
490 : Assumes caller has acquired the exclusive lock via fd_store_exacq.
491 :
492 : IMPORTANT SAFETY TIP! Caller should only call fd_store_exrel when
493 : they no longer retain interest in the returned pointer. */
494 :
495 : fd_store_fec_t *
496 : fd_store_publish( fd_store_t * store,
497 : fd_hash_t const * merkle_root );
498 :
499 : /* fd_store_clear clears the store. All elements are removed from the
500 : map and released back into the pool. Does not zero-out fields.
501 :
502 : IMPORTANT SAFETY TIP! the store must be non-empty. */
503 :
504 : fd_store_t *
505 : fd_store_clear( fd_store_t * store );
506 :
507 : /* TODO fd_store_verify */
508 :
509 : /* fd_store_print pretty-prints a formatted store as a tree structure.
510 : Printing begins from the store root and each node is the FEC set key
511 : (merkle root hash). */
512 :
513 : int
514 : fd_store_verify( fd_store_t * store );
515 :
516 : void
517 : fd_store_print( fd_store_t const * store );
518 :
519 : FD_PROTOTYPES_END
520 :
521 : #endif /* HEADER_fd_src_disco_store_fd_store_h */
|