LCOV - code coverage report
Current view: top level - discof/store - fd_store.h (source / functions) Hit Total Coverage
Test: cov.lcov Lines: 31 38 81.6 %
Date: 2025-08-01 05:13:22 Functions: 21 51 41.2 %

          Line data    Source code
       1             : #ifndef HEADER_fd_src_discof_repair_fd_store_h
       2             : #define HEADER_fd_src_discof_repair_fd_store_h
       3             : 
       4             : /* fd_store is a high-performance in-memory storage engine for shreds as
       5             :    they are received from the network.
       6             : 
       7             :    The elements in the store themselves are not shreds, but FEC set
       8             :    payloads.  Briefly, FEC sets are groupings of shreds that encode the
       9             :    same data but provide redundancy and security.  While Firedancer
      10             :    receives individual shreds over Turbine / Repair (the relevant Solana
      11             :    protocols), a lot of Firedancer's validation of those shreds can only
      12             :    be done at the FEC set boundary.  Also, there are potential future
      13             :    protocol changes to encode all shreds in a FEC set so that they can
      14             :    only be decoded once the entire FEC set is received ("all-coding").
      15             : 
      16             :    Therefore, the FEC set becomes the logical unit for the store vs.
      17             :    individual shreds.  Replay will only ever attempt replay of a FEC set
      18             :    so there is no need to store at the granularity of individual shreds.
      19             :    The store is essentially a mapping of merkle root->FEC set payload,
      20             :    in which the merkle root was produced by feeding every shred in a FEC
      21             :    set into a merkle tree.  This uniquely identifies a FEC set, because
      22             :    if any of the shreds change, the merkle root changes.
      23             : 
      24             :    Shreds are coalesced and inserted into the store as bytes.  The max
      25             :    bytes per FEC set is currently variable-length but capped at 63985.
      26             :    In the future this will be fixed to 31840 bytes, when FEC sets are
      27             :    enforced to always be 32 shreds.
      28             : 
      29             :    The API is designed to be used inter-process (ie. concurrent joins
      30             :    from multiple tiles) and also supports concurrent access, but callers
      31             :    must interface with the store through a read-write lock (fd_rwlock).
      32             : 
      33             :    EQUIVOCATION
      34             : 
      35             :    There is a protocol violation called equivocation (also known as
      36             :    "duplicates") that results in two or more blocks for the same slot.
      37             :    The actual conflict occurs at the shred level: a leader produces two
      38             :    or more shreds for the same slot at the same index, with different
      39             :    data payloads.  The result will be two different FEC sets for the
      40             :    same "logical" FEC set based on (slot, fec_set_idx).
      41             : 
      42             :    Unfortunately a naive keying scheme such as (slot, fec_set_idx) is
      43             :    insufficient as a result of equivocation.  As mentioned earlier,
      44             :    Store instead uses the merkle root as the key for its FEC set
      45             :    elements, at the cost of some keying performance and complexity.
      46             : 
      47             :    ARCHITECTURE
      48             : 
      49             :    In the Firedancer topology, Shred tile writes to store and Replay
      50             :    tile reads from store.  Replay tile only reads from the store after
      51             :    Repair tile (which is downstream of Shred) has notified Replay that a
      52             :    FEC set is ready.  Shred's writes are append-only and Replay is
      53             :    responsible for publishing once it is done consuming (signaled by a
      54             :    new Tower root).
      55             : 
      56             :    Shred (writes) -> Repair (notif) -> Replay (reads, publishes)
      57             : 
      58             :    ORDERING
      59             : 
      60             :    In the above architecture, it is guaranteed that Repair will deliver
      61             :    FEC sets to Replay in replay order.  That is, the parent FEC set will
      62             :    always be delivered before the child (see fd_fec_chainer).  Note
      63             :    however concurrent forks are delivered in the order they are received
      64             :    from the network ie. arbitrary order.
      65             : 
      66             :    CONCURRENCY
      67             : 
      68             :    It is possible to design Store access in a way that enables parallel
      69             :    writes and minimizes lock contention between readers and writers.  In
      70             :    this design, writers (Shred tiles) only hold the read lock during
      71             :    their access.  The reader (Replay tile) also holds the read lock
      72             :    during its access, but given both Shred and Replay will be taking out
      73             :    read locks, they will not contend.  "Write lock" is now a bit of a
      74             :    misnomer, because the only operation that will actually need the
      75             :    write lock is publish, which will be done by Replay.
      76             : 
      77             :    For parallel writes, the Store's hash function should be carefully
      78             :    constructed to mirror the Shred tiles' (writers') round-robin hashing
      79             :    to ensure the property that any slot collision in the underlying
      80             :    fd_map_chain will always occur on the same Shred tile.  So if two
      81             :    different hashes point to the same slot, it is guaranteed to be the
      82             :    same Shred tile processing both keys (and similarly, a hash collision
      83             :    would also be processed by the same tile).  This prevents a data race
      84             :    in which multiple Shred tiles attempt to write to the same map slot.
      85             : 
      86             :    For reducing lock contention, the Store should 1. only be read by
      87             :    Replay after Repair has notified Replay it is time to read, and 2. be
      88             :    written to by Shred tile(s) in append-only fashion, so Shred never
      89             :    modifies or removes what it has written.  Store is backed by a
      90             :    fd_map_chain, which is not thread-safe generally, but in the case of
      91             :    a slot collision where Replay tile is reading an element and Shred
      92             :    tile writes a new element to the same slot, that new element is
      93             :    always appended to the end of the hash chain within that slot (which
      94             :    modifies the second-to-last element in the hash chain's `.next` field
      95             :    but does not touch application data).  So this can enable lock-free
      96             :    concurrent reads and writes.  Note Replay tile (reader) should always
      97             :    use fd_store_query_const to ensure the underlying fd_map_chain is not
      98             :    modified during querying.
      99             : 
     100             :    The exception to the above is publishing.  Publishing requires the
     101             :    write lock to ensure the tile doing the publishing (Replay tile) is
     102             :    the only thing accessing the store.  Publishing happens at most once
     103             :    per slot, so it is a relatively infrequent Store access compared to
     104             :    FEC queries and inserts. */
     105             : 
     106             : #include "../../flamenco/fd_rwlock.h"
     107             : #include "../../flamenco/types/fd_types_custom.h"
     108             : 
     109             : /* FD_STORE_USE_HANDHOLDING:  Define this to non-zero at compile time
     110             :    to turn on additional runtime checks and logging. */
     111             : 
     112             : #ifndef FD_STORE_USE_HANDHOLDING
     113             : #define FD_STORE_USE_HANDHOLDING 1
     114             : #endif
     115             : 
     116             : /* FD_STORE_ALIGN specifies the alignment needed for store.  ALIGN is
     117             :    double x86 cache line to mitigate various kinds of false sharing (eg.
     118             :    ACLPF adjacent cache line prefetch). */
     119             : 
     120             : #define FD_STORE_ALIGN (128UL)
     121             : 
     122             : /* FD_STORE_MAGIC is a magic number for detecting store corruption. */
     123             : 
     124           3 : #define FD_STORE_MAGIC (0xf17eda2ce75702e0UL) /* firedancer store version 0 */
     125             : 
     126             : /* FD_STORE_DATA_MAX defines a constant for the maximum size of a FEC
     127             :    set payload.  The value is computed from the maximum number
     128             :    of shreds in a FEC set * the payload bytes per shred.
     129             : 
     130             :    67 shreds per FEC set * 955 payloads per shred = 63985 bytes max. */
     131             : 
     132             : #define FD_STORE_DATA_MAX (63985UL) /* FIXME fixed-32 */
     133             : 
     134             : /* fd_store_fec describes a store element (FEC set).  The pointer fields
     135             :    implement a left-child, right-sibling n-ary tree. */
     136             : 
     137             : struct __attribute__((aligned(128UL))) fd_store_fec {
     138             : 
     139             :   /* Keys */
     140             : 
     141             :   fd_hash_t key; /* map key, merkle root of the FEC set */
     142             :   fd_hash_t cmr; /* parent's map key, chained merkle root of the FEC set */
     143             : 
     144             :   /* Pointers */
     145             : 
     146             :   ulong next;    /* reserved for internal use by fd_pool, fd_map_chain, orphan list */
     147             :   ulong parent;  /* pool idx of the parent */
     148             :   ulong child;   /* pool idx of the left-child */
     149             :   ulong sibling; /* pool idx of the right-sibling */
     150             : 
     151             :   /* Data */
     152             : 
     153             :   ulong data_sz;                 /* FIXME fixed-32. sz of the FEC set payload, guaranteed < FD_STORE_DATA_MAX */
     154             :   uchar data[FD_STORE_DATA_MAX]; /* FEC set payload = coalesced data shreds (byte array) */
     155             : };
     156             : typedef struct fd_store_fec fd_store_fec_t;
     157             : 
     158             : #define POOL_NAME fd_store_pool
     159           6 : #define POOL_T    fd_store_fec_t
     160             : #include "../../util/tmpl/fd_pool.c"
     161             : 
     162             : #define MAP_NAME               fd_store_map
     163             : #define MAP_ELE_T              fd_store_fec_t
     164             : #define MAP_KEY_T              fd_hash_t
     165         111 : #define MAP_KEY_EQ(k0,k1)      (!memcmp((k0),(k1), sizeof(fd_hash_t)))
     166         153 : #define MAP_KEY_HASH(key,seed) (fd_hash((seed),(key),sizeof(fd_hash_t)))
     167             : #include "../../util/tmpl/fd_map_chain.c"
     168             : 
     169             : struct fd_store {
     170             :   ulong magic;       /* ==FD_STORE_MAGIC */
     171             :   ulong seed;        /* seed for various hashing function used under the hood, arbitrary */
     172             :   ulong root;        /* pool idx of the root */
     173             :   ulong store_gaddr; /* wksp gaddr of store in the backing wksp, non-zero gaddr */
     174             :   ulong pool_gaddr;  /* wksp gaddr of pool of store elements (fd_store_fec) */
     175             :   ulong map_gaddr;   /* wksp gaddr of map of fd_store_key->fd_store_fec */
     176             : 
     177             :   fd_rwlock_t lock; /* rwlock for concurrent access */
     178             : };
     179             : typedef struct fd_store fd_store_t;
     180             : 
     181             : FD_PROTOTYPES_BEGIN
     182             : 
     183             : /* Constructors */
     184             : 
     185             : /* fd_store_{align,footprint} return the required alignment and
     186             :    footprint of a memory region suitable for use as store with up to
     187             :    fec_max elements. */
     188             : 
     189             : FD_FN_CONST static inline ulong
     190          36 : fd_store_align( void ) {
     191          36 :   return alignof(fd_store_t);
     192          36 : }
     193             : 
     194             : FD_FN_CONST static inline ulong
     195           6 : fd_store_footprint( ulong fec_max ) {
     196           6 :   return FD_LAYOUT_FINI(
     197           6 :     FD_LAYOUT_APPEND(
     198           6 :     FD_LAYOUT_APPEND(
     199           6 :     FD_LAYOUT_APPEND(
     200           6 :     FD_LAYOUT_INIT,
     201           6 :       alignof(fd_store_t),   sizeof(fd_store_t)                 ),
     202           6 :       fd_store_pool_align(), fd_store_pool_footprint( fec_max ) ),
     203           6 :       fd_store_map_align(),  fd_store_map_footprint( fec_max )  ),
     204           6 :     fd_store_align() );
     205           6 : }
     206             : 
     207             : /* fd_store_new formats an unused memory region for use as a store.
     208             :    mem is a non-NULL pointer to this region in the local address space
     209             :    with the required footprint and alignment. */
     210             : 
     211             : void *
     212             : fd_store_new( void * shmem, ulong fec_max, ulong seed );
     213             : 
     214             : /* fd_store_join joins the caller to the store.  store points to the
     215             :    first byte of the memory region backing the store in the caller's
     216             :    address space.
     217             : 
     218             :    Returns a pointer in the local address space to store on success. */
     219             : 
     220             : fd_store_t *
     221             : fd_store_join( void * store );
     222             : 
     223             : /* fd_store_leave leaves a current local join.  Returns a pointer to the
     224             :    underlying shared memory region on success and NULL on failure (logs
     225             :    details).  Reasons for failure include store is NULL. */
     226             : 
     227             : void *
     228             : fd_store_leave( fd_store_t const * store );
     229             : 
     230             : /* fd_store_delete unformats a memory region used as a store.
     231             :    Assumes only the nobody is joined to the region.  Returns a
     232             :    pointer to the underlying shared memory region or NULL if used
     233             :    obviously in error (e.g. store is obviously not a store ... logs
     234             :    details).  The ownership of the memory region is transferred to the
     235             :    caller. */
     236             : 
     237             : void *
     238             : fd_store_delete( void * store );
     239             : 
     240             : /* Accessors */
     241             : 
     242             : /* fd_store_wksp returns the local join to the wksp backing the store.
     243             :    The lifetime of the returned pointer is at least as long as the
     244             :    lifetime of the local join.  Assumes store is a current local
     245             :    join. */
     246             : 
     247             : FD_FN_PURE static inline fd_wksp_t *
     248         342 : fd_store_wksp( fd_store_t const * store ) {
     249         342 :   return (fd_wksp_t *)( ( (ulong)store ) - store->store_gaddr );
     250         342 : }
     251             : 
     252             : /* fd_store_{pool,map} returns a pointer in the caller's address space
     253             :    to the corresponding store field.  const versions for each are also
     254             :    provided. */
     255             : 
     256         141 : FD_FN_PURE static inline fd_store_fec_t       * fd_store_pool        ( fd_store_t       * store ) { return fd_wksp_laddr_fast     ( fd_store_wksp      ( store ), store->pool_gaddr ); }
     257          78 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_pool_const  ( fd_store_t const * store ) { return fd_wksp_laddr_fast     ( fd_store_wksp      ( store ), store->pool_gaddr ); }
     258          45 : FD_FN_PURE static inline fd_store_map_t       * fd_store_map         ( fd_store_t       * store ) { return fd_wksp_laddr_fast     ( fd_store_wksp      ( store ), store->map_gaddr  ); }
     259          78 : FD_FN_PURE static inline fd_store_map_t const * fd_store_map_const   ( fd_store_t const * store ) { return fd_wksp_laddr_fast     ( fd_store_wksp      ( store ), store->map_gaddr  ); }
     260           6 : FD_FN_PURE static inline fd_store_fec_t       * fd_store_root        ( fd_store_t       * store ) { return fd_store_pool_ele      ( fd_store_pool      ( store ), store->root       ); }
     261           0 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_root_const  ( fd_store_t const * store ) { return fd_store_pool_ele_const( fd_store_pool_const( store ), store->root       ); }
     262             : 
     263             : /* fd_store_{parent,child,sibling} returns a pointer in the caller's
     264             :    address space to the corresponding {parent,left-child,right-sibling}
     265             :    of fec.  Assumes store is a current local join and fec is a valid
     266             :    pointer to a pool element inside store.  const versions for each are
     267             :    also provided. */
     268             : 
     269          24 : FD_FN_PURE static inline fd_store_fec_t       * fd_store_parent       ( fd_store_t       * store, fd_store_fec_t const * fec ) { return fd_store_pool_ele      ( fd_store_pool      ( store ), fec->parent  ); }
     270           0 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_parent_const ( fd_store_t const * store, fd_store_fec_t const * fec ) { return fd_store_pool_ele_const( fd_store_pool_const( store ), fec->parent  ); }
     271          18 : FD_FN_PURE static inline fd_store_fec_t       * fd_store_child        ( fd_store_t       * store, fd_store_fec_t const * fec ) { return fd_store_pool_ele      ( fd_store_pool      ( store ), fec->child   ); }
     272           0 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_child_const  ( fd_store_t const * store, fd_store_fec_t const * fec ) { return fd_store_pool_ele_const( fd_store_pool_const( store ), fec->child   ); }
     273           3 : FD_FN_PURE static inline fd_store_fec_t       * fd_store_sibling      ( fd_store_t       * store, fd_store_fec_t const * fec ) { return fd_store_pool_ele      ( fd_store_pool      ( store ), fec->sibling ); }
     274           0 : FD_FN_PURE static inline fd_store_fec_t const * fd_store_sibling_const( fd_store_t const * store, fd_store_fec_t const * fec ) { return fd_store_pool_ele_const( fd_store_pool_const( store ), fec->sibling ); }
     275             : 
     276             : /* fd_store_{query,query_const} queries the FEC set keyed by merkle.
     277             :    Returns a pointer to the fd_store_fec_t if found, NULL otherwise.
     278             : 
     279             :    Assumes caller has already acquired a read lock via fd_rwlock_read.
     280             : 
     281             :    IMPORTANT SAFETY TIP!  Caller should only call fd_rwlock_unread when
     282             :    they no longer retain interest in the returned pointer. */
     283             : 
     284             : FD_FN_PURE static inline fd_store_fec_t *
     285           0 : fd_store_query( fd_store_t * store, fd_hash_t * merkle_root ) {
     286           0 :    return fd_store_map_ele_query( fd_store_map( store ), merkle_root, NULL, fd_store_pool( store ) );
     287           0 : }
     288             : 
     289             : FD_FN_PURE static inline fd_store_fec_t const *
     290          78 : fd_store_query_const( fd_store_t const * store, fd_hash_t * merkle_root ) {
     291             :    return fd_store_map_ele_query_const( fd_store_map_const( store ), merkle_root, NULL, fd_store_pool_const( store ) );
     292          78 : }
     293             : 
     294             : /* Operations */
     295             : 
     296             : /* fd_store_insert inserts a new FEC set keyed by merkle.  Returns the
     297             :    newly inserted fd_store_fec_t.  Copies data and data_sz into its
     298             :    corresponding fields and copies at most FD_STORE_DATA_MAX bytes from
     299             :    data (if handholding is enabled, it will abort the caller with a
     300             :    descriptive error message if data_sz is too large).
     301             : 
     302             :    Assumes store is a current local join and has space for another
     303             :    element.  Does additional checks when handholding is enabled and
     304             :    fails insertion (returning NULL) if checks fail.  If this is the
     305             :    first element being inserted into store, the store root will be set
     306             :    to this newly inserted element.
     307             : 
     308             :    Assumes caller has already acquired the appropriate lock via
     309             :    fd_rwlock_read or fd_rwlock_write.  See top-level documentation for
     310             :    why this operation may only require a read lock and not a write.
     311             : 
     312             :    IMPORTANT SAFETY TIP!  Caller should only call fd_rwlock_unread when
     313             :    they no longer retain interest in the returned pointer. */
     314             : 
     315             : fd_store_fec_t *
     316             : fd_store_insert( fd_store_t * store,
     317             :                  fd_hash_t  * merkle_root,
     318             :                  uchar      * data,
     319             :                  ulong        data_sz /* FIXME fixed-32 */ );
     320             : 
     321             : /* fd_store_link queries for and links the child keyed by merkle_root to
     322             :    parent keyed by chained_merkle_root.  Returns a pointer to the child.
     323             :    Assumes merkle_root and chained_merkle_root are both non-NULL and key
     324             :    elements currently in the store.  Does not require the lock. */
     325             : 
     326             : fd_store_fec_t *
     327             : fd_store_link( fd_store_t * store,
     328             :                fd_hash_t  * merkle_root,
     329             :                fd_hash_t  * chained_merkle_root );
     330             : 
     331             : /* fd_store_publish publishes merkle_root as the new store root, pruning
     332             :    all elements across branches that do not descend from the new root.
     333             :    Returns a pointer to the new root.  Assumes merkle_root is in the
     334             :    store and connected to the root (if handholding is enabled does
     335             :    additional checks and returns NULL on error).  Note pruning can
     336             :    result in store elements greater than the new root slot being
     337             :    removed.  These are elements that become orphaned as a result of the
     338             :    common ancestor with the new root being removed (the entire branch
     339             :    ie. fork is pruned).
     340             : 
     341             :    For example, in the tree (preorder) [0 1 2 4 3 5 6] publishing 2 will
     342             :    result in [0 1] being removed given they are ancestors of 2, and
     343             :    removing 1 will leave [3 5 6] orphaned and also removed.
     344             : 
     345             :    Assumes caller has already acquired a write lock via fd_rwlock_write.
     346             : 
     347             :    IMPORTANT SAFETY TIP!  Caller should only call fd_rwlock_unwrite when
     348             :    they no longer retain interest in the returned pointer. */
     349             : 
     350             : fd_store_fec_t *
     351             : fd_store_publish( fd_store_t * store,
     352             :                   fd_hash_t  * merkle_root );
     353             : 
     354             : FD_PROTOTYPES_END
     355             : 
     356             : #endif /* HEADER_fd_src_discof_repair_fd_store_h */

Generated by: LCOV version 1.14