LCOV - cov.lcov - flamenco/runtime/fd

LCOV - code coverage report

Current view:	top level - flamenco/runtime - fd_txncache.h (source / functions)		Hit	Total	Coverage
Test:	cov.lcov	Lines:	0	4	0.0 %
Date:	2025-08-05 05:04:49	Functions:	0	0	-

          Line data    Source code

       1             : #ifndef HEADER_fd_src_flamenco_runtime_txncache_h
       2             : #define HEADER_fd_src_flamenco_runtime_txncache_h
       3             : 
       4             : #include "../fd_flamenco_base.h"
       5             : #include "../types/fd_types.h"
       6             : #include <math.h>
       7             : 
       8             : /* A txn cache is a concurrent map for saving the result (status) of
       9             :    transactions that have executed.  In addition to supporting fast
      10             :    concurrent insertion and query of transaction results, the txn
      11             :    cache supports serialization of its state into a standard bincode
      12             :    format for serving the state to other nodes via. snapshot responses,
      13             :    and then restoring snapshots produced by other nodes.
      14             : 
      15             :    The txn cache is designed to do two operations fast,
      16             : 
      17             :      (a) Insertion.  Insertion is done by both the leader pipeline and
      18             :          the replay stage, with an insertion for every transaction that
      19             :          is executed, potentially over 1M per second.
      20             : 
      21             :      (b) Query.  The leader pipeline queries the status of transactions
      22             :          before executing them to ensure that it does not execute
      23             :          anything twice.
      24             : 
      25             :    Both of these operations are concurrent and lockless, assuming there
      26             :    are no other (non-insert/query) operations occuring on the txn cache.
      27             :    Most other operations lock the entire structure and will prevent both
      28             :    insertion and query from proceeding.
      29             : 
      30             :    The txn cache is both CPU and memory sensitive.  A transaction result
      31             :    is 1 byte, and the stored transaction hashes are 20 bytes, so
      32             :    without any overhead just storing 150 slots of transactions in a flat
      33             :    array would require
      34             : 
      35             :       524,288 * 150 * 21 ~ 1.5GiB
      36             : 
      37             :    Of memory.  But, we can't query, insert, and remove quickly from a
      38             :    flat array.  We make a few trade-offs to achieve good performance
      39             :    without bloating memory completely.  In particular:
      40             : 
      41             :      - The transactions to be queried are stored in a structure that
      42             :        looks like a
      43             : 
      44             :          hash_map<blockhash, hash_map<txnhash, vec<(slot, status)>>>
      45             : 
      46             :        The top level hash_map is a probed hash map, and the txnhash map
      47             :        is a chained hash map, where the items come from a pool of pages
      48             :        of transactions.  We use pages of transactions to support fast
      49             :        removal of a blockhash from the top level map, we need to return
      50             :        at most 4,800 pages back to the pool rather than 78,643,200
      51             :        individual transactions.
      52             : 
      53             :        This adds additional memory overhead, a blockhash with only one
      54             :        transaction in it will still consume a full page (16,384) of
      55             :        transactions of memory.  Allocating a new transaction page to a
      56             :        chained map is rare (once every 16,384 inserts) so the cost
      57             :        amortizes to zero.  Creating a blockhash happens once per
      58             :        blockhash, so also amortizes to zero, the only operation we care
      59             :        about is then the simple insert case with an unfull transaction
      60             :        page into an existing blockhash.  This can be done with two
      61             :        compare-and-swaps,
      62             : 
      63             :          // 1. Find the blockhash in the probed hash map.
      64             : 
      65             :             let by_blockhash = hash_map[ blockhash ];
      66             : 
      67             :          // 2. Request space for the transaction from the current page
      68             : 
      69             :             let page = by_blockhash.pages.back();
      70             :             let idx = page.used.compare_and_swap( current, current+1 );
      71             : 
      72             :          // 3. Write the transaction into the page and the map
      73             : 
      74             :             page.txns[ idx ] = txn;
      75             :             page.txns[ idx ].next = by_blockhash.txns[ txnhash ].idx;
      76             :             by_blockhash.txns[ txnhash ].head.compare_and_swap( current, idx );
      77             : 
      78             :        Removal of a blockhash from this structure is simple because it
      79             :        does not need to be concurrent (the caller will only remove
      80             :        between executing slots, so there's no contention and it can take
      81             :        a full write lock).  We take a write lock, restore the pages in
      82             :        the blockhash to the pool, and then mark the space in the
      83             :        hash_map as empty.  This is fast since there are at most 4,800
      84             :        pages to restore and restoration is a simple memcpy.
      85             : 
      86             :      - Another structure is required to support serialization of
      87             :        snapshots from the cache.  Serialization must produce a binary
      88             :        structure that encodes essentially:
      89             : 
      90             :          hash_map<slot, hash_map<blockhash, vec<(txnhash, txn_result)>>>
      91             : 
      92             :        For each slot that is rooted.  Observe that we can't reuse the
      93             :        structure used for queries, since it's not grouped by slot, we
      94             :        would have to iterate all of the transactions to do this grouping
      95             :        which is not feasible.
      96             : 
      97             :        We can't add the slot to that index, since then query would not
      98             :        be fast.  Queries are just against (blockhash, txnhash) pairs,
      99             :        they don't know which slot might have included it.
     100             : 
     101             :        So we need a second index for this information.  The index is
     102             :        going to line up with what we described above, the root hash map
     103             :        will be probed, and the nested hash map will be chained.  The
     104             :        vector of results will also use a pool of pages.
     105             : 
     106             :        In this case the page pool is for a different reason: we know a
     107             :        sensible upper bound for the number of transactions that could be
     108             :        alive in the entire cache at any point in time, but we don't know
     109             :        quite how to bound it for a single slot or blockhash.  Well, we
     110             :        do: the bound is 524,288 per (slot, blockhash) pair but we would
     111             :        need to allocate all that memory up front.  Instead, we bound the
     112             :        total number of slots and transactions per slot, assume the
     113             :        transactions must be distributed somehow within the blockhashes,
     114             :        and then give some overhead for unoccupied page entries.
     115             : 
     116             :        The insertion process for this structure is similar to the one
     117             :        above, except the vector is not a chained map but just a regular
     118             :        vector.
     119             : 
     120             :          // 1. Find the (slot, blockhash) in the probed hash map.
     121             : 
     122             :             let by_slot_blockhash = hash_map[ slot ][ blockhash ];
     123             : 
     124             :          // 2. Request space for the transaction from the current page
     125             : 
     126             :             let page = by_slot_blockhash.pages.back();
     127             :             let idx = page.used.fetch_add( 1 );
     128             : 
     129             :          // 3. Write the transaction into the page and the map
     130             : 
     131             :             page.txns[ idx ] = txn;
     132             :             by_slot_blockhash.txn_count += 1;
     133             : 
     134             :        The final increment is OK even with concurrent inserts because
     135             :        no reader will try to observe the txn_count until the slot is
     136             :        rooted, at which point nothing could be being inserted into it
     137             :        and the txn_count will be finalized. */
     138             : 
     139             : #include "../fd_flamenco_base.h"
     140             : 
     141           0 : #define FD_TXNCACHE_ALIGN (128UL)
     142             : 
     143           0 : #define FD_TXNCACHE_MAGIC (0xF17EDA2CE5CAC4E0) /* FIREDANCE SCACHE V0 */
     144             : 
     145             : /* The duration of history to keep around in the txn cache before aging
     146             :    it out.  This must be at least 150, otherwise we could forget about
     147             :    transactions for blockhashes that are still valid to use, and let
     148             :    duplicate transactions make it into a produced block.
     149             : 
     150             :    Beyond 150, any value is valid.  The value used here, 300 comes from
     151             :    Agave, and corresponds to roughly 2 minutes of history assuming there
     152             :    are no forks, with an extra 12.8 seconds of history for slots that
     153             :    are in progress but have not rooted yet.  On a production validator
     154             :    without RPC support, the value should probably be configurable and
     155             :    always set to strictly 150. */
     156             : 
     157           0 : #define FD_TXNCACHE_DEFAULT_MAX_ROOTED_SLOTS (300UL)
     158             : 
     159             : /* A note on live slots ...
     160             : 
     161             :    The maximum number of live slots is the sum of the rooted and
     162             :    unrooted slots.  The rooted slots are explicitly capped at 300 for
     163             :    this structure (implying we keep around 2 minutes of history
     164             :    around for queries and snapshots).
     165             : 
     166             :    For the unrooted slots, we must root at least one slot in an epoch
     167             :    for an epoch transition to occur successfully to the next one, so
     168             :    assuming every slot is unconfirmed for some reason, and the prior
     169             :    epoch was rooted at the first slot in the epoch, and the next epoch
     170             :    is rooted at the last slot, there could be
     171             : 
     172             :       432,000 + 432,000 - 31 = 863,969
     173             : 
     174             :    Live slots on the validator.  This is clearly impractical as each
     175             :    bank consumes a lof of memory to store slot state, so the
     176             :    validator would crash long before this.
     177             : 
     178             :    For now we just pick a number: 2048, based on running on mainnet.
     179             :    We take into account that apart from the 450 recent blockhashes,
     180             :    there are also nonce accounts used as blockhashes by transactions
     181             :    and they contribute significantly to entries being filled up.
     182             : 
     183             :    TODO: This number might be unbounded, more testing required with
     184             :    mainnet before deciding an actual number. */
     185             : 
     186             : #define FD_TXNCACHE_DEFAULT_MAX_LIVE_SLOTS (2048UL)
     187             : 
     188             : /* The Solana consensus protocol has an implied restriction on the number
     189             :    transactions in a slot.  A slot might have 50,000,000+ CUs,
     190             :    but a transaction requires at least around 1500 CUs, so there could
     191             :    be at most 33,333 transactions in a slot.
     192             : 
     193             :    For Firedancer, we respect this limit when running in production, but
     194             :    for development and preformance tuning this limit is removed, and
     195             :    instead we will produce and accept at most 524,288 transactions per
     196             :    slot.  This is chosen arbitrarily and works out to around ~1.3M TPS,
     197             :    however such a value does not exist in the consensus protocol. */
     198             : 
     199           0 : #define FD_TXNCACHE_DEFAULT_MAX_TRANSACTIONS_PER_SLOT (524288UL)
     200             : 
     201             : struct fd_txncache_insert {
     202             :   uchar const * blockhash;
     203             :   uchar const * txnhash;
     204             :   ulong         slot;
     205             :   uchar const * result;
     206             : };
     207             : 
     208             : typedef struct fd_txncache_insert fd_txncache_insert_t;
     209             : 
     210             : struct fd_txncache_query {
     211             :   uchar const * blockhash;
     212             :   uchar const * txnhash;
     213             : };
     214             : 
     215             : typedef struct fd_txncache_query fd_txncache_query_t;
     216             : 
     217             : struct fd_txncache_snapshot_entry {
     218             :    ulong slot;
     219             :    uchar blockhash[ 32 ];
     220             :    uchar txnhash[ 20 ];
     221             :    ulong txn_idx;
     222             :    uchar result;
     223             : };
     224             : 
     225             : typedef struct fd_txncache_snapshot_entry fd_txncache_snapshot_entry_t;
     226             : 
     227             : /* Forward declare opaque handle */
     228             : struct fd_txncache_private;
     229             : typedef struct fd_txncache_private fd_txncache_t;
     230             : 
     231             : FD_PROTOTYPES_BEGIN
     232             : 
     233             : /* fd_txncache_{align,footprint} give the needed alignment and
     234             :    footprint of a memory region suitable to hold a txn cache.
     235             :    fd_txncache_{align,footprint} return the same value as
     236             :    FD_TXNCACHE_{ALIGN,FOOTPRINT}.
     237             : 
     238             :    fd_txncache_new formats memory region with suitable alignment and
     239             :    footprint suitable for holding a txn cache.  Assumes shmem points
     240             :    on the caller to the first byte of the memory region owned by the
     241             :    caller to use.  Returns shmem on success and NULL on failure (logs
     242             :    details).  The memory region will be owned by the state on successful
     243             :    return.  The caller is not joined on return.
     244             : 
     245             :    fd_txncache_join joins the caller to a txn cache. Assumes shtc points
     246             :    to the first byte of the memory region holding the state.  Returns a
     247             :    local handle to the join on success (this is not necessarily a simple
     248             :    cast of the address) and NULL on failure (logs details).
     249             : 
     250             :    fd_txncache_leave leaves the caller's current local join to a txn
     251             :    cache.  Returns a pointer to the memory region holding the state on
     252             :    success (this is not necessarily a simple cast of the address) and
     253             :    NULL on failure (logs details).  The caller is not joined on
     254             :    successful return.
     255             : 
     256             :    fd_txncache_delete unformats a memory region that holds a txn cache.
     257             :    Assumes shtc points on the caller to the first byte of the memory
     258             :    region holding the state and that nobody is joined.  Returns a
     259             :    pointer to the memory region on success and NULL on failure (logs
     260             :    details).  The caller has ownership of the memory region on
     261             :    successful return. */
     262             : 
     263             : FD_FN_CONST ulong
     264             : fd_txncache_align( void );
     265             : 
     266             : FD_FN_CONST ulong
     267             : fd_txncache_footprint( ulong max_rooted_slots,
     268             :                        ulong max_live_slots,
     269             :                        ulong max_txn_per_slot );
     270             : 
     271             : void *
     272             : fd_txncache_new( void * shmem,
     273             :                  ulong  max_rooted_slots,
     274             :                  ulong  max_live_slots,
     275             :                  ulong  max_txn_per_slot );
     276             : 
     277             : fd_txncache_t *
     278             : fd_txncache_join( void * shtc );
     279             : 
     280             : void *
     281             : fd_txncache_leave( fd_txncache_t * tc );
     282             : 
     283             : void *
     284             : fd_txncache_delete( void * shtc );
     285             : 
     286             : /* fd_txncache_register_root registers a root slot in a txn cache.  Only
     287             :    the provided limit of roots (typically 300) can exist in the txn
     288             :    cache at once, after which the oldest roots will be purged.
     289             : 
     290             :    Purging a root means it cannot be served in a snapshot response,
     291             :    although transactions in the root may or may not still be present.
     292             :    Transaction status is removed once all roots referencing the
     293             :    blockhash of the transaction are removed from the txn cache.
     294             : 
     295             :    This is neither cheap or expensive, it will pause all insertion and
     296             :    query operations but only momentarily until any old slots can be
     297             :    purged from the cache. */
     298             : 
     299             : void
     300             : fd_txncache_register_root_slot( fd_txncache_t * tc,
     301             :                                 ulong           slot );
     302             : 
     303             : /* fd_txncache_root_slots returns the list of live slots currently
     304             :    tracked by the txn cache.  There will be at most max_root_slots
     305             :    slots, which will be written into the provided out_slots.  It is
     306             :    assumed tc points to a txn cache and out_slots has space for at least
     307             :    max_root_slots results.  If there are less than max_root_slots slots
     308             :    in the cache, the front part of out_slots will be filled in, and all
     309             :    the remaining slots will be set to ULONG_MAX.
     310             : 
     311             :    This is a fast operation, but it will lock the whole structure and
     312             :    cause a temporary pause in insert and query operations. */
     313             : 
     314             : void
     315             : fd_txncache_root_slots( fd_txncache_t * tc,
     316             :                         ulong *         out_slots );
     317             : 
     318             : /* fd_txncache_snapshot writes the current state of a txn cache into a
     319             :    binary format suitable for serving to other nodes via snapshot
     320             :    responses.  The write function is called in a streaming fashion with
     321             :    the binary data, the size of the data, and the ctx pointer provided
     322             :    to this function.  The write function should return 0 on success and
     323             :    -1 on failure, this function will propgate a failure to write back to
     324             :    the caller immediately, so this function also returns 0 on success
     325             :    and -1 on failure.
     326             : 
     327             :    IMPORTANT!  THIS ASSUMES THERE ARE NO CONCURRENT INSERTS OCCURING ON
     328             :    THE TXN CACHE AT THE ROOT SLOTS DURING SNAPSHOTTING.  OTHERWISE THE
     329             :    SNAPSHOT MIGHT BE NOT CONTAIN ALL OF THE DATA, ALTHOUGH IT WILL NOT
     330             :    CAUSE CORRUPTION.  THIS IS ASSUMED OK BECAUSE YOU CANNOT MODIFY A
     331             :    ROOTED SLOT.
     332             : 
     333             :    This is a cheap operation and will not cause any pause in insertion
     334             :    or query operations. */
     335             : 
     336             : int
     337             : fd_txncache_snapshot( fd_txncache_t * tc,
     338             :                       void *          ctx,
     339             :                       int ( * write )( uchar const * data, ulong data_sz, void * ctx ) );
     340             : 
     341             : /* fd_txncache_insert_batch inserts a batch of transaction results into
     342             :    a txn cache.  Assumes tc points to a txn cache, txns is a list of
     343             :    transaction results to be inserted, and txns_cnt is the count of the
     344             :    results list.  The insert copies data from the results and does not
     345             :    have any lifetime interest in the results memory provided once this
     346             :    call returns.
     347             : 
     348             :    Returns 1 on success and 0 on failure.  The only reason insertion can
     349             :    fail is because the txn cache is full, which should never happen in
     350             :    practice if the caller sizes the bounds correctly.  This is mostly
     351             :    here to support testing and fuzzing.
     352             : 
     353             :    This is a cheap, high performance, concurrent operation and can occur
     354             :    at the same time as queries and arbitrary other insertions. */
     355             : 
     356             : int
     357             : fd_txncache_insert_batch( fd_txncache_t *              tc,
     358             :                           fd_txncache_insert_t const * txns,
     359             :                           ulong                        txns_cnt );
     360             : 
     361             : /* fd_txncache_query_batch queries a batch of transactions to determine
     362             :    if they exist in the txn cache or not.  The queries have an ambiguous
     363             :    slot, but must match both the blockhash and txnhash.  In addition, if
     364             :    the query_func is not NULL, the query_func will be called with the
     365             :    slot of the txn and the query_func_ctx that was provided, and the
     366             :    transaction will be considered present only if the query_func also
     367             :    returns 1.
     368             : 
     369             :    Assumes tc points to a txn cache, queries is a list of queries to be
     370             :    executed, and queries_cnt is the count of the queries list.
     371             :    out_results must be at least as large as queries_cnt and will be
     372             :    filled with 0 or 1 if the transaction is not present or present
     373             :    respectively.
     374             : 
     375             :    This is a cheap, high performance, concurrent operation and can occur
     376             :    at the same time as queries and arbitrary other insertions. */
     377             : 
     378             : void
     379             : fd_txncache_query_batch( fd_txncache_t *             tc,
     380             :                          fd_txncache_query_t const * queries,
     381             :                          ulong                       queries_cnt,
     382             :                          void *                      query_func_ctx,
     383             :                          int ( * query_func )( ulong slot, void * ctx ),
     384             :                          int *                       out_results );
     385             : 
     386             : /* fd_txncache_set_txnhash_offset sets the correct offset value for the
     387             :    txn hash "key slice" in the blockcache and slotblockcache. This is used
     388             :    primarily for snapshot restore since in firedancer we always use the
     389             :    offset of 0. Return an error if the cache entry isn't found. */
     390             : int
     391             : fd_txncache_set_txnhash_offset( fd_txncache_t * tc,
     392             :                                 ulong slot,
     393             :                                 uchar blockhash[ 32 ],
     394             :                                 ulong txnhash_offset );
     395             : 
     396             : /* fd_txncache_is_rooted_slot returns 1 is `slot` is rooted, 0 otherwise. */
     397             : int
     398             : fd_txncache_is_rooted_slot( fd_txncache_t * tc,
     399             :                             ulong slot );
     400             : 
     401             : /* fd_txncache_get_entries is responsible for converting the rooted state of
     402             :    the status cache back into fd_bank_slot_deltas_t, which is the decoded
     403             :    format used by Agave. This is a helper method used to generate Agave-
     404             :    compatible snapshots. */
     405             : 
     406             : int
     407             : fd_txncache_get_entries( fd_txncache_t *         tc,
     408             :                          fd_bank_slot_deltas_t * bank_slot_deltas,
     409             :                          fd_spad_t *             spad );
     410             : 
     411             : FD_PROTOTYPES_END
     412             : 
     413             : #endif /* HEADER_fd_src_flamenco_runtime_txncache_h */

Generated by: LCOV version 1.14