LCOV - code coverage report
Current view: top level - flamenco/runtime - fd_txncache.h (source / functions) Hit Total Coverage
Test: cov.lcov Lines: 0 5 0.0 %
Date: 2024-11-13 11:58:15 Functions: 0 0 -

          Line data    Source code
       1             : #ifndef HEADER_fd_src_flamenco_runtime_txncache_h
       2             : #define HEADER_fd_src_flamenco_runtime_txncache_h
       3             : 
       4             : /* A txn cache is a concurrent map for saving the result (status) of
       5             :    transactions that have executed.  In addition to supporting fast
       6             :    concurrent insertion and query of transaction results, the txn
       7             :    cache supports serialization of its state into a standard bincode
       8             :    format for serving the state to other nodes via. snapshot responses,
       9             :    and then restoring snapshots produced by other nodes.
      10             : 
      11             :    The txn cache is designed to do two operations fast,
      12             : 
      13             :      (a) Insertion.  Insertion is done by both the leader pipeline and
      14             :          the replay stage, with an insertion for every transaction that
      15             :          is executed, potentially over 1M per second.
      16             : 
      17             :      (b) Query.  The leader pipeline queries the status of transactions
      18             :          before executing them to ensure that it does not execute
      19             :          anything twice.
      20             : 
      21             :    Both of these operations are concurrent and lockless, assuming there
      22             :    are no other (non-insert/query) operations occuring on the txn cache.
      23             :    Most other operations lock the entire structure and will prevent both
      24             :    insertion and query from proceeding.
      25             : 
      26             :    The txn cache is both CPU and memory sensitive.  A transaction result
      27             :    is 1 byte, and the stored transaction hashes are 20 bytes, so
      28             :    without any overhead just storing 150 slots of transactions in a flat
      29             :    array would require
      30             : 
      31             :       524,288 * 150 * 21 ~ 1.5GiB
      32             : 
      33             :    Of memory.  But, we can't query, insert, and remove quickly from a
      34             :    flat array.  We make a few trade-offs to achieve good performance
      35             :    without bloating memory completely.  In particular:
      36             : 
      37             :      - The transactions to be queried are stored in a structure that
      38             :        looks like a
      39             : 
      40             :          hash_map<blockhash, hash_map<txnhash, vec<(slot, status)>>>
      41             : 
      42             :        The top level hash_map is a probed hash map, and the txnhash map
      43             :        is a chained hash map, where the items come from a pool of pages
      44             :        of transactions.  We use pages of transactions to support fast
      45             :        removal of a blockhash from the top level map, we need to return
      46             :        at most 4,800 pages back to the pool rather than 78,643,200
      47             :        individual transactions.
      48             : 
      49             :        This adds additional memory overhead, a blockhash with only one
      50             :        transaction in it will still consume a full page (16,384) of
      51             :        transactions of memory.  Allocating a new transaction page to a
      52             :        chained map is rare (once every 16,384 inserts) so the cost
      53             :        amortizes to zero.  Creating a blockhash happens once per
      54             :        blockhash, so also amortizes to zero, the only operation we care
      55             :        about is then the simple insert case with an unfull transaction
      56             :        page into an existing blockhash.  This can be done with two
      57             :        compare-and-swaps,
      58             : 
      59             :          // 1. Find the blockhash in the probed hash map.
      60             : 
      61             :             let by_blockhash = hash_map[ blockhash ];
      62             : 
      63             :          // 2. Request space for the transaction from the current page
      64             : 
      65             :             let page = by_blockhash.pages.back();
      66             :             let idx = page.used.compare_and_swap( current, current+1 );
      67             : 
      68             :          // 3. Write the transaction into the page and the map
      69             : 
      70             :             page.txns[ idx ] = txn;
      71             :             page.txns[ idx ].next = by_blockhash.txns[ txnhash ].idx;
      72             :             by_blockhash.txns[ txnhash ].head.compare_and_swap( current, idx );
      73             : 
      74             :        Removal of a blockhash from this structure is simple because it
      75             :        does not need to be concurrent (the caller will only remove
      76             :        between executing slots, so there's no contention and it can take
      77             :        a full write lock).  We take a write lock, restore the pages in
      78             :        the blockhash to the pool, and then mark the space in the
      79             :        hash_map as empty.  This is fast since there are at most 4,800
      80             :        pages to restore and restoration is a simple memcpy.
      81             : 
      82             :      - Another structure is required to support serialization of
      83             :        snapshots from the cache.  Serialization must produce a binary
      84             :        structure that encodes essentially:
      85             : 
      86             :          hash_map<slot, hash_map<blockhash, vec<(txnhash, txn_result)>>>
      87             : 
      88             :        For each slot that is rooted.  Observe that we can't reuse the
      89             :        structure used for queries, since it's not grouped by slot, we
      90             :        would have to iterate all of the transactions to do this grouping
      91             :        which is not feasible.
      92             : 
      93             :        We can't add the slot to that index, since then query would not
      94             :        be fast.  Queries are just against (blockhash, txnhash) pairs,
      95             :        they don't know which slot might have included it.
      96             : 
      97             :        So we need a second index for this information.  The index is
      98             :        going to line up with what we described above, the root hash map
      99             :        will be probed, and the nested hash map will be chained.  The
     100             :        vector of results will also use a pool of pages.
     101             : 
     102             :        In this case the page pool is for a different reason: we know a
     103             :        sensible upper bound for the number of transactions that could be
     104             :        alive in the entire cache at any point in time, but we don't know
     105             :        quite how to bound it for a single slot or blockhash.  Well, we
     106             :        do: the bound is 524,288 per (slot, blockhash) pair but we would
     107             :        need to allocate all that memory up front.  Instead, we bound the
     108             :        total number of slots and transactions per slot, assume the
     109             :        transactions must be distributed somehow within the blockhashes,
     110             :        and then give some overhead for unoccupied page entries.
     111             : 
     112             :        The insertion process for this structure is similar to the one
     113             :        above, except the vector is not a chained map but just a regular
     114             :        vector.
     115             : 
     116             :          // 1. Find the (slot, blockhash) in the probed hash map.
     117             : 
     118             :             let by_slot_blockhash = hash_map[ slot ][ blockhash ];
     119             : 
     120             :          // 2. Request space for the transaction from the current page
     121             : 
     122             :             let page = by_slot_blockhash.pages.back();
     123             :             let idx = page.used.fetch_add( 1 );
     124             : 
     125             :          // 3. Write the transaction into the page and the map
     126             : 
     127             :             page.txns[ idx ] = txn;
     128             :             by_slot_blockhash.txn_count += 1;
     129             : 
     130             :        The final increment is OK even with concurrent inserts because
     131             :        no reader will try to observe the txn_count until the slot is
     132             :        rooted, at which point nothing could be being inserted into it
     133             :        and the txn_count will be finalized. */
     134             : 
     135             : #include "../fd_flamenco_base.h"
     136             : 
     137           0 : #define FD_TXNCACHE_ALIGN (128UL)
     138             : 
     139           0 : #define FD_TXNCACHE_MAGIC (0xF17EDA2CE5CAC4E0) /* FIREDANCE SCACHE V0 */
     140             : 
     141             : /* The duration of history to keep around in the txn cache before aging
     142             :    it out.  This must be at least 150, otherwise we could forget about
     143             :    transactions for blockhashes that are still valid to use, and let
     144             :    duplicate transactions make it into a produced block.
     145             : 
     146             :    Beyond 150, any value is valid.  The value used here, 300 comes from
     147             :    Agave, and corresponds to roughly 2 minutes of history assuming there
     148             :    are no forks, with an extra 12.8 seconds of history for slots that
     149             :    are in progress but have not rooted yet.  On a production validator
     150             :    without RPC support, the value should probably be configurable and
     151             :    always set to strictly 150. */
     152             : 
     153           0 : #define FD_TXNCACHE_DEFAULT_MAX_ROOTED_SLOTS (300UL)
     154             : 
     155             : /* A note on live slots ...
     156             : 
     157             :    The maximum number of live slots is the sum of the rooted and
     158             :    unrooted slots.  The rooted slots are explicitly capped at 300 for
     159             :    this structure (implying we keep around 2 minutes of history
     160             :    around for queries and snapshots).
     161             : 
     162             :    For the unrooted slots, we must root at least one slot in an epoch
     163             :    for an epoch transition to occur successfully to the next one, so
     164             :    assuming every slot is unconfirmed for some reason, and the prior
     165             :    epoch was rooted at the first slot in the epoch, and the next epoch
     166             :    is rooted at the last slot, there could be
     167             : 
     168             :       432,000 + 432,000 - 31 = 863,969
     169             : 
     170             :    Live slots on the validator.  This is clearly impractical as each
     171             :    bank consumes a lof of memory to store slot state, so the
     172             :    validator would crash long before this.
     173             : 
     174             :    For now we just pick a number: 2048, based on running on mainnet.
     175             :    We take into account that apart from the 450 recent blockhashes,
     176             :    there are also nonce accounts used as blockhashes by transactions
     177             :    and they contribute significantly to entries being filled up.
     178             : 
     179             :    TODO: This number might be unbounded, more testing required with
     180             :    mainnet before deciding an actual number. */
     181             : 
     182           0 : #define FD_TXNCACHE_DEFAULT_MAX_LIVE_SLOTS (2048UL)
     183             : 
     184             : /* The Solana consensus protocol has an implied restriction on the number
     185             :    transactions in a slot.  A slot might have at most 48,000,000 CUs,
     186             :    but a transaction requires at least around 1500 CUs, so there could
     187             :    be at most 32,000 transactions in a slot.
     188             : 
     189             :    For Firedancer, we respect this limit when running in production, but
     190             :    for development and preformance tuning this limit is removed, and
     191             :    instead we will produce and accept at most 524,288 transactions per
     192             :    slot.  This is chosen arbitrarily and works out to around ~1.3M TPS,
     193             :    however such a value does not exist in the consensus protocol. */
     194             : 
     195           0 : #define FD_TXNCACHE_DEFAULT_MAX_TRANSACTIONS_PER_SLOT (524288UL)
     196             : 
     197             : struct fd_txncache_insert {
     198             :   uchar const * blockhash;
     199             :   uchar const * txnhash;
     200             :   ulong         slot;
     201             :   uchar const * result;
     202             : };
     203             : 
     204             : typedef struct fd_txncache_insert fd_txncache_insert_t;
     205             : 
     206             : struct fd_txncache_query {
     207             :   uchar const * blockhash;
     208             :   uchar const * txnhash;
     209             : };
     210             : 
     211             : typedef struct fd_txncache_query fd_txncache_query_t;
     212             : 
     213             : struct fd_txncache_snapshot_entry {
     214             :    ulong slot;
     215             :    uchar blockhash[ 32 ];
     216             :    uchar txnhash[ 20 ];
     217             :    ulong txn_idx;
     218             :    uchar result;
     219             : };
     220             : 
     221             : typedef struct fd_txncache_snapshot_entry fd_txncache_snapshot_entry_t;
     222             : 
     223             : /* Forward declare opaque handle */
     224             : struct fd_txncache_private;
     225             : typedef struct fd_txncache_private fd_txncache_t;
     226             : 
     227             : FD_PROTOTYPES_BEGIN
     228             : 
     229             : /* fd_txncache_{align,footprint} give the needed alignment and
     230             :    footprint of a memory region suitable to hold a txn cache.
     231             :    fd_txncache_{align,footprint} return the same value as
     232             :    FD_TXNCACHE_{ALIGN,FOOTPRINT}.
     233             : 
     234             :    fd_txncache_new formats memory region with suitable alignment and
     235             :    footprint suitable for holding a txn cache.  Assumes shmem points
     236             :    on the caller to the first byte of the memory region owned by the
     237             :    caller to use.  Returns shmem on success and NULL on failure (logs
     238             :    details).  The memory region will be owned by the state on successful
     239             :    return.  The caller is not joined on return.
     240             : 
     241             :    fd_txncache_join joins the caller to a txn cache. Assumes shtc points
     242             :    to the first byte of the memory region holding the state.  Returns a
     243             :    local handle to the join on success (this is not necessarily a simple
     244             :    cast of the address) and NULL on failure (logs details).
     245             : 
     246             :    fd_txncache_leave leaves the caller's current local join to a txn
     247             :    cache.  Returns a pointer to the memory region holding the state on
     248             :    success (this is not necessarily a simple cast of the address) and
     249             :    NULL on failure (logs details).  The caller is not joined on
     250             :    successful return.
     251             : 
     252             :    fd_txncache_delete unformats a memory region that holds a txn cache.
     253             :    Assumes shtc points on the caller to the first byte of the memory
     254             :    region holding the state and that nobody is joined.  Returns a
     255             :    pointer to the memory region on success and NULL on failure (logs
     256             :    details).  The caller has ownership of the memory region on
     257             :    successful return. */
     258             : 
     259             : FD_FN_CONST ulong
     260             : fd_txncache_align( void );
     261             : 
     262             : FD_FN_CONST ulong
     263             : fd_txncache_footprint( ulong max_rooted_slots,
     264             :                        ulong max_live_slots,
     265             :                        ulong max_txn_per_slot );
     266             : 
     267             : void *
     268             : fd_txncache_new( void * shmem,
     269             :                  ulong  max_rooted_slots,
     270             :                  ulong  max_live_slots,
     271             :                  ulong  max_txn_per_slot );
     272             : 
     273             : fd_txncache_t *
     274             : fd_txncache_join( void * shtc );
     275             : 
     276             : void *
     277             : fd_txncache_leave( fd_txncache_t * tc );
     278             : 
     279             : void *
     280             : fd_txncache_delete( void * shtc );
     281             : 
     282             : /* fd_txncache_register_root registers a root slot in a txn cache.  Only
     283             :    the provided limit of roots (typically 300) can exist in the txn
     284             :    cache at once, after which the oldest roots will be purged.
     285             : 
     286             :    Purging a root means it cannot be served in a snapshot response,
     287             :    although transactions in the root may or may not still be present.
     288             :    Transaction status is removed once all roots referencing the
     289             :    blockhash of the transaction are removed from the txn cache.
     290             : 
     291             :    This is neither cheap or expensive, it will pause all insertion and
     292             :    query operations but only momentarily until any old slots can be
     293             :    purged from the cache. */
     294             : 
     295             : void
     296             : fd_txncache_register_root_slot( fd_txncache_t * tc,
     297             :                                 ulong           slot );
     298             : 
     299             : /* fd_txncache_root_slots returns the list of live slots currently
     300             :    tracked by the txn cache.  There will be at most max_root_slots
     301             :    slots, which will be written into the provided out_slots.  It is
     302             :    assumed tc points to a txn cache and out_slots has space for at least
     303             :    max_root_slots results.  If there are less than max_root_slots slots
     304             :    in the cache, the front part of out_slots will be filled in, and all
     305             :    the remaining slots will be set to ULONG_MAX.
     306             : 
     307             :    This is a fast operation, but it will lock the whole structure and
     308             :    cause a temporary pause in insert and query operations. */
     309             : 
     310             : void
     311             : fd_txncache_root_slots( fd_txncache_t * tc,
     312             :                         ulong *         out_slots );
     313             : 
     314             : /* fd_txncache_snapshot writes the current state of a txn cache into a
     315             :    binary format suitable for serving to other nodes via snapshot
     316             :    responses.  The write function is called in a streaming fashion with
     317             :    the binary data, the size of the data, and the ctx pointer provided
     318             :    to this function.  The write function should return 0 on success and
     319             :    -1 on failure, this function will propgate a failure to write back to
     320             :    the caller immediately, so this function also returns 0 on success
     321             :    and -1 on failure.
     322             : 
     323             :    IMPORTANT!  THIS ASSUMES THERE ARE NO CONCURRENT INSERTS OCCURING ON
     324             :    THE TXN CACHE AT THE ROOT SLOTS DURING SNAPSHOTTING.  OTHERWISE THE
     325             :    SNAPSHOT MIGHT BE NOT CONTAIN ALL OF THE DATA, ALTHOUGH IT WILL NOT
     326             :    CAUSE CORRUPTION.  THIS IS ASSUMED OK BECAUSE YOU CANNOT MODIFY A
     327             :    ROOTED SLOT.
     328             : 
     329             :    This is a cheap operation and will not cause any pause in insertion
     330             :    or query operations. */
     331             : 
     332             : int
     333             : fd_txncache_snapshot( fd_txncache_t * tc,
     334             :                       void *          ctx,
     335             :                       int ( * write )( uchar const * data, ulong data_sz, void * ctx ) );
     336             : 
     337             : /* fd_txncache_insert_batch inserts a batch of transaction results into
     338             :    a txn cache.  Assumes tc points to a txn cache, txns is a list of
     339             :    transaction results to be inserted, and txns_cnt is the count of the
     340             :    results list.  The insert copies data from the results and does not
     341             :    have any lifetime interest in the results memory provided once this
     342             :    call returns.
     343             : 
     344             :    Returns 1 on success and 0 on failure.  The only reason insertion can
     345             :    fail is because the txn cache is full, which should never happen in
     346             :    practice if the caller sizes the bounds correctly.  This is mostly
     347             :    here to support testing and fuzzing.
     348             : 
     349             :    This is a cheap, high performance, concurrent operation and can occur
     350             :    at the same time as queries and arbitrary other insertions. */
     351             : 
     352             : int
     353             : fd_txncache_insert_batch( fd_txncache_t *              tc,
     354             :                           fd_txncache_insert_t const * txns,
     355             :                           ulong                        txns_cnt );
     356             : 
     357             : /* fd_txncache_query_batch queries a batch of transactions to determine
     358             :    if they exist in the txn cache or not.  The queries have an ambiguous
     359             :    slot, but must match both the blockhash and txnhash.  In addition, if
     360             :    the query_func is not NULL, the query_func will be called with the
     361             :    slot of the txn and the query_func_ctx that was provided, and the
     362             :    transaction will be considered present only if the query_func also
     363             :    returns 1.
     364             : 
     365             :    Assumes tc points to a txn cache, queries is a list of queries to be
     366             :    executed, and queries_cnt is the count of the queries list.
     367             :    out_results must be at least as large as queries_cnt and will be
     368             :    filled with 0 or 1 if the transaction is not present or present
     369             :    respectively.
     370             : 
     371             :    This is a cheap, high performance, concurrent operation and can occur
     372             :    at the same time as queries and arbitrary other insertions. */
     373             : 
     374             : void
     375             : fd_txncache_query_batch( fd_txncache_t *             tc,
     376             :                          fd_txncache_query_t const * queries,
     377             :                          ulong                       queries_cnt,
     378             :                          void *                      query_func_ctx,
     379             :                          int ( * query_func )( ulong slot, void * ctx ),
     380             :                          int *                       out_results );
     381             : 
     382             : /* fd_txncache_set_txnhash_offset sets the correct offset value for the
     383             :    txn hash "key slice" in the blockcache and slotblockcache. This is used
     384             :    primarily for snapshot restore since in firedancer we always use the
     385             :    offset of 0. Return an error if the cache entry isn't found. */
     386             : int
     387             : fd_txncache_set_txnhash_offset( fd_txncache_t * tc,
     388             :                                 ulong slot,
     389             :                                 uchar blockhash[ 32 ],
     390             :                                 ulong txnhash_offset );
     391             : 
     392             : /* fd_txncache_is_rooted_slot returns 1 is `slot` is rooted, 0 otherwise. */
     393             : int
     394             : fd_txncache_is_rooted_slot( fd_txncache_t * tc,
     395             :                             ulong slot );
     396             : 
     397             : FD_PROTOTYPES_END
     398             : 
     399             : #endif /* HEADER_fd_src_flamenco_runtime_txncache_h */

Generated by: LCOV version 1.14