Line data Source code
1 : #ifndef HEADER_fd_src_flamenco_runtime_fd_txncache_h
2 : #define HEADER_fd_src_flamenco_runtime_fd_txncache_h
3 :
4 : /* A txn cache is a concurrent set storing the message hashes of
5 : transactions which have already executed. Note the structure is
6 : keyed by message hash, not signature, otherwise a double spend might
7 : be possible due to signature malleability.
8 :
9 : The txn cache is designed to do two operations fast,
10 :
11 : (a) Insertion. Insertion is done by both the leader pipeline and
12 : the replay stage, with an insertion for every transaction that
13 : is executed, potentially over 1M per second.
14 :
15 : (b) Query. The leader pipeline queries the status of transactions
16 : before executing them to ensure that it does not execute
17 : anything twice. The replay stage queries to make sure that
18 : blocks do not contain duplicate transactions.
19 :
20 : Both of these operations are concurrent and lockless, assuming there
21 : are no other (non-insert/query) operations occurring on the txn
22 : cache. Most other operations lock the entire structure and will
23 : prevent both insertion and query from proceeding, but are rare
24 : (once per slot) so it's OK.
25 :
26 : The txn cache is somewhat CPU and memory sensitive. To store message
27 : hashes requires 20 bytes (only the first 20 of the 32 bytes of the
28 : hash are used, since this is sufficient to avoid collisions).
29 : Without any other overhead, and 31 unrooted blocks with 41,019
30 : transactions each (the current maximum), the txn cache would need to
31 : store
32 :
33 : 31 * 41,019 * 20 = 0.025 GB
34 :
35 : Of memory. But, we can't query, insert, and remove quickly from a
36 : flat array. We make a few trade-offs to achieve good performance
37 : without bloating memory completely. In particular the transactions
38 : to be queried are stored in a structure that looks like a
39 :
40 : multi_map<blockhash, hash_map<txnhash, vec<fork_idx>>>
41 :
42 : Note that there can be multiple duplicate blockhashes in the map, so
43 : it is a multimap. This is to handle the extremely rare degenerate
44 : case where a leader equivocates a block, but keeps the transactions
45 : the same and just fiddles with the shred data or padding a little
46 : bit. This keeps blockhash the same, but creates a new block ID
47 : (fork). We cannot stick both forks in the same blockhash entry
48 : because later, when one fork is rooted, we need to be able to purge
49 : all transactions in the non-rooted equivocation, otherwise we would
50 : violate our own invariants around the max active banks. (Rooting one
51 : equivocation should fully evict transactions belonging to the other
52 : equivocations, which would be impossible if they were under one
53 : blockhash).
54 :
55 : Both hash maps are chained hash maps, and for the txnhash map items
56 : come from a pool of pages of transactions. We use pages of
57 : transactions to support fast removal of a blockhash from the top
58 : level map.
59 :
60 : This adds additional memory overhead, a blockhash with only one
61 : transaction in it will still consume a full page (16,384) of
62 : transactions of memory. Allocating a new transaction page to a
63 : chained map is rare (once every 16,384 inserts) so the cost amortizes
64 : to zero. Creating a blockhash happens once per blockhash, so also
65 : amortizes to zero, the only operation we care about is then the
66 : simple insert case with an unfull transaction page into an existing
67 : blockhash. This can be done with two compare-and-swaps,
68 :
69 : // 1. Find the blockhash in the probed hash map.
70 :
71 : let by_blockhash = multi_map[ blockhash ];
72 :
73 : // 2. Request space for the transaction from the current page
74 :
75 : let page = by_blockhash.pages.back();
76 : let idx = page.used.compare_and_swap( current, current+1 );
77 :
78 : // 3. Write the transaction into the page and the map
79 :
80 : page.txns[ idx ] = txn;
81 : page.txns[ idx ].next = by_blockhash.txns[ txnhash ].idx;
82 : by_blockhash.txns[ txnhash ].head.compare_and_swap( current, idx );
83 :
84 : Step 1 is fast assuming equivocation is rare, since there will only
85 : ever be one matching blockhash in the multi map.
86 :
87 : Removal of a blockhash from this structure is simple because it does
88 : not need to be concurrent (it is once per slot). We take a write
89 : lock, restore the pages in the blockhash to the pool, and then mark
90 : the space in the hash_map as empty. Page restoration is a simple
91 : memcpy.
92 :
93 : All of this would be complicated by nonce transactions, which can
94 : reference any blockhash, and so a transaction could appear in the
95 : txn cache under any blockhash. But when dealing with these we make
96 : a simple observation: double spend is already cryptographically
97 : prevented by the nonce mechanism itself, so we do not need to store
98 : or check them. Agave does store them for RPC related reasons (they
99 : want to be able to query the status of nonce transactions that have
100 : already executed). It is therefore an error to insert any nonce
101 : transactions into the txn cache, and the caller is responsible for
102 : ensuring this does not happen.
103 :
104 : There is one minor complication with this, which is that Agave
105 : snapshots serve out the status cache with nonce transactions already
106 : in it. This structure assumes these will not get inserted, so it is
107 : up to the snapshot loading code to filter out all blockhashes which
108 : are nonce blockhashes (not in the recent blockhashes sysvar). */
109 :
110 : #include "fd_txncache_shmem.h"
111 :
112 0 : #define FD_TXNCACHE_ALIGN (128UL)
113 :
114 : #define FD_TXNCACHE_MAGIC (0xF17EDA2CE5CAC4E0) /* FIREDANCE SCACHE V0 */
115 :
116 : struct fd_txncache_private;
117 : typedef struct fd_txncache_private fd_txncache_t;
118 :
119 : FD_PROTOTYPES_BEGIN
120 :
121 : /* fd_txncache_align gives the needed alignment of the caller-provided
122 : local join region (ljoin). This returns FD_TXNCACHE_ALIGN.
123 :
124 : fd_txncache_footprint gives the needed footprint of the local join
125 : region for a txn cache configured for max_live_slots live slots
126 :
127 : A txn cache uses two backing regions:
128 :
129 : (1) Shared memory region, formatted by fd_txncache_shmem_new and
130 : joined as an fd_txncache_shmem_t * (see
131 : fd_txncache_shmem_{align,footprint}).
132 :
133 : (2) Local join region, formatted by fd_txncache_new and joined via
134 : fd_txncache_join.
135 :
136 : fd_txncache_new formats the caller-provided local join region ljoin
137 : and attaches it to the shared region shmem. Assumes ljoin points to
138 : the first byte of a memory region owned by the caller with suitable
139 : alignment and footprint. Assumes shmem is a valid joined txn cache
140 : shared-memory region (typically from fd_txncache_shmem_join) created
141 : with a matching max_live_slots. Returns ljoin on success and NULL
142 : on failure (logs details). The caller is not joined on return.
143 :
144 : fd_txncache_join joins the caller to a txn cache. Assumes ljoin
145 : points to the first byte of the local join region holding the state.
146 : Returns a local handle to the join on success (this is not
147 : necessarily a simple cast of the address) and NULL on failure (logs
148 : details). */
149 :
150 : FD_FN_CONST ulong
151 : fd_txncache_align( void );
152 :
153 : FD_FN_CONST ulong
154 : fd_txncache_footprint( ulong max_live_slots );
155 :
156 : void *
157 : fd_txncache_new( void * ljoin,
158 : fd_txncache_shmem_t * shmem );
159 :
160 : fd_txncache_t *
161 : fd_txncache_join( void * ljoin );
162 :
163 : void
164 : fd_txncache_reset( fd_txncache_t * tc );
165 :
166 : /* fd_txncache_attach_child notifies the txncache that a new child bank
167 : has been created, off some parent. This must be called before any
168 : transaction executed on this child fork is inserted. The parent fork
169 : id must be a fork ID previously returned from attach child, which
170 : must be still valid (the current root bank or a child of it).
171 :
172 : It is assumed that there are less than max_live_slots number of
173 : unrooted (or the active root) banks active prior to calling,
174 : otherwise it is an error.
175 :
176 : Attaching a child takes a write lock on the entire structure, which
177 : is an expensive stall for any other concurrent operations, but it is
178 : OK because the operation is otherwise cheap and it is called rarely
179 : (once every 400ms or so). */
180 :
181 : fd_txncache_fork_id_t
182 : fd_txncache_attach_child( fd_txncache_t * tc,
183 : fd_txncache_fork_id_t parent_fork_id );
184 :
185 : void
186 : fd_txncache_attach_blockhash( fd_txncache_t * tc,
187 : fd_txncache_fork_id_t fork_id,
188 : uchar const * blockhash );
189 :
190 : void
191 : fd_txncache_finalize_fork( fd_txncache_t * tc,
192 : fd_txncache_fork_id_t fork_id,
193 : ulong txnhash_offset,
194 : uchar const * blockhash );
195 :
196 : /* fd_txncache_advance_root is called when the root slot of the chain
197 : has advanced, in which case old message hashes (referencing
198 : blockhashes that could no longer be valid) can be removed from the
199 : cache.
200 :
201 : Advancing the root takes a write lock on the structure, which is an
202 : expensive stall for any other concurrent operations, but it is OK
203 : because the operation is otherwise cheap and it is called rarely
204 : (once every 400ms or so). */
205 :
206 : void
207 : fd_txncache_advance_root( fd_txncache_t * tc,
208 : fd_txncache_fork_id_t fork_id );
209 :
210 : /* fd_txncache_insert inserts a transaction hash into the txn cache on
211 : the provided fork. The insert copies data from the hashes and does
212 : not have any lifetime interest in the hashes memory provided once
213 : this call returns.
214 :
215 : Insertion cannot fail, as it is assume the caller is respecting the
216 : invariants of the structure. If there is no space internally to
217 : insert another transaction, the program is aborted with an error.
218 :
219 : This is a cheap, high performance, concurrent operation and can occur
220 : at the same time as queries and arbitrary other insertions. */
221 :
222 : void
223 : fd_txncache_insert( fd_txncache_t * tc,
224 : fd_txncache_fork_id_t fork_id,
225 : uchar const * blockhash,
226 : uchar const * txnhash );
227 :
228 : /* fd_txncache_query queries the txncache to determine if the given
229 : txnhash (referencing the given blockhash) exists on the provided fork
230 : or not.
231 :
232 : Returns 1 if the transaction exists on the provided fork and 0 if it
233 : does not.
234 :
235 : This is a cheap, high performance, concurrent operation and can occur
236 : at the same time as queries and arbitrary other insertions. */
237 :
238 : int
239 : fd_txncache_query( fd_txncache_t * tc,
240 : fd_txncache_fork_id_t fork_id,
241 : uchar const * blockhash,
242 : uchar const * txnhash );
243 :
244 : FD_PROTOTYPES_END
245 :
246 : #endif /* HEADER_fd_src_flamenco_runtime_fd_txncache_h */
|