Line data Source code
1 : #ifndef HEADER_fd_src_flamenco_runtime_txncache_h
2 : #define HEADER_fd_src_flamenco_runtime_txncache_h
3 :
4 : #include "fd_txncache_shmem.h"
5 :
6 : /* A txn cache is a concurrent set storing the message hashes of
7 : transactions which have already executed. Note the structure is
8 : keyed by message hash, not signature, otherwiwe a double spend might
9 : be possible due to signature malleability.
10 :
11 : The txn cache is designed to do two operations fast,
12 :
13 : (a) Insertion. Insertion is done by both the leader pipeline and
14 : the replay stage, with an insertion for every transaction that
15 : is executed, potentially over 1M per second.
16 :
17 : (b) Query. The leader pipeline queries the status of transactions
18 : before executing them to ensure that it does not execute
19 : anything twice. The replay stage queries to make sure that
20 : blocks do not contain duplicate transactions.
21 :
22 : Both of these operations are concurrent and lockless, assuming there
23 : are no other (non-insert/query) operations occuring on the txn cache.
24 : Most other operations lock the entire structure and will prevent both
25 : insertion and query from proceeding, but are rare (once per slot) so
26 : it's OK.
27 :
28 : The txn cache is somewhat CPU and memory sensitive. To store message
29 : hashes requires 20 bytes (only the first 20 of the 32 bytes of the
30 : hash are used, since this is sufficient to avoid collisions).
31 : Without any other overhead, and 31 unrooted blocks with 41,019
32 : transactions each (the current maximum), the txn cache would need to
33 : store
34 :
35 : 31 * 41,019 * 20 = 0.025 GB
36 :
37 : Of memory. But, we can't query, insert, and remove quickly from a
38 : flat array. We make a few trade-offs to achieve good performance
39 : without bloating memory completely. In particular the transactions
40 : to be queried are stored in a structure that looks like a
41 :
42 : multi_map<blockhash, hash_map<txnhash, vec<fork_idx>>>
43 :
44 : Note that there can be multiple duplicate blockhashes in the map, so
45 : it is a multimap. This is to handle the extremely rare degenerate
46 : case where a leader equivocates a block, but keeps the transactions
47 : the same and just fiddles with the shred data or padding a little
48 : bit. This keeps blockhash the same, but creates a new block ID
49 : (fork). We cannot stick both forks in the same blockhash entry
50 : because later, when one fork is rooted, we need to be able to purge
51 : all transactions in the non-rooted equivocation, otherwise we would
52 : violate our own invariants around the max active banks. (Rooting one
53 : equivocation should fully evict transactions belonging to the other
54 : equivocations, which would be impossible if they were under one
55 : blockhash).
56 :
57 : Both hash maps are chained hash maps, and for the txnhash map items
58 : come from a pool of pages of transactions. We use pages of
59 : transactions to support fast removal of a blockhash from the top
60 : level map.
61 :
62 : This adds additional memory overhead, a blockhash with only one
63 : transaction in it will still consume a full page (16,384) of
64 : transactions of memory. Allocating a new transaction page to a
65 : chained map is rare (once every 16,384 inserts) so the cost amortizes
66 : to zero. Creating a blockhash happens once per blockhash, so also
67 : amortizes to zero, the only operation we care about is then the
68 : simple insert case with an unfull transaction page into an existing
69 : blockhash. This can be done with two compare-and-swaps,
70 :
71 : // 1. Find the blockhash in the probed hash map.
72 :
73 : let by_blockhash = multi_map[ blockhash ];
74 :
75 : // 2. Request space for the transaction from the current page
76 :
77 : let page = by_blockhash.pages.back();
78 : let idx = page.used.compare_and_swap( current, current+1 );
79 :
80 : // 3. Write the transaction into the page and the map
81 :
82 : page.txns[ idx ] = txn;
83 : page.txns[ idx ].next = by_blockhash.txns[ txnhash ].idx;
84 : by_blockhash.txns[ txnhash ].head.compare_and_swap( current, idx );
85 :
86 : Step 1 is fast assuming equivocation is rare, since there will only
87 : ever be one matching blockhash in the multi map.
88 :
89 : Removal of a blockhash from this structure is simple because it does
90 : not need to be concurrent (it is once per slot). We take a write
91 : lock, restore the pages in the blockhash to the pool, and then mark
92 : the space in the hash_map as empty. Page restoration is a simple
93 : memcpy.
94 :
95 : All of this would be complicated by nonce transactions, which can
96 : reference any blockhash, and so a transaction could appear in the
97 : txn cache under any blockhash. But when dealing with these we make
98 : a simple observation: double spend is already cryptographically
99 : prevented by the nonce mechanism itself, so we do not need to store
100 : or check them. Agave does store them for RPC related reasons (they
101 : want to be able to query the status of nonce transactions that have
102 : already executed). It is therefore an error to insert any nonce
103 : transactions into the txn cache, and the caller is responsible for
104 : ensuring this does not happen.
105 :
106 : There is one minor complication with this, which is that Agave
107 : snapshots serve out the status cache with nonce transactions already
108 : in it. This structure assumes these will not get inserted, so it is
109 : up to the snapshot loading code to filter out all blockhashes which
110 : are nonce blockhashes (not in the recent blockhashes sysvar). */
111 :
112 : #include "../../util/fd_util_base.h"
113 :
114 0 : #define FD_TXNCACHE_ALIGN (128UL)
115 :
116 : #define FD_TXNCACHE_MAGIC (0xF17EDA2CE5CAC4E0) /* FIREDANCE SCACHE V0 */
117 :
118 : struct fd_txncache_private;
119 : typedef struct fd_txncache_private fd_txncache_t;
120 :
121 : FD_PROTOTYPES_BEGIN
122 :
123 : /* fd_txncache_{align,footprint} give the needed alignment and
124 : footprint of a memory region suitable to hold a txn cache.
125 : fd_txncache_{align,footprint} return the same value as
126 : FD_TXNCACHE_{ALIGN,FOOTPRINT}.
127 :
128 : fd_txncache_new formats memory region with suitable alignment and
129 : footprint suitable for holding a txn cache. Assumes shmem points
130 : on the caller to the first byte of the memory region owned by the
131 : caller to use. Returns shmem on success and NULL on failure (logs
132 : details). The memory region will be owned by the state on successful
133 : return. The caller is not joined on return.
134 :
135 : Max live slots is global value for the validator, indicating the
136 : maximum number of forks which can be active at any one time. It
137 : should be the same as "max active banks" in the replay system. This
138 : number counts any unrooted banks, plus one for the most recent root
139 : bank.
140 :
141 : Max transactions per slot should be some consensus derived upper
142 : bound on the number of transactions that can appear in a slot. This
143 : value must be valid else it is a security issue, where an attacker
144 : can cause the txn cache to run out of space and abort the program.
145 :
146 : fd_txncache_join joins the caller to a txn cache. Assumes shtc points
147 : to the first byte of the memory region holding the state. Returns a
148 : local handle to the join on success (this is not necessarily a simple
149 : cast of the address) and NULL on failure (logs details). */
150 :
151 : FD_FN_CONST ulong
152 : fd_txncache_align( void );
153 :
154 : FD_FN_CONST ulong
155 : fd_txncache_footprint( ulong max_live_slots );
156 :
157 : void *
158 : fd_txncache_new( void * ljoin,
159 : fd_txncache_shmem_t * shmem );
160 :
161 : fd_txncache_t *
162 : fd_txncache_join( void * ljoin );
163 :
164 : void
165 : fd_txncache_reset( fd_txncache_t * tc );
166 :
167 : /* fd_txncache_attach_child notifies the txncache that a new child bank
168 : has been created, off some parent. This must be called before any
169 : transaction executed on this child fork is inserted. The parent fork
170 : id must be a fork ID previously returned from attach child, which
171 : must be still valid (the current root bank or a child of it).
172 :
173 : It is assumed that there are less than max_live_slots number of
174 : unrooted (or the active root) banks active prior to calling,
175 : otherwise it is an error.
176 :
177 : Attaching a child takes a write lock on the entire structure, which
178 : is an expensive stall for any other concurrent operations, but it is
179 : OK because the operation is otherwise cheap and it is called rarely
180 : (once every 400ms or so). */
181 :
182 : fd_txncache_fork_id_t
183 : fd_txncache_attach_child( fd_txncache_t * tc,
184 : fd_txncache_fork_id_t parent_fork_id );
185 :
186 : void
187 : fd_txncache_attach_blockhash( fd_txncache_t * tc,
188 : fd_txncache_fork_id_t fork_id,
189 : uchar const * blockhash );
190 :
191 : void
192 : fd_txncache_finalize_fork( fd_txncache_t * tc,
193 : fd_txncache_fork_id_t fork_id,
194 : ulong txnhash_offset,
195 : uchar const * blockhash );
196 :
197 : /* fd_txncache_advance_root is called when the root slot of the chain
198 : has advanced, in which case old message hashes (referncing
199 : blockhashes that could no longer be valid) can be removed from the
200 : cache.
201 :
202 : Advancing the root takes a write lock on the structure, which is an
203 : expensive stall for any other concurrent operations, but it is OK
204 : because the operation is otherwise cheap and it is called rarely
205 : (once every 400ms or so). */
206 :
207 : void
208 : fd_txncache_advance_root( fd_txncache_t * tc,
209 : fd_txncache_fork_id_t fork_id );
210 :
211 : /* fd_txncache_insert inserts a transaction hash into the txn cache on
212 : the provided fork. The insert copies data from the hashes and does
213 : not have any lifetime interest in the hashes memory provided once
214 : this call returns.
215 :
216 : Insertion cannot fail, as it is assume the caller is respecting the
217 : invariants of the structure. If there is no space internally to
218 : insert another transaction, the program is aborted with an error.
219 :
220 : This is a cheap, high performance, concurrent operation and can occur
221 : at the same time as queries and arbitrary other insertions. */
222 :
223 : void
224 : fd_txncache_insert( fd_txncache_t * tc,
225 : fd_txncache_fork_id_t fork_id,
226 : uchar const * blockhash,
227 : uchar const * txnhash );
228 :
229 : /* fd_txncache_query queries the txncache to determine if the given
230 : txnhash (referencing the given blockhash) exists on the provided fork
231 : or not.
232 :
233 : Returns 1 if the transaction exists on the provided fork and 0 if it
234 : does not.
235 :
236 : This is a cheap, high performance, concurrent operation and can occur
237 : at the same time as queries and arbitrary other insertions. */
238 :
239 : int
240 : fd_txncache_query( fd_txncache_t * tc,
241 : fd_txncache_fork_id_t fork_id,
242 : uchar const * blockhash,
243 : uchar const * txnhash );
244 :
245 : FD_PROTOTYPES_END
246 :
247 : #endif /* HEADER_fd_src_flamenco_runtime_txncache_h */
|