Line data Source code
1 : #ifndef HEADER_fd_src_flamenco_runtime_fd_txncache_h
2 : #define HEADER_fd_src_flamenco_runtime_fd_txncache_h
3 :
4 : /* A txn cache is a concurrent set storing the message hashes of
5 : transactions which have already executed. Note the structure is
6 : keyed by message hash, not signature, otherwiwe a double spend might
7 : be possible due to signature malleability.
8 :
9 : The txn cache is designed to do two operations fast,
10 :
11 : (a) Insertion. Insertion is done by both the leader pipeline and
12 : the replay stage, with an insertion for every transaction that
13 : is executed, potentially over 1M per second.
14 :
15 : (b) Query. The leader pipeline queries the status of transactions
16 : before executing them to ensure that it does not execute
17 : anything twice. The replay stage queries to make sure that
18 : blocks do not contain duplicate transactions.
19 :
20 : Both of these operations are concurrent and lockless, assuming there
21 : are no other (non-insert/query) operations occuring on the txn cache.
22 : Most other operations lock the entire structure and will prevent both
23 : insertion and query from proceeding, but are rare (once per slot) so
24 : it's OK.
25 :
26 : The txn cache is somewhat CPU and memory sensitive. To store message
27 : hashes requires 20 bytes (only the first 20 of the 32 bytes of the
28 : hash are used, since this is sufficient to avoid collisions).
29 : Without any other overhead, and 31 unrooted blocks with 41,019
30 : transactions each (the current maximum), the txn cache would need to
31 : store
32 :
33 : 31 * 41,019 * 20 = 0.025 GB
34 :
35 : Of memory. But, we can't query, insert, and remove quickly from a
36 : flat array. We make a few trade-offs to achieve good performance
37 : without bloating memory completely. In particular the transactions
38 : to be queried are stored in a structure that looks like a
39 :
40 : multi_map<blockhash, hash_map<txnhash, vec<fork_idx>>>
41 :
42 : Note that there can be multiple duplicate blockhashes in the map, so
43 : it is a multimap. This is to handle the extremely rare degenerate
44 : case where a leader equivocates a block, but keeps the transactions
45 : the same and just fiddles with the shred data or padding a little
46 : bit. This keeps blockhash the same, but creates a new block ID
47 : (fork). We cannot stick both forks in the same blockhash entry
48 : because later, when one fork is rooted, we need to be able to purge
49 : all transactions in the non-rooted equivocation, otherwise we would
50 : violate our own invariants around the max active banks. (Rooting one
51 : equivocation should fully evict transactions belonging to the other
52 : equivocations, which would be impossible if they were under one
53 : blockhash).
54 :
55 : Both hash maps are chained hash maps, and for the txnhash map items
56 : come from a pool of pages of transactions. We use pages of
57 : transactions to support fast removal of a blockhash from the top
58 : level map.
59 :
60 : This adds additional memory overhead, a blockhash with only one
61 : transaction in it will still consume a full page (16,384) of
62 : transactions of memory. Allocating a new transaction page to a
63 : chained map is rare (once every 16,384 inserts) so the cost amortizes
64 : to zero. Creating a blockhash happens once per blockhash, so also
65 : amortizes to zero, the only operation we care about is then the
66 : simple insert case with an unfull transaction page into an existing
67 : blockhash. This can be done with two compare-and-swaps,
68 :
69 : // 1. Find the blockhash in the probed hash map.
70 :
71 : let by_blockhash = multi_map[ blockhash ];
72 :
73 : // 2. Request space for the transaction from the current page
74 :
75 : let page = by_blockhash.pages.back();
76 : let idx = page.used.compare_and_swap( current, current+1 );
77 :
78 : // 3. Write the transaction into the page and the map
79 :
80 : page.txns[ idx ] = txn;
81 : page.txns[ idx ].next = by_blockhash.txns[ txnhash ].idx;
82 : by_blockhash.txns[ txnhash ].head.compare_and_swap( current, idx );
83 :
84 : Step 1 is fast assuming equivocation is rare, since there will only
85 : ever be one matching blockhash in the multi map.
86 :
87 : Removal of a blockhash from this structure is simple because it does
88 : not need to be concurrent (it is once per slot). We take a write
89 : lock, restore the pages in the blockhash to the pool, and then mark
90 : the space in the hash_map as empty. Page restoration is a simple
91 : memcpy.
92 :
93 : All of this would be complicated by nonce transactions, which can
94 : reference any blockhash, and so a transaction could appear in the
95 : txn cache under any blockhash. But when dealing with these we make
96 : a simple observation: double spend is already cryptographically
97 : prevented by the nonce mechanism itself, so we do not need to store
98 : or check them. Agave does store them for RPC related reasons (they
99 : want to be able to query the status of nonce transactions that have
100 : already executed). It is therefore an error to insert any nonce
101 : transactions into the txn cache, and the caller is responsible for
102 : ensuring this does not happen.
103 :
104 : There is one minor complication with this, which is that Agave
105 : snapshots serve out the status cache with nonce transactions already
106 : in it. This structure assumes these will not get inserted, so it is
107 : up to the snapshot loading code to filter out all blockhashes which
108 : are nonce blockhashes (not in the recent blockhashes sysvar). */
109 :
110 : #include "fd_txncache_shmem.h"
111 :
112 0 : #define FD_TXNCACHE_ALIGN (128UL)
113 :
114 : #define FD_TXNCACHE_MAGIC (0xF17EDA2CE5CAC4E0) /* FIREDANCE SCACHE V0 */
115 :
116 : struct fd_txncache_private;
117 : typedef struct fd_txncache_private fd_txncache_t;
118 :
119 : FD_PROTOTYPES_BEGIN
120 :
121 : /* fd_txncache_{align,footprint} give the needed alignment and
122 : footprint of a memory region suitable to hold a txn cache.
123 : fd_txncache_{align,footprint} return the same value as
124 : FD_TXNCACHE_{ALIGN,FOOTPRINT}.
125 :
126 : fd_txncache_new formats memory region with suitable alignment and
127 : footprint suitable for holding a txn cache. Assumes shmem points
128 : on the caller to the first byte of the memory region owned by the
129 : caller to use. Returns shmem on success and NULL on failure (logs
130 : details). The memory region will be owned by the state on successful
131 : return. The caller is not joined on return.
132 :
133 : Max live slots is global value for the validator, indicating the
134 : maximum number of forks which can be active at any one time. It
135 : should be the same as "max active banks" in the replay system. This
136 : number counts any unrooted banks, plus one for the most recent root
137 : bank.
138 :
139 : Max transactions per slot should be some consensus derived upper
140 : bound on the number of transactions that can appear in a slot. This
141 : value must be valid else it is a security issue, where an attacker
142 : can cause the txn cache to run out of space and abort the program.
143 :
144 : fd_txncache_join joins the caller to a txn cache. Assumes shtc points
145 : to the first byte of the memory region holding the state. Returns a
146 : local handle to the join on success (this is not necessarily a simple
147 : cast of the address) and NULL on failure (logs details). */
148 :
149 : FD_FN_CONST ulong
150 : fd_txncache_align( void );
151 :
152 : FD_FN_CONST ulong
153 : fd_txncache_footprint( ulong max_live_slots );
154 :
155 : void *
156 : fd_txncache_new( void * ljoin,
157 : fd_txncache_shmem_t * shmem );
158 :
159 : fd_txncache_t *
160 : fd_txncache_join( void * ljoin );
161 :
162 : void
163 : fd_txncache_reset( fd_txncache_t * tc );
164 :
165 : /* fd_txncache_attach_child notifies the txncache that a new child bank
166 : has been created, off some parent. This must be called before any
167 : transaction executed on this child fork is inserted. The parent fork
168 : id must be a fork ID previously returned from attach child, which
169 : must be still valid (the current root bank or a child of it).
170 :
171 : It is assumed that there are less than max_live_slots number of
172 : unrooted (or the active root) banks active prior to calling,
173 : otherwise it is an error.
174 :
175 : Attaching a child takes a write lock on the entire structure, which
176 : is an expensive stall for any other concurrent operations, but it is
177 : OK because the operation is otherwise cheap and it is called rarely
178 : (once every 400ms or so). */
179 :
180 : fd_txncache_fork_id_t
181 : fd_txncache_attach_child( fd_txncache_t * tc,
182 : fd_txncache_fork_id_t parent_fork_id );
183 :
184 : void
185 : fd_txncache_attach_blockhash( fd_txncache_t * tc,
186 : fd_txncache_fork_id_t fork_id,
187 : uchar const * blockhash );
188 :
189 : void
190 : fd_txncache_finalize_fork( fd_txncache_t * tc,
191 : fd_txncache_fork_id_t fork_id,
192 : ulong txnhash_offset,
193 : uchar const * blockhash );
194 :
195 : /* fd_txncache_advance_root is called when the root slot of the chain
196 : has advanced, in which case old message hashes (referncing
197 : blockhashes that could no longer be valid) can be removed from the
198 : cache.
199 :
200 : Advancing the root takes a write lock on the structure, which is an
201 : expensive stall for any other concurrent operations, but it is OK
202 : because the operation is otherwise cheap and it is called rarely
203 : (once every 400ms or so). */
204 :
205 : void
206 : fd_txncache_advance_root( fd_txncache_t * tc,
207 : fd_txncache_fork_id_t fork_id );
208 :
209 : /* fd_txncache_insert inserts a transaction hash into the txn cache on
210 : the provided fork. The insert copies data from the hashes and does
211 : not have any lifetime interest in the hashes memory provided once
212 : this call returns.
213 :
214 : Insertion cannot fail, as it is assume the caller is respecting the
215 : invariants of the structure. If there is no space internally to
216 : insert another transaction, the program is aborted with an error.
217 :
218 : This is a cheap, high performance, concurrent operation and can occur
219 : at the same time as queries and arbitrary other insertions. */
220 :
221 : void
222 : fd_txncache_insert( fd_txncache_t * tc,
223 : fd_txncache_fork_id_t fork_id,
224 : uchar const * blockhash,
225 : uchar const * txnhash );
226 :
227 : /* fd_txncache_query queries the txncache to determine if the given
228 : txnhash (referencing the given blockhash) exists on the provided fork
229 : or not.
230 :
231 : Returns 1 if the transaction exists on the provided fork and 0 if it
232 : does not.
233 :
234 : This is a cheap, high performance, concurrent operation and can occur
235 : at the same time as queries and arbitrary other insertions. */
236 :
237 : int
238 : fd_txncache_query( fd_txncache_t * tc,
239 : fd_txncache_fork_id_t fork_id,
240 : uchar const * blockhash,
241 : uchar const * txnhash );
242 :
243 : FD_PROTOTYPES_END
244 :
245 : #endif /* HEADER_fd_src_flamenco_runtime_fd_txncache_h */
|