Line data Source code
1 : #ifndef HEADER_fd_src_choreo_tower_fd_tower_h
2 : #define HEADER_fd_src_choreo_tower_fd_tower_h
3 :
4 : /* fd_tower presents an API for Solana's TowerBFT algorithm.
5 :
6 : What is TowerBFT? TowerBFT is an algorithm for converging a
7 : supermajority of stake in the validator cluster on the same fork.
8 :
9 : /-- 3-- 4 (A)
10 : 1-- 2
11 : \-- 5 (B)
12 :
13 : Above is a diagram of a fork. The leader for slot 5 decided to build
14 : off slot 2, rather than slot 4. This can happen for various reasons,
15 : for example network propagation delay. We now have two possible forks
16 : labeled A and B. The consensus algorithm has to pick one of them.
17 :
18 : So, how does the consensus algorithm pick? As detailed in fd_ghost.h,
19 : we pick the fork based on the most stake from votes, called the
20 : "heaviest". Validators vote for blocks during replay, and
21 : simultaneously use other validator’s votes to determine which block
22 : to vote for. This encourages convergence, because as one fork gathers
23 : more votes, more and more votes pile-on, solidifying its position as
24 : the heaviest fork.
25 :
26 : /-- 3-- 4 (10%)
27 : 1-- 2
28 : \-- 5 (9%)
29 :
30 : However, network propagation delay of votes can lead us to think one
31 : fork is heaviest, before observing new votes that indicate another
32 : fork is heavier. So our consensus algorithm also needs to support
33 : switching.
34 :
35 : /-- 3-- 4 (10%)
36 : 1-- 2
37 : \-- 5 (15%)
38 :
39 : At the same time we don’t want excessive switching. The more often
40 : validators switch, the more difficult it will be to achieve that
41 : pile-on effect I just described.
42 :
43 : Note that to switch forks, you need to rollback a given slot and its
44 : descendants on that fork. In the example above, to switch to 1, 2, 5,
45 : we need to rollback 3 and 4. The consensus algorithm makes it more
46 : costly the further you want to rollback a fork. Here, I’ve added a
47 : column lockout, which doubles for every additional slot you want to
48 : rollback.
49 :
50 : Eventually you have traversed far enough down a fork, that the
51 : lockout is so great it is infeasible to imagine it ever rolling back
52 : in practice. So you can make that fork permanent or “commit” it. Once
53 : all validators do this, the blockchain now just has a single fork.
54 :
55 : Armed with some intuition, let’s now begin defining some terminology.
56 : Here is a diagram of a validator's "vote tower":
57 :
58 : slot | confirmation count (conf)
59 : --------------------------------
60 : 4 | 1
61 : 3 | 2
62 : 2 | 3
63 : 1 | 4
64 :
65 : It is a stack structure in which each element is a vote. The vote
66 : slot column indicates which slots the validator has voted for,
67 : ordered from most to least recent.
68 :
69 : The confirmation count column indicates how many consecutive votes on
70 : the same fork have been pushed on top of that vote. You are
71 : confirming your own votes for a fork every time you vote on top of
72 : the same fork.
73 :
74 : Two related concepts to confirmation count are lockout and expiration
75 : slot. Lockout equals 2 to the power of confirmation count. Every time
76 : we “confirm” a vote by voting on top of it, we double the lockout.
77 : The expiration slot is the sum of vote slot and lockout, so it also
78 : increases when lockouts double. It represents which slot the vote
79 : will expire. When a vote expires, it is popped from the top of the
80 : tower. An important Tower rule is that a validator cannot vote for a
81 : different fork from a given vote slot, until reaching the expiration
82 : slot for that vote slot. To summarize, the further a validator wants
83 : to rollback their fork (or vote slots) the longer the validator needs
84 : to wait without voting (in slot time).
85 :
86 : Here is the same tower, fully-expanded to include all the fields:
87 :
88 : slot | conf | lockout | expiration
89 : ----------------------------------
90 : 4 | 1 | 2 | 6
91 : 3 | 2 | 4 | 7
92 : 2 | 3 | 8 | 10
93 : 1 | 4 | 16 | 17
94 :
95 : Based on this tower, the validator is locked out from voting for any
96 : slot <= 6 that is on a different fork than slot 4. I’d like to
97 : emphasize that the expiration is with respect to the vote slot, and
98 : is _not_ related to the Proof-of-History slot or what the
99 : quote-unquote current slot is. So even if the current slot is now 7,
100 : the validator can’t go back and vote for slot 5, if slot 5 were on a
101 : different fork than 4. The earliest valid vote slot this validator
102 : could submit for a different fork from 4 would be slot 7 or later.
103 :
104 : Next let’s look at how the tower makes state transitions. Here we
105 : have the previous example tower, with a before-and-after view with
106 : respect to a vote for slot 9:
107 :
108 : (before) slot | conf
109 : -----------
110 : 4 | 1
111 : 3 | 2
112 : 2 | 3
113 : 1 | 4
114 :
115 : (after) slot | conf
116 : -----------
117 : 9 | 1
118 : 2 | 3
119 : 1 | 4
120 :
121 : As you can see, we added a vote for slot 9 to the top of the tower.
122 : But we also removed the votes for slot 4 and slot 3. What happened?
123 : This is an example of vote expiry in action. When we voted for slot
124 : 9, this exceeded the expirations of vote slots 4 and 3, which were 6
125 : and 7 respectively. This action of voting triggered the popping of
126 : the expired votes from the top of the tower.
127 :
128 : Next, we add a vote for slot 10:
129 :
130 : (before) slot | conf
131 : -----------
132 : 9 | 1
133 : 2 | 3
134 : 1 | 4
135 :
136 : (after) slot | conf
137 : -----------
138 : 10 | 1
139 : 9 | 2
140 : 2 | 3
141 : 1 | 4
142 :
143 : The next vote for slot 10 doesn’t involve expirations, so we just add
144 : it to the top of the tower. Also, here is an important property of
145 : lockouts. Note that the lockout for vote slot 9 doubled (ie. the
146 : confirmation count increased by 1) but the lockouts of vote slots 2
147 : and 1 remained unchanged.
148 :
149 : The reason for this is confirmation counts only increase when they
150 : are consecutive in the vote tower. Because 4 and 3 were expired
151 : previously by the vote for 9, that consecutive property was broken.
152 : In this case, the vote for slot 10 is only consecutive with slot 9,
153 : but not 2 and 1. Specifically, there is a gap in the before-tower at
154 : confirmation count 2.
155 :
156 : In the after-tower, all the votes are again consecutive (confirmation
157 : counts 1, 2, 3, 4 are all accounted for), so the next vote will
158 : result in all lockouts doubling as long as it doesn’t result in more
159 : expirations.
160 :
161 : One other thing I’d like to point out about this vote for slot 10.
162 : Even though 10 >= the expiration slot of vote slot 2, which is
163 : 10, voting for 11 did not expire the vote for 2. This is because
164 : expiration happens top-down and contiguously. Because vote slot 9 was
165 : not expired, we do not proceed with expiring 2.
166 :
167 : In the Tower rules, once a vote reaches a conf count of 32, it is
168 : considered rooted and it is popped from the bottom of the tower. Here
169 : is an example where 1 got rooted and therefore popped from the bottom:
170 :
171 : (before) slot | conf
172 : -----------
173 : 50 | 1
174 : ... | ... (29 votes elided)
175 : 1 | 31
176 :
177 : (after) slot | conf
178 : -----------
179 : 53 | 1
180 : ... | ... (29 votes elided)
181 : 2 | 31
182 :
183 : So the tower is really a double-ended queue rather than a stack.
184 :
185 : Rooting has implications beyond the Tower. It's what we use to prune
186 : our state. Every time tower makes a new root slot, we prune any old
187 : state that does not originate from that new root slot. Our blockstore
188 : will discard blocks below that root, our forks structure will discard
189 : stale banks, funk (which is our accounts database) will discard stale
190 : transactions (which in turn track modifications to accounts), and
191 : ghost (which is our fork select tree) will discard stale nodes
192 : tracking stake percentages. We call this operation publishing.
193 :
194 : Note that the vote slots are not necessarily consecutive. Here I
195 : elided the votes sandwiched between the newest and oldest votes for
196 : brevity.
197 :
198 : Next, let’s go over three additional tower checks. These three checks
199 : further reinforce the consensus algorithm we established with
200 : intuition, in this case getting a supermajority (ie. 2/3) of stake to
201 : converge on a fork.
202 :
203 : The first is the threshold check. The threshold check makes sure at
204 : least 2/3 of stake has voted for the same fork as the vote at depth 8
205 : in our tower. Essentially, this guards our tower from getting too out
206 : of sync with the rest of the cluster. If we get too out of sync we
207 : can’t vote for a long time, because we had to rollback a vote we had
208 : already confirmed many times and had a large lockout. This might
209 : otherwise happen as the result of a network partition where we can
210 : only communicate with a subset of stake.
211 :
212 : Next is the lockout check. We went in detail on this earlier when
213 : going through the lockout and expiration slot, and as before, the
214 : rule is we can only vote on a slot for a different fork from a
215 : previous vote, after that vote’s expiration slot.
216 :
217 : Given this fork and tower from earlier:
218 :
219 : /-- 3-- 4
220 : 1-- 2
221 : \-- 5
222 :
223 : slot | conf
224 : -----------
225 : 4 | 1
226 : 3 | 2
227 : 2 | 3
228 : 1 | 4
229 :
230 : You’re locked out from voting for 5 because it’s on a different fork
231 : from 4 and the expiration slot of your previous vote for 4 is 6.
232 :
233 : However, if we introduce a new slot 9:
234 :
235 : /-- 3-- 4
236 : 1-- 2
237 : \-- 5-- 9
238 :
239 : slot | conf
240 : -----------
241 : 9 | 1
242 : 2 | 3
243 : 1 | 4
244 :
245 : Here the new Slot 9 descends from 5, and exceeds vote slot 4’s
246 : expiration slot of 6 unlike 5.
247 :
248 : After your lockout expires, the tower rules allow you to vote for
249 : descendants of the fork slot you wanted to switch to in the first
250 : place (here, 9 descending from 5). So we eventually switch to the fork
251 : we wanted, by voting for 9 and expiring 3 and 4.
252 :
253 : Importantly, notice that the fork slots and vote slots are not exactly
254 : 1-to-1. While conceptually our tower is voting for the fork 1, 2, 5,
255 : 9, the vote for 5 is only implied. Our tower votes themselves still
256 : can’t include 5 due to lockout.
257 :
258 : Finally, the switch check. The switch check is used to safeguard
259 : optimistic confirmation. Optimistic confirmation is when a slot gets
260 : 2/3 of stake-weighted votes. This is then treated as a signal that the
261 : slot will eventually get rooted. However, to actually guarantee this
262 : we need a rule that prevents validators from arbitrarily switching
263 : forks (even when their vote lockout has expired). This rule is the
264 : switch check.
265 :
266 : The switch check is additional to the lockout check. Before switching
267 : forks, we need to make sure at least 38% of stake has voted for a
268 : different fork than our own. Different fork is defined by finding the
269 : greatest common ancestor of our last voted fork slot and the slot we
270 : want to switch to. Any forks descending from the greatest common
271 : ancestor (which I will subsequently call the GCA) that are not our
272 : own fork are counted towards the switch check stake.
273 :
274 : Here we visualize the switch check:
275 :
276 : /-- 7
277 : /-- 3-- 4
278 : 1-- 2 -- 6
279 : \-- 5-- 9
280 :
281 : First, we find the GCA of 4 and 9 which is 2. Then we look at all the
282 : descendants of the GCA that do not share a fork with us, and make sure
283 : their stake sums to more than 38%.
284 :
285 : I’d like to highlight that 7 here is not counted towards the switch
286 : proof, even though it is on a different fork from 4. This is because
287 : it’s on the same fork relative to the GCA.
288 :
289 : So that covers the checks. Next, there are two additional important
290 : concepts: "reset slot" and "vote slot". The reset slot is the slot you
291 : reset PoH to when it's your turn to be leader. Because you are
292 : responsible for producing a block, you need to decide which fork to
293 : build your block on. For example, if there are two competing slots 3
294 : and 4, you would decide whether to build 3 <- 5 or 4 <- 5. In general
295 : the reset slot is the same fork as the vote slot, but not always.
296 : There is an important reason for this. Recall this fork graph from
297 : earlier:
298 :
299 : /-- 3-- 4 (10%)
300 : 1-- 2
301 : \-- 5-- 6 (9%)
302 :
303 : In this diagram, 4 is the winner of fork choice. All future leaders
304 : now want to reset to slot 4. Naively, this makes sense because you
305 : maximize the chance of your block finalizing (and earning the rewards)
306 : if you greedily (in the algorithmic, and perhaps also literal sense)
307 : pick what's currently the heaviest.
308 :
309 : However, say most validators actually voted fork 5, even though we
310 : currently observe 3 as the heavier. For whatever reason, those votes
311 : for 5 just didn't land (the leader for 6 missed the votes, network
312 : blip, etc.)
313 :
314 : All these validators that voted for 5 are now constrained by the
315 : switch check (38% of stake), and none of them can actually switch
316 : their vote to 4 (which only has 10%). But they're all continuing to
317 : build blocks on top of fork 4, which importantly implies that votes
318 : for 5 will not be able to propagate. This is because the validators
319 : that can't switch continue to refresh their votes for 5, but those
320 : votes never "land" because no one is building blocks on top of fork
321 : 5 anymore (everyone is building on 4 because that's currently the
322 : heaviest).
323 :
324 : Therefore, it is important to reset to the same fork as your last vote
325 : slot, which is usually also the heaviest fork, but not always.
326 :
327 : Note that with both the vote slot and reset slot, the tower uses ghost
328 : to determine the last vote slot's ancestry. So what happens if the
329 : last vote slot isn't in the ghost? There are two separate cases in
330 : which this can happen that tower needs to handle:
331 :
332 : 1. Our last vote slot > ghost root slot, but is not a descendant of
333 : the ghost root. This can happen if we get stuck on a minority fork
334 : with a long lockout. In the worst case, lockout duration is
335 : 2^{threshold_check_depth} ie. 2^8 = 256 slots. In other words, we
336 : voted for and confirmed a minority fork 8 times in a row. We assume
337 : we won't vote past 8 times for the minority fork, because the
338 : threshold check would have stopped us (recall the threshold check
339 : requires 2/3 of stake to be on the same fork at depth 8 before we
340 : can keep voting for that fork).
341 :
342 : While waiting for those 256 slots of lockout to expire, it is
343 : possible that in the meantime a supermajority (ie. >2/3) of the
344 : cluster actually roots another fork that is not ours. During
345 : regular execution, we would not publish ghost until we have an
346 : updated tower root. So as long as the validator stays running while
347 : it is locked out from the supermajority fork, it keeps track of its
348 : vote slot's ancestry.
349 :
350 : If the validator were to stop running while locked out though (eg.
351 : operator needed to restart the box), the validator attempts to
352 : repair the ancestry of its last vote slot.
353 :
354 : In the worst case, if we cannot repair that ancestry, then we do
355 : not vote until replay reaches the expiration slot of that last vote
356 : slot. We can assume the votes > depth 8 in the tower do not violate
357 : lockout, because again the threshold check would have guarded it.
358 :
359 : TODO CURRENTLY THIS IS UNHANDLED. WHAT THE VALIDATOR DOES IF IT
360 : HAS LOST THE GHOST ANCESTRY IS IT WILL ERROR OUT.
361 :
362 : 2. Our last vote slot < ghost root slot. In this case we simply
363 : cannot determine whether our last vote slot is on the same fork as
364 : our ghost root slot because we no longer have ancestry information
365 : before the ghost root slot. This can happen if the validator is not
366 : running for a long time, then started up again. It will have to use
367 : the snapshot slot for the beginning of the ghost ancestry, which
368 : could be well past the last vote slot in the tower.
369 :
370 : In this case, before the validator votes again, it makes sure that
371 : the last vote's confirmation count >= THRESHOLD_CHECK_DEPTH (stated
372 : differently, it makes sure the next time it votes it will expire at
373 : least the first THRESHOLD_CHECK_DEPTH votes in the tower), and then
374 : it assumes that the last vote slot is on the same fork as the ghost
375 : root slot.
376 :
377 : TODO VERIFY AGAVE BEHAVIOR IS THE SAME.
378 :
379 : Now let’s switch gears from theory back to practice. What does it mean
380 : to send a vote?
381 :
382 : As a validator, you aren’t sending individual tower votes. Rather, you
383 : are sending your entire updated tower to the cluster every time.
384 : Essentially, the validator is continuously syncing their local tower
385 : with the cluster. That tower state is then stored inside a vote
386 : account, like any other state on Solana.
387 :
388 : On the flip side, we also must stay in sync the other way from cluster
389 : to local. If we have previously voted, we need to make sure our tower
390 : matches up with what the cluster has last seen. We know the most
391 : recent tower is in the last vote we sent, so we durably store every
392 : tower (by checkpointing it to disk) whenever we send a vote. In case
393 : this tower is out-of-date Conveniently Funk, our accounts database,
394 : stores all the vote accounts including our own, so on bootstrap we
395 : simply load in our vote account state itself to to initialize our own
396 : local view of the tower.
397 :
398 : Finally, a note on the difference between the Vote Program and
399 : TowerBFT. The Vote Program runs during transaction (block) execution.
400 : It checks that certain invariants about the tower inside a vote
401 : transaction are upheld (recall a validator sends their entire tower as
402 : part of a "vote"): otherwise, it fails the transaction. For example,
403 : it checks that every vote contains a tower in which the vote slots are
404 : strictly monotonically increasing. As a consequence, only valid towers
405 : are committed to the ledger. Another important detail of the Vote
406 : Program is that it only has a view of the current fork on which it is
407 : executing. Specifically, it can't observe what state is on other
408 : forks, like what a validator's tower looks like on fork A vs. fork B.
409 :
410 : The TowerBFT rules, on the other hand, run after transaction
411 : execution. Also unlike the Vote Program, the TowerBFT rules do not
412 : take the vote transactions as inputs: rather the inputs are the towers
413 : that have already been written to the ledger by the Vote Program. As
414 : described above, the Vote Program validates every tower, and in this
415 : way, the TowerBFT rules leverage the validation already done by the
416 : Vote Program to (mostly) assume each tower is valid. Every validator
417 : runs TowerBFT to update their own tower with votes based on the
418 : algorithm documented above. Importantly, TowerBFT has a view of all
419 : forks, and the validator makes a voting decision based on all forks.
420 : */
421 :
422 : #include "../fd_choreo_base.h"
423 : #include "../epoch/fd_epoch.h"
424 : #include "../ghost/fd_ghost.h"
425 : #include "../../disco/pack/fd_microblock.h"
426 :
427 : /* FD_TOWER_PARANOID: Define this to non-zero at compile time
428 : to turn on additional runtime integrity checks. */
429 :
430 : #ifndef FD_TOWER_PARANOID
431 : #define FD_TOWER_PARANOID 1
432 : #endif
433 :
434 0 : #define FD_TOWER_VOTE_MAX (31UL)
435 :
436 : struct fd_tower_vote {
437 : ulong slot; /* vote slot */
438 : ulong conf; /* confirmation count */
439 : };
440 : typedef struct fd_tower_vote fd_tower_vote_t;
441 :
442 : #define DEQUE_NAME fd_tower_votes
443 0 : #define DEQUE_T fd_tower_vote_t
444 0 : #define DEQUE_MAX FD_TOWER_VOTE_MAX
445 : #include "../../util/tmpl/fd_deque.c"
446 :
447 : /* fd_tower is a representation of a validator's "vote tower" (described
448 : in detail in the preamble at the top of this file). The votes in the
449 : tower are stored in an fd_deque ordered from lowest to highest vote
450 : slot (highest to lowest confirmation count) relative to the head and
451 : tail . There can be at most 31 votes in the tower. This invariant
452 : is upheld with every call to `fd_tower_vote`.
453 :
454 : The definition of `fd_tower_t` is a simple typedef alias for
455 : `fd_tower_vote_t` and is a transparent wrapper around the vote deque.
456 : Relatedly, the tower API takes a local pointer to the first vote in
457 : the deque (the result of `fd_deque_join`) as a parameter in all its
458 : function signatures. */
459 :
460 : typedef fd_tower_vote_t fd_tower_t;
461 :
462 : /* fd_tower_sync_serde describes a serialization / deserialization
463 : schema for a bincode-encoded TowerSync transaction. The serde is
464 : structured for zero-copy access ie. x-raying individual fields. */
465 :
466 : struct fd_tower_sync_serde /* CompactTowerSync */ {
467 : ulong const * root;
468 : struct /* short_vec */ {
469 : ushort lockouts_cnt; /* variable-length so copied (ShortU16) */
470 : struct /* Lockout */ {
471 : ulong offset; /* variable-length so copied (VarInt) */
472 : uchar const * confirmation_count;
473 : } lockouts[31];
474 : };
475 : fd_hash_t const * hash;
476 : struct /* Option<UnixTimestamp> */ {
477 : uchar const * timestamp_option;
478 : long const * timestamp;
479 : };
480 : fd_hash_t const * block_id;
481 : };
482 : typedef struct fd_tower_sync_serde fd_tower_sync_serde_t;
483 :
484 : /* fd_tower_serde describes a serialization / deserialization schema for
485 : an Agave-compatible bincode encoding of a tower. This corresponds
486 : exactly with the binary layout of a tower file that Agave interfaces
487 : with during boot, set-identity, and voting.
488 :
489 : The serde is structured for zero-copy access ie. x-raying individual
490 : fields. */
491 :
492 : struct fd_tower_serde /* SavedTowerVersions::Current */ {
493 : uint const * kind;
494 : fd_ed25519_sig_t const * signature;
495 : ulong const * data_sz; /* serialized sz of data field below */
496 : struct /* Tower1_14_11 */ {
497 : fd_pubkey_t const * node_pubkey;
498 : ulong const * threshold_depth;
499 : double const * threshold_size;
500 : fd_voter_v2_serde_t vote_state;
501 : struct {
502 : uint const * last_vote_kind;
503 : fd_tower_sync_serde_t last_vote;
504 : };
505 : struct /* BlockTimestamp */ {
506 : ulong const * slot;
507 : long const * timestamp;
508 : } last_timestamp;
509 : } /* data */;
510 : };
511 : typedef struct fd_tower_serde fd_tower_serde_t;
512 :
513 : /* fd_tower_sign_fn is the signing callback used for signing tower
514 : checkpoints after serialization. */
515 :
516 : typedef void (fd_tower_sign_fn)( void const * ctx, uchar * sig, uchar const * ser, ulong ser_sz );
517 :
518 : /* FD_TOWER_{ALIGN,FOOTPRINT} specify the alignment and footprint needed
519 : for tower. ALIGN is double x86 cache line to mitigate various kinds
520 : of false sharing (eg. ACLPF adjacent cache line prefetch). FOOTPRINT
521 : is the size of fd_deque including the private header's start and end
522 : and an exact multiple of ALIGN. These are provided to facilitate
523 : compile time tower declarations. */
524 :
525 6 : #define FD_TOWER_ALIGN (128UL)
526 0 : #define FD_TOWER_FOOTPRINT (512UL)
527 : FD_STATIC_ASSERT( FD_TOWER_FOOTPRINT==sizeof(fd_tower_votes_private_t), FD_TOWER_FOOTPRINT );
528 :
529 : /* fd_tower_{align,footprint} return the required alignment and
530 : footprint of a memory region suitable for use as a tower. align
531 : returns FD_TOWER_ALIGN. footprint returns FD_TOWER_FOOTPRINT. */
532 :
533 : FD_FN_CONST static inline ulong
534 6 : fd_tower_align( void ) {
535 6 : return FD_TOWER_ALIGN;
536 6 : }
537 :
538 : FD_FN_CONST static inline ulong
539 0 : fd_tower_footprint( void ) {
540 0 : return FD_TOWER_FOOTPRINT;
541 0 : }
542 :
543 : /* fd_tower_new formats an unused memory region for use as a tower. mem
544 : is a non-NULL pointer to this region in the local address space with
545 : the required footprint and alignment. */
546 :
547 : void *
548 : fd_tower_new( void * mem );
549 :
550 : /* fd_tower_join joins the caller to the tower. tower points to the
551 : first byte of the memory region backing the tower in the caller's
552 : address space.
553 :
554 : Returns a pointer in the local address space to tower on success. */
555 :
556 : fd_tower_t *
557 : fd_tower_join( void * tower );
558 :
559 : /* fd_tower_leave leaves a current local join. Returns a pointer to the
560 : underlying shared memory region on success and NULL on failure (logs
561 : details). Reasons for failure include tower is NULL. */
562 :
563 : void *
564 : fd_tower_leave( fd_tower_t * tower );
565 :
566 : /* fd_tower_delete unformats a memory region used as a tower. Assumes
567 : only the local process is joined to the region. Returns a pointer to
568 : the underlying shared memory region or NULL if used obviously in
569 : error (e.g. tower is obviously not a tower ... logs details). The
570 : ownership of the memory region is transferred to the caller. */
571 :
572 : void *
573 : fd_tower_delete( void * tower );
574 :
575 : /* fd_tower_lockout_check checks if we are locked out from voting for
576 : the `slot`. Returns 1 if we can vote for `slot` without violating
577 : lockout, 0 otherwise. Assumes tower is non-empty.
578 :
579 : After voting for a slot n, we are locked out for 2^k slots, where k
580 : is the confirmation count of that vote. Once locked out, we cannot
581 : vote for a different fork until that previously-voted fork expires at
582 : slot n+2^k. This implies the earliest slot in which we can switch
583 : from the previously-voted fork is (n+2^k)+1. We use `ghost` to
584 : determine whether `slot` is on the same or different fork as previous
585 : vote slots.
586 :
587 : In the case of the tower, every vote has its own expiration slot
588 : depending on confirmations. The confirmation count is the max number
589 : of consecutive votes that have been pushed on top of the vote, and
590 : not necessarily its current height in the tower.
591 :
592 : For example, the following is a diagram of a tower pushing and
593 : popping with each vote:
594 :
595 :
596 : slot | confirmation count
597 : -----|-------------------
598 : 4 | 1 <- vote
599 : 3 | 2
600 : 2 | 3
601 : 1 | 4
602 :
603 :
604 : slot | confirmation count
605 : -----|-------------------
606 : 9 | 1 <- vote
607 : 2 | 3
608 : 1 | 4
609 :
610 :
611 : slot | confirmation count
612 : -----|-------------------
613 : 10 | 1 <- vote
614 : 9 | 2
615 : 2 | 3
616 : 1 | 4
617 :
618 :
619 : slot | confirmation count
620 : -----|-------------------
621 : 11 | 1 <- vote
622 : 10 | 2
623 : 9 | 3
624 : 2 | 4
625 : 1 | 5
626 :
627 :
628 : slot | confirmation count
629 : -----|-------------------
630 : 18 | 1 <- vote
631 : 2 | 4
632 : 1 | 5
633 :
634 :
635 : In the final tower, note the gap in confirmation counts between slot
636 : 18 and slot 2, even though slot 18 is directly above slot 2. */
637 :
638 : int
639 : fd_tower_lockout_check( fd_tower_t const * tower,
640 : fd_ghost_t const * ghost,
641 : ulong slot,
642 : fd_hash_t const * block_id );
643 :
644 : /* fd_tower_switch_check checks if we can switch to the fork of `slot`.
645 : Returns 1 if we can switch, 0 otherwise. Assumes tower is non-empty.
646 :
647 : There are two forks of interest: our last vote fork ("vote fork") and
648 : the fork we want to switch to ("switch fork"). The switch fork is the
649 : fork of `slot`.
650 :
651 : In order to switch, FD_TOWER_SWITCH_PCT of stake must have voted for
652 : a different descendant of the GCA of vote_fork and switch_fork, and
653 : also must be locked out from our last vote slot.
654 :
655 : Recall from the lockout check a validator is locked out from voting
656 : for our last vote slot when their last vote slot is on a different
657 : fork, and that vote's expiration slot > our last vote slot.
658 :
659 : The following pseudocode describes the algorithm:
660 :
661 : ```
662 : find the greatest common ancestor (gca) of vote_fork and switch_fork
663 : for all validators v
664 : if v's locked out[1] from voting for our latest vote slot
665 : add v's stake to switch stake
666 : return switch stake >= FD_TOWER_SWITCH_PCT
667 : ```
668 :
669 : The switch check is used to safeguard optimistic confirmation.
670 : Specifically: FD_TOWER_OPT_CONF_PCT + FD_TOWER_SWITCH_PCT >= 1. */
671 :
672 : int
673 : fd_tower_switch_check( fd_tower_t const * tower,
674 : fd_epoch_t const * epoch,
675 : fd_ghost_t const * ghost,
676 : ulong slot,
677 : fd_hash_t const * block_id );
678 :
679 : /* fd_tower_threshold_check checks if we pass the threshold required to
680 : vote for `slot`. This is only relevant after voting for (and
681 : confirming) the same fork ie. the tower is FD_TOWER_THRESHOLD_DEPTH
682 : deep. Returns 1 if we pass the threshold check, 0 otherwise.
683 :
684 : The following psuedocode describes the algorithm:
685 :
686 : ```
687 : for all vote accounts in the current epoch
688 :
689 : simulate that the validator has voted for `slot`
690 :
691 : pop all votes expired by that simulated vote
692 :
693 : if the validator's latest tower vote after expiry >= our threshold
694 : slot ie. our vote from FD_TOWER_THRESHOLD_DEPTH back (after
695 : simulating a vote on our own tower the same way), then add
696 : validator's stake to threshold_stake.
697 :
698 : return threshold_stake >= FD_TOWER_THRESHOLD_RATIO
699 : ```
700 :
701 : The threshold check simulates voting for the current slot to expire
702 : stale votes. This is to prevent validators that haven't voted in a
703 : long time from counting towards the threshold stake. */
704 :
705 : int
706 : fd_tower_threshold_check( fd_tower_t const * tower,
707 : fd_epoch_t * epoch,
708 : fd_pubkey_t * vote_keys,
709 : fd_tower_t * const * vote_towers,
710 : ulong vote_cnt,
711 : ulong slot );
712 :
713 : /* fd_tower_reset_slot returns the slot to reset PoH to when building
714 : the next leader block. Assumes tower and ghost are both valid local
715 : joins and in-sync ie. every vote slot in tower corresponds to a node
716 : in ghost. There is always a reset slot (ie. this function will never
717 : return ULONG_MAX). In general, we reset to the leaf of our last vote
718 : fork (which is usually also the ghost head).
719 : See the implementation for detailed documentation of each reset case.
720 : */
721 :
722 : ulong
723 : fd_tower_reset_slot( fd_tower_t const * tower,
724 : fd_epoch_t const * epoch,
725 : fd_ghost_t const * ghost );
726 :
727 : /* fd_tower_vote_slot returns the correct vote slot to pick given the
728 : ghost tree. Returns FD_SLOT_NULL if we cannot vote, because we are
729 : locked out, do not meet switch threshold, or fail the threshold
730 : check.
731 :
732 : Specifically, these are the two scenarios in which we can vote:
733 :
734 : 1. the ghost head is on the same fork as our last vote slot, and
735 : we pass the threshold check.
736 : 2. the ghost head is on a different fork than our last vote slot,
737 : but we pass both the lockout and switch checks so we can
738 : switch to the ghost head's fork. */
739 :
740 : ulong
741 : fd_tower_vote_slot( fd_tower_t const * tower,
742 : fd_epoch_t * epoch,
743 : fd_pubkey_t * voter_keys,
744 : fd_tower_t * const * voter_towers,
745 : ulong voter_len,
746 : fd_ghost_t const * ghost );
747 :
748 : /* fd_tower_simulate_vote simulates a vote on the vote tower for slot,
749 : returning the new height (cnt) for all the votes that would have been
750 : popped. Assumes tower is non-empty. */
751 :
752 : ulong
753 : fd_tower_simulate_vote( fd_tower_t const * tower, ulong slot );
754 :
755 : /* Operations */
756 :
757 : /* fd_tower_vote votes for slot. Assumes caller has already performed
758 : the relevant tower checks (lockout_check, etc.) to ensure it is valid
759 : to vote for `slot`. Returns a new root if this vote results in the
760 : lowest vote slot in the tower reaching max lockout. The lowest vote
761 : will also be popped from the tower.
762 :
763 : Max lockout is equivalent to 1 << FD_TOWER_VOTE_MAX + 1 (which
764 : implies confirmation count is FD_TOWER_VOTE_MAX + 1). As a result,
765 : fd_tower_vote also maintains the invariant that the tower contains at
766 : most FD_TOWER_VOTE_MAX votes, because (in addition to vote expiry)
767 : there will always be a pop before reaching FD_TOWER_VOTE_MAX + 1. */
768 :
769 : ulong
770 : fd_tower_vote( fd_tower_t * tower, ulong slot );
771 :
772 : /* Misc */
773 :
774 : /* fd_tower_sync_serde populates serde using the provided tower and args
775 : to create a zero-copy view of a TowerSync vote transaction payload
776 : ready for serialization. */
777 :
778 : fd_tower_sync_serde_t *
779 : fd_tower_to_tower_sync( fd_tower_t const * tower, ulong root, fd_hash_t * bank_hash, fd_hash_t * block_id, long ts, fd_tower_sync_serde_t * ser );
780 :
781 : /* fd_tower_sync_serde populates serde using the provided tower and args
782 : to create a zero-copy view of an Agave-compatible serialized Tower
783 : ready for serialization. */
784 :
785 : fd_tower_serde_t *
786 : fd_tower_serde( fd_tower_t const * tower,
787 : ulong root,
788 : fd_tower_sync_serde_t * last_vote,
789 : uchar const pvtkey[static 32],
790 : uchar const pubkey[static 32],
791 : fd_tower_serde_t * ser );
792 :
793 : /* fd_tower_to_vote_txn writes tower into a fd_tower_sync_t vote
794 : instruction and serializes it into a Solana transaction. Assumes
795 : tower is a valid local join. */
796 :
797 : void
798 : fd_tower_to_vote_txn( fd_tower_t const * tower,
799 : ulong root,
800 : fd_lockout_offset_t * lockouts_scratch,
801 : fd_hash_t const * bank_hash,
802 : fd_hash_t const * recent_blockhash,
803 : fd_pubkey_t const * validator_identity,
804 : fd_pubkey_t const * vote_authority,
805 : fd_pubkey_t const * vote_acc,
806 : fd_txn_p_t * vote_txn );
807 :
808 : /* fd_tower_checkpt bincode-serializes tower into the provided file
809 : descriptor. Returns 0 on success meaning the above has been
810 : successfully written to fd, -1 if an error occurred during
811 : checkpointing. Assumes tower is non-empty and a valid local join of
812 : the current tower, root is the current tower root, last_vote is the
813 : serde of the last vote sent, pvtkey / pubkey is the validator
814 : identity keypair, fd is a valid open file descriptor to write to, buf
815 : is a buffer of at least buf_max bytes to serialize to and write from.
816 :
817 : The binary layout of the file is compatible with Agave and specified
818 : in bincode. fd_tower_serde_t describes the schema. Firedancer can
819 : restore towers checkpointed by Agave (>=2.3.7) and vice versa.
820 :
821 : IMPORTANT SAFETY TIP! No other process should be writing to either
822 : the memory pointed by tower or the file pointed to by path while
823 : fd_tower_checkpt is in progress. */
824 :
825 : int
826 : fd_tower_checkpt( fd_tower_t const * tower,
827 : ulong root,
828 : fd_tower_sync_serde_t * last_vote,
829 : uchar const pubkey[static 32],
830 : fd_tower_sign_fn * sign_fn,
831 : int fd,
832 : uchar * buf,
833 : ulong buf_max );
834 :
835 : /* fd_tower_restore restores tower from the bytes pointed to by restore.
836 : Returns 0 on success, -1 on error. Assumes tower is a valid local
837 : join of an empty tower, pubkey is the identity key of the validator
838 : associated with the tower, and fd is a valid open file descriptor to
839 : read from. On return, the state of the tower, tower root and tower
840 : timestamp as of the time the file was checkpointed will be restored
841 : into tower, root and ts respectively. Any errors encountered during
842 : restore are logged with as informative an error message as can be
843 : contextualized before returning -1.
844 :
845 : The binary layout of the file is compatible with Agave and specified
846 : in bincode. fd_tower_serde_t describes the schema. Firedancer can
847 : restore towers checkpointed by Agave (>=2.3.7) and vice versa.
848 :
849 : IMPORTANT SAFETY TIP! No other process should be writing to either
850 : the memory pointed by tower or the file pointed to by path while
851 : fd_tower_restore is in progress. */
852 :
853 : int
854 : fd_tower_restore( fd_tower_t * tower,
855 : ulong * root,
856 : long * ts,
857 : uchar const pubkey[static 32],
858 : int fd,
859 : uchar * buf,
860 : ulong buf_max,
861 : ulong * buf_sz );
862 :
863 : /* fd_tower_serialize serializes the provided serde into buf. Returns 0
864 : on success, -1 if an error is encountered during serialization.
865 : Assumes serde is populated with valid local pointers to values to
866 : serialize (NULL if a field should be skipped) and buf is at least as
867 : large as the serialized size of ser. Populates the number of bytes
868 : serialized in buf_sz.
869 :
870 : serde is a zero-copy view of the Agave-compatible bincode-encoding of
871 : a tower. See also the struct definition of fd_tower_serde_t. */
872 :
873 : int
874 : fd_tower_serialize( fd_tower_serde_t * ser,
875 : uchar * buf,
876 : ulong buf_max,
877 : ulong * buf_sz );
878 :
879 : /* fd_tower_deserialize deserializes buf into serde. Returns 0 on
880 : success, -1 if an error is encountered during deserialization. buf
881 : must be at least as large as is required during bincode-decoding of
882 : ser otherwise returns an error.
883 :
884 : serde is a zero-copy view of the Agave-compatible bincode-encoding of
885 : a tower. See also the struct definition of fd_tower_serde_t. */
886 :
887 : int
888 : fd_tower_deserialize( uchar * buf,
889 : ulong buf_sz,
890 : fd_tower_serde_t * de );
891 :
892 : /* fd_tower_verify checks tower is in a valid state. Valid iff:
893 : - cnt < FD_TOWER_VOTE_MAX
894 : - vote slots and confirmation counts in the tower are monotonically
895 : increasing */
896 :
897 : int
898 : fd_tower_verify( fd_tower_t const * tower );
899 :
900 : /* fd_tower_on_duplicate checks if the tower is on the same fork with an
901 : invalid ancestor. */
902 :
903 : int
904 : fd_tower_on_duplicate( fd_tower_t const * tower, fd_ghost_t const * ghost );
905 :
906 : /* fd_tower_print pretty-prints tower as a formatted table.
907 :
908 : Sample output:
909 :
910 : slot | confirmation count
911 : --------- | ------------------
912 : 279803918 | 1
913 : 279803917 | 2
914 : 279803916 | 3
915 : 279803915 | 4
916 : 279803914 | 5
917 : 279803913 | 6
918 : 279803912 | 7
919 : */
920 :
921 : void
922 : fd_tower_print( fd_tower_t const * tower, ulong root );
923 :
924 : /* fd_tower_from_vote_acc_data reads into the given tower the given vote account data.
925 : The vote account data will be deserialized from the Solana vote account
926 : representation, and the tower will be populated with the votes.
927 :
928 : Assumes tower is a valid local join and currently empty. */
929 : void
930 : fd_tower_from_vote_acc_data( uchar const * data,
931 : fd_tower_t * tower_out );
932 :
933 : #endif /* HEADER_fd_src_choreo_tower_fd_tower_h */
|