LCOV - code coverage report
Current view: top level - choreo/tower - fd_tower.h (source / functions) Hit Total Coverage
Test: cov.lcov Lines: 4 11 36.4 %
Date: 2025-09-19 04:41:14 Functions: 1 10 10.0 %

          Line data    Source code
       1             : #ifndef HEADER_fd_src_choreo_tower_fd_tower_h
       2             : #define HEADER_fd_src_choreo_tower_fd_tower_h
       3             : 
       4             : /* fd_tower presents an API for Solana's TowerBFT algorithm.
       5             : 
       6             :    What is TowerBFT? TowerBFT is an algorithm for converging a
       7             :    supermajority of stake in the validator cluster on the same fork.
       8             : 
       9             :         /-- 3-- 4 (A)
      10             :    1-- 2
      11             :         \-- 5     (B)
      12             : 
      13             :    Above is a diagram of a fork. The leader for slot 5 decided to build
      14             :    off slot 2, rather than slot 4. This can happen for various reasons,
      15             :    for example network propagation delay. We now have two possible forks
      16             :    labeled A and B. The consensus algorithm has to pick one of them.
      17             : 
      18             :    So, how does the consensus algorithm pick? As detailed in fd_ghost.h,
      19             :    we pick the fork based on the most stake from votes, called the
      20             :    "heaviest". Validators vote for blocks during replay, and
      21             :    simultaneously use other validator’s votes to determine which block
      22             :    to vote for. This encourages convergence, because as one fork gathers
      23             :    more votes, more and more votes pile-on, solidifying its position as
      24             :    the heaviest fork.
      25             : 
      26             :          /-- 3-- 4 (10%)
      27             :    1-- 2
      28             :          \-- 5     (9%)
      29             : 
      30             :    However, network propagation delay of votes can lead us to think one
      31             :    fork is heaviest, before observing new votes that indicate another
      32             :    fork is heavier. So our consensus algorithm also needs to support
      33             :    switching.
      34             : 
      35             :          /-- 3-- 4 (10%)
      36             :    1-- 2
      37             :          \-- 5     (15%)
      38             : 
      39             :    At the same time we don’t want excessive switching. The more often
      40             :    validators switch, the more difficult it will be to achieve that
      41             :    pile-on effect I just described.
      42             : 
      43             :    Note that to switch forks, you need to rollback a given slot and its
      44             :    descendants on that fork. In the example above, to switch to 1, 2, 5,
      45             :    we need to rollback 3 and 4. The consensus algorithm makes it more
      46             :    costly the further you want to rollback a fork. Here, I’ve added a
      47             :    column lockout, which doubles for every additional slot you want to
      48             :    rollback.
      49             : 
      50             :    Eventually you have traversed far enough down a fork, that the
      51             :    lockout is so great it is infeasible to imagine it ever rolling back
      52             :    in practice. So you can make that fork permanent or “commit” it. Once
      53             :    all validators do this, the blockchain now just has a single fork.
      54             : 
      55             :    Armed with some intuition, let’s now begin defining some terminology.
      56             :    Here is a diagram of a validator's "vote tower":
      57             : 
      58             :    slot | confirmation count (conf)
      59             :    --------------------------------
      60             :    4    | 1
      61             :    3    | 2
      62             :    2    | 3
      63             :    1    | 4
      64             : 
      65             :    It is a stack structure in which each element is a vote. The vote
      66             :    slot column indicates which slots the validator has voted for,
      67             :    ordered from most to least recent.
      68             : 
      69             :    The confirmation count column indicates how many consecutive votes on
      70             :    the same fork have been pushed on top of that vote. You are
      71             :    confirming your own votes for a fork every time you vote on top of
      72             :    the same fork.
      73             : 
      74             :    Two related concepts to confirmation count are lockout and expiration
      75             :    slot. Lockout equals 2 to the power of confirmation count. Every time
      76             :    we “confirm” a vote by voting on top of it, we double the lockout.
      77             :    The expiration slot is the sum of vote slot and lockout, so it also
      78             :    increases when lockouts double. It represents which slot the vote
      79             :    will expire. When a vote expires, it is popped from the top of the
      80             :    tower. An important Tower rule is that a validator cannot vote for a
      81             :    different fork from a given vote slot, until reaching the expiration
      82             :    slot for that vote slot. To summarize, the further a validator wants
      83             :    to rollback their fork (or vote slots) the longer the validator needs
      84             :    to wait without voting (in slot time).
      85             : 
      86             :    Here is the same tower, fully-expanded to include all the fields:
      87             : 
      88             :    slot | conf | lockout | expiration
      89             :    ----------------------------------
      90             :    4    | 1    | 2       | 6
      91             :    3    | 2    | 4       | 7
      92             :    2    | 3    | 8       | 10
      93             :    1    | 4    | 16      | 17
      94             : 
      95             :    Based on this tower, the validator is locked out from voting for any
      96             :    slot <= 6 that is on a different fork than slot 4. I’d like to
      97             :    emphasize that the expiration is with respect to the vote slot, and
      98             :    is _not_ related to the Proof-of-History slot or what the
      99             :    quote-unquote current slot is. So even if the current slot is now 7,
     100             :    the validator can’t go back and vote for slot 5, if slot 5 were on a
     101             :    different fork than 4. The earliest valid vote slot this validator
     102             :    could submit for a different fork from 4 would be slot 7 or later.
     103             : 
     104             :    Next let’s look at how the tower makes state transitions. Here we
     105             :    have the previous example tower, with a before-and-after view with
     106             :    respect to a vote for slot 9:
     107             : 
     108             :    (before)  slot | conf
     109             :             -----------
     110             :              4    | 1
     111             :              3    | 2
     112             :              2    | 3
     113             :              1    | 4
     114             : 
     115             :    (after)  slot | conf
     116             :             -----------
     117             :             9    | 1
     118             :             2    | 3
     119             :             1    | 4
     120             : 
     121             :    As you can see, we added a vote for slot 9 to the top of the tower.
     122             :    But we also removed the votes for slot 4 and slot 3. What happened?
     123             :    This is an example of vote expiry in action. When we voted for slot
     124             :    9, this exceeded the expirations of vote slots 4 and 3, which were 6
     125             :    and 7 respectively. This action of voting triggered the popping of
     126             :    the expired votes from the top of the tower.
     127             : 
     128             :    Next, we add a vote for slot 10:
     129             : 
     130             :    (before)  slot | conf
     131             :             -----------
     132             :              9    | 1
     133             :              2    | 3
     134             :              1    | 4
     135             : 
     136             :    (after)  slot | conf
     137             :             -----------
     138             :              10   | 1
     139             :              9    | 2
     140             :              2    | 3
     141             :              1    | 4
     142             : 
     143             :    The next vote for slot 10 doesn’t involve expirations, so we just add
     144             :    it to the top of the tower. Also, here is an important property of
     145             :    lockouts. Note that the lockout for vote slot 9 doubled (ie. the
     146             :    confirmation count increased by 1) but the lockouts of vote slots 2
     147             :    and 1 remained unchanged.
     148             : 
     149             :    The reason for this is confirmation counts only increase when they
     150             :    are consecutive in the vote tower. Because 4 and 3 were expired
     151             :    previously by the vote for 9, that consecutive property was broken.
     152             :    In this case, the vote for slot 10 is only consecutive with slot 9,
     153             :    but not 2 and 1. Specifically, there is a gap in the before-tower at
     154             :    confirmation count 2.
     155             : 
     156             :    In the after-tower, all the votes are again consecutive (confirmation
     157             :    counts 1, 2, 3, 4 are all accounted for), so the next vote will
     158             :    result in all lockouts doubling as long as it doesn’t result in more
     159             :    expirations.
     160             : 
     161             :    One other thing I’d like to point out about this vote for slot 10.
     162             :    Even though 10 >= the expiration slot of vote slot 2, which is
     163             :    10, voting for 11 did not expire the vote for 2. This is because
     164             :    expiration happens top-down and contiguously. Because vote slot 9 was
     165             :    not expired, we do not proceed with expiring 2.
     166             : 
     167             :    In the Tower rules, once a vote reaches a conf count of 32, it is
     168             :    considered rooted and it is popped from the bottom of the tower. Here
     169             :    is an example where 1 got rooted and therefore popped from the bottom:
     170             : 
     171             :    (before)  slot | conf
     172             :             -----------
     173             :              50   | 1
     174             :              ...  | ... (29 votes elided)
     175             :              1    | 31
     176             : 
     177             :    (after)  slot | conf
     178             :             -----------
     179             :              53   | 1
     180             :              ...  | ... (29 votes elided)
     181             :              2    | 31
     182             : 
     183             :    So the tower is really a double-ended queue rather than a stack.
     184             : 
     185             :    Rooting has implications beyond the Tower. It's what we use to prune
     186             :    our state. Every time tower makes a new root slot, we prune any old
     187             :    state that does not originate from that new root slot. Our blockstore
     188             :    will discard blocks below that root, our forks structure will discard
     189             :    stale banks, funk (which is our accounts database) will discard stale
     190             :    transactions (which in turn track modifications to accounts), and
     191             :    ghost (which is our fork select tree) will discard stale nodes
     192             :    tracking stake percentages. We call this operation publishing.
     193             : 
     194             :    Note that the vote slots are not necessarily consecutive. Here I
     195             :    elided the votes sandwiched between the newest and oldest votes for
     196             :    brevity.
     197             : 
     198             :    Next, let’s go over three additional tower checks. These three checks
     199             :    further reinforce the consensus algorithm we established with
     200             :    intuition, in this case getting a supermajority (ie. 2/3) of stake to
     201             :    converge on a fork.
     202             : 
     203             :    The first is the threshold check. The threshold check makes sure at
     204             :    least 2/3 of stake has voted for the same fork as the vote at depth 8
     205             :    in our tower. Essentially, this guards our tower from getting too out
     206             :    of sync with the rest of the cluster. If we get too out of sync we
     207             :    can’t vote for a long time, because we had to rollback a vote we had
     208             :    already confirmed many times and had a large lockout. This might
     209             :    otherwise happen as the result of a network partition where we can
     210             :    only communicate with a subset of stake.
     211             : 
     212             :    Next is the lockout check. We went in detail on this earlier when
     213             :    going through the lockout and expiration slot, and as before, the
     214             :    rule is we can only vote on a slot for a different fork from a
     215             :    previous vote, after that vote’s expiration slot.
     216             : 
     217             :    Given this fork and tower from earlier:
     218             : 
     219             :         /-- 3-- 4
     220             :    1-- 2
     221             :         \-- 5
     222             : 
     223             :    slot | conf
     224             :    -----------
     225             :    4    | 1
     226             :    3    | 2
     227             :    2    | 3
     228             :    1    | 4
     229             : 
     230             :   You’re locked out from voting for 5 because it’s on a different fork
     231             :   from 4 and the expiration slot of your previous vote for 4 is 6.
     232             : 
     233             :   However, if we introduce a new slot 9:
     234             : 
     235             :         /-- 3-- 4
     236             :   1-- 2
     237             :         \-- 5-- 9
     238             : 
     239             :   slot | conf
     240             :   -----------
     241             :   9    | 1
     242             :   2    | 3
     243             :   1    | 4
     244             : 
     245             :   Here the new Slot 9 descends from 5, and exceeds vote slot 4’s
     246             :   expiration slot of 6 unlike 5.
     247             : 
     248             :   After your lockout expires, the tower rules allow you to vote for
     249             :   descendants of the fork slot you wanted to switch to in the first
     250             :   place (here, 9 descending from 5). So we eventually switch to the fork
     251             :   we wanted, by voting for 9 and expiring 3 and 4.
     252             : 
     253             :   Importantly, notice that the fork slots and vote slots are not exactly
     254             :   1-to-1. While conceptually our tower is voting for the fork 1, 2, 5,
     255             :   9, the vote for 5 is only implied. Our tower votes themselves still
     256             :   can’t include 5 due to lockout.
     257             : 
     258             :   Finally, the switch check. The switch check is used to safeguard
     259             :   optimistic confirmation. Optimistic confirmation is when a slot gets
     260             :   2/3 of stake-weighted votes. This is then treated as a signal that the
     261             :   slot will eventually get rooted. However, to actually guarantee this
     262             :   we need a rule that prevents validators from arbitrarily switching
     263             :   forks (even when their vote lockout has expired). This rule is the
     264             :   switch check.
     265             : 
     266             :   The switch check is additional to the lockout check. Before switching
     267             :   forks, we need to make sure at least 38% of stake has voted for a
     268             :   different fork than our own. Different fork is defined by finding the
     269             :   greatest common ancestor of our last voted fork slot and the slot we
     270             :   want to switch to. Any forks descending from the greatest common
     271             :   ancestor (which I will subsequently call the GCA) that are not our
     272             :   own fork are counted towards the switch check stake.
     273             : 
     274             :   Here we visualize the switch check:
     275             : 
     276             :              /-- 7
     277             :         /-- 3-- 4
     278             :   1-- 2  -- 6
     279             :         \-- 5-- 9
     280             : 
     281             :   First, we find the GCA of 4 and 9 which is 2. Then we look at all the
     282             :   descendants of the GCA that do not share a fork with us, and make sure
     283             :   their stake sums to more than 38%.
     284             : 
     285             :   I’d like to highlight that 7 here is not counted towards the switch
     286             :   proof, even though it is on a different fork from 4. This is because
     287             :   it’s on the same fork relative to the GCA.
     288             : 
     289             :   So that covers the checks. Next, there are two additional important
     290             :   concepts: "reset slot" and "vote slot". The reset slot is the slot you
     291             :   reset PoH to when it's your turn to be leader. Because you are
     292             :   responsible for producing a block, you need to decide which fork to
     293             :   build your block on. For example, if there are two competing slots 3
     294             :   and 4, you would decide whether to build 3 <- 5 or 4 <- 5. In general
     295             :   the reset slot is the same fork as the vote slot, but not always.
     296             :   There is an important reason for this. Recall this fork graph from
     297             :   earlier:
     298             : 
     299             :         /-- 3-- 4 (10%)
     300             :    1-- 2
     301             :         \-- 5-- 6 (9%)
     302             : 
     303             :   In this diagram, 4 is the winner of fork choice. All future leaders
     304             :   now want to reset to slot 4. Naively, this makes sense because you
     305             :   maximize the chance of your block finalizing (and earning the rewards)
     306             :   if you greedily (in the algorithmic, and perhaps also literal sense)
     307             :   pick what's currently the heaviest.
     308             : 
     309             :   However, say most validators actually voted fork 5, even though we
     310             :   currently observe 3 as the heavier. For whatever reason, those votes
     311             :   for 5 just didn't land (the leader for 6 missed the votes, network
     312             :   blip, etc.)
     313             : 
     314             :   All these validators that voted for 5 are now constrained by the
     315             :   switch check (38% of stake), and none of them can actually switch
     316             :   their vote to 4 (which only has 10%). But they're all continuing to
     317             :   build blocks on top of fork 4, which importantly implies that votes
     318             :   for 5 will not be able to propagate. This is because the validators
     319             :   that can't switch continue to refresh their votes for 5, but those
     320             :   votes never "land" because no one is building blocks on top of fork
     321             :   5 anymore (everyone is building on 4 because that's currently the
     322             :   heaviest).
     323             : 
     324             :   Therefore, it is important to reset to the same fork as your last vote
     325             :   slot, which is usually also the heaviest fork, but not always.
     326             : 
     327             :   Note that with both the vote slot and reset slot, the tower uses ghost
     328             :   to determine the last vote slot's ancestry. So what happens if the
     329             :   last vote slot isn't in the ghost? There are two separate cases in
     330             :   which this can happen that tower needs to handle:
     331             : 
     332             :   1. Our last vote slot > ghost root slot, but is not a descendant of
     333             :      the ghost root. This can happen if we get stuck on a minority fork
     334             :      with a long lockout. In the worst case, lockout duration is
     335             :      2^{threshold_check_depth} ie. 2^8 = 256 slots. In other words, we
     336             :      voted for and confirmed a minority fork 8 times in a row. We assume
     337             :      we won't vote past 8 times for the minority fork, because the
     338             :      threshold check would have stopped us (recall the threshold check
     339             :      requires 2/3 of stake to be on the same fork at depth 8 before we
     340             :      can keep voting for that fork).
     341             : 
     342             :      While waiting for those 256 slots of lockout to expire, it is
     343             :      possible that in the meantime a supermajority (ie. >2/3) of the
     344             :      cluster actually roots another fork that is not ours. During
     345             :      regular execution, we would not publish ghost until we have an
     346             :      updated tower root. So as long as the validator stays running while
     347             :      it is locked out from the supermajority fork, it keeps track of its
     348             :      vote slot's ancestry.
     349             : 
     350             :      If the validator were to stop running while locked out though (eg.
     351             :      operator needed to restart the box), the validator attempts to
     352             :      repair the ancestry of its last vote slot.
     353             : 
     354             :      In the worst case, if we cannot repair that ancestry, then we do
     355             :      not vote until replay reaches the expiration slot of that last vote
     356             :      slot. We can assume the votes > depth 8 in the tower do not violate
     357             :      lockout, because again the threshold check would have guarded it.
     358             : 
     359             :      TODO CURRENTLY THIS IS UNHANDLED. WHAT THE VALIDATOR DOES IF IT
     360             :      HAS LOST THE GHOST ANCESTRY IS IT WILL ERROR OUT.
     361             : 
     362             :   2. Our last vote slot < ghost root slot.  In this case we simply
     363             :      cannot determine whether our last vote slot is on the same fork as
     364             :      our ghost root slot because we no longer have ancestry information
     365             :      before the ghost root slot. This can happen if the validator is not
     366             :      running for a long time, then started up again. It will have to use
     367             :      the snapshot slot for the beginning of the ghost ancestry, which
     368             :      could be well past the last vote slot in the tower.
     369             : 
     370             :      In this case, before the validator votes again, it makes sure that
     371             :      the last vote's confirmation count >= THRESHOLD_CHECK_DEPTH (stated
     372             :      differently, it makes sure the next time it votes it will expire at
     373             :      least the first THRESHOLD_CHECK_DEPTH votes in the tower), and then
     374             :      it assumes that the last vote slot is on the same fork as the ghost
     375             :      root slot.
     376             : 
     377             :      TODO VERIFY AGAVE BEHAVIOR IS THE SAME.
     378             : 
     379             :   Now let’s switch gears from theory back to practice. What does it mean
     380             :   to send a vote?
     381             : 
     382             :   As a validator, you aren’t sending individual tower votes. Rather, you
     383             :   are sending your entire updated tower to the cluster every time.
     384             :   Essentially, the validator is continuously syncing their local tower
     385             :   with the cluster. That tower state is then stored inside a vote
     386             :   account, like any other state on Solana.
     387             : 
     388             :   On the flip side, we also must stay in sync the other way from cluster
     389             :   to local. If we have previously voted, we need to make sure our tower
     390             :   matches up with what the cluster has last seen. We know the most
     391             :   recent tower is in the last vote we sent, so we durably store every
     392             :   tower (by checkpointing it to disk) whenever we send a vote. In case
     393             :   this tower is out-of-date Conveniently Funk, our accounts database,
     394             :   stores all the vote accounts including our own, so on bootstrap we
     395             :   simply load in our vote account state itself to to initialize our own
     396             :   local view of the tower.
     397             : 
     398             :   Finally, a note on the difference between the Vote Program and
     399             :   TowerBFT. The Vote Program runs during transaction (block) execution.
     400             :   It checks that certain invariants about the tower inside a vote
     401             :   transaction are upheld (recall a validator sends their entire tower as
     402             :   part of a "vote"): otherwise, it fails the transaction. For example,
     403             :   it checks that every vote contains a tower in which the vote slots are
     404             :   strictly monotonically increasing. As a consequence, only valid towers
     405             :   are committed to the ledger. Another important detail of the Vote
     406             :   Program is that it only has a view of the current fork on which it is
     407             :   executing. Specifically, it can't observe what state is on other
     408             :   forks, like what a validator's tower looks like on fork A vs. fork B.
     409             : 
     410             :   The TowerBFT rules, on the other hand, run after transaction
     411             :   execution. Also unlike the Vote Program, the TowerBFT rules do not
     412             :   take the vote transactions as inputs: rather the inputs are the towers
     413             :   that have already been written to the ledger by the Vote Program. As
     414             :   described above, the Vote Program validates every tower, and in this
     415             :   way, the TowerBFT rules leverage the validation already done by the
     416             :   Vote Program to (mostly) assume each tower is valid. Every validator
     417             :   runs TowerBFT to update their own tower with votes based on the
     418             :   algorithm documented above. Importantly, TowerBFT has a view of all
     419             :   forks, and the validator makes a voting decision based on all forks.
     420             : */
     421             : 
     422             : #include "../fd_choreo_base.h"
     423             : #include "../epoch/fd_epoch.h"
     424             : #include "../ghost/fd_ghost.h"
     425             : #include "../../disco/pack/fd_microblock.h"
     426             : 
     427             : /* FD_TOWER_PARANOID:  Define this to non-zero at compile time
     428             :    to turn on additional runtime integrity checks. */
     429             : 
     430             : #ifndef FD_TOWER_PARANOID
     431             : #define FD_TOWER_PARANOID 1
     432             : #endif
     433             : 
     434           0 : #define FD_TOWER_VOTE_MAX (31UL)
     435             : 
     436             : struct fd_tower_vote {
     437             :   ulong slot; /* vote slot */
     438             :   ulong conf; /* confirmation count */
     439             : };
     440             : typedef struct fd_tower_vote fd_tower_vote_t;
     441             : 
     442             : #define DEQUE_NAME fd_tower_votes
     443           0 : #define DEQUE_T    fd_tower_vote_t
     444           0 : #define DEQUE_MAX  FD_TOWER_VOTE_MAX
     445             : #include "../../util/tmpl/fd_deque.c"
     446             : 
     447             : /* fd_tower is a representation of a validator's "vote tower" (described
     448             :    in detail in the preamble at the top of this file).  The votes in the
     449             :    tower are stored in an fd_deque ordered from lowest to highest vote
     450             :    slot (highest to lowest confirmation count) relative to the head and
     451             :    tail .  There can be at most 31 votes in the tower.  This invariant
     452             :    is upheld with every call to `fd_tower_vote`.
     453             : 
     454             :    The definition of `fd_tower_t` is a simple typedef alias for
     455             :    `fd_tower_vote_t` and is a transparent wrapper around the vote deque.
     456             :    Relatedly, the tower API takes a local pointer to the first vote in
     457             :    the deque (the result of `fd_deque_join`) as a parameter in all its
     458             :    function signatures. */
     459             : 
     460             : typedef fd_tower_vote_t fd_tower_t;
     461             : 
     462             : /* fd_tower_sync_serde describes a serialization / deserialization
     463             :    schema for a bincode-encoded TowerSync transaction.  The serde is
     464             :    structured for zero-copy access ie. x-raying individual fields. */
     465             : 
     466             : struct fd_tower_sync_serde /* CompactTowerSync */ {
     467             :   ulong const * root;
     468             :   struct /* short_vec */ {
     469             :     ushort lockouts_cnt; /* variable-length so copied (ShortU16) */
     470             :     struct /* Lockout */ {
     471             :       ulong         offset; /* variable-length so copied (VarInt) */
     472             :       uchar const * confirmation_count;
     473             :     } lockouts[31];
     474             :   };
     475             :   fd_hash_t const * hash;
     476             :   struct /* Option<UnixTimestamp> */ {
     477             :     uchar const * timestamp_option;
     478             :     long  const * timestamp;
     479             :   };
     480             :   fd_hash_t const * block_id;
     481             : };
     482             : typedef struct fd_tower_sync_serde fd_tower_sync_serde_t;
     483             : 
     484             : /* fd_tower_serde describes a serialization / deserialization schema for
     485             :    an Agave-compatible bincode encoding of a tower.  This corresponds
     486             :    exactly with the binary layout of a tower file that Agave interfaces
     487             :    with during boot, set-identity, and voting.
     488             : 
     489             :    The serde is structured for zero-copy access ie. x-raying individual
     490             :    fields. */
     491             : 
     492             : struct fd_tower_serde /* SavedTowerVersions::Current */ {
     493             :   uint const *             kind;
     494             :   fd_ed25519_sig_t const * signature;
     495             :   ulong const *            data_sz; /* serialized sz of data field below */
     496             :   struct /* Tower1_14_11 */ {
     497             :     fd_pubkey_t const *   node_pubkey;
     498             :     ulong const *         threshold_depth;
     499             :     double const *        threshold_size;
     500             :     fd_voter_v2_serde_t   vote_state;
     501             :     struct {
     502             :       uint const *          last_vote_kind;
     503             :       fd_tower_sync_serde_t last_vote;
     504             :     };
     505             :     struct /* BlockTimestamp */ {
     506             :       ulong const * slot;
     507             :       long const *  timestamp;
     508             :     } last_timestamp;
     509             :   } /* data */;
     510             : };
     511             : typedef struct fd_tower_serde fd_tower_serde_t;
     512             : 
     513             : /* fd_tower_sign_fn is the signing callback used for signing tower
     514             :    checkpoints after serialization. */
     515             : 
     516             : typedef void (fd_tower_sign_fn)( void const * ctx, uchar * sig, uchar const * ser, ulong ser_sz );
     517             : 
     518             : /* FD_TOWER_{ALIGN,FOOTPRINT} specify the alignment and footprint needed
     519             :    for tower.  ALIGN is double x86 cache line to mitigate various kinds
     520             :    of false sharing (eg. ACLPF adjacent cache line prefetch).  FOOTPRINT
     521             :    is the size of fd_deque including the private header's start and end
     522             :    and an exact multiple of ALIGN.  These are provided to facilitate
     523             :    compile time tower declarations. */
     524             : 
     525           6 : #define FD_TOWER_ALIGN     (128UL)
     526           0 : #define FD_TOWER_FOOTPRINT (512UL)
     527             : FD_STATIC_ASSERT( FD_TOWER_FOOTPRINT==sizeof(fd_tower_votes_private_t), FD_TOWER_FOOTPRINT );
     528             : 
     529             : /* fd_tower_{align,footprint} return the required alignment and
     530             :    footprint of a memory region suitable for use as a tower.  align
     531             :    returns FD_TOWER_ALIGN.  footprint returns FD_TOWER_FOOTPRINT. */
     532             : 
     533             : FD_FN_CONST static inline ulong
     534           6 : fd_tower_align( void ) {
     535           6 :    return FD_TOWER_ALIGN;
     536           6 : }
     537             : 
     538             : FD_FN_CONST static inline ulong
     539           0 : fd_tower_footprint( void ) {
     540           0 :    return FD_TOWER_FOOTPRINT;
     541           0 : }
     542             : 
     543             : /* fd_tower_new formats an unused memory region for use as a tower.  mem
     544             :    is a non-NULL pointer to this region in the local address space with
     545             :    the required footprint and alignment. */
     546             : 
     547             : void *
     548             : fd_tower_new( void * mem );
     549             : 
     550             : /* fd_tower_join joins the caller to the tower.  tower points to the
     551             :    first byte of the memory region backing the tower in the caller's
     552             :    address space.
     553             : 
     554             :    Returns a pointer in the local address space to tower on success. */
     555             : 
     556             : fd_tower_t *
     557             : fd_tower_join( void * tower );
     558             : 
     559             : /* fd_tower_leave leaves a current local join.  Returns a pointer to the
     560             :    underlying shared memory region on success and NULL on failure (logs
     561             :    details).  Reasons for failure include tower is NULL. */
     562             : 
     563             : void *
     564             : fd_tower_leave( fd_tower_t * tower );
     565             : 
     566             : /* fd_tower_delete unformats a memory region used as a tower.  Assumes
     567             :    only the local process is joined to the region.  Returns a pointer to
     568             :    the underlying shared memory region or NULL if used obviously in
     569             :    error (e.g. tower is obviously not a tower ...  logs details).  The
     570             :    ownership of the memory region is transferred to the caller. */
     571             : 
     572             : void *
     573             : fd_tower_delete( void * tower );
     574             : 
     575             : /* fd_tower_lockout_check checks if we are locked out from voting for
     576             :    the `slot`.  Returns 1 if we can vote for `slot` without violating
     577             :    lockout, 0 otherwise.  Assumes tower is non-empty.
     578             : 
     579             :    After voting for a slot n, we are locked out for 2^k slots, where k
     580             :    is the confirmation count of that vote.  Once locked out, we cannot
     581             :    vote for a different fork until that previously-voted fork expires at
     582             :    slot n+2^k.  This implies the earliest slot in which we can switch
     583             :    from the previously-voted fork is (n+2^k)+1.  We use `ghost` to
     584             :    determine whether `slot` is on the same or different fork as previous
     585             :    vote slots.
     586             : 
     587             :    In the case of the tower, every vote has its own expiration slot
     588             :    depending on confirmations. The confirmation count is the max number
     589             :    of consecutive votes that have been pushed on top of the vote, and
     590             :    not necessarily its current height in the tower.
     591             : 
     592             :    For example, the following is a diagram of a tower pushing and
     593             :    popping with each vote:
     594             : 
     595             : 
     596             :    slot | confirmation count
     597             :    -----|-------------------
     598             :    4    |  1 <- vote
     599             :    3    |  2
     600             :    2    |  3
     601             :    1    |  4
     602             : 
     603             : 
     604             :    slot | confirmation count
     605             :    -----|-------------------
     606             :    9    |  1 <- vote
     607             :    2    |  3
     608             :    1    |  4
     609             : 
     610             : 
     611             :    slot | confirmation count
     612             :    -----|-------------------
     613             :    10   |  1 <- vote
     614             :    9    |  2
     615             :    2    |  3
     616             :    1    |  4
     617             : 
     618             : 
     619             :    slot | confirmation count
     620             :    -----|-------------------
     621             :    11   |  1 <- vote
     622             :    10   |  2
     623             :    9    |  3
     624             :    2    |  4
     625             :    1    |  5
     626             : 
     627             : 
     628             :    slot | confirmation count
     629             :    -----|-------------------
     630             :    18   |  1 <- vote
     631             :    2    |  4
     632             :    1    |  5
     633             : 
     634             : 
     635             :    In the final tower, note the gap in confirmation counts between slot
     636             :    18 and slot 2, even though slot 18 is directly above slot 2. */
     637             : 
     638             : int
     639             : fd_tower_lockout_check( fd_tower_t const * tower,
     640             :                         fd_ghost_t const * ghost,
     641             :                         ulong              slot,
     642             :                         fd_hash_t const *  block_id );
     643             : 
     644             : /* fd_tower_switch_check checks if we can switch to the fork of `slot`.
     645             :    Returns 1 if we can switch, 0 otherwise.  Assumes tower is non-empty.
     646             : 
     647             :    There are two forks of interest: our last vote fork ("vote fork") and
     648             :    the fork we want to switch to ("switch fork"). The switch fork is the
     649             :    fork of `slot`.
     650             : 
     651             :    In order to switch, FD_TOWER_SWITCH_PCT of stake must have voted for
     652             :    a different descendant of the GCA of vote_fork and switch_fork, and
     653             :    also must be locked out from our last vote slot.
     654             : 
     655             :    Recall from the lockout check a validator is locked out from voting
     656             :    for our last vote slot when their last vote slot is on a different
     657             :    fork, and that vote's expiration slot > our last vote slot.
     658             : 
     659             :    The following pseudocode describes the algorithm:
     660             : 
     661             :    ```
     662             :    find the greatest common ancestor (gca) of vote_fork and switch_fork
     663             :    for all validators v
     664             :       if v's  locked out[1] from voting for our latest vote slot
     665             :          add v's stake to switch stake
     666             :    return switch stake >= FD_TOWER_SWITCH_PCT
     667             :    ```
     668             : 
     669             :    The switch check is used to safeguard optimistic confirmation.
     670             :    Specifically: FD_TOWER_OPT_CONF_PCT + FD_TOWER_SWITCH_PCT >= 1. */
     671             : 
     672             : int
     673             : fd_tower_switch_check( fd_tower_t const * tower,
     674             :                        fd_epoch_t const * epoch,
     675             :                        fd_ghost_t const * ghost,
     676             :                        ulong              slot,
     677             :                        fd_hash_t const *  block_id );
     678             : 
     679             : /* fd_tower_threshold_check checks if we pass the threshold required to
     680             :    vote for `slot`.  This is only relevant after voting for (and
     681             :    confirming) the same fork ie. the tower is FD_TOWER_THRESHOLD_DEPTH
     682             :    deep.  Returns 1 if we pass the threshold check, 0 otherwise.
     683             : 
     684             :    The following psuedocode describes the algorithm:
     685             : 
     686             :    ```
     687             :    for all vote accounts in the current epoch
     688             : 
     689             :       simulate that the validator has voted for `slot`
     690             : 
     691             :       pop all votes expired by that simulated vote
     692             : 
     693             :       if the validator's latest tower vote after expiry >= our threshold
     694             :       slot ie. our vote from FD_TOWER_THRESHOLD_DEPTH back (after
     695             :       simulating a vote on our own tower the same way), then add
     696             :       validator's stake to threshold_stake.
     697             : 
     698             :    return threshold_stake >= FD_TOWER_THRESHOLD_RATIO
     699             :    ```
     700             : 
     701             :    The threshold check simulates voting for the current slot to expire
     702             :    stale votes.  This is to prevent validators that haven't voted in a
     703             :    long time from counting towards the threshold stake. */
     704             : 
     705             : int
     706             : fd_tower_threshold_check( fd_tower_t const *   tower,
     707             :                           fd_epoch_t *         epoch,
     708             :                           fd_pubkey_t *        vote_keys,
     709             :                           fd_tower_t * const * vote_towers,
     710             :                           ulong                vote_cnt,
     711             :                           ulong                slot );
     712             : 
     713             : /* fd_tower_reset_slot returns the slot to reset PoH to when building
     714             :    the next leader block.  Assumes tower and ghost are both valid local
     715             :    joins and in-sync ie. every vote slot in tower corresponds to a node
     716             :    in ghost.  There is always a reset slot (ie. this function will never
     717             :    return ULONG_MAX).  In general, we reset to the leaf of our last vote
     718             :    fork (which is usually also the ghost head).
     719             :    See the implementation for detailed documentation of each reset case.
     720             :    */
     721             : 
     722             : ulong
     723             : fd_tower_reset_slot( fd_tower_t const * tower,
     724             :                      fd_epoch_t const * epoch,
     725             :                      fd_ghost_t const * ghost );
     726             : 
     727             : /* fd_tower_vote_slot returns the correct vote slot to pick given the
     728             :    ghost tree.  Returns FD_SLOT_NULL if we cannot vote, because we are
     729             :    locked out, do not meet switch threshold, or fail the threshold
     730             :    check.
     731             : 
     732             :    Specifically, these are the two scenarios in which we can vote:
     733             : 
     734             :    1. the ghost head is on the same fork as our last vote slot, and
     735             :       we pass the threshold check.
     736             :    2. the ghost head is on a different fork than our last vote slot,
     737             :       but we pass both the lockout and switch checks so we can
     738             :       switch to the ghost head's fork. */
     739             : 
     740             : ulong
     741             : fd_tower_vote_slot( fd_tower_t const *   tower,
     742             :                     fd_epoch_t *         epoch,
     743             :                     fd_pubkey_t *        voter_keys,
     744             :                     fd_tower_t * const * voter_towers,
     745             :                     ulong                voter_len,
     746             :                     fd_ghost_t const *   ghost );
     747             : 
     748             : /* fd_tower_simulate_vote simulates a vote on the vote tower for slot,
     749             :    returning the new height (cnt) for all the votes that would have been
     750             :    popped.  Assumes tower is non-empty. */
     751             : 
     752             : ulong
     753             : fd_tower_simulate_vote( fd_tower_t const * tower, ulong slot );
     754             : 
     755             : /* Operations */
     756             : 
     757             : /* fd_tower_vote votes for slot.  Assumes caller has already performed
     758             :    the relevant tower checks (lockout_check, etc.) to ensure it is valid
     759             :    to vote for `slot`.  Returns a new root if this vote results in the
     760             :    lowest vote slot in the tower reaching max lockout.  The lowest vote
     761             :    will also be popped from the tower.
     762             : 
     763             :    Max lockout is equivalent to 1 << FD_TOWER_VOTE_MAX + 1 (which
     764             :    implies confirmation count is FD_TOWER_VOTE_MAX + 1).  As a result,
     765             :    fd_tower_vote also maintains the invariant that the tower contains at
     766             :    most FD_TOWER_VOTE_MAX votes, because (in addition to vote expiry)
     767             :    there will always be a pop before reaching FD_TOWER_VOTE_MAX + 1. */
     768             : 
     769             : ulong
     770             : fd_tower_vote( fd_tower_t * tower, ulong slot );
     771             : 
     772             : /* Misc */
     773             : 
     774             : /* fd_tower_sync_serde populates serde using the provided tower and args
     775             :    to create a zero-copy view of a TowerSync vote transaction payload
     776             :    ready for serialization. */
     777             : 
     778             : fd_tower_sync_serde_t *
     779             : fd_tower_to_tower_sync( fd_tower_t const * tower, ulong root, fd_hash_t * bank_hash, fd_hash_t * block_id, long ts, fd_tower_sync_serde_t * ser );
     780             : 
     781             : /* fd_tower_sync_serde populates serde using the provided tower and args
     782             :    to create a zero-copy view of an Agave-compatible serialized Tower
     783             :    ready for serialization. */
     784             : 
     785             : fd_tower_serde_t *
     786             : fd_tower_serde( fd_tower_t const *      tower,
     787             :                 ulong                   root,
     788             :                 fd_tower_sync_serde_t * last_vote,
     789             :                 uchar const             pvtkey[static 32],
     790             :                 uchar const             pubkey[static 32],
     791             :                 fd_tower_serde_t *      ser );
     792             : 
     793             : /* fd_tower_to_vote_txn writes tower into a fd_tower_sync_t vote
     794             :    instruction and serializes it into a Solana transaction.  Assumes
     795             :    tower is a valid local join. */
     796             : 
     797             : void
     798             : fd_tower_to_vote_txn( fd_tower_t const *    tower,
     799             :                       ulong                 root,
     800             :                       fd_lockout_offset_t * lockouts_scratch,
     801             :                       fd_hash_t const *     bank_hash,
     802             :                       fd_hash_t const *     recent_blockhash,
     803             :                       fd_pubkey_t const *   validator_identity,
     804             :                       fd_pubkey_t const *   vote_authority,
     805             :                       fd_pubkey_t const *   vote_acc,
     806             :                       fd_txn_p_t *          vote_txn );
     807             : 
     808             : /* fd_tower_checkpt bincode-serializes tower into the provided file
     809             :    descriptor.  Returns 0 on success meaning the above has been
     810             :    successfully written to fd, -1 if an error occurred during
     811             :    checkpointing.  Assumes tower is non-empty and a valid local join of
     812             :    the current tower, root is the current tower root, last_vote is the
     813             :    serde of the last vote sent, pvtkey / pubkey is the validator
     814             :    identity keypair, fd is a valid open file descriptor to write to, buf
     815             :    is a buffer of at least buf_max bytes to serialize to and write from.
     816             : 
     817             :    The binary layout of the file is compatible with Agave and specified
     818             :    in bincode.  fd_tower_serde_t describes the schema.  Firedancer can
     819             :    restore towers checkpointed by Agave (>=2.3.7) and vice versa.
     820             : 
     821             :    IMPORTANT SAFETY TIP! No other process should be writing to either
     822             :    the memory pointed by tower or the file pointed to by path while
     823             :    fd_tower_checkpt is in progress. */
     824             : 
     825             : int
     826             : fd_tower_checkpt( fd_tower_t const *      tower,
     827             :                   ulong                   root,
     828             :                   fd_tower_sync_serde_t * last_vote,
     829             :                   uchar const             pubkey[static 32],
     830             :                   fd_tower_sign_fn *      sign_fn,
     831             :                   int                     fd,
     832             :                   uchar *                 buf,
     833             :                   ulong                   buf_max );
     834             : 
     835             : /* fd_tower_restore restores tower from the bytes pointed to by restore.
     836             :    Returns 0 on success, -1 on error.  Assumes tower is a valid local
     837             :    join of an empty tower, pubkey is the identity key of the validator
     838             :    associated with the tower, and fd is a valid open file descriptor to
     839             :    read from.  On return, the state of the tower, tower root and tower
     840             :    timestamp as of the time the file was checkpointed will be restored
     841             :    into tower, root and ts respectively. Any errors encountered during
     842             :    restore are logged with as informative an error message as can be
     843             :    contextualized before returning -1.
     844             : 
     845             :    The binary layout of the file is compatible with Agave and specified
     846             :    in bincode.  fd_tower_serde_t describes the schema.  Firedancer can
     847             :    restore towers checkpointed by Agave (>=2.3.7) and vice versa.
     848             : 
     849             :    IMPORTANT SAFETY TIP! No other process should be writing to either
     850             :    the memory pointed by tower or the file pointed to by path while
     851             :    fd_tower_restore is in progress. */
     852             : 
     853             : int
     854             : fd_tower_restore( fd_tower_t * tower,
     855             :                   ulong *      root,
     856             :                   long *       ts,
     857             :                   uchar const  pubkey[static 32],
     858             :                   int          fd,
     859             :                   uchar *      buf,
     860             :                   ulong        buf_max,
     861             :                   ulong *      buf_sz );
     862             : 
     863             : /* fd_tower_serialize serializes the provided serde into buf.  Returns 0
     864             :    on success, -1 if an error is encountered during serialization.
     865             :    Assumes serde is populated with valid local pointers to values to
     866             :    serialize (NULL if a field should be skipped) and buf is at least as
     867             :    large as the serialized size of ser.  Populates the number of bytes
     868             :    serialized in buf_sz.
     869             : 
     870             :    serde is a zero-copy view of the Agave-compatible bincode-encoding of
     871             :    a tower.  See also the struct definition of fd_tower_serde_t. */
     872             : 
     873             : int
     874             : fd_tower_serialize( fd_tower_serde_t * ser,
     875             :                     uchar *            buf,
     876             :                     ulong              buf_max,
     877             :                     ulong *            buf_sz );
     878             : 
     879             : /* fd_tower_deserialize deserializes buf into serde.  Returns 0 on
     880             :    success, -1 if an error is encountered during deserialization.  buf
     881             :    must be at least as large as is required during bincode-decoding of
     882             :    ser otherwise returns an error.
     883             : 
     884             :    serde is a zero-copy view of the Agave-compatible bincode-encoding of
     885             :    a tower.  See also the struct definition of fd_tower_serde_t. */
     886             : 
     887             : int
     888             : fd_tower_deserialize( uchar *            buf,
     889             :                       ulong              buf_sz,
     890             :                       fd_tower_serde_t * de );
     891             : 
     892             : /* fd_tower_verify checks tower is in a valid state. Valid iff:
     893             :    - cnt < FD_TOWER_VOTE_MAX
     894             :    - vote slots and confirmation counts in the tower are monotonically
     895             :      increasing */
     896             : 
     897             : int
     898             : fd_tower_verify( fd_tower_t const * tower );
     899             : 
     900             : /* fd_tower_on_duplicate checks if the tower is on the same fork with an
     901             :    invalid ancestor. */
     902             : 
     903             : int
     904             : fd_tower_on_duplicate( fd_tower_t const * tower, fd_ghost_t const * ghost );
     905             : 
     906             : /* fd_tower_print pretty-prints tower as a formatted table.
     907             : 
     908             :    Sample output:
     909             : 
     910             :         slot | confirmation count
     911             :    --------- | ------------------
     912             :    279803918 | 1
     913             :    279803917 | 2
     914             :    279803916 | 3
     915             :    279803915 | 4
     916             :    279803914 | 5
     917             :    279803913 | 6
     918             :    279803912 | 7
     919             : */
     920             : 
     921             : void
     922             : fd_tower_print( fd_tower_t const * tower, ulong root );
     923             : 
     924             : /* fd_tower_from_vote_acc_data reads into the given tower the given vote account data.
     925             :    The vote account data will be deserialized from the Solana vote account
     926             :    representation, and the tower will be populated with the votes.
     927             : 
     928             :    Assumes tower is a valid local join and currently empty. */
     929             : void
     930             : fd_tower_from_vote_acc_data( uchar const * data,
     931             :                              fd_tower_t *  tower_out );
     932             : 
     933             : #endif /* HEADER_fd_src_choreo_tower_fd_tower_h */

Generated by: LCOV version 1.14