Line data Source code
1 : #define _GNU_SOURCE
2 :
3 : /* Let's say there was a computer, the "leader" computer, that acted as
4 : a bank. Users could send it messages saying they wanted to deposit
5 : money, or transfer it to someone else.
6 :
7 : That's how, for example, Bank of America works but there are problems
8 : with it. One simple problem is: the bank can set your balance to
9 : zero if they don't like you.
10 :
11 : You could try to fix this by having the bank periodically publish the
12 : list of all account balances and transactions. If the customers add
13 : unforgeable signatures to their deposit slips and transfers, then
14 : the bank cannot zero a balance without it being obvious to everyone.
15 :
16 : There's still problems. The bank can't lie about your balance now or
17 : take your money, but it can just not accept deposits on your behalf
18 : by ignoring you.
19 :
20 : You could fix this by getting a few independent banks together, lets
21 : say Bank of America, Bank of England, and Westpac, and having them
22 : rotate who operates the leader computer periodically. If one bank
23 : ignores your deposits, you can just wait and send them to the next
24 : one.
25 :
26 : This is Solana.
27 :
28 : There's still problems of course but they are largely technical. How
29 : do the banks agree who is leader? How do you recover if a leader
30 : misbehaves? How do customers verify the transactions aren't forged?
31 : How do banks receive and publish and verify each others work quickly?
32 : These are the main technical innovations that enable Solana to work
33 : well.
34 :
35 : What about Proof of History?
36 :
37 : One particular niche problem is about the leader schedule. When the
38 : leader computer is moving from one bank to another, the new bank must
39 : wait for the old bank to say it's done and provide a final list of
40 : balances that it can start working off of. But: what if the computer
41 : at the old bank crashes and never says its done?
42 :
43 : Does the new leader just take over at some point? What if the new
44 : leader is malicious, and says the past thousand leaders crashed, and
45 : there have been no transactions for days? How do you check?
46 :
47 : This is what Proof of History solves. Each bank in the network must
48 : constantly do a lot of busywork (compute hashes), even when it is not
49 : leader.
50 :
51 : If the prior thousand leaders crashed, and no transactions happened
52 : in an hour, the new leader would have to show they did about an hour
53 : of busywork for everyone else to believe them.
54 :
55 : A better name for this is proof of skipping. If a leader is skipping
56 : slots (building off of a slot that is not the direct parent), it must
57 : prove that it waited a good amount of time to do so.
58 :
59 : It's not a perfect solution. For one thing, some banks have really
60 : fast computers and can compute a lot of busywork in a short amount of
61 : time, allowing them to skip prior slot(s) anyway. But: there is a
62 : social component that prevents validators from skipping the prior
63 : leader slot. It is easy to detect when this happens and the network
64 : could respond by ignoring their votes or stake.
65 :
66 : You could come up with other schemes: for example, the network could
67 : just use wall clock time. If a new leader publishes a block without
68 : waiting 400 milliseconds for the prior slot to complete, then there
69 : is no "proof of skipping" and the nodes ignore the slot.
70 :
71 : These schemes have a problem in that they are not deterministic
72 : across the network (different computers have different clocks), and
73 : so they will cause frequent forks which are very expensive to
74 : resolve. Even though the proof of history scheme is not perfect,
75 : it is better than any alternative which is not deterministic.
76 :
77 : With all that background, we can now describe at a high level what
78 : this PoH tile actually does,
79 :
80 : (1) Whenever any other leader in the network finishes a slot, and
81 : the slot is determined to be the best one to build off of, this
82 : tile gets "reset" onto that block, the so called "reset slot".
83 :
84 : (2) The tile is constantly doing busy work, hash(hash(hash(...))) on
85 : top of the last reset slot, even when it is not leader.
86 :
87 : (3) When the tile becomes leader, it continues hashing from where it
88 : was. Typically, the prior leader finishes their slot, so the
89 : reset slot will be the parent one, and this tile only publishes
90 : hashes for its own slot. But if prior slots were skipped, then
91 : there might be a whole chain already waiting.
92 :
93 : That's pretty much it. When we are leader, in addition to doing
94 : busywork, we publish ticks and microblocks to the shred tile. A
95 : microblock is a non-empty group of transactions whose hashes are
96 : mixed-in to the chain, while a tick is a periodic stamp of the
97 : current hash, with no transactions (nothing mixed in). We need
98 : to send both to the shred tile, as ticks are important for other
99 : validators to verify in parallel.
100 :
101 : As well, the tile should never become leader for a slot that it has
102 : published anything for, otherwise it may create a duplicate block.
103 :
104 : Some particularly common misunderstandings:
105 :
106 : - PoH is critical to security.
107 :
108 : This largely isn't true. The target hash rate of the network is
109 : so slow (1 hash per 500 nanoseconds) that a malicious leader can
110 : easily catch up if they start from an old hash, and the only
111 : practical attack prevented is the proof of skipping. Most of the
112 : long range attacks in the Solana whitepaper are not relevant.
113 :
114 : - PoH keeps passage of time.
115 :
116 : This is also not true. The way the network keeps time so it can
117 : decide who is leader is that, each leader uses their operating
118 : system clock to time 400 milliseconds and publishes their block
119 : when this timer expires.
120 :
121 : If a leader just hashed as fast as they could, they could publish
122 : a block in tens of milliseconds, and the rest of the network
123 : would happily accept it. This is why the Solana "clock" as
124 : determined by PoH is not accurate and drifts over time.
125 :
126 : - PoH prevents transaction reordering by the leader.
127 :
128 : The leader can, in theory, wait until the very end of their
129 : leader slot to publish anything at all to the network. They can,
130 : in particular, hold all received transactions for 400
131 : milliseconds and then reorder and publish some right at the end
132 : to advantage certain transactions.
133 :
134 : You might be wondering... if all the PoH chain is helping us do is
135 : prove that slots were skipped correctly, why do we need to "mix in"
136 : transactions to the hash value? Or do anything at all for slots
137 : where we don't skip the prior slot?
138 :
139 : It's a good question, and the answer is that this behavior is not
140 : necessary. An ideal implementation of PoH have no concept of ticks
141 : or mixins, and would not be part of the TPU pipeline at all.
142 : Instead, there would be a simple field "skip_proof" on the last
143 : shred we send for a slot, the hash(hash(...)) value. This field
144 : would only be filled in (and only verified by replayers) in cases
145 : where the slot actually skipped a parent.
146 :
147 : Then what is the "clock? In Solana, time is constructed as follows:
148 :
149 : HASHES
150 :
151 : The base unit of time is a hash. Hereafter, any values whose
152 : units are in hashes are called a "hashcnt" to distinguish them
153 : from actual hashed values.
154 :
155 : Agave generally defines a constant duration for each tick
156 : (see below) and then varies the number of hashcnt per tick, but
157 : as we consider the hashcnt the base unit of time, Firedancer and
158 : this PoH implementation defines everything in terms of hashcnt
159 : duration instead.
160 :
161 : In mainnet-beta, testnet, and devnet the hashcnt ticks over
162 : (increments) every 100 nanoseconds. The hashcnt rate is
163 : specified as 500 nanoseconds according to the genesis, but there
164 : are several features which increase the number of hashes per
165 : tick while keeping tick duration constant, which make the time
166 : per hashcnt lower. These features up to and including the
167 : `update_hashes_per_tick6` feature are activated on mainnet-beta,
168 : devnet, and testnet, and are described in the TICKS section
169 : below.
170 :
171 : Other chains and development environments might have a different
172 : hashcnt rate in the genesis, or they might not have activated
173 : the features which increase the rate yet, which we also support.
174 :
175 : In practice, although each validator follows a hashcnt rate of
176 : 100 nanoseconds, the overall observed hashcnt rate of the
177 : network is a little slower than once every 100 nanoseconds,
178 : mostly because there are gaps and clock synchronization issues
179 : during handoff between leaders. This is referred to as clock
180 : drift.
181 :
182 : TICKS
183 :
184 : The leader needs to periodically checkpoint the hash value
185 : associated with a given hashcnt so that they can publish it to
186 : other nodes for verification.
187 :
188 : On mainnet-beta, testnet, and devnet this occurs once every
189 : 62,500 hashcnts, or approximately once every 6.4 microseconds.
190 : This value is determined at genesis time, and according to the
191 : features below, and could be different in development
192 : environments or on other chains which we support.
193 :
194 : Due to protocol limitations, when mixing in transactions to the
195 : proof-of-history chain, it cannot occur on a tick boundary (but
196 : can occur at any other hashcnt).
197 :
198 : Ticks exist mainly so that verification can happen in parallel.
199 : A verifier computer, rather than needing to do hash(hash(...))
200 : all in sequence to verify a proof-of-history chain, can do,
201 :
202 : Core 0: hash(hash(...))
203 : Core 1: hash(hash(...))
204 : Core 2: hash(hash(...))
205 : Core 3: hash(hash(...))
206 : ...
207 :
208 : Between each pair of tick boundaries.
209 :
210 : Solana sometimes calls the current tick the "tick height",
211 : although it makes more sense to think of it as a counter from
212 : zero, it's just the number of ticks since the genesis hash.
213 :
214 : There is a set of features which increase the number of hashcnts
215 : per tick. These are all deployed on mainnet-beta, devnet, and
216 : testnet.
217 :
218 : name: update_hashes_per_tick
219 : id: 3uFHb9oKdGfgZGJK9EHaAXN4USvnQtAFC13Fh5gGFS5B
220 : hashes per tick: 12,500
221 : hashcnt duration: 500 nanos
222 :
223 : name: update_hashes_per_tick2
224 : id: EWme9uFqfy1ikK1jhJs8fM5hxWnK336QJpbscNtizkTU
225 : hashes per tick: 17,500
226 : hashcnt duration: 357.142857143 nanos
227 :
228 : name: update_hashes_per_tick3
229 : id: 8C8MCtsab5SsfammbzvYz65HHauuUYdbY2DZ4sznH6h5
230 : hashes per tick: 27,500
231 : hashcnt duration: 227.272727273 nanos
232 :
233 : name: update_hashes_per_tick4
234 : id: 8We4E7DPwF2WfAN8tRTtWQNhi98B99Qpuj7JoZ3Aikgg
235 : hashes per tick: 47,500
236 : hashcnt duration: 131.578947368 nanos
237 :
238 : name: update_hashes_per_tick5
239 : id: BsKLKAn1WM4HVhPRDsjosmqSg2J8Tq5xP2s2daDS6Ni4
240 : hashes per tick: 57,500
241 : hashcnt duration: 108.695652174 nanos
242 :
243 : name: update_hashes_per_tick6
244 : id: FKu1qYwLQSiehz644H6Si65U5ZQ2cp9GxsyFUfYcuADv
245 : hashes per tick: 62,500
246 : hashcnt duration: 100 nanos
247 :
248 : In development environments, there is a way to configure the
249 : hashcnt per tick to be "none" during genesis, for a so-called
250 : "low power" tick producer. The idea is not to spin cores during
251 : development. This is equivalent to setting the hashcnt per tick
252 : to be 1, and increasing the hashcnt duration to the desired tick
253 : duration.
254 :
255 : SLOTS
256 :
257 : Each leader needs to be leader for a fixed amount of time, which
258 : is called a slot. During a slot, a leader has an opportunity to
259 : receive transactions and produce a block for the network,
260 : although they may miss ("skip") the slot if they are offline or
261 : not behaving.
262 :
263 : In mainnet-beta, testnet, and devnet a slot is 64 ticks, or
264 : 4,000,000 hashcnts, or approximately 400 milliseconds.
265 :
266 : Due to the way the leader schedule is constructed, each leader
267 : is always given at least four (4) consecutive slots in the
268 : schedule. This means when becoming leader you will be leader
269 : for at least 4 slots, or 1.6 seconds.
270 :
271 : It is rare, although can happen that a leader gets more than 4
272 : consecutive slots (eg, 8, or 12), if they are lucky with the
273 : leader schedule generation.
274 :
275 : The number of ticks in a slot is fixed at genesis time, and
276 : could be different for development or other chains, which we
277 : support. There is nothing special about 4 leader slots in a
278 : row, and this might be changed in future, and the proof of
279 : history makes no assumptions that this is the case.
280 :
281 : EPOCHS
282 :
283 : Infrequently, the network needs to do certain housekeeping,
284 : mainly things like collecting rent and deciding on the leader
285 : schedule. The length of an epoch is fixed on mainnet-beta,
286 : devnet and testnet at 420,000 slots, or around ~2 (1.94) days.
287 : This value is fixed at genesis time, and could be different for
288 : other chains including development, which we support. Typically
289 : in development, epochs are every 8,192 slots, or around ~1 hour
290 : (54.61 minutes), although it depends on the number of ticks per
291 : slot and the target hashcnt rate of the genesis as well.
292 :
293 : In development, epochs need not be a fixed length either. There
294 : is a "warmup" option, where epochs start short and grow, which
295 : is useful for quickly warming up stake during development.
296 :
297 : The epoch is important because it is the only time the leader
298 : schedule is updated. The leader schedule is a list of which
299 : leader is leader for which slot, and is generated by a special
300 : algorithm that is deterministic and known to all nodes.
301 :
302 : The leader schedule is computed one epoch in advance, so that
303 : at slot T, we always know who will be leader up until the end
304 : of slot T+EPOCH_LENGTH. Specifically, the leader schedule for
305 : epoch N is computed during the epoch boundary crossing from
306 : N-2 to N-1. For mainnet-beta, the slots per epoch is fixed and
307 : will always be 420,000. */
308 :
309 : #include "../../disco/tiles.h"
310 : #include "../../disco/fd_txn_m.h"
311 : #include "../../disco/bundle/fd_bundle_crank.h"
312 : #include "../../disco/pack/fd_pack.h"
313 : #include "../../disco/pack/fd_pack_cost.h"
314 : #include "../../ballet/sha256/fd_sha256.h"
315 : #include "../../disco/metrics/fd_metrics.h"
316 : #include "../../util/pod/fd_pod.h"
317 : #include "../../disco/shred/fd_shredder.h"
318 : #include "../../disco/keyguard/fd_keyload.h"
319 : #include "../../disco/keyguard/fd_keyswitch.h"
320 : #include "../plugin/fd_plugin.h"
321 : #include "../../flamenco/leaders/fd_multi_epoch_leaders.h"
322 :
323 : #include <string.h>
324 :
325 : /* The maximum number of microblocks that pack is allowed to pack into a
326 : single slot. This is not consensus critical, and pack could, if we
327 : let it, produce as many microblocks as it wants, and the slot would
328 : still be valid.
329 :
330 : We have this here instead so that PoH can estimate slot completion,
331 : and keep the hashcnt up to date as pack progresses through packing
332 : the slot. If this upper bound was not enforced, PoH could tick to
333 : the last hash of the slot and have no hashes left to mixin incoming
334 : microblocks from pack, so this upper bound is a coordination
335 : mechanism so that PoH can progress hashcnts while the slot is active,
336 : and know that pack will not need those hashcnts later to do mixins. */
337 0 : #define MAX_MICROBLOCKS_PER_SLOT (131072UL)
338 :
339 : /* When we are hashing in the background in case a prior leader skips
340 : their slot, we need to store the result of each tick hash so we can
341 : publish them when we become leader. The network requires at least
342 : one leader slot to publish in each epoch for the leader schedule to
343 : generate, so in the worst case we might need two full epochs of slots
344 : to store the hashes. (Eg, if epoch T only had a published slot in
345 : position 0 and epoch T+1 only had a published slot right at the end).
346 :
347 : There is a tighter bound: the block data limit of mainnet-beta is
348 : currently FD_PACK_MAX_DATA_PER_BLOCK, or 27,332,342 bytes per slot.
349 : At 48 bytes per tick, it is not possible to publish a slot that skips
350 : 569,424 or more prior slots. */
351 0 : #define MAX_SKIPPED_TICKS (1UL+(FD_PACK_MAX_DATA_PER_BLOCK/48UL))
352 :
353 0 : #define IN_KIND_BANK (0)
354 0 : #define IN_KIND_PACK (1)
355 0 : #define IN_KIND_EPOCH (2)
356 :
357 :
358 : struct fd_pohh_in {
359 : fd_wksp_t * mem;
360 : ulong chunk0;
361 : ulong wmark;
362 : };
363 :
364 : typedef struct fd_pohh_in fd_pohh_in_t;
365 :
366 : struct fd_pohh_out {
367 : ulong idx;
368 : fd_wksp_t * mem;
369 : ulong chunk0;
370 : ulong wmark;
371 : ulong chunk;
372 : };
373 :
374 : typedef struct fd_pohh_out fd_pohh_out_t;
375 :
376 : struct fd_pohh_tile {
377 : fd_stem_context_t * stem;
378 :
379 : /* Static configuration determined at genesis creation time. See
380 : long comment above for more information. */
381 : ulong tick_duration_ns;
382 : ulong hashcnt_per_tick;
383 : ulong ticks_per_slot;
384 :
385 : /* Derived from the above configuration, but we precompute it. */
386 : double slot_duration_ns;
387 : double hashcnt_duration_ns;
388 : ulong hashcnt_per_slot;
389 :
390 : /* The maximum number of real microblocks that the pack tile is
391 : allowed to publish in each slot.
392 :
393 : While we are leader, PoH internally treats this limit as having
394 : one extra phantom "microblock" reserved for the done_packing
395 : message, so that PoH does not finish the slot before pack
396 : confirms it is done. Pack itself is configured with the
397 : un-inflated limit and never publishes more than this many real
398 : microblocks per slot. */
399 : ulong max_microblocks_per_slot;
400 :
401 : /* Consensus-critical slot cost limits. */
402 : struct {
403 : ulong slot_max_cost;
404 : ulong slot_max_vote_cost;
405 : ulong slot_max_write_cost_per_acct;
406 : } limits;
407 :
408 : /* The current slot and hashcnt within that slot of the proof of
409 : history, including hashes we have been producing in the background
410 : while waiting for our next leader slot. */
411 : ulong slot;
412 : ulong hashcnt;
413 : ulong cus_used;
414 :
415 : /* When we send a microblock on to the shred tile, we need to tell
416 : it how many hashes there have been since the last microblock, so
417 : this tracks the hashcnt of the last published microblock.
418 :
419 : If we are skipping slots prior to our leader slot, the last_slot
420 : will be quite old, and potentially much larger than the number of
421 : hashcnts in one slot. */
422 : ulong last_slot;
423 : ulong last_hashcnt;
424 :
425 : /* If we have published a tick or a microblock for a particular slot
426 : to the shred tile, we should never become leader for that slot
427 : again, otherwise we could publish a duplicate block.
428 :
429 : This value tracks the max slot that we have published a tick or
430 : microblock for so we can prevent this. */
431 : ulong highwater_leader_slot;
432 :
433 : /* See how this field is used below. If we have sequential leader
434 : slots, we don't reset the expected slot end time between the two,
435 : to prevent clock drift. If we didn't do this, our 2nd slot would
436 : end 400ms + `time_for_replay_to_move_slot_and_reset_poh` after
437 : our 1st, rather than just strictly 400ms. */
438 : int lagged_consecutive_leader_start;
439 : ulong expect_sequential_leader_slot;
440 :
441 : /* There's a race condition ... let's say two banks A and B, bank A
442 : processes some transactions, then releases the account locks, and
443 : sends the microblock to PoH to be stamped. Pack now re-packs the
444 : same accounts with a new microblock, sends to bank B, bank B
445 : executes and sends the microblock to PoH, and this all happens fast
446 : enough that PoH picks the 2nd block to stamp before the 1st. The
447 : accounts database changes now are misordered with respect to PoH so
448 : replay could fail.
449 :
450 : To prevent this race, we order all microblocks and only process
451 : them in PoH in the order they are produced by pack. This is a
452 : little bit over-strict, we just need to ensure that microblocks
453 : with conflicting accounts execute in order, but this is easiest to
454 : implement for now. */
455 : uint expect_pack_idx;
456 :
457 : /* Pack and bank tiles need a reference to the bank object with a
458 : slightly different lifetime than current_leader_bank, particularly
459 : when we switch forks in the middle of a leader slot. We need to
460 : make sure we don't free the last reference to the bank while the
461 : pack or bank tiles are still using it. The strange thing is that
462 : bank tiles have no concept of the current slot, but we know they're
463 : done with the bank object when pack's inter-slot bank draining
464 : process is complete. Pack notifies PoH by a frag with
465 : sig==ULONG_MAX on the pack_poh link when the banks are drained, and
466 : the PoH tile must then free the reference on behalf of pack.
467 :
468 : pack_leader_bank is non-NULL when the reference we're holding on
469 : behalf of the pack tile is acquired, and NULL when it is not
470 : acquired. */
471 : void const * pack_leader_bank;
472 :
473 : /* Store tile needs another reference to the bank object with its
474 : own lifetime requirements. We need to set the block_id of a
475 : slot to the merkle root of the last FEC set in the slot. We
476 : also want to make sure that we correctly release the reference
477 : in case we abandon a slot midway and never send a SLOT_COMPLETE.
478 : For the latter part, we track the slot we acquired the bank for
479 : so that we can signal to release it if we have moved on from
480 : that slot. Since poh does not have a direct link to store,
481 : we pass the bank pointer using the poh_shred -> shred_store
482 : route.*/
483 : void const * store_leader_bank;
484 : ulong store_leader_bank_slot;
485 :
486 : /* The PoH tile must never drop microblocks that get committed by the
487 : bank, so it needs to always be able to mixin a microblock hash.
488 : Mixing in requires incrementing the hashcnt, so we need to ensure
489 : at all times that there is enough hascnts left in the slot to
490 : mixin whatever future microblocks pack might produce for it.
491 :
492 : This value tracks that. At any time, max_microblocks_per_slot
493 : - microblocks_lower_bound is an upper bound on the maximum number
494 : of microblocks that might still be received in this slot. */
495 : ulong microblocks_lower_bound;
496 :
497 : uchar __attribute__((aligned(32UL))) reset_hash[ 32 ];
498 : uchar __attribute__((aligned(32UL))) hash[ 32 ];
499 :
500 : /* When we are not leader, we need to save the hashes that were
501 : produced in case the prior leader skips. If they skip, we will
502 : replay these skipped hashes into our next leader bank so that
503 : the slot hashes sysvar can be updated correctly, and also publish
504 : them to peer nodes as part of our outgoing shreds. */
505 : uchar skipped_tick_hashes[ MAX_SKIPPED_TICKS ][ 32 ];
506 :
507 : /* The timestamp in nanoseconds of when the reset slot was received.
508 : This is the timestamp we are building on top of to determine when
509 : our next leader slot starts. */
510 : long reset_slot_start_ns;
511 :
512 : /* The timestamp in nanoseconds of when we got the bank for the
513 : current leader slot. */
514 : long leader_bank_start_ns;
515 :
516 : /* The hashcnt corresponding to the start of the current reset slot. */
517 : ulong reset_slot;
518 :
519 : /* The hashcnt at which our next leader slot begins, or ULONG max if
520 : we have no known next leader slot. */
521 : ulong next_leader_slot;
522 :
523 : /* If an in progress frag should be skipped */
524 : int skip_frag;
525 :
526 : ulong max_active_descendant;
527 :
528 : /* If we currently are the leader according the clock AND we have
529 : received the leader bank for the slot from the replay stage,
530 : this value will be non-NULL.
531 :
532 : Note that we might be inside our leader slot, but not have a bank
533 : yet, in which case this will still be NULL.
534 :
535 : It will be NULL for a brief race period between consecutive leader
536 : slots, as we ping-pong back to replay stage waiting for a new bank.
537 :
538 : Agave refers to this as the "working bank". */
539 : void const * current_leader_bank;
540 :
541 : fd_sha256_t * sha256;
542 :
543 : fd_multi_epoch_leaders_t * mleaders;
544 :
545 : /* The last sequence number of an outgoing fragment to the shred tile,
546 : or ULONG max if no such fragment. See fd_keyswitch.h for details
547 : of how this is used. */
548 : ulong shred_seq;
549 :
550 : int halted_switching_key;
551 :
552 : fd_keyswitch_t * keyswitch;
553 : fd_pubkey_t identity_key;
554 :
555 : /* We need a few pieces of information to compute the right addresses
556 : for bundle crank information that we need to send to pack. */
557 : struct {
558 : int enabled;
559 : fd_pubkey_t vote_account;
560 : fd_bundle_crank_gen_t gen[1];
561 : } bundle;
562 :
563 :
564 : /* The Agave client needs to be notified when the leader changes,
565 : so that they can resume the replay stage if it was suspended waiting. */
566 : void * signal_leader_change;
567 :
568 : /* These are temporarily set in during_frag so they can be used in
569 : after_frag once the frag has been validated as not overrun. */
570 : uchar _txns[ USHORT_MAX ];
571 : fd_microblock_trailer_t _microblock_trailer[ 1 ];
572 :
573 : int in_kind[ 64 ];
574 : fd_pohh_in_t in[ 64 ];
575 :
576 : fd_pohh_out_t shred_out[ 1 ];
577 : fd_pohh_out_t pack_out[ 1 ];
578 : fd_pohh_out_t plugin_out[ 1 ];
579 :
580 : fd_histf_t begin_leader_delay[ 1 ];
581 : fd_histf_t first_microblock_delay[ 1 ];
582 : fd_histf_t slot_done_delay[ 1 ];
583 : fd_histf_t bundle_init_delay[ 1 ];
584 :
585 : ulong features_activation_avail;
586 : fd_shred_features_activation_t features_activation[1];
587 :
588 : ulong parent_slot;
589 : uchar parent_block_id[ 32 ];
590 :
591 : uchar __attribute__((aligned(FD_MULTI_EPOCH_LEADERS_ALIGN))) mleaders_mem[ FD_MULTI_EPOCH_LEADERS_FOOTPRINT ];
592 : };
593 :
594 : typedef struct fd_pohh_tile fd_pohh_tile_t;
595 :
596 : /* The PoH recorder is implemented in Firedancer but for now needs to
597 : work with Agave, so we have a locking scheme for them to
598 : co-operate.
599 :
600 : This is because the PoH tile lives in the Agave memory address
601 : space and their version of concurrency is locking the PoH recorder
602 : and reading arbitrary fields.
603 :
604 : So we allow them to lock the PoH tile, although with a very bad (for
605 : them) locking scheme. By default, the tile has full and exclusive
606 : access to the data. If part of Agave wishes to read/write they
607 : can either,
608 :
609 : 1. Rewrite their concurrency to message passing based on mcache
610 : (preferred, but not feasible).
611 : 2. Signal to the tile they wish to acquire the lock, by setting
612 : fd_poh_waiting_lock to 1.
613 :
614 : During after_credit, the tile will check if the waiting lock is set
615 : to 1, and if so, set the returned lock to 1, indicating to the waiter
616 : that they may now proceed.
617 :
618 : When the waiter is done reading and writing, they restore the
619 : returned lock value back to zero, and the POH tile continues with its
620 : day. */
621 :
622 : static fd_pohh_tile_t * fd_pohh_global_ctx;
623 :
624 : static volatile ulong fd_poh_waiting_lock __attribute__((aligned(128UL)));
625 : static volatile ulong fd_poh_returned_lock __attribute__((aligned(128UL)));
626 :
627 : /* Agave also needs to write to some mcaches, so we trampoline
628 : that via. the PoH tile as well. */
629 :
630 : struct poh_link {
631 : fd_frag_meta_t * mcache;
632 : ulong depth;
633 : ulong tx_seq;
634 :
635 : void * mem;
636 : void * dcache;
637 : ulong chunk0;
638 : ulong wmark;
639 : ulong chunk;
640 :
641 : ulong cr_avail;
642 : ulong rx_cnt;
643 : ulong * rx_fseqs[ 32UL ];
644 : };
645 :
646 : typedef struct poh_link poh_link_t;
647 :
648 : static poh_link_t gossip_dedup;
649 : static poh_link_t stake_out;
650 : static poh_link_t crds_shred;
651 : static poh_link_t replay_resolh;
652 : static poh_link_t executed_txn;
653 :
654 : static poh_link_t replay_plugin;
655 : static poh_link_t gossip_plugin;
656 : static poh_link_t start_progress_plugin;
657 : static poh_link_t vote_listener_plugin;
658 : static poh_link_t validator_info_plugin;
659 :
660 : static void
661 0 : poh_link_wait_credit( poh_link_t * link ) {
662 0 : if( FD_LIKELY( link->cr_avail ) ) return;
663 :
664 0 : while( 1 ) {
665 0 : ulong cr_query = ULONG_MAX;
666 0 : for( ulong i=0UL; i<link->rx_cnt; i++ ) {
667 0 : ulong const * _rx_seq = link->rx_fseqs[ i ];
668 0 : ulong rx_seq = FD_VOLATILE_CONST( *_rx_seq );
669 0 : ulong rx_cr_query = (ulong)fd_long_max( (long)link->depth - fd_long_max( fd_seq_diff( link->tx_seq, rx_seq ), 0L ), 0L );
670 0 : cr_query = fd_ulong_min( rx_cr_query, cr_query );
671 0 : }
672 0 : if( FD_LIKELY( cr_query>0UL ) ) {
673 0 : link->cr_avail = cr_query;
674 0 : break;
675 0 : }
676 0 : FD_SPIN_PAUSE();
677 0 : }
678 0 : }
679 :
680 : static void
681 : poh_link_publish( poh_link_t * link,
682 : ulong sig,
683 : uchar const * data,
684 0 : ulong data_sz ) {
685 0 : while( FD_UNLIKELY( !FD_VOLATILE_CONST( link->mcache ) ) ) FD_SPIN_PAUSE();
686 0 : if( FD_UNLIKELY( !link->mem ) ) return; /* link not enabled, don't publish */
687 0 : poh_link_wait_credit( link );
688 :
689 0 : uchar * dst = (uchar *)fd_chunk_to_laddr( link->mem, link->chunk );
690 0 : fd_memcpy( dst, data, data_sz );
691 0 : ulong tspub = (ulong)fd_frag_meta_ts_comp( fd_tickcount() );
692 0 : fd_mcache_publish( link->mcache, link->depth, link->tx_seq, sig, link->chunk, data_sz, 0UL, 0UL, tspub );
693 0 : link->chunk = fd_dcache_compact_next( link->chunk, data_sz, link->chunk0, link->wmark );
694 0 : link->cr_avail--;
695 0 : link->tx_seq++;
696 0 : }
697 :
698 : static void
699 : poh_link_init( poh_link_t * link,
700 : fd_topo_t const * topo,
701 : fd_topo_tile_t const * tile,
702 0 : ulong out_idx ) {
703 0 : fd_topo_link_t const * topo_link = &topo->links[ tile->out_link_id[ out_idx ] ];
704 0 : fd_topo_wksp_t const * wksp = &topo->workspaces[ topo->objs[ topo_link->dcache_obj_id ].wksp_id ];
705 :
706 0 : link->mem = wksp->wksp;
707 0 : link->depth = fd_mcache_depth( topo_link->mcache );
708 0 : link->tx_seq = 0UL;
709 0 : link->dcache = topo_link->dcache;
710 0 : link->chunk0 = fd_dcache_compact_chunk0( wksp->wksp, topo_link->dcache );
711 0 : link->wmark = fd_dcache_compact_wmark ( wksp->wksp, topo_link->dcache, topo_link->mtu );
712 0 : link->chunk = link->chunk0;
713 0 : link->cr_avail = 0UL;
714 0 : link->rx_cnt = 0UL;
715 0 : for( ulong i=0UL; i<topo->tile_cnt; i++ ) {
716 0 : fd_topo_tile_t const * _tile = &topo->tiles[ i ];
717 0 : for( ulong j=0UL; j<_tile->in_cnt; j++ ) {
718 0 : if( _tile->in_link_id[ j ]==topo_link->id && _tile->in_link_reliable[ j ] ) {
719 0 : FD_TEST( link->rx_cnt<32UL );
720 0 : link->rx_fseqs[ link->rx_cnt++ ] = _tile->in_link_fseq[ j ];
721 0 : break;
722 0 : }
723 0 : }
724 0 : }
725 0 : FD_COMPILER_MFENCE();
726 0 : link->mcache = topo_link->mcache;
727 0 : FD_COMPILER_MFENCE();
728 0 : FD_TEST( link->mcache );
729 0 : }
730 :
731 : /* To help show correctness, functions that might be called from
732 : Rust, either directly or indirectly, have this fake "attribute"
733 : CALLED_FROM_RUST, which is actually nothing. Calls from Rust
734 : typically execute on threads did not call fd_boot, so they do not
735 : have the typical FD_TL variables. In particular, they cannot use
736 : normal metrics, and their log messages don't have full context.
737 : Additionally, Rust functions marked CALLED_FROM_RUST cannot call back
738 : into a C fd_ext function without causing a deadlock (although the
739 : other Rust fd_ext functions have a similar problem).
740 :
741 : To prevent annotation from polluting the whole codebase, calls to
742 : functions outside this file are manually checked and marked as being
743 : safe at each call rather than annotated. */
744 : #define CALLED_FROM_RUST
745 :
746 : static CALLED_FROM_RUST fd_pohh_tile_t *
747 0 : fd_ext_poh_write_lock( void ) {
748 0 : for(;;) {
749 : /* Acquire the waiter lock to make sure we are the first writer in the queue. */
750 0 : if( FD_LIKELY( !FD_ATOMIC_CAS( &fd_poh_waiting_lock, 0UL, 1UL) ) ) break;
751 0 : FD_SPIN_PAUSE();
752 0 : }
753 0 : FD_COMPILER_MFENCE();
754 0 : for(;;) {
755 : /* Now wait for the tile to tell us we can proceed. */
756 0 : if( FD_LIKELY( FD_VOLATILE_CONST( fd_poh_returned_lock ) ) ) break;
757 0 : FD_SPIN_PAUSE();
758 0 : }
759 0 : FD_COMPILER_MFENCE();
760 0 : return fd_pohh_global_ctx;
761 0 : }
762 :
763 : static CALLED_FROM_RUST void
764 0 : fd_ext_poh_write_unlock( void ) {
765 0 : FD_COMPILER_MFENCE();
766 0 : FD_VOLATILE( fd_poh_returned_lock ) = 0UL;
767 0 : }
768 :
769 : /* The PoH tile needs to interact with the Agave address space to
770 : do certain operations that Firedancer hasn't reimplemented yet, a.k.a
771 : transaction execution. We have Agave export some wrapper
772 : functions that we call into during regular tile execution. These do
773 : not need any locking, since they are called serially from the single
774 : PoH tile. */
775 :
776 : extern CALLED_FROM_RUST void fd_ext_bank_acquire( void const * bank );
777 : extern CALLED_FROM_RUST void fd_ext_bank_release( void const * bank );
778 : extern CALLED_FROM_RUST void fd_ext_poh_signal_leader_change( void * sender );
779 : extern void fd_ext_poh_register_tick( void const * bank, uchar const * hash );
780 :
781 : /* fd_ext_poh_initialize is called by Agave on startup to
782 : initialize the PoH tile with some static configuration, and the
783 : initial reset slot and hash which it retrieves from a snapshot.
784 :
785 : This function is called by some random Agave thread, but
786 : it blocks booting of the PoH tile. The tile will spin until it
787 : determines that this initialization has happened.
788 :
789 : signal_leader_change is an opaque Rust object that is used to
790 : tell the replay stage that the leader has changed. It is a
791 : Box::into_raw(Arc::increment_strong(crossbeam::Sender)), so it
792 : has infinite lifetime unless this C code releases the refcnt.
793 :
794 : It can be used with `fd_ext_poh_signal_leader_change` which
795 : will just issue a nonblocking send on the channel. */
796 :
797 : CALLED_FROM_RUST void
798 : fd_ext_poh_initialize( ulong tick_duration_ns, /* See clock comments above, will be 6.4 microseconds for mainnet-beta. */
799 : ulong hashcnt_per_tick, /* See clock comments above, will be 62,500 for mainnet-beta. */
800 : ulong ticks_per_slot, /* See clock comments above, will almost always be 64. */
801 : ulong tick_height, /* The counter (height) of the tick to start hashing on top of. */
802 : uchar const * last_entry_hash, /* Points to start of a 32 byte region of memory, the hash itself at the tick height. */
803 0 : void * signal_leader_change /* See comment above. */ ) {
804 0 : FD_COMPILER_MFENCE();
805 0 : for(;;) {
806 : /* Make sure the ctx is initialized before trying to take the lock. */
807 0 : if( FD_LIKELY( FD_VOLATILE_CONST( fd_pohh_global_ctx ) ) ) break;
808 0 : FD_SPIN_PAUSE();
809 0 : }
810 0 : fd_pohh_tile_t * ctx = fd_ext_poh_write_lock();
811 :
812 0 : ctx->slot = tick_height/ticks_per_slot;
813 0 : ctx->hashcnt = 0UL;
814 0 : ctx->cus_used = 0UL;
815 0 : ctx->last_slot = ctx->slot;
816 0 : ctx->last_hashcnt = 0UL;
817 0 : ctx->reset_slot = ctx->slot;
818 0 : ctx->reset_slot_start_ns = fd_log_wallclock(); /* safe to call from Rust */
819 :
820 0 : memcpy( ctx->reset_hash, last_entry_hash, 32UL );
821 0 : memcpy( ctx->hash, last_entry_hash, 32UL );
822 :
823 0 : ctx->signal_leader_change = signal_leader_change;
824 :
825 : /* Static configuration about the clock. */
826 0 : ctx->tick_duration_ns = tick_duration_ns;
827 0 : ctx->hashcnt_per_tick = hashcnt_per_tick;
828 0 : ctx->ticks_per_slot = ticks_per_slot;
829 :
830 : /* Recompute derived information about the clock. */
831 0 : ctx->slot_duration_ns = (double)ticks_per_slot*(double)tick_duration_ns;
832 0 : ctx->hashcnt_duration_ns = (double)tick_duration_ns/(double)hashcnt_per_tick;
833 0 : ctx->hashcnt_per_slot = ticks_per_slot*hashcnt_per_tick;
834 :
835 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick==1UL ) ) {
836 : /* Low power producer, maximum of one microblock per tick in the slot */
837 0 : ctx->max_microblocks_per_slot = ctx->ticks_per_slot;
838 0 : } else {
839 : /* See the long comment in after_credit for this limit */
840 0 : ctx->max_microblocks_per_slot = fd_ulong_min( MAX_MICROBLOCKS_PER_SLOT, ctx->ticks_per_slot*(ctx->hashcnt_per_tick-1UL) );
841 0 : }
842 :
843 0 : fd_ext_poh_write_unlock();
844 0 : }
845 :
846 : /* fd_ext_poh_acquire_bank gets the current leader bank if there is one
847 : currently active. PoH might think we are leader without having a
848 : leader bank if the replay stage has not yet noticed we are leader.
849 :
850 : The bank that is returned is owned the caller, and must be converted
851 : to an Arc<Bank> by calling Arc::from_raw() on it. PoH increments the
852 : reference count before returning the bank, so that it can also keep
853 : its internal copy.
854 :
855 : If there is no leader bank, NULL is returned. In this case, the
856 : caller should not call `Arc::from_raw()`. */
857 :
858 : CALLED_FROM_RUST void const *
859 0 : fd_ext_poh_acquire_leader_bank( void ) {
860 0 : fd_pohh_tile_t * ctx = fd_ext_poh_write_lock();
861 0 : void const * bank = NULL;
862 0 : if( FD_LIKELY( ctx->current_leader_bank ) ) {
863 : /* Clone refcount before we release the lock. */
864 0 : fd_ext_bank_acquire( ctx->current_leader_bank );
865 0 : bank = ctx->current_leader_bank;
866 0 : }
867 0 : fd_ext_poh_write_unlock();
868 0 : return bank;
869 0 : }
870 :
871 : /* fd_ext_poh_reset_slot returns the slot height one above the last good
872 : (unskipped) slot we are building on top of. This is always a good
873 : known value, and will not be ULONG_MAX. */
874 :
875 : CALLED_FROM_RUST ulong
876 0 : fd_ext_poh_reset_slot( void ) {
877 0 : fd_pohh_tile_t * ctx = fd_ext_poh_write_lock();
878 0 : ulong reset_slot = ctx->reset_slot;
879 0 : fd_ext_poh_write_unlock();
880 0 : return reset_slot;
881 0 : }
882 :
883 : CALLED_FROM_RUST void
884 0 : fd_ext_poh_update_active_descendant( ulong max_active_descendant ) {
885 0 : fd_pohh_tile_t * ctx = fd_ext_poh_write_lock();
886 0 : ctx->max_active_descendant = max_active_descendant;
887 0 : fd_ext_poh_write_unlock();
888 0 : }
889 :
890 : /* fd_ext_poh_reached_leader_slot returns 1 if we have reached a slot
891 : where we are leader. This is used by the replay stage to determine
892 : if it should create a new leader bank descendant of the prior reset
893 : slot block.
894 :
895 : Sometimes, even when we reach our slot we do not return 1, as we are
896 : giving a grace period to the prior leader to finish publishing their
897 : block.
898 :
899 : out_leader_slot is the slot height of the leader slot we reached, and
900 : reset_slot is the slot height of the last good (unskipped) slot we
901 : are building on top of. */
902 :
903 : CALLED_FROM_RUST int
904 : fd_ext_poh_reached_leader_slot( ulong * out_leader_slot,
905 0 : ulong * out_reset_slot ) {
906 0 : fd_pohh_tile_t * ctx = fd_ext_poh_write_lock();
907 :
908 0 : *out_leader_slot = ctx->next_leader_slot;
909 0 : *out_reset_slot = ctx->reset_slot;
910 :
911 0 : if( FD_UNLIKELY( ctx->next_leader_slot==ULONG_MAX ||
912 0 : ctx->slot<ctx->next_leader_slot ) ) {
913 : /* Didn't reach our leader slot yet. */
914 0 : fd_ext_poh_write_unlock();
915 0 : return 0;
916 0 : }
917 :
918 0 : if( FD_UNLIKELY( ctx->halted_switching_key ) ) {
919 : /* Reached our leader slot, but the leader pipeline is halted
920 : because we are switching identity key. */
921 0 : fd_ext_poh_write_unlock();
922 0 : return 0;
923 0 : }
924 :
925 0 : if( FD_LIKELY( ctx->reset_slot==ctx->next_leader_slot ) ) {
926 : /* We were reset onto our leader slot, because the prior leader
927 : completed theirs, so we should start immediately, no need for a
928 : grace period. */
929 0 : fd_ext_poh_write_unlock();
930 0 : return 1;
931 0 : }
932 :
933 0 : fd_pubkey_t const * reset_leader = fd_multi_epoch_leaders_get_leader_for_slot( ctx->mleaders, ctx->reset_slot );
934 0 : if( FD_UNLIKELY( reset_leader && fd_memeq( reset_leader, ctx->identity_key.uc, 32UL ) ) ) {
935 : /* Surprisingly, in some rare cases where we're skipping ourselves,
936 : the following can occur:
937 : Reset onto n-1
938 : Tick into slot n, become leader for slot n, skipping slot n-1
939 : Prior leader start publishing slot n-1
940 : max_active_descendant is set to n
941 : Switch forks, abandon slot n, reset onto slot n
942 : In this case, next_leader_slot is n+1 because we can't become
943 : leader again for slot n. We don't want to give ourselves any
944 : grace time though; we want to start n+1 as soon as the hashing
945 : is ready. */
946 0 : fd_ext_poh_write_unlock();
947 0 : return 1;
948 0 : }
949 :
950 0 : long now_ns = fd_log_wallclock();
951 0 : long expected_start_time_ns = ctx->reset_slot_start_ns + (long)((double)(ctx->next_leader_slot-ctx->reset_slot)*ctx->slot_duration_ns);
952 :
953 : /* Now we're faced with the question of how much grace to give the
954 : prior leader before trying to skip them. If they are still in the
955 : process of publishing their slot, delay ours to let them finish ...
956 : unless they are so delayed that we risk getting skipped by the
957 : leader following us. 1.2 seconds is a reasonable default here,
958 : although any value between 0 and 1.6 seconds could be considered
959 : reasonable. If they haven't started their last block, but we're
960 : reset on their second to last block, we'll give them an extra
961 : 400ms. This is arbitrary and chosen due to intuition. */
962 :
963 0 : long start_time_with_grace_ns = expected_start_time_ns;
964 :
965 0 : if( FD_UNLIKELY( ctx->max_active_descendant>=ctx->next_leader_slot ) ) {
966 : /* If the max_active_descendant is >= next_leader_slot, we waited
967 : too long and a leader after us started publishing to try and skip
968 : us. Just start our leader slot immediately, we might win ... */
969 0 : start_time_with_grace_ns = now_ns;
970 0 : } else if( FD_LIKELY( ctx->max_active_descendant>=ctx->reset_slot ) ) {
971 : /* If one of the leaders between the reset slot and our leader
972 : slot is in the process of publishing (they have a descendant
973 : bank that is in progress of being replayed), then keep waiting.
974 : We probably wouldn't get a leader slot out before they
975 : finished. */
976 0 : start_time_with_grace_ns += (long)(3.0*ctx->slot_duration_ns);
977 0 : } else if( FD_LIKELY( ctx->next_leader_slot==ctx->reset_slot+1UL ) ) {
978 : /* We finished replaying the slot two before ours, which means the
979 : prior leader is probably online, but they haven't started
980 : publishing the slot immediately prior to ours. Give the prior
981 : leader a little more time. */
982 0 : start_time_with_grace_ns += (long)(1.0*ctx->slot_duration_ns);
983 0 : }
984 :
985 :
986 0 : if( FD_UNLIKELY( now_ns<start_time_with_grace_ns ) ) {
987 0 : fd_ext_poh_write_unlock();
988 0 : return 0;
989 0 : }
990 :
991 0 : fd_ext_poh_write_unlock();
992 0 : return 1;
993 0 : }
994 :
995 : CALLED_FROM_RUST static inline void
996 : publish_plugin_slot_start( fd_pohh_tile_t * ctx,
997 : ulong slot,
998 0 : ulong parent_slot ) {
999 0 : if( FD_UNLIKELY( !ctx->plugin_out->mem ) ) return;
1000 :
1001 0 : fd_plugin_msg_slot_start_t * slot_start = (fd_plugin_msg_slot_start_t *)fd_chunk_to_laddr( ctx->plugin_out->mem, ctx->plugin_out->chunk );
1002 0 : *slot_start = (fd_plugin_msg_slot_start_t){ .slot = slot, .parent_slot = parent_slot };
1003 0 : fd_stem_publish( ctx->stem, ctx->plugin_out->idx, FD_PLUGIN_MSG_SLOT_START, ctx->plugin_out->chunk, sizeof(fd_plugin_msg_slot_start_t), 0UL, 0UL, 0UL );
1004 0 : ctx->plugin_out->chunk = fd_dcache_compact_next( ctx->plugin_out->chunk, sizeof(fd_plugin_msg_slot_start_t), ctx->plugin_out->chunk0, ctx->plugin_out->wmark );
1005 0 : }
1006 :
1007 : CALLED_FROM_RUST static inline void
1008 : publish_plugin_slot_end( fd_pohh_tile_t * ctx,
1009 : ulong slot,
1010 0 : ulong cus_used ) {
1011 0 : if( FD_UNLIKELY( !ctx->plugin_out->mem ) ) return;
1012 :
1013 0 : fd_plugin_msg_slot_end_t * slot_end = (fd_plugin_msg_slot_end_t *)fd_chunk_to_laddr( ctx->plugin_out->mem, ctx->plugin_out->chunk );
1014 0 : *slot_end = (fd_plugin_msg_slot_end_t){ .slot = slot, .cus_used = cus_used };
1015 0 : fd_stem_publish( ctx->stem, ctx->plugin_out->idx, FD_PLUGIN_MSG_SLOT_END, ctx->plugin_out->chunk, sizeof(fd_plugin_msg_slot_end_t), 0UL, 0UL, 0UL );
1016 0 : ctx->plugin_out->chunk = fd_dcache_compact_next( ctx->plugin_out->chunk, sizeof(fd_plugin_msg_slot_end_t), ctx->plugin_out->chunk0, ctx->plugin_out->wmark );
1017 0 : }
1018 :
1019 : extern int
1020 : fd_ext_bank_load_account( void const * bank,
1021 : int fixed_root,
1022 : uchar const * addr,
1023 : uchar * owner,
1024 : uchar * data,
1025 : ulong * data_sz );
1026 :
1027 : CALLED_FROM_RUST static void
1028 : publish_became_leader( fd_pohh_tile_t * ctx,
1029 : ulong slot,
1030 0 : ulong epoch ) {
1031 0 : double tick_per_ns = fd_tempo_tick_per_ns( NULL );
1032 0 : fd_histf_sample( ctx->begin_leader_delay, (ulong)((double)(fd_log_wallclock()-ctx->reset_slot_start_ns)/tick_per_ns) );
1033 :
1034 0 : if( FD_UNLIKELY( ctx->lagged_consecutive_leader_start || ctx->reset_slot!=slot ) ) {
1035 : /* If we are mirroring Agave behavior, the wall clock gets reset
1036 : here so we don't count time spent waiting for a bank to freeze
1037 : or replay stage to actually start the slot towards our 400ms.
1038 :
1039 : See extended comments in the config file on this option.
1040 :
1041 : We must also reset the wall clock if we skipped a slot. */
1042 0 : ctx->reset_slot_start_ns = fd_log_wallclock() - (long)((double)(slot-ctx->reset_slot)*ctx->slot_duration_ns);
1043 0 : }
1044 :
1045 0 : fd_bundle_crank_tip_payment_config_t config[1] = { 0 };
1046 0 : fd_acct_addr_t tip_receiver_owner[1] = { 0 };
1047 :
1048 0 : if( FD_UNLIKELY( ctx->bundle.enabled ) ) {
1049 0 : long bundle_time = -fd_tickcount();
1050 0 : fd_acct_addr_t tip_payment_config[1];
1051 0 : fd_acct_addr_t tip_receiver[1];
1052 0 : fd_bundle_crank_get_addresses( ctx->bundle.gen, epoch, tip_payment_config, tip_receiver );
1053 :
1054 0 : fd_acct_addr_t _dummy[1];
1055 0 : uchar dummy[1];
1056 :
1057 0 : void const * bank = ctx->current_leader_bank;
1058 :
1059 : /* Calling rust from a C function that is CALLED_FROM_RUST risks
1060 : deadlock. In this case, I checked the load_account function and
1061 : ensured it never calls any C functions that acquire the lock. */
1062 0 : ulong sz1 = sizeof(config), sz2 = 1UL;
1063 0 : int found1 = fd_ext_bank_load_account( bank, 0, tip_payment_config->b, _dummy->b, (uchar *)config, &sz1 );
1064 0 : int found2 = fd_ext_bank_load_account( bank, 0, tip_receiver->b, tip_receiver_owner->b, dummy, &sz2 );
1065 : /* The bundle crank code detects whether the accounts were found by
1066 : whether they have non-zero values (since found and uninitialized
1067 : should be treated the same), so we actually don't really care
1068 : about the value of found{1,2}. */
1069 0 : (void)found1; (void)found2;
1070 0 : bundle_time += fd_tickcount();
1071 0 : fd_histf_sample( ctx->bundle_init_delay, (ulong)bundle_time );
1072 0 : }
1073 :
1074 0 : long slot_start_ns = ctx->reset_slot_start_ns + (long)((double)(slot-ctx->reset_slot)*ctx->slot_duration_ns);
1075 :
1076 : /* No need to check flow control, there are always credits became when we
1077 : are leader, we will not "become" leader again until we are done, so at
1078 : most one frag in flight at a time. */
1079 :
1080 0 : uchar * dst = (uchar *)fd_chunk_to_laddr( ctx->pack_out->mem, ctx->pack_out->chunk );
1081 :
1082 0 : fd_became_leader_t * leader = (fd_became_leader_t *)dst;
1083 0 : leader->slot_start_ns = slot_start_ns;
1084 0 : leader->slot_end_ns = (long)((double)slot_start_ns + ctx->slot_duration_ns);
1085 0 : leader->bank = ctx->current_leader_bank;
1086 0 : leader->max_microblocks_in_slot = ctx->max_microblocks_per_slot;
1087 0 : leader->ticks_per_slot = ctx->ticks_per_slot;
1088 0 : leader->total_skipped_ticks = ctx->ticks_per_slot*(slot-ctx->reset_slot);
1089 0 : leader->epoch = epoch;
1090 0 : leader->bundle->config[0] = config[0];
1091 0 : leader->slot = slot;
1092 :
1093 0 : leader->limits.slot_max_cost = ctx->limits.slot_max_cost;
1094 0 : leader->limits.slot_max_vote_cost = ctx->limits.slot_max_vote_cost;
1095 0 : leader->limits.slot_max_write_cost_per_acct = ctx->limits.slot_max_write_cost_per_acct;
1096 :
1097 0 : memcpy( leader->bundle->last_blockhash, ctx->reset_hash, 32UL );
1098 0 : memcpy( leader->bundle->tip_receiver_owner, tip_receiver_owner, 32UL );
1099 :
1100 0 : if( FD_UNLIKELY( leader->ticks_per_slot+leader->total_skipped_ticks>=MAX_SKIPPED_TICKS ) )
1101 0 : FD_LOG_ERR(( "Too many skipped ticks %lu for slot %lu, chain must halt", leader->ticks_per_slot+leader->total_skipped_ticks, slot ));
1102 :
1103 : /* increment refcount once for pack's reference to the current leader bank
1104 : and once for store's reference. */
1105 0 : if( FD_UNLIKELY( ctx->current_leader_bank ) ) {
1106 0 : ctx->pack_leader_bank = ctx->current_leader_bank;
1107 0 : fd_ext_bank_acquire( ctx->pack_leader_bank );
1108 :
1109 0 : FD_TEST( ctx->store_leader_bank_slot==ULONG_MAX );
1110 0 : ctx->store_leader_bank = ctx->current_leader_bank;
1111 0 : ctx->store_leader_bank_slot = slot;
1112 0 : fd_ext_bank_acquire( ctx->store_leader_bank );
1113 0 : }
1114 :
1115 0 : ulong pack_sig = fd_disco_poh_sig( slot, POH_PKT_TYPE_BECAME_LEADER, 0UL );
1116 0 : fd_stem_publish( ctx->stem, ctx->pack_out->idx, pack_sig, ctx->pack_out->chunk, sizeof(fd_became_leader_t), 0UL, 0UL, fd_frag_meta_ts_comp( fd_tickcount() ) );
1117 0 : ctx->pack_out->chunk = fd_dcache_compact_next( ctx->pack_out->chunk, sizeof(fd_became_leader_t), ctx->pack_out->chunk0, ctx->pack_out->wmark );
1118 :
1119 : /* We acquired another leader bank earlier that the store tile will use
1120 : to set the block_id for the slot we produce. We send this through
1121 : the existing poh_shred and shred_store links. */
1122 0 : void const ** msg = (void const **)fd_chunk_to_laddr( ctx->shred_out->mem, ctx->shred_out->chunk );
1123 0 : *msg = ctx->current_leader_bank;
1124 :
1125 0 : ulong shred_sig = fd_disco_poh_sig( slot, POH_PKT_TYPE_LEADER_BANK, 0UL );
1126 0 : fd_stem_publish( ctx->stem, ctx->shred_out->idx, shred_sig, ctx->shred_out->chunk, sizeof(void const *), 0UL, 0UL, fd_frag_meta_ts_comp( fd_tickcount() ) );
1127 0 : ctx->shred_seq = ctx->stem->seqs[ ctx->shred_out->idx ];
1128 0 : ctx->shred_out->chunk = fd_dcache_compact_next( ctx->shred_out->chunk, sizeof(void const *), ctx->shred_out->chunk0, ctx->shred_out->wmark );
1129 0 : }
1130 :
1131 : /* The PoH tile knows when it should become leader by waiting for its
1132 : leader slot (with the operating system clock). This function is so
1133 : that when it becomes the leader, it can be told what the leader bank
1134 : is by the replay stage. See the notes in the long comment above for
1135 : more on how this works. */
1136 :
1137 : CALLED_FROM_RUST void
1138 : fd_ext_poh_begin_leader( void const * bank,
1139 : ulong slot,
1140 : ulong epoch,
1141 : ulong hashcnt_per_tick,
1142 : ulong cus_block_limit,
1143 : ulong cus_vote_cost_limit,
1144 0 : ulong cus_account_cost_limit ) {
1145 0 : fd_pohh_tile_t * ctx = fd_ext_poh_write_lock();
1146 :
1147 0 : FD_TEST( !ctx->current_leader_bank );
1148 :
1149 0 : if( FD_UNLIKELY( slot!=ctx->slot ) ) FD_LOG_ERR(( "Trying to begin leader slot %lu but we are now on slot %lu", slot, ctx->slot ));
1150 0 : if( FD_UNLIKELY( slot!=ctx->next_leader_slot ) ) FD_LOG_ERR(( "Trying to begin leader slot %lu but next leader slot is %lu", slot, ctx->next_leader_slot ));
1151 :
1152 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick!=hashcnt_per_tick ) ) {
1153 0 : FD_LOG_WARNING(( "hashes per tick changed from %lu to %lu", ctx->hashcnt_per_tick, hashcnt_per_tick ));
1154 :
1155 : /* Recompute derived information about the clock. */
1156 0 : ctx->hashcnt_duration_ns = (double)ctx->tick_duration_ns/(double)hashcnt_per_tick;
1157 0 : ctx->hashcnt_per_slot = ctx->ticks_per_slot*hashcnt_per_tick;
1158 0 : ctx->hashcnt_per_tick = hashcnt_per_tick;
1159 :
1160 : /* Discard any ticks we might have done in the interim. They will
1161 : have the wrong number of hashes per tick. We can just catch back
1162 : up quickly if not too many slots were skipped and hopefully
1163 : publish on time. Note that tick production and verification of
1164 : skipped slots is done for the eventual bank that publishes a
1165 : slot, for example:
1166 :
1167 : Reset Slot: 998
1168 : Epoch Transition Slot: 1000
1169 : Leader Slot: 1002
1170 :
1171 : In this case, if a feature changing the hashcnt_per_tick is
1172 : activated in slot 1000, and we are publishing empty ticks for
1173 : slots 998, 999, 1000, and 1001, they should all have the new
1174 : hashes_per_tick number of hashes, rather than the older one, or
1175 : some combination. */
1176 :
1177 0 : FD_TEST( ctx->last_slot==ctx->reset_slot );
1178 0 : FD_TEST( !ctx->last_hashcnt );
1179 0 : ctx->slot = ctx->reset_slot;
1180 0 : ctx->hashcnt = 0UL;
1181 0 : }
1182 :
1183 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick==1UL ) ) {
1184 : /* Low power producer, maximum of one microblock per tick in the slot */
1185 0 : ctx->max_microblocks_per_slot = ctx->ticks_per_slot;
1186 0 : } else {
1187 : /* See the long comment in after_credit for this limit */
1188 0 : ctx->max_microblocks_per_slot = fd_ulong_min( MAX_MICROBLOCKS_PER_SLOT, ctx->ticks_per_slot*(ctx->hashcnt_per_tick-1UL) );
1189 0 : }
1190 :
1191 0 : ctx->current_leader_bank = bank;
1192 0 : ctx->microblocks_lower_bound = 0UL;
1193 0 : ctx->cus_used = 0UL;
1194 :
1195 0 : ctx->limits.slot_max_cost = cus_block_limit;
1196 0 : ctx->limits.slot_max_vote_cost = cus_vote_cost_limit;
1197 0 : ctx->limits.slot_max_write_cost_per_acct = cus_account_cost_limit;
1198 :
1199 : /* clamp and warn if we are underutilizing CUs */
1200 0 : if( FD_UNLIKELY( ctx->limits.slot_max_cost > FD_PACK_MAX_COST_PER_BLOCK_UPPER_BOUND ) ) {
1201 0 : FD_LOG_WARNING(( "Underutilizing protocol slot CU limit. protocol_limit=%lu validator_limit=%lu", ctx->limits.slot_max_cost, FD_PACK_MAX_COST_PER_BLOCK_UPPER_BOUND ));
1202 0 : ctx->limits.slot_max_cost = FD_PACK_MAX_COST_PER_BLOCK_UPPER_BOUND;
1203 0 : }
1204 0 : if( FD_UNLIKELY( ctx->limits.slot_max_vote_cost > FD_PACK_MAX_VOTE_COST_PER_BLOCK_UPPER_BOUND ) ) {
1205 0 : FD_LOG_WARNING(( "Underutilizing protocol vote CU limit. protocol_limit=%lu validator_limit=%lu", ctx->limits.slot_max_vote_cost, FD_PACK_MAX_VOTE_COST_PER_BLOCK_UPPER_BOUND ));
1206 0 : ctx->limits.slot_max_vote_cost = FD_PACK_MAX_VOTE_COST_PER_BLOCK_UPPER_BOUND;
1207 0 : }
1208 0 : if( FD_UNLIKELY( ctx->limits.slot_max_write_cost_per_acct > FD_PACK_MAX_WRITE_COST_PER_ACCT_UPPER_BOUND ) ) {
1209 0 : FD_LOG_WARNING(( "Underutilizing protocol write CU limit. protocol_limit=%lu validator_limit=%lu", ctx->limits.slot_max_write_cost_per_acct, FD_PACK_MAX_WRITE_COST_PER_ACCT_UPPER_BOUND ));
1210 0 : ctx->limits.slot_max_write_cost_per_acct = FD_PACK_MAX_WRITE_COST_PER_ACCT_UPPER_BOUND;
1211 0 : }
1212 :
1213 : /* We are about to start publishing to the shred tile for this slot
1214 : so update the highwater mark so we never republish in this slot
1215 : again. Also check that the leader slot is greater than the
1216 : highwater, which should have been ensured earlier. */
1217 :
1218 0 : FD_TEST( ctx->highwater_leader_slot==ULONG_MAX || slot>=ctx->highwater_leader_slot );
1219 0 : ctx->highwater_leader_slot = fd_ulong_max( fd_ulong_if( ctx->highwater_leader_slot==ULONG_MAX, 0UL, ctx->highwater_leader_slot ), slot );
1220 :
1221 0 : publish_became_leader( ctx, slot, epoch );
1222 :
1223 : /* PoH ends the slot once it "ticks" through all of the hashes, but
1224 : we only want that to happen if we received a done packing message
1225 : from pack, so we always reserve an empty microblock at the end so
1226 : the tick advance will not end the slot without being told.
1227 :
1228 : This should be after publish_became_leader so that pack receives
1229 : the original (un-inflated) max_microblocks_per_slot. */
1230 0 : ctx->max_microblocks_per_slot += 1UL;
1231 :
1232 0 : FD_LOG_INFO(( "fd_ext_poh_begin_leader(slot=%lu, highwater_leader_slot=%lu, last_slot=%lu, last_hashcnt=%lu)", slot, ctx->highwater_leader_slot, ctx->last_slot, ctx->last_hashcnt ));
1233 :
1234 0 : fd_ext_poh_write_unlock();
1235 0 : }
1236 :
1237 : /* Determine what the next slot is in the leader schedule is that we are
1238 : leader. Includes the current slot. If we are not leader in what
1239 : remains of the current and next epoch, return ULONG_MAX. */
1240 :
1241 : static inline CALLED_FROM_RUST ulong
1242 0 : next_leader_slot( fd_pohh_tile_t * ctx ) {
1243 : /* If we have published anything in a particular slot, then we
1244 : should never become leader for that slot again. */
1245 0 : ulong min_leader_slot = fd_ulong_max( ctx->slot, fd_ulong_if( ctx->highwater_leader_slot==ULONG_MAX, 0UL, ctx->highwater_leader_slot ) );
1246 0 : return fd_multi_epoch_leaders_get_next_slot( ctx->mleaders, min_leader_slot, &ctx->identity_key );
1247 0 : }
1248 :
1249 : extern int
1250 : fd_ext_admin_rpc_set_identity( uchar const * identity_keypair,
1251 : int require_tower );
1252 :
1253 : static inline int FD_FN_SENSITIVE
1254 : maybe_change_identity( fd_pohh_tile_t * ctx,
1255 0 : int definitely_not_leader ) {
1256 0 : if( FD_UNLIKELY( ctx->halted_switching_key && fd_keyswitch_state_query( ctx->keyswitch )==FD_KEYSWITCH_STATE_UNHALT_PENDING ) ) {
1257 0 : ctx->halted_switching_key = 0;
1258 0 : fd_keyswitch_state( ctx->keyswitch, FD_KEYSWITCH_STATE_COMPLETED );
1259 0 : return 1;
1260 0 : }
1261 :
1262 : /* Cannot change identity while in the middle of a leader slot, else
1263 : poh state machine would become corrupt. */
1264 :
1265 0 : int is_leader = !definitely_not_leader && ctx->next_leader_slot!=ULONG_MAX && ctx->slot>=ctx->next_leader_slot;
1266 0 : if( FD_UNLIKELY( is_leader ) ) return 0;
1267 :
1268 0 : if( FD_UNLIKELY( fd_keyswitch_state_query( ctx->keyswitch )==FD_KEYSWITCH_STATE_SWITCH_PENDING ) ) {
1269 0 : int failed = fd_ext_admin_rpc_set_identity( ctx->keyswitch->bytes, fd_keyswitch_param_query( ctx->keyswitch )==1 );
1270 0 : fd_memzero_explicit( ctx->keyswitch->bytes, 32UL );
1271 0 : FD_COMPILER_MFENCE();
1272 0 : if( FD_UNLIKELY( failed==-1 ) ) {
1273 0 : fd_keyswitch_state( ctx->keyswitch, FD_KEYSWITCH_STATE_FAILED );
1274 0 : return 0;
1275 0 : }
1276 :
1277 0 : memcpy( ctx->identity_key.uc, ctx->keyswitch->bytes+32UL, 32UL );
1278 :
1279 : /* When we switch key, we might have ticked part way through a slot
1280 : that we are now leader in. This violates the contract of the
1281 : tile, that when we become leader, we have not ticked in that slot
1282 : at all. To see why this would be bad, consider the case where we
1283 : have ticked almost to the end, and there isn't enough space left
1284 : to reserve the minimum amount of microblocks needed by pack.
1285 :
1286 : To resolve this, we just reset PoH back to the reset slot, and
1287 : let it try to catch back up quickly. This is OK since the network
1288 : rarely skips. */
1289 0 : ctx->slot = ctx->reset_slot;
1290 0 : ctx->hashcnt = 0UL;
1291 0 : memcpy( ctx->hash, ctx->reset_hash, 32UL );
1292 :
1293 0 : ctx->halted_switching_key = 1;
1294 0 : ctx->keyswitch->result = ctx->shred_seq;
1295 0 : fd_keyswitch_state( ctx->keyswitch, FD_KEYSWITCH_STATE_COMPLETED );
1296 0 : }
1297 :
1298 0 : return 0;
1299 0 : }
1300 :
1301 : static CALLED_FROM_RUST void
1302 0 : no_longer_leader( fd_pohh_tile_t * ctx ) {
1303 : /* If we acquired a bank for the store tile and never produced its
1304 : block_complete entry, the store tile never gets the chance to
1305 : drop the refcount, so we drop it directly here. */
1306 0 : if( FD_UNLIKELY( ctx->store_leader_bank_slot!=ULONG_MAX ) ) {
1307 0 : fd_ext_bank_release( ctx->store_leader_bank );
1308 0 : ctx->store_leader_bank = NULL;
1309 0 : ctx->store_leader_bank_slot = ULONG_MAX;
1310 0 : }
1311 :
1312 0 : if( FD_UNLIKELY( ctx->current_leader_bank ) ) fd_ext_bank_release( ctx->current_leader_bank );
1313 : /* If we stop being leader in a slot, we can never become leader in
1314 : that slot again, and all in-flight microblocks for that slot
1315 : should be dropped. */
1316 0 : ctx->highwater_leader_slot = fd_ulong_max( fd_ulong_if( ctx->highwater_leader_slot==ULONG_MAX, 0UL, ctx->highwater_leader_slot ), ctx->slot );
1317 0 : ctx->current_leader_bank = NULL;
1318 0 : int identity_changed = maybe_change_identity( ctx, 1 );
1319 0 : ctx->next_leader_slot = next_leader_slot( ctx );
1320 0 : if( FD_UNLIKELY( identity_changed ) ) {
1321 0 : FD_LOG_INFO(( "fd_poh_identity_changed(next_leader_slot=%lu)", ctx->next_leader_slot ));
1322 0 : }
1323 :
1324 0 : FD_COMPILER_MFENCE();
1325 0 : fd_ext_poh_signal_leader_change( ctx->signal_leader_change );
1326 0 : FD_LOG_INFO(( "no_longer_leader(next_leader_slot=%lu)", ctx->next_leader_slot ));
1327 0 : }
1328 :
1329 : /* fd_ext_poh_reset is called by the Agave client when a slot on
1330 : the active fork has finished a block and we need to reset our PoH to
1331 : be ticking on top of the block it produced. */
1332 :
1333 : CALLED_FROM_RUST void
1334 : fd_ext_poh_reset( ulong completed_bank_slot, /* The slot that successfully produced a block */
1335 : uchar const * reset_blockhash, /* The hash of the last tick in the produced block */
1336 : ulong hashcnt_per_tick, /* The hashcnt per tick of the bank that completed */
1337 : uchar const * parent_block_id, /* The block id of the parent block */
1338 0 : ulong const * features_activation /* The activation slot of shred-tile features */ ) {
1339 0 : fd_pohh_tile_t * ctx = fd_ext_poh_write_lock();
1340 :
1341 0 : ulong slot_before_reset = ctx->slot;
1342 0 : int leader_before_reset = ctx->slot>=ctx->next_leader_slot;
1343 0 : if( FD_UNLIKELY( leader_before_reset && ctx->current_leader_bank ) ) {
1344 : /* If we were in the middle of a leader slot that we notified pack
1345 : pack to start packing for we can never publish into that slot
1346 : again, mark all in-flight microblocks to be dropped. */
1347 0 : ctx->highwater_leader_slot = fd_ulong_max( fd_ulong_if( ctx->highwater_leader_slot==ULONG_MAX, 0UL, ctx->highwater_leader_slot ), 1UL+ctx->slot );
1348 0 : }
1349 :
1350 0 : ctx->leader_bank_start_ns = fd_log_wallclock(); /* safe to call from Rust */
1351 0 : if( FD_UNLIKELY( ctx->expect_sequential_leader_slot==(completed_bank_slot+1UL) ) ) {
1352 : /* If we are being reset onto a slot, it means some block was fully
1353 : processed, so we reset to build on top of it. Typically we want
1354 : to update the reset_slot_start_ns to the current time, because
1355 : the network will give the next leader 400ms to publish,
1356 : regardless of how long the prior leader took.
1357 :
1358 : But: if we were leader in the prior slot, and the block was our
1359 : own we can do better. We know that the next slot should start
1360 : exactly 400ms after the prior one started, so we can use that as
1361 : the reset slot start time instead. */
1362 0 : ctx->reset_slot_start_ns = ctx->reset_slot_start_ns + (long)((double)((completed_bank_slot+1UL)-ctx->reset_slot)*ctx->slot_duration_ns);
1363 0 : } else {
1364 0 : ctx->reset_slot_start_ns = ctx->leader_bank_start_ns;
1365 0 : }
1366 0 : ctx->expect_sequential_leader_slot = ULONG_MAX;
1367 :
1368 0 : memcpy( ctx->reset_hash, reset_blockhash, 32UL );
1369 0 : memcpy( ctx->hash, reset_blockhash, 32UL );
1370 0 : if( FD_LIKELY( parent_block_id!=NULL ) ) {
1371 0 : ctx->parent_slot = completed_bank_slot;
1372 0 : memcpy( ctx->parent_block_id, parent_block_id, 32UL );
1373 0 : }
1374 0 : ctx->slot = completed_bank_slot+1UL;
1375 0 : ctx->hashcnt = 0UL;
1376 0 : ctx->last_slot = ctx->slot;
1377 0 : ctx->last_hashcnt = 0UL;
1378 0 : ctx->reset_slot = ctx->slot;
1379 :
1380 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick!=hashcnt_per_tick ) ) {
1381 0 : FD_LOG_WARNING(( "hashes per tick changed from %lu to %lu", ctx->hashcnt_per_tick, hashcnt_per_tick ));
1382 :
1383 : /* Recompute derived information about the clock. */
1384 0 : ctx->hashcnt_duration_ns = (double)ctx->tick_duration_ns/(double)hashcnt_per_tick;
1385 0 : ctx->hashcnt_per_slot = ctx->ticks_per_slot*hashcnt_per_tick;
1386 0 : ctx->hashcnt_per_tick = hashcnt_per_tick;
1387 0 : }
1388 :
1389 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick==1UL ) ) {
1390 : /* Low power producer, maximum of one microblock per tick in the slot */
1391 0 : ctx->max_microblocks_per_slot = ctx->ticks_per_slot;
1392 0 : } else {
1393 : /* See the long comment in after_credit for this limit */
1394 0 : ctx->max_microblocks_per_slot = fd_ulong_min( MAX_MICROBLOCKS_PER_SLOT, ctx->ticks_per_slot*(ctx->hashcnt_per_tick-1UL) );
1395 0 : }
1396 :
1397 : /* When we reset, we need to allow PoH to tick freely again rather
1398 : than being constrained. If we are leader after the reset, this
1399 : is OK because we won't tick until we get a bank, and the lower
1400 : bound will be reset with the value from the bank. */
1401 0 : ctx->microblocks_lower_bound = ctx->max_microblocks_per_slot;
1402 :
1403 0 : if( FD_UNLIKELY( leader_before_reset ) ) {
1404 : /* No longer have a leader bank if we are reset. Replay stage will
1405 : call back again to give us a new one if we should become leader
1406 : for the reset slot.
1407 :
1408 : The order is important here, ctx->hashcnt must be updated before
1409 : calling no_longer_leader. */
1410 0 : no_longer_leader( ctx );
1411 0 : }
1412 0 : ctx->next_leader_slot = next_leader_slot( ctx );
1413 0 : FD_LOG_INFO(( "fd_ext_poh_reset(slot=%lu,next_leader_slot=%lu)", ctx->reset_slot, ctx->next_leader_slot ));
1414 :
1415 0 : if( FD_UNLIKELY( ctx->slot>=ctx->next_leader_slot ) ) {
1416 : /* We are leader after the reset... two cases: */
1417 0 : if( FD_LIKELY( ctx->slot==slot_before_reset ) ) {
1418 : /* 1. We are reset onto the same slot we are already leader on.
1419 : This is a common case when we have two leader slots in a
1420 : row, replay stage will reset us to our own slot. No need to
1421 : do anything here, we already sent a SLOT_START. */
1422 0 : FD_TEST( leader_before_reset );
1423 0 : } else {
1424 : /* 2. We are reset onto a different slot. If we were leader
1425 : before, we should first end that slot, then begin the new
1426 : one if we are newly leader now. */
1427 0 : if( FD_LIKELY( leader_before_reset ) ) publish_plugin_slot_end( ctx, slot_before_reset, ctx->cus_used );
1428 0 : else publish_plugin_slot_start( ctx, ctx->next_leader_slot, ctx->reset_slot );
1429 0 : }
1430 0 : } else {
1431 0 : if( FD_UNLIKELY( leader_before_reset ) ) publish_plugin_slot_end( ctx, slot_before_reset, ctx->cus_used );
1432 0 : }
1433 :
1434 : /* There is a subset of FD_SHRED_FEATURES_ACTIVATION_... slots that
1435 : the shred tile needs to be aware of. Since their computation
1436 : requires the bank, we are forced (so far) to receive them here
1437 : from the Rust side, before forwarding them to the shred tile as
1438 : POH_PKT_TYPE_FEAT_ACT_SLOT. This is not elegant, and it should
1439 : be revised in the future (TODO), but it provides a "temporary"
1440 : working solution to handle features activation. */
1441 0 : if( FD_UNLIKELY( !fd_memeq( ctx->features_activation->slots, features_activation, sizeof(fd_shred_features_activation_t) ) ) ) {
1442 0 : fd_memcpy( ctx->features_activation->slots, features_activation, sizeof(fd_shred_features_activation_t) );
1443 0 : ctx->features_activation_avail = 1UL;
1444 0 : }
1445 :
1446 0 : fd_ext_poh_write_unlock();
1447 0 : }
1448 :
1449 : /* Since it can't easily return an Option<Pubkey>, return 1 for Some and
1450 : 0 for None. */
1451 : CALLED_FROM_RUST int
1452 : fd_ext_poh_get_leader_after_n_slots( ulong n,
1453 0 : uchar out_pubkey[ static 32 ] ) {
1454 0 : fd_pohh_tile_t * ctx = fd_ext_poh_write_lock();
1455 0 : ulong slot = ctx->slot + n;
1456 0 : fd_pubkey_t const * leader = fd_multi_epoch_leaders_get_leader_for_slot( ctx->mleaders, slot );
1457 :
1458 0 : int copied = 0;
1459 0 : if( FD_LIKELY( leader ) ) {
1460 0 : memcpy( out_pubkey, leader, 32UL );
1461 0 : copied = 1;
1462 0 : }
1463 0 : fd_ext_poh_write_unlock();
1464 0 : return copied;
1465 0 : }
1466 :
1467 : FD_FN_CONST static inline ulong
1468 0 : scratch_align( void ) {
1469 0 : return 128UL;
1470 0 : }
1471 :
1472 : FD_FN_PURE static inline ulong
1473 0 : scratch_footprint( fd_topo_tile_t const * tile ) {
1474 0 : (void)tile;
1475 0 : ulong l = FD_LAYOUT_INIT;
1476 0 : l = FD_LAYOUT_APPEND( l, alignof( fd_pohh_tile_t ), sizeof( fd_pohh_tile_t ) );
1477 0 : l = FD_LAYOUT_APPEND( l, FD_SHA256_ALIGN, FD_SHA256_FOOTPRINT );
1478 0 : return FD_LAYOUT_FINI( l, scratch_align() );
1479 0 : }
1480 :
1481 : static void
1482 : publish_tick( fd_pohh_tile_t * ctx,
1483 : fd_stem_context_t * stem,
1484 : uchar hash[ static 32 ],
1485 0 : int is_skipped ) {
1486 0 : ulong hashcnt = ctx->hashcnt_per_tick*(1UL+(ctx->last_hashcnt/ctx->hashcnt_per_tick));
1487 :
1488 0 : uchar * dst = (uchar *)fd_chunk_to_laddr( ctx->shred_out->mem, ctx->shred_out->chunk );
1489 :
1490 0 : FD_TEST( ctx->last_slot>=ctx->reset_slot );
1491 0 : fd_entry_batch_meta_t * meta = (fd_entry_batch_meta_t *)dst;
1492 0 : if( FD_UNLIKELY( is_skipped ) ) {
1493 : /* We are publishing ticks for a skipped slot, the reference tick
1494 : and block complete flags should always be zero. */
1495 0 : meta->reference_tick = 0UL;
1496 0 : meta->block_complete = 0;
1497 0 : } else {
1498 0 : meta->reference_tick = hashcnt/ctx->hashcnt_per_tick;
1499 0 : meta->block_complete = hashcnt==ctx->hashcnt_per_slot;
1500 0 : }
1501 :
1502 0 : ulong slot = fd_ulong_if( meta->block_complete, ctx->slot-1UL, ctx->slot );
1503 0 : meta->parent_offset = 1UL+slot-ctx->reset_slot;
1504 :
1505 : /* From poh_reset we received the block_id for ctx->parent_slot.
1506 : Now we're telling shred tile to build on parent: (slot-meta->parent_offset).
1507 : The block_id that we're passing is valid iff the two are the same,
1508 : i.e. ctx->parent_slot == (slot-meta->parent_offset). */
1509 0 : meta->parent_block_id_valid = ctx->parent_slot == (slot-meta->parent_offset);
1510 0 : if( FD_LIKELY( meta->parent_block_id_valid ) ) {
1511 0 : fd_memcpy( meta->parent_block_id, ctx->parent_block_id, 32UL );
1512 0 : }
1513 :
1514 0 : FD_TEST( hashcnt>ctx->last_hashcnt );
1515 0 : ulong hash_delta = hashcnt-ctx->last_hashcnt;
1516 :
1517 0 : dst += sizeof(fd_entry_batch_meta_t);
1518 0 : fd_entry_batch_header_t * tick = (fd_entry_batch_header_t *)dst;
1519 0 : tick->hashcnt_delta = hash_delta;
1520 0 : fd_memcpy( tick->hash, hash, 32UL );
1521 0 : tick->txn_cnt = 0UL;
1522 :
1523 0 : ulong tspub = (ulong)fd_frag_meta_ts_comp( fd_tickcount() );
1524 0 : ulong sz = sizeof(fd_entry_batch_meta_t)+sizeof(fd_entry_batch_header_t);
1525 0 : ulong sig = fd_disco_poh_sig( slot, POH_PKT_TYPE_MICROBLOCK, 0UL );
1526 0 : fd_stem_publish( stem, ctx->shred_out->idx, sig, ctx->shred_out->chunk, sz, 0UL, 0UL, tspub );
1527 0 : ctx->shred_seq = stem->seqs[ ctx->shred_out->idx ];
1528 0 : ctx->shred_out->chunk = fd_dcache_compact_next( ctx->shred_out->chunk, sz, ctx->shred_out->chunk0, ctx->shred_out->wmark );
1529 :
1530 0 : if( FD_UNLIKELY( hashcnt==ctx->hashcnt_per_slot ) ) {
1531 0 : ctx->last_slot++;
1532 0 : ctx->last_hashcnt = 0UL;
1533 0 : } else {
1534 0 : ctx->last_hashcnt = hashcnt;
1535 0 : }
1536 :
1537 : /* The store tile will release the bank refcount when an FEC set with
1538 : SLOT_COMPLETE arrives, so we drop our local tracking and suppress
1539 : the END type frag in no_longer_leader. */
1540 0 : if( FD_UNLIKELY( meta->block_complete && ctx->store_leader_bank_slot!=ULONG_MAX ) ) {
1541 0 : FD_TEST( ctx->store_leader_bank_slot==slot );
1542 0 : ctx->store_leader_bank = NULL;
1543 0 : ctx->store_leader_bank_slot = ULONG_MAX;
1544 0 : }
1545 0 : }
1546 :
1547 : static inline void
1548 : publish_features_activation( fd_pohh_tile_t * ctx,
1549 0 : fd_stem_context_t * stem ) {
1550 0 : uchar * dst = (uchar *)fd_chunk_to_laddr( ctx->shred_out->mem, ctx->shred_out->chunk );
1551 0 : fd_shred_features_activation_t * act_data = (fd_shred_features_activation_t *)dst;
1552 0 : fd_memcpy( act_data, ctx->features_activation, sizeof(fd_shred_features_activation_t) );
1553 :
1554 0 : ulong tspub = (ulong)fd_frag_meta_ts_comp( fd_tickcount() );
1555 0 : ulong sz = sizeof(fd_shred_features_activation_t);
1556 0 : ulong sig = fd_disco_poh_sig( ctx->slot, POH_PKT_TYPE_FEAT_ACT_SLOT, 0UL );
1557 0 : fd_stem_publish( stem, ctx->shred_out->idx, sig, ctx->shred_out->chunk, sz, 0UL, 0UL, tspub );
1558 0 : ctx->shred_seq = stem->seqs[ ctx->shred_out->idx ];
1559 0 : ctx->shred_out->chunk = fd_dcache_compact_next( ctx->shred_out->chunk, sz, ctx->shred_out->chunk0, ctx->shred_out->wmark );
1560 0 : }
1561 :
1562 : static inline void
1563 : after_credit( fd_pohh_tile_t * ctx,
1564 : fd_stem_context_t * stem,
1565 : int * opt_poll_in,
1566 0 : int * charge_busy ) {
1567 0 : ctx->stem = stem;
1568 :
1569 0 : FD_COMPILER_MFENCE();
1570 0 : if( FD_UNLIKELY( fd_poh_waiting_lock ) ) {
1571 0 : FD_VOLATILE( fd_poh_returned_lock ) = 1UL;
1572 0 : FD_COMPILER_MFENCE();
1573 0 : for(;;) {
1574 0 : if( FD_UNLIKELY( !FD_VOLATILE_CONST( fd_poh_returned_lock ) ) ) break;
1575 0 : FD_SPIN_PAUSE();
1576 0 : }
1577 0 : FD_COMPILER_MFENCE();
1578 0 : FD_VOLATILE( fd_poh_waiting_lock ) = 0UL;
1579 0 : *opt_poll_in = 0;
1580 0 : *charge_busy = 1;
1581 0 : return;
1582 0 : }
1583 0 : FD_COMPILER_MFENCE();
1584 :
1585 0 : if( FD_UNLIKELY( ctx->features_activation_avail ) ) {
1586 : /* If we have received an update on features_activation, then
1587 : forward them to the shred tile. In principle, this should
1588 : happen at most once per slot. */
1589 0 : publish_features_activation( ctx, stem );
1590 0 : ctx->features_activation_avail = 0UL;
1591 0 : }
1592 :
1593 0 : int is_leader = ctx->next_leader_slot!=ULONG_MAX && ctx->slot>=ctx->next_leader_slot;
1594 0 : if( FD_UNLIKELY( is_leader && !ctx->current_leader_bank ) ) {
1595 : /* If we are the leader, but we didn't yet learn what the leader
1596 : bank object is from the replay stage, do not do any hashing.
1597 :
1598 : This is not ideal, but greatly simplifies the control flow. */
1599 0 : return;
1600 0 : }
1601 :
1602 : /* If we have skipped ticks pending because we skipped some slots to
1603 : become leader, register them now one at a time. */
1604 0 : if( FD_UNLIKELY( is_leader && ctx->last_slot<ctx->slot ) ) {
1605 0 : ulong publish_hashcnt = ctx->last_hashcnt+ctx->hashcnt_per_tick;
1606 0 : ulong tick_idx = (ctx->last_slot*ctx->ticks_per_slot+publish_hashcnt/ctx->hashcnt_per_tick)%MAX_SKIPPED_TICKS;
1607 :
1608 0 : fd_ext_poh_register_tick( ctx->current_leader_bank, ctx->skipped_tick_hashes[ tick_idx ] );
1609 0 : publish_tick( ctx, stem, ctx->skipped_tick_hashes[ tick_idx ], 1 );
1610 :
1611 : /* If we are catching up now and publishing a bunch of skipped
1612 : ticks, we do not want to process any incoming microblocks until
1613 : all the skipped ticks have been published out; otherwise we would
1614 : intersperse skipped tick messages with microblocks. */
1615 0 : *opt_poll_in = 0;
1616 0 : *charge_busy = 1;
1617 0 : return;
1618 0 : }
1619 :
1620 0 : int low_power_mode = ctx->hashcnt_per_tick==1UL;
1621 :
1622 : /* If we are the leader, always leave enough capacity in the slot so
1623 : that we can mixin any potential microblocks still coming from the
1624 : pack tile for this slot. */
1625 0 : ulong max_remaining_microblocks = ctx->max_microblocks_per_slot - ctx->microblocks_lower_bound;
1626 :
1627 : /* With hashcnt_per_tick hashes per tick, we actually get
1628 : hashcnt_per_tick-1 chances to mixin a microblock. For each tick
1629 : span that we need to reserve, we also need to reserve the hashcnt
1630 : for the tick, hence the +
1631 : max_remaining_microblocks/(hashcnt_per_tick-1) rounded up.
1632 :
1633 : However, if hashcnt_per_tick is 1 because we're in low power mode,
1634 : this should probably just be max_remaining_microblocks. */
1635 0 : ulong max_remaining_ticks_or_microblocks = max_remaining_microblocks;
1636 0 : if( FD_LIKELY( !low_power_mode ) ) max_remaining_ticks_or_microblocks += (max_remaining_microblocks+ctx->hashcnt_per_tick-2UL)/(ctx->hashcnt_per_tick-1UL);
1637 :
1638 0 : ulong restricted_hashcnt = fd_ulong_if( ctx->hashcnt_per_slot>=max_remaining_ticks_or_microblocks, ctx->hashcnt_per_slot-max_remaining_ticks_or_microblocks, 0UL );
1639 :
1640 0 : ulong min_hashcnt = ctx->hashcnt;
1641 :
1642 0 : if( FD_LIKELY( !low_power_mode ) ) {
1643 : /* Recall that there are two kinds of events that will get published
1644 : to the shredder,
1645 :
1646 : (a) Ticks. These occur every 62,500 (hashcnt_per_tick) hashcnts,
1647 : and there will be 64 (ticks_per_slot) of them in each slot.
1648 :
1649 : Ticks must not have any transactions mixed into the hash.
1650 : This is not strictly needed in theory, but is required by the
1651 : current consensus protocol. They get published here in
1652 : after_credit.
1653 :
1654 : (b) Microblocks. These can occur at any other hashcnt, as long
1655 : as it is not a tick. Microblocks cannot be empty, and must
1656 : have at least one transactions mixed in. These get
1657 : published in after_frag.
1658 :
1659 : If hashcnt_per_tick is 1, then we are in low power mode and the
1660 : following does not apply, since we can mix in transactions at any
1661 : time.
1662 :
1663 : In the normal, non-low-power mode, though, we have to be careful
1664 : to make sure that we do not publish microblocks on tick
1665 : boundaries. To do that, we need to obey two rules:
1666 : (i) after_credit must not leave hashcnt one before a tick
1667 : boundary
1668 : (ii) if after_credit begins one before a tick boundary, it must
1669 : advance hashcnt and publish the tick
1670 :
1671 : There's some interplay between min_hashcnt and restricted_hashcnt
1672 : here, and we need to show that there's always a value of
1673 : target_hashcnt we can pick such that
1674 : min_hashcnt <= target_hashcnt <= restricted_hashcnt.
1675 : We'll prove this by induction for current_slot==0 and
1676 : is_leader==true, since all other slots should be the same.
1677 :
1678 : Let m_j and r_j be the min_hashcnt and restricted_hashcnt
1679 : (respectively) for the jth call to after_credit in a slot. We
1680 : want to show that for all values of j, it's possible to pick a
1681 : value h_j, the value of target_hashcnt for the jth call to
1682 : after_credit (which is also the value of hashcnt after
1683 : after_credit has completed) such that m_j<=h_j<=r_j.
1684 :
1685 : Additionally, let T be hashcnt_per_tick and N be ticks_per_slot.
1686 :
1687 : Starting with the base case, j==0. m_j=0, and
1688 : r_0 = N*T - max_microblocks_per_slot
1689 : - ceil(max_microblocks_per_slot/(T-1)).
1690 :
1691 : This is monotonic decreasing in max_microblocks_per_slot, so it
1692 : achieves its minimum when max_microblocks_per_slot is its
1693 : maximum.
1694 : r_0 >= N*T - N*(T-1) - ceil( (N*(T-1))/(T-1))
1695 : = N*T - N*(T-1)-N = 0.
1696 : Thus, m_0 <= r_0, as desired.
1697 :
1698 :
1699 :
1700 : Then, for the inductive step, assume there exists h_j such that
1701 : m_j<=h_j<=r_j, and we want to show that there exists h_{j+1},
1702 : which is the same as showing m_{j+1}<=r_{j+1}.
1703 :
1704 : Let a_j be 1 if we had a microblock immediately following the jth
1705 : call to after_credit, and 0 otherwise. Then hashcnt at the start
1706 : of the (j+1)th call to after_frag is h_j+a_j.
1707 : Also, set b_{j+1}=1 if we are in the case covered by rule (ii)
1708 : above during the (j+1)th call to after_credit, i.e. if
1709 : (h_j+a_j)%T==T-1. Thus, m_{j+1} = h_j + a_j + b_{j+1}.
1710 :
1711 : If we received an additional microblock, then
1712 : max_remaining_microblocks goes down by 1, and
1713 : max_remaining_ticks_or_microblocks goes down by either 1 or 2,
1714 : which means restricted_hashcnt goes up by either 1 or 2. In
1715 : particular, it goes up by 2 if the new value of
1716 : max_remaining_microblocks (at the start of the (j+1)th call to
1717 : after_credit) is congruent to 0 mod T-1. Let b'_{j+1} be 1 if
1718 : this condition is met and 0 otherwise. If we receive a
1719 : done_packing message, restricted_hashcnt can go up by more, but
1720 : we can ignore that case, since it is less restrictive.
1721 : Thus, r_{j+1}=r_j+a_j+b'_{j+1}.
1722 :
1723 : If h_j < r_j (strictly less), then h_j+a_j < r_j+a_j. And thus,
1724 : since b_{j+1}<=b'_{j+1}+1, just by virtue of them both being
1725 : binary,
1726 : h_j + a_j + b_{j+1} < r_j + a_j + b'_{j+1} + 1,
1727 : which is the same (for integers) as
1728 : h_j + a_j + b_{j+1} <= r_j + a_j + b'_{j+1},
1729 : m_{j+1} <= r_{j+1}
1730 :
1731 : On the other hand, if h_j==r_j, this is easy unless b_{j+1}==1,
1732 : which can also only happen if a_j==1. Then (h_j+a_j)%T==T-1,
1733 : which means there's an integer k such that
1734 :
1735 : h_j+a_j==(ticks_per_slot-k)*T-1
1736 : h_j ==ticks_per_slot*T - k*(T-1)-1 - k-1
1737 : ==ticks_per_slot*T - (k*(T-1)+1) - ceil( (k*(T-1)+1)/(T-1) )
1738 :
1739 : Since h_j==r_j in this case, and
1740 : r_j==(ticks_per_slot*T) - max_remaining_microblocks_j - ceil(max_remaining_microblocks_j/(T-1)),
1741 : we can see that the value of max_remaining_microblocks at the
1742 : start of the jth call to after_credit is k*(T-1)+1. Again, since
1743 : a_j==1, then the value of max_remaining_microblocks at the start
1744 : of the j+1th call to after_credit decreases by 1 to k*(T-1),
1745 : which means b'_{j+1}=1.
1746 :
1747 : Thus, h_j + a_j + b_{j+1} == r_j + a_j + b'_{j+1}, so, in
1748 : particular, h_{j+1}<=r_{j+1} as desired. */
1749 0 : min_hashcnt += (ulong)(min_hashcnt%ctx->hashcnt_per_tick == (ctx->hashcnt_per_tick-1UL)); /* add b_{j+1}, enforcing rule (ii) */
1750 0 : }
1751 : /* Now figure out how many hashes are needed to "catch up" the hash
1752 : count to the current system clock, and clamp it to the allowed
1753 : range. */
1754 0 : long now = fd_log_wallclock();
1755 0 : ulong target_hashcnt;
1756 0 : if( FD_LIKELY( !is_leader ) ) {
1757 0 : target_hashcnt = (ulong)((double)(now - ctx->reset_slot_start_ns) / ctx->hashcnt_duration_ns) - (ctx->slot-ctx->reset_slot)*ctx->hashcnt_per_slot;
1758 0 : } else {
1759 : /* We might have gotten very behind on hashes, but if we are leader
1760 : we want to catch up gradually over the remainder of our leader
1761 : slot, not all at once right now. This helps keep the tile from
1762 : being oversubscribed and taking a long time to process incoming
1763 : microblocks. */
1764 0 : long expected_slot_start_ns = ctx->reset_slot_start_ns + (long)((double)(ctx->slot-ctx->reset_slot)*ctx->slot_duration_ns);
1765 0 : double actual_slot_duration_ns = ctx->slot_duration_ns<(double)(ctx->leader_bank_start_ns - expected_slot_start_ns) ? 0.0 : ctx->slot_duration_ns - (double)(ctx->leader_bank_start_ns - expected_slot_start_ns);
1766 0 : double actual_hashcnt_duration_ns = actual_slot_duration_ns / (double)ctx->hashcnt_per_slot;
1767 0 : target_hashcnt = actual_hashcnt_duration_ns==0.0 ? restricted_hashcnt : (ulong)((double)(now - ctx->leader_bank_start_ns) / actual_hashcnt_duration_ns);
1768 0 : }
1769 : /* Clamp to [min_hashcnt, restricted_hashcnt] as above */
1770 0 : target_hashcnt = fd_ulong_max( fd_ulong_min( target_hashcnt, restricted_hashcnt ), min_hashcnt );
1771 :
1772 : /* The above proof showed that it was always possible to pick a value
1773 : of target_hashcnt, but we still have a lot of freedom in how to
1774 : pick it. It simplifies the code a lot if we don't keep going after
1775 : a tick in this function. In particular, we want to publish at most
1776 : 1 tick in this call, since otherwise we could consume infinite
1777 : credits to publish here. The credits are set so that we should
1778 : only ever publish one tick during this loop. Also, all the extra
1779 : stuff (leader transitions, publishing ticks, etc.) we have to do
1780 : happens at tick boundaries, so this lets us consolidate all those
1781 : cases.
1782 :
1783 : Mathematically, since the current value of hashcnt is h_j+a_j, the
1784 : next tick (advancing a full tick if we're currently at a tick) is
1785 : t_{j+1} = T*(floor( (h_j+a_j)/T )+1). We need to show that if we set
1786 : h'_{j+1} = min( h_{j+1}, t_{j+1} ), it is still valid.
1787 :
1788 : First, h'_{j+1} <= h_{j+1} <= r_{j+1}, so we're okay in that
1789 : direction.
1790 :
1791 : Next, observe that t_{j+1}>=h_j + a_j + 1, and recall that b_{j+1}
1792 : is 0 or 1. So then,
1793 : t_{j+1} >= h_j+a_j+b_{j+1} = m_{j+1}.
1794 :
1795 : We know h_{j+1) >= m_{j+1} from before, so then h'_{j+1} >=
1796 : m_{j+1}, as desired. */
1797 :
1798 0 : ulong next_tick_hashcnt = ctx->hashcnt_per_tick * (1UL+(ctx->hashcnt/ctx->hashcnt_per_tick));
1799 0 : target_hashcnt = fd_ulong_min( target_hashcnt, next_tick_hashcnt );
1800 :
1801 : /* We still need to enforce rule (i). We know that min_hashcnt%T !=
1802 : T-1 because of rule (ii). That means that if target_hashcnt%T ==
1803 : T-1 at this point, target_hashcnt > min_hashcnt (notice the
1804 : strict), so target_hashcnt-1 >= min_hashcnt and is thus still a
1805 : valid choice for target_hashcnt. */
1806 0 : target_hashcnt -= (ulong)( (!low_power_mode) & ((target_hashcnt%ctx->hashcnt_per_tick)==(ctx->hashcnt_per_tick-1UL)) );
1807 :
1808 0 : FD_TEST( target_hashcnt >= ctx->hashcnt );
1809 0 : FD_TEST( target_hashcnt >= min_hashcnt );
1810 0 : FD_TEST( target_hashcnt <= restricted_hashcnt );
1811 :
1812 0 : if( FD_UNLIKELY( ctx->hashcnt==target_hashcnt ) ) return; /* Nothing to do, don't publish a tick twice */
1813 :
1814 0 : *charge_busy = 1;
1815 :
1816 0 : if( FD_LIKELY( ctx->hashcnt<target_hashcnt ) ) {
1817 0 : fd_sha256_hash_32_repeated( ctx->hash, ctx->hash, target_hashcnt-ctx->hashcnt );
1818 0 : ctx->hashcnt = target_hashcnt;
1819 0 : }
1820 :
1821 0 : if( FD_UNLIKELY( ctx->hashcnt==ctx->hashcnt_per_slot ) ) {
1822 0 : ctx->slot++;
1823 0 : ctx->hashcnt = 0UL;
1824 0 : }
1825 :
1826 0 : if( FD_UNLIKELY( !is_leader && !(ctx->hashcnt%ctx->hashcnt_per_tick ) ) ) {
1827 : /* We finished a tick while not leader... save the current hash so
1828 : it can be played back into the bank when we become the leader. */
1829 0 : ulong tick_idx = (ctx->slot*ctx->ticks_per_slot+ctx->hashcnt/ctx->hashcnt_per_tick)%MAX_SKIPPED_TICKS;
1830 0 : fd_memcpy( ctx->skipped_tick_hashes[ tick_idx ], ctx->hash, 32UL );
1831 :
1832 0 : ulong initial_tick_idx = (ctx->last_slot*ctx->ticks_per_slot+ctx->last_hashcnt/ctx->hashcnt_per_tick)%MAX_SKIPPED_TICKS;
1833 0 : if( FD_UNLIKELY( tick_idx==initial_tick_idx ) ) FD_LOG_ERR(( "Too many skipped ticks from slot %lu to slot %lu, chain must halt", ctx->last_slot, ctx->slot ));
1834 0 : }
1835 :
1836 0 : if( FD_UNLIKELY( is_leader && !(ctx->hashcnt%ctx->hashcnt_per_tick) ) ) {
1837 : /* We ticked while leader... tell the leader bank. */
1838 0 : fd_ext_poh_register_tick( ctx->current_leader_bank, ctx->hash );
1839 :
1840 : /* And send an empty microblock (a tick) to the shred tile. */
1841 0 : publish_tick( ctx, stem, ctx->hash, 0 );
1842 0 : }
1843 :
1844 0 : if( FD_UNLIKELY( !is_leader && ctx->slot>=ctx->next_leader_slot ) ) {
1845 : /* We ticked while not leader and are now leader... transition
1846 : the state machine. */
1847 0 : publish_plugin_slot_start( ctx, ctx->next_leader_slot, ctx->reset_slot );
1848 0 : FD_LOG_INFO(( "fd_poh_ticked_into_leader(slot=%lu, reset_slot=%lu)", ctx->next_leader_slot, ctx->reset_slot ));
1849 0 : }
1850 :
1851 0 : if( FD_UNLIKELY( is_leader && ctx->slot>ctx->next_leader_slot ) ) {
1852 : /* We ticked while leader and are no longer leader... transition
1853 : the state machine. */
1854 0 : FD_TEST( !max_remaining_microblocks );
1855 0 : publish_plugin_slot_end( ctx, ctx->next_leader_slot, ctx->cus_used );
1856 0 : FD_LOG_INFO(( "fd_poh_ticked_outof_leader(slot=%lu)", ctx->next_leader_slot ));
1857 :
1858 0 : no_longer_leader( ctx );
1859 0 : ctx->expect_sequential_leader_slot = ctx->slot;
1860 :
1861 0 : double tick_per_ns = fd_tempo_tick_per_ns( NULL );
1862 0 : fd_histf_sample( ctx->slot_done_delay, (ulong)((double)(fd_log_wallclock()-ctx->reset_slot_start_ns)*tick_per_ns) );
1863 0 : ctx->next_leader_slot = next_leader_slot( ctx );
1864 :
1865 0 : if( FD_UNLIKELY( ctx->slot>=ctx->next_leader_slot ) ) {
1866 : /* We finished a leader slot, and are immediately leader for the
1867 : following slot... transition. */
1868 0 : publish_plugin_slot_start( ctx, ctx->next_leader_slot, ctx->next_leader_slot-1UL );
1869 0 : FD_LOG_INFO(( "fd_poh_ticked_into_leader(slot=%lu, reset_slot=%lu)", ctx->next_leader_slot, ctx->next_leader_slot-1UL ));
1870 0 : }
1871 0 : }
1872 0 : }
1873 :
1874 : static inline void
1875 0 : during_housekeeping( fd_pohh_tile_t * ctx ) {
1876 0 : if( FD_UNLIKELY( maybe_change_identity( ctx, 0 ) ) ) {
1877 0 : ctx->next_leader_slot = next_leader_slot( ctx );
1878 0 : FD_LOG_INFO(( "fd_poh_identity_changed(next_leader_slot=%lu)", ctx->next_leader_slot ));
1879 :
1880 : /* Signal replay to check if we are leader again, in-case it's stuck
1881 : because everything already replayed. */
1882 0 : FD_COMPILER_MFENCE();
1883 0 : fd_ext_poh_signal_leader_change( ctx->signal_leader_change );
1884 0 : }
1885 0 : }
1886 :
1887 : static inline void
1888 0 : metrics_write( fd_pohh_tile_t * ctx ) {
1889 0 : FD_MHIST_COPY( POHH, BEGIN_LEADER_DELAY_SECONDS, ctx->begin_leader_delay );
1890 0 : FD_MHIST_COPY( POHH, FIRST_MICROBLOCK_DELAY_SECONDS, ctx->first_microblock_delay );
1891 0 : FD_MHIST_COPY( POHH, SLOT_DONE_DELAY_SECONDS, ctx->slot_done_delay );
1892 0 : FD_MHIST_COPY( POHH, BUNDLE_INITIALIZE_DELAY_SECONDS, ctx->bundle_init_delay );
1893 0 : }
1894 :
1895 : static int
1896 : before_frag( fd_pohh_tile_t * ctx,
1897 : ulong in_idx,
1898 : ulong seq,
1899 0 : ulong sig ) {
1900 0 : (void)seq;
1901 :
1902 0 : if( FD_LIKELY( ctx->in_kind[ in_idx ]!=IN_KIND_BANK && ctx->in_kind[ in_idx ]!=IN_KIND_PACK ) ) return 0;
1903 :
1904 0 : if( FD_UNLIKELY( sig==FD_PACK_MSG_DONE_DRAINING ) ) {
1905 : /* Banks are drained, release pack's ownership of the current bank */
1906 0 : if( FD_UNLIKELY( ctx->pack_leader_bank ) ) fd_ext_bank_release( ctx->pack_leader_bank );
1907 0 : ctx->pack_leader_bank = NULL;
1908 0 : return 1; /* discard */
1909 0 : }
1910 :
1911 : /* Firedancer publishes dynamic microblock bound updates over the
1912 : pack_poh link. Frankendancer does not use them. */
1913 0 : if( FD_UNLIKELY( sig==FD_PACK_MSG_REDUCE_MB_BOUND ) ) return 1; /* discard */
1914 :
1915 0 : uint pack_idx = (uint)fd_disco_execle_sig_pack_idx( sig );
1916 0 : FD_TEST( ((int)(pack_idx-ctx->expect_pack_idx))>=0L );
1917 0 : if( FD_UNLIKELY( pack_idx!=ctx->expect_pack_idx ) ) return -1;
1918 0 : ctx->expect_pack_idx++;
1919 :
1920 0 : return 0;
1921 0 : }
1922 :
1923 : static inline void
1924 : during_frag( fd_pohh_tile_t * ctx,
1925 : ulong in_idx,
1926 : ulong seq FD_PARAM_UNUSED,
1927 : ulong sig,
1928 : ulong chunk,
1929 : ulong sz,
1930 0 : ulong ctl FD_PARAM_UNUSED ) {
1931 0 : ctx->skip_frag = 0;
1932 :
1933 0 : if( FD_UNLIKELY( ctx->in_kind[ in_idx ]==IN_KIND_EPOCH ) ) {
1934 0 : if( FD_UNLIKELY( chunk<ctx->in[ in_idx ].chunk0 || chunk>ctx->in[ in_idx ].wmark ) )
1935 0 : FD_LOG_ERR(( "chunk %lu %lu corrupt, not in range [%lu,%lu]", chunk, sz,
1936 0 : ctx->in[ in_idx ].chunk0, ctx->in[ in_idx ].wmark ));
1937 :
1938 0 : uchar const * dcache_entry = fd_chunk_to_laddr_const( ctx->in[ in_idx ].mem, chunk );
1939 0 : fd_multi_epoch_leaders_stake_msg_init( ctx->mleaders, fd_type_pun_const( dcache_entry ) );
1940 0 : return;
1941 0 : }
1942 :
1943 0 : ulong slot;
1944 0 : switch( ctx->in_kind[ in_idx ] ) {
1945 0 : case IN_KIND_BANK:
1946 0 : case IN_KIND_PACK: {
1947 0 : slot = fd_disco_execle_sig_slot( sig );
1948 0 : break;
1949 0 : }
1950 0 : default:
1951 0 : FD_LOG_ERR(( "unexpected in_kind %d", ctx->in_kind[ in_idx ] ));
1952 0 : }
1953 :
1954 : /* The following sequence is possible...
1955 :
1956 : 1. We become leader in slot 10
1957 : 2. While leader, we switch to a fork that is on slot 8, where
1958 : we are leader
1959 : 3. We get the in-flight microblocks for slot 10
1960 :
1961 : These in-flight microblocks need to be dropped, so we check
1962 : against the high water mark (highwater_leader_slot) rather than
1963 : the current hashcnt here when determining what to drop.
1964 :
1965 : We know if the slot is lower than the high water mark it's from a stale
1966 : leader slot, because we will not become leader for the same slot twice
1967 : even if we are reset back in time (to prevent duplicate blocks). */
1968 0 : int is_frag_for_prior_leader_slot = slot<ctx->highwater_leader_slot;
1969 :
1970 0 : if( FD_UNLIKELY( ctx->in_kind[ in_idx ]==IN_KIND_PACK ) ) {
1971 : /* We now know the real amount of microblocks published, so set an
1972 : exact bound for once we receive them. */
1973 0 : ctx->skip_frag = 1;
1974 0 : if( FD_UNLIKELY( is_frag_for_prior_leader_slot ) ) return;
1975 0 : fd_done_packing_t const * done_packing = fd_chunk_to_laddr( ctx->in[ in_idx ].mem, chunk );
1976 :
1977 0 : FD_TEST( ctx->microblocks_lower_bound<=ctx->max_microblocks_per_slot );
1978 0 : FD_TEST( done_packing->microblocks_in_slot<=ctx->max_microblocks_per_slot-1UL );
1979 0 : FD_LOG_INFO(( "done_packing(slot=%lu,seen_microblocks=%lu,microblocks_in_slot=%lu)",
1980 0 : ctx->slot,
1981 0 : ctx->microblocks_lower_bound,
1982 0 : done_packing->microblocks_in_slot ));
1983 :
1984 0 : ctx->microblocks_lower_bound += 1UL /* done_packing as a phantom "microblock"*/
1985 0 : + (ctx->max_microblocks_per_slot-1UL) /* the canonical microblock limit */
1986 0 : - done_packing->microblocks_in_slot /* the actual microblock count */;
1987 0 : return;
1988 0 : } else {
1989 0 : if( FD_UNLIKELY( chunk<ctx->in[ in_idx ].chunk0 || chunk>ctx->in[ in_idx ].wmark || sz>USHORT_MAX ) )
1990 0 : FD_LOG_ERR(( "chunk %lu %lu corrupt, not in range [%lu,%lu]", chunk, sz, ctx->in[ in_idx ].chunk0, ctx->in[ in_idx ].wmark ));
1991 :
1992 0 : uchar * src = (uchar *)fd_chunk_to_laddr( ctx->in[ in_idx ].mem, chunk );
1993 :
1994 0 : fd_memcpy( ctx->_txns, src, sz-sizeof(fd_microblock_trailer_t) );
1995 0 : fd_memcpy( ctx->_microblock_trailer, src+sz-sizeof(fd_microblock_trailer_t), sizeof(fd_microblock_trailer_t) );
1996 :
1997 0 : ctx->skip_frag = is_frag_for_prior_leader_slot;
1998 0 : }
1999 0 : }
2000 :
2001 : static void
2002 : publish_microblock( fd_pohh_tile_t * ctx,
2003 : fd_stem_context_t * stem,
2004 : ulong slot,
2005 : ulong hashcnt_delta,
2006 0 : ulong txn_cnt ) {
2007 0 : uchar * dst = (uchar *)fd_chunk_to_laddr( ctx->shred_out->mem, ctx->shred_out->chunk );
2008 0 : FD_TEST( slot>=ctx->reset_slot );
2009 0 : fd_entry_batch_meta_t * meta = (fd_entry_batch_meta_t *)dst;
2010 0 : meta->parent_offset = 1UL+slot-ctx->reset_slot;
2011 0 : meta->reference_tick = (ctx->hashcnt/ctx->hashcnt_per_tick) % ctx->ticks_per_slot;
2012 0 : meta->block_complete = !ctx->hashcnt;
2013 :
2014 : /* Refer to publish_tick() for details on meta->parent_block_id_valid. */
2015 0 : meta->parent_block_id_valid = ctx->parent_slot == (slot-meta->parent_offset);
2016 0 : if( FD_LIKELY( meta->parent_block_id_valid ) ) {
2017 0 : fd_memcpy( meta->parent_block_id, ctx->parent_block_id, 32UL );
2018 0 : }
2019 :
2020 0 : dst += sizeof(fd_entry_batch_meta_t);
2021 0 : fd_entry_batch_header_t * header = (fd_entry_batch_header_t *)dst;
2022 0 : header->hashcnt_delta = hashcnt_delta;
2023 0 : fd_memcpy( header->hash, ctx->hash, 32UL );
2024 :
2025 0 : dst += sizeof(fd_entry_batch_header_t);
2026 0 : ulong payload_sz = 0UL;
2027 0 : ulong included_txn_cnt = 0UL;
2028 0 : for( ulong i=0UL; i<txn_cnt; i++ ) {
2029 0 : fd_txn_p_t * txn = (fd_txn_p_t *)(ctx->_txns + i*sizeof(fd_txn_p_t));
2030 0 : if( FD_UNLIKELY( !(txn->flags & FD_TXN_P_FLAGS_EXECUTE_SUCCESS) ) ) continue;
2031 :
2032 0 : fd_memcpy( dst, txn->payload, txn->payload_sz );
2033 0 : payload_sz += txn->payload_sz;
2034 0 : dst += txn->payload_sz;
2035 0 : included_txn_cnt++;
2036 0 : }
2037 0 : header->txn_cnt = included_txn_cnt;
2038 :
2039 : /* We always have credits to publish here, because we have a burst
2040 : value of 3 credits, and at most we will publish_tick() once and
2041 : then publish_became_leader() once, leaving one credit here to
2042 : publish the microblock. */
2043 0 : ulong tspub = (ulong)fd_frag_meta_ts_comp( fd_tickcount() );
2044 0 : ulong sz = sizeof(fd_entry_batch_meta_t)+sizeof(fd_entry_batch_header_t)+payload_sz;
2045 0 : ulong new_sig = fd_disco_poh_sig( slot, POH_PKT_TYPE_MICROBLOCK, 0UL );
2046 0 : fd_stem_publish( stem, ctx->shred_out->idx, new_sig, ctx->shred_out->chunk, sz, 0UL, 0UL, tspub );
2047 0 : ctx->shred_seq = stem->seqs[ ctx->shred_out->idx ];
2048 0 : ctx->shred_out->chunk = fd_dcache_compact_next( ctx->shred_out->chunk, sz, ctx->shred_out->chunk0, ctx->shred_out->wmark );
2049 :
2050 : /* Can only happen in low power mode. Refer to comment in publish_tick()
2051 : for more details */
2052 0 : if( FD_UNLIKELY( meta->block_complete && ctx->store_leader_bank_slot!=ULONG_MAX ) ) {
2053 0 : FD_TEST( ctx->store_leader_bank_slot==slot );
2054 0 : ctx->store_leader_bank = NULL;
2055 0 : ctx->store_leader_bank_slot = ULONG_MAX;
2056 0 : }
2057 0 : }
2058 :
2059 : static inline void
2060 : after_frag( fd_pohh_tile_t * ctx,
2061 : ulong in_idx,
2062 : ulong seq,
2063 : ulong sig,
2064 : ulong sz,
2065 : ulong tsorig,
2066 : ulong tspub,
2067 0 : fd_stem_context_t * stem ) {
2068 0 : (void)in_idx;
2069 0 : (void)seq;
2070 0 : (void)tsorig;
2071 0 : (void)tspub;
2072 :
2073 0 : if( FD_UNLIKELY( ctx->skip_frag ) ) return;
2074 :
2075 0 : if( FD_UNLIKELY( ctx->in_kind[ in_idx ]==IN_KIND_EPOCH ) ) {
2076 0 : fd_multi_epoch_leaders_stake_msg_fini( ctx->mleaders );
2077 : /* It might seem like we do not need to do state transitions in and
2078 : out of being the leader here, since leader schedule updates are
2079 : always one epoch in advance (whether we are leader or not would
2080 : never change for the currently executing slot) but this is not
2081 : true for new ledgers when the validator first boots. We will
2082 : likely be the leader in slot 1, and get notified of the leader
2083 : schedule for that slot while we are still in it.
2084 :
2085 : For safety we just handle both transitions, in and out, although
2086 : the only one possible should be into leader. */
2087 0 : ulong next_leader_slot_after_frag = next_leader_slot( ctx );
2088 :
2089 0 : int currently_leader = ctx->slot>=ctx->next_leader_slot;
2090 0 : int leader_after_frag = ctx->slot>=next_leader_slot_after_frag;
2091 :
2092 0 : FD_LOG_INFO(( "stake_update(before_leader=%lu,after_leader=%lu)",
2093 0 : ctx->next_leader_slot,
2094 0 : next_leader_slot_after_frag ));
2095 :
2096 0 : ctx->next_leader_slot = next_leader_slot_after_frag;
2097 0 : if( FD_UNLIKELY( currently_leader && !leader_after_frag ) ) {
2098 : /* Shouldn't ever happen, otherwise we need to do a state
2099 : transition out of being leader. */
2100 0 : FD_LOG_ERR(( "stake update caused us to no longer be leader in an active slot" ));
2101 0 : }
2102 :
2103 : /* Nothing to do if we transition into being leader, since it
2104 : will just get picked up by the regular tick loop. */
2105 0 : if( FD_UNLIKELY( !currently_leader && leader_after_frag ) ) {
2106 0 : publish_plugin_slot_start( ctx, next_leader_slot_after_frag, ctx->reset_slot );
2107 0 : }
2108 :
2109 0 : return;
2110 0 : }
2111 :
2112 0 : if( FD_UNLIKELY( !ctx->microblocks_lower_bound ) ) {
2113 0 : double tick_per_ns = fd_tempo_tick_per_ns( NULL );
2114 0 : fd_histf_sample( ctx->first_microblock_delay, (ulong)((double)(fd_log_wallclock()-ctx->reset_slot_start_ns)/tick_per_ns) );
2115 0 : }
2116 :
2117 0 : ulong target_slot = fd_disco_execle_sig_slot( sig );
2118 :
2119 0 : if( FD_UNLIKELY( target_slot!=ctx->next_leader_slot || target_slot!=ctx->slot ) ) {
2120 0 : FD_LOG_ERR(( "packed too early or late target_slot=%lu, current_slot=%lu. highwater_leader_slot=%lu",
2121 0 : target_slot, ctx->slot, ctx->highwater_leader_slot ));
2122 0 : }
2123 :
2124 0 : FD_TEST( ctx->current_leader_bank );
2125 0 : FD_TEST( ctx->microblocks_lower_bound<ctx->max_microblocks_per_slot );
2126 0 : ctx->microblocks_lower_bound += 1UL;
2127 :
2128 0 : ulong txn_cnt = (sz-sizeof(fd_microblock_trailer_t))/sizeof(fd_txn_p_t);
2129 0 : fd_txn_p_t * txns = (fd_txn_p_t *)(ctx->_txns);
2130 0 : ulong executed_txn_cnt = 0UL;
2131 0 : ulong cus_used = 0UL;
2132 0 : for( ulong i=0UL; i<txn_cnt; i++ ) {
2133 : /* It's important that we check if a transaction is included in the
2134 : block with FD_TXN_P_FLAGS_EXECUTE_SUCCESS since
2135 : actual_consumed_cus may have a nonzero value for excluded
2136 : transactions used for monitoring purposes */
2137 0 : if( FD_LIKELY( txns[ i ].flags & FD_TXN_P_FLAGS_EXECUTE_SUCCESS ) ) {
2138 0 : executed_txn_cnt++;
2139 0 : cus_used += txns[ i ].execle_cu.actual_consumed_cus;
2140 0 : }
2141 0 : }
2142 :
2143 : /* We don't publish transactions that fail to execute. If all the
2144 : transactions failed to execute, the microblock would be empty,
2145 : causing agave to think it's a tick and complain. Instead, we just
2146 : skip the microblock and don't hash or update the hashcnt. */
2147 0 : if( FD_UNLIKELY( !executed_txn_cnt ) ) return;
2148 :
2149 0 : uchar data[ 64 ];
2150 0 : fd_memcpy( data, ctx->hash, 32UL );
2151 0 : fd_memcpy( data+32UL, ctx->_microblock_trailer->hash, 32UL );
2152 0 : fd_sha256_hash( data, 64UL, ctx->hash );
2153 :
2154 0 : ctx->hashcnt++;
2155 0 : FD_TEST( ctx->hashcnt>ctx->last_hashcnt );
2156 0 : ulong hashcnt_delta = ctx->hashcnt - ctx->last_hashcnt;
2157 :
2158 : /* The hashing loop above will never leave us exactly one away from
2159 : crossing a tick boundary, so this increment will never cause the
2160 : current tick (or the slot) to change, except in low power mode
2161 : for development, in which case we do need to register the tick
2162 : with the leader bank. We don't need to publish the tick since
2163 : sending the microblock below is the publishing action. */
2164 0 : if( FD_UNLIKELY( !(ctx->hashcnt%ctx->hashcnt_per_slot ) ) ) {
2165 0 : ctx->slot++;
2166 0 : ctx->hashcnt = 0UL;
2167 0 : }
2168 :
2169 0 : ctx->last_slot = ctx->slot;
2170 0 : ctx->last_hashcnt = ctx->hashcnt;
2171 :
2172 0 : ctx->cus_used += cus_used;
2173 :
2174 0 : if( FD_UNLIKELY( !(ctx->hashcnt%ctx->hashcnt_per_tick ) ) ) {
2175 0 : fd_ext_poh_register_tick( ctx->current_leader_bank, ctx->hash );
2176 0 : }
2177 :
2178 0 : publish_microblock( ctx, stem, target_slot, hashcnt_delta, txn_cnt );
2179 :
2180 0 : if( FD_UNLIKELY( !(ctx->hashcnt%ctx->hashcnt_per_tick ) ) ) {
2181 0 : if( FD_UNLIKELY( ctx->slot>ctx->next_leader_slot ) ) {
2182 : /* We ticked while leader and are no longer leader... transition
2183 : the state machine. */
2184 0 : publish_plugin_slot_end( ctx, ctx->next_leader_slot, ctx->cus_used );
2185 :
2186 0 : no_longer_leader( ctx );
2187 :
2188 0 : if( FD_UNLIKELY( ctx->slot>=ctx->next_leader_slot ) ) {
2189 : /* We finished a leader slot, and are immediately leader for the
2190 : following slot... transition. */
2191 0 : publish_plugin_slot_start( ctx, ctx->next_leader_slot, ctx->next_leader_slot-1UL );
2192 0 : }
2193 0 : }
2194 0 : }
2195 0 : }
2196 :
2197 : static void
2198 : privileged_init( fd_topo_t const * topo,
2199 0 : fd_topo_tile_t const * tile ) {
2200 0 : void * scratch = fd_topo_obj_laddr( topo, tile->tile_obj_id );
2201 :
2202 0 : FD_SCRATCH_ALLOC_INIT( l, scratch );
2203 0 : fd_pohh_tile_t * ctx = FD_SCRATCH_ALLOC_APPEND( l, alignof( fd_pohh_tile_t ), sizeof( fd_pohh_tile_t ) );
2204 :
2205 0 : if( FD_UNLIKELY( !strcmp( tile->pohh.identity_key_path, "" ) ) )
2206 0 : FD_LOG_ERR(( "identity_key_path not set" ));
2207 :
2208 0 : const uchar * identity_key = fd_keyload_load( tile->pohh.identity_key_path, /* pubkey only: */ 1 );
2209 0 : fd_memcpy( ctx->identity_key.uc, identity_key, 32UL );
2210 :
2211 0 : ctx->bundle.enabled = tile->pohh.bundle.enabled;
2212 0 : if( FD_UNLIKELY( !tile->pohh.bundle.vote_account_path[0] ) ) {
2213 0 : ctx->bundle.enabled = 0;
2214 0 : }
2215 0 : if( FD_UNLIKELY( ctx->bundle.enabled ) ) {
2216 0 : if( FD_UNLIKELY( !fd_base58_decode_32( tile->pohh.bundle.vote_account_path, ctx->bundle.vote_account.uc ) ) ) {
2217 0 : const uchar * vote_key = fd_keyload_load( tile->pohh.bundle.vote_account_path, /* pubkey only: */ 1 );
2218 0 : fd_memcpy( ctx->bundle.vote_account.uc, vote_key, 32UL );
2219 0 : }
2220 0 : }
2221 0 : }
2222 :
2223 : /* The Agave client needs to communicate to the shred tile what
2224 : the shred version is on boot, but shred tile does not live in the
2225 : same address space, so have the PoH tile pass the value through
2226 : via. a shared memory ulong. */
2227 :
2228 : static volatile ulong * fd_shred_version;
2229 :
2230 : void
2231 0 : fd_ext_shred_set_shred_version( ulong shred_version ) {
2232 0 : while( FD_UNLIKELY( !fd_shred_version ) ) FD_SPIN_PAUSE();
2233 0 : *fd_shred_version = shred_version;
2234 0 : }
2235 :
2236 : void
2237 : fd_ext_poh_publish_gossip_vote( uchar * data,
2238 : ulong data_len,
2239 : uint source_ipv4,
2240 0 : uchar * pubkey ) {
2241 0 : (void)pubkey;
2242 0 : uchar txn_with_header[ FD_TPU_RAW_MTU ];
2243 0 : fd_txn_m_t * txnm = (fd_txn_m_t *)txn_with_header;
2244 0 : *txnm = (fd_txn_m_t) { 0UL };
2245 0 : txnm->payload_sz = (ushort)data_len;
2246 0 : txnm->source_ipv4 = source_ipv4;
2247 0 : txnm->source_tpu = FD_TXN_M_TPU_SOURCE_GOSSIP;
2248 0 : fd_memcpy(txn_with_header + sizeof(fd_txn_m_t), data, data_len);
2249 0 : poh_link_publish( &gossip_dedup, 1UL, txn_with_header, fd_txn_m_realized_footprint( txnm, 0, 0 ) );
2250 0 : }
2251 :
2252 : void
2253 : fd_ext_poh_publish_leader_schedule( uchar * data,
2254 0 : ulong data_len ) {
2255 0 : poh_link_publish( &stake_out, 2UL, data, data_len );
2256 0 : }
2257 :
2258 : void
2259 : fd_ext_poh_publish_cluster_info( uchar * data,
2260 0 : ulong data_len ) {
2261 0 : poh_link_publish( &crds_shred, 2UL, data, data_len );
2262 0 : }
2263 :
2264 : void
2265 0 : fd_ext_poh_publish_executed_txn( uchar const * data ) {
2266 0 : static int lock = 0;
2267 :
2268 : /* Need to lock since the link publisher is not concurrent, and replay
2269 : happens on a thread pool. */
2270 0 : for(;;) {
2271 0 : if( FD_LIKELY( FD_ATOMIC_CAS( &lock, 0, 1 )==0 ) ) break;
2272 0 : FD_SPIN_PAUSE();
2273 0 : }
2274 :
2275 0 : FD_COMPILER_MFENCE();
2276 0 : poh_link_publish( &executed_txn, 0UL, data, 64UL );
2277 0 : FD_COMPILER_MFENCE();
2278 :
2279 0 : FD_VOLATILE(lock) = 0;
2280 0 : }
2281 :
2282 : void
2283 : fd_ext_plugin_publish_replay_stage( ulong sig,
2284 : uchar * data,
2285 0 : ulong data_len ) {
2286 0 : poh_link_publish( &replay_plugin, sig, data, data_len );
2287 0 : }
2288 :
2289 : void
2290 : fd_ext_plugin_publish_genesis_hash( ulong sig,
2291 : uchar * data,
2292 0 : ulong data_len ) {
2293 0 : poh_link_publish( &replay_plugin, sig, data, data_len );
2294 0 : }
2295 :
2296 : void
2297 : fd_ext_plugin_publish_start_progress( ulong sig,
2298 : uchar * data,
2299 0 : ulong data_len ) {
2300 0 : poh_link_publish( &start_progress_plugin, sig, data, data_len );
2301 0 : }
2302 :
2303 : void
2304 : fd_ext_plugin_publish_vote_listener( ulong sig,
2305 : uchar * data,
2306 0 : ulong data_len ) {
2307 0 : poh_link_publish( &vote_listener_plugin, sig, data, data_len );
2308 0 : }
2309 :
2310 : void
2311 : fd_ext_plugin_publish_validator_info( ulong sig,
2312 : uchar * data,
2313 0 : ulong data_len ) {
2314 0 : poh_link_publish( &validator_info_plugin, sig, data, data_len );
2315 0 : }
2316 :
2317 : void
2318 : fd_ext_plugin_publish_periodic( ulong sig,
2319 : uchar * data,
2320 0 : ulong data_len ) {
2321 0 : poh_link_publish( &gossip_plugin, sig, data, data_len );
2322 0 : }
2323 :
2324 : void
2325 : fd_ext_resolv_publish_root_bank( uchar * data,
2326 0 : ulong data_len ) {
2327 0 : poh_link_publish( &replay_resolh, 0UL, data, data_len );
2328 0 : }
2329 :
2330 : void
2331 : fd_ext_resolv_publish_completed_blockhash( uchar * data,
2332 0 : ulong data_len ) {
2333 0 : poh_link_publish( &replay_resolh, 1UL, data, data_len );
2334 0 : }
2335 :
2336 : static inline fd_pohh_out_t
2337 : out1( fd_topo_t const * topo,
2338 : fd_topo_tile_t const * tile,
2339 0 : char const * name ) {
2340 0 : ulong idx = ULONG_MAX;
2341 :
2342 0 : for( ulong i=0UL; i<tile->out_cnt; i++ ) {
2343 0 : fd_topo_link_t const * link = &topo->links[ tile->out_link_id[ i ] ];
2344 0 : if( !strcmp( link->name, name ) ) {
2345 0 : if( FD_UNLIKELY( idx!=ULONG_MAX ) ) FD_LOG_ERR(( "tile %s:%lu had multiple output links named %s but expected one", tile->name, tile->kind_id, name ));
2346 0 : idx = i;
2347 0 : }
2348 0 : }
2349 :
2350 0 : if( FD_UNLIKELY( idx==ULONG_MAX ) ) FD_LOG_ERR(( "tile %s:%lu had no output link named %s", tile->name, tile->kind_id, name ));
2351 :
2352 0 : void * mem = topo->workspaces[ topo->objs[ topo->links[ tile->out_link_id[ idx ] ].dcache_obj_id ].wksp_id ].wksp;
2353 0 : ulong chunk0 = fd_dcache_compact_chunk0( mem, topo->links[ tile->out_link_id[ idx ] ].dcache );
2354 0 : ulong wmark = fd_dcache_compact_wmark ( mem, topo->links[ tile->out_link_id[ idx ] ].dcache, topo->links[ tile->out_link_id[ idx ] ].mtu );
2355 :
2356 0 : return (fd_pohh_out_t){ .idx = idx, .mem = mem, .chunk0 = chunk0, .wmark = wmark, .chunk = chunk0 };
2357 0 : }
2358 :
2359 : static void
2360 : unprivileged_init( fd_topo_t const * topo,
2361 0 : fd_topo_tile_t const * tile ) {
2362 0 : void * scratch = fd_topo_obj_laddr( topo, tile->tile_obj_id );
2363 :
2364 0 : FD_SCRATCH_ALLOC_INIT( l, scratch );
2365 0 : fd_pohh_tile_t * ctx = FD_SCRATCH_ALLOC_APPEND( l, alignof( fd_pohh_tile_t ), sizeof( fd_pohh_tile_t ) );
2366 0 : void * sha256 = FD_SCRATCH_ALLOC_APPEND( l, FD_SHA256_ALIGN, FD_SHA256_FOOTPRINT );
2367 :
2368 0 : #define NONNULL( x ) (__extension__({ \
2369 0 : __typeof__((x)) __x = (x); \
2370 0 : if( FD_UNLIKELY( !__x ) ) FD_LOG_ERR(( #x " was unexpectedly NULL" )); \
2371 0 : __x; }))
2372 :
2373 0 : ctx->mleaders = NONNULL( fd_multi_epoch_leaders_join( fd_multi_epoch_leaders_new( ctx->mleaders_mem ) ) );
2374 0 : ctx->sha256 = NONNULL( fd_sha256_join( fd_sha256_new( sha256 ) ) );
2375 0 : ctx->current_leader_bank = NULL;
2376 0 : ctx->pack_leader_bank = NULL;
2377 0 : ctx->store_leader_bank = NULL;
2378 0 : ctx->store_leader_bank_slot = ULONG_MAX;
2379 0 : ctx->signal_leader_change = NULL;
2380 :
2381 0 : ctx->shred_seq = ULONG_MAX;
2382 0 : ctx->halted_switching_key = 0;
2383 0 : ctx->keyswitch = fd_keyswitch_join( fd_topo_obj_laddr( topo, tile->id_keyswitch_obj_id ) );
2384 0 : FD_TEST( ctx->keyswitch );
2385 :
2386 0 : ctx->slot = 0UL;
2387 0 : ctx->hashcnt = 0UL;
2388 0 : ctx->last_hashcnt = 0UL;
2389 0 : ctx->highwater_leader_slot = ULONG_MAX;
2390 0 : ctx->next_leader_slot = ULONG_MAX;
2391 0 : ctx->reset_slot = ULONG_MAX;
2392 :
2393 0 : ctx->lagged_consecutive_leader_start = tile->pohh.lagged_consecutive_leader_start;
2394 0 : ctx->expect_sequential_leader_slot = ULONG_MAX;
2395 :
2396 0 : ctx->expect_pack_idx = 0U;
2397 0 : ctx->microblocks_lower_bound = 0UL;
2398 :
2399 0 : ctx->max_active_descendant = 0UL;
2400 :
2401 0 : if( FD_UNLIKELY( ctx->bundle.enabled ) ) {
2402 0 : NONNULL( fd_bundle_crank_gen_init( ctx->bundle.gen, (fd_acct_addr_t const *)tile->pohh.bundle.tip_distribution_program_addr,
2403 0 : (fd_acct_addr_t const *)tile->pohh.bundle.tip_payment_program_addr,
2404 0 : (fd_acct_addr_t const *)ctx->bundle.vote_account.uc,
2405 0 : (fd_acct_addr_t const *)ctx->bundle.vote_account.uc, "NAN", 0UL ) ); /* last three arguments are properly bogus */
2406 0 : }
2407 :
2408 0 : ulong pohh_shred_obj_id = fd_pod_query_ulong( topo->props, "pohh_shred", ULONG_MAX );
2409 0 : FD_TEST( pohh_shred_obj_id!=ULONG_MAX );
2410 :
2411 0 : fd_shred_version = fd_fseq_join( fd_topo_obj_laddr( topo, pohh_shred_obj_id ) );
2412 0 : FD_TEST( fd_shred_version );
2413 :
2414 0 : poh_link_init( &gossip_dedup, topo, tile, out1( topo, tile, "gossip_dedup" ).idx );
2415 0 : poh_link_init( &stake_out, topo, tile, out1( topo, tile, "stake_out" ).idx );
2416 0 : poh_link_init( &crds_shred, topo, tile, out1( topo, tile, "crds_shred" ).idx );
2417 0 : poh_link_init( &replay_resolh, topo, tile, out1( topo, tile, "replay_resol" ).idx );
2418 0 : poh_link_init( &executed_txn, topo, tile, out1( topo, tile, "executed_txn" ).idx );
2419 :
2420 0 : if( FD_LIKELY( tile->pohh.plugins_enabled ) ) {
2421 0 : poh_link_init( &replay_plugin, topo, tile, out1( topo, tile, "replay_plugi" ).idx );
2422 0 : poh_link_init( &gossip_plugin, topo, tile, out1( topo, tile, "gossip_plugi" ).idx );
2423 0 : poh_link_init( &start_progress_plugin, topo, tile, out1( topo, tile, "startp_plugi" ).idx );
2424 0 : poh_link_init( &vote_listener_plugin, topo, tile, out1( topo, tile, "votel_plugin" ).idx );
2425 0 : poh_link_init( &validator_info_plugin, topo, tile, out1( topo, tile, "valcfg_plugi" ).idx );
2426 0 : } else {
2427 : /* Mark these mcaches as "available", so the system boots, but the
2428 : memory is not set so nothing will actually get published via.
2429 : the links. */
2430 0 : FD_COMPILER_MFENCE();
2431 0 : replay_plugin.mcache = (fd_frag_meta_t*)1;
2432 0 : gossip_plugin.mcache = (fd_frag_meta_t*)1;
2433 0 : start_progress_plugin.mcache = (fd_frag_meta_t*)1;
2434 0 : vote_listener_plugin.mcache = (fd_frag_meta_t*)1;
2435 0 : validator_info_plugin.mcache = (fd_frag_meta_t*)1;
2436 0 : FD_COMPILER_MFENCE();
2437 0 : }
2438 :
2439 0 : FD_LOG_INFO(( "PoH waiting to be initialized by Agave client... %lu %lu", fd_poh_waiting_lock, fd_poh_returned_lock ));
2440 0 : FD_VOLATILE( fd_pohh_global_ctx ) = ctx;
2441 0 : FD_COMPILER_MFENCE();
2442 0 : for(;;) {
2443 0 : if( FD_LIKELY( FD_VOLATILE_CONST( fd_poh_waiting_lock ) ) ) break;
2444 0 : FD_SPIN_PAUSE();
2445 0 : }
2446 0 : FD_VOLATILE( fd_poh_waiting_lock ) = 0UL;
2447 0 : FD_VOLATILE( fd_poh_returned_lock ) = 1UL;
2448 0 : FD_COMPILER_MFENCE();
2449 0 : for(;;) {
2450 0 : if( FD_UNLIKELY( !FD_VOLATILE_CONST( fd_poh_returned_lock ) ) ) break;
2451 0 : FD_SPIN_PAUSE();
2452 0 : }
2453 0 : FD_COMPILER_MFENCE();
2454 :
2455 0 : if( FD_UNLIKELY( ctx->reset_slot==ULONG_MAX ) ) FD_LOG_ERR(( "PoH was not initialized by Agave client" ));
2456 :
2457 0 : fd_histf_join( fd_histf_new( ctx->begin_leader_delay, FD_MHIST_SECONDS_MIN( POHH, BEGIN_LEADER_DELAY_SECONDS ),
2458 0 : FD_MHIST_SECONDS_MAX( POHH, BEGIN_LEADER_DELAY_SECONDS ) ) );
2459 0 : fd_histf_join( fd_histf_new( ctx->first_microblock_delay, FD_MHIST_SECONDS_MIN( POHH, FIRST_MICROBLOCK_DELAY_SECONDS ),
2460 0 : FD_MHIST_SECONDS_MAX( POHH, FIRST_MICROBLOCK_DELAY_SECONDS ) ) );
2461 0 : fd_histf_join( fd_histf_new( ctx->slot_done_delay, FD_MHIST_SECONDS_MIN( POHH, SLOT_DONE_DELAY_SECONDS ),
2462 0 : FD_MHIST_SECONDS_MAX( POHH, SLOT_DONE_DELAY_SECONDS ) ) );
2463 :
2464 0 : fd_histf_join( fd_histf_new( ctx->bundle_init_delay, FD_MHIST_SECONDS_MIN( POHH, BUNDLE_INITIALIZE_DELAY_SECONDS ),
2465 0 : FD_MHIST_SECONDS_MAX( POHH, BUNDLE_INITIALIZE_DELAY_SECONDS ) ) );
2466 :
2467 0 : for( ulong i=0UL; i<tile->in_cnt; i++ ) {
2468 0 : fd_topo_link_t const * link = &topo->links[ tile->in_link_id[ i ] ];
2469 0 : fd_topo_wksp_t const * link_wksp = &topo->workspaces[ topo->objs[ link->dcache_obj_id ].wksp_id ];
2470 :
2471 0 : ctx->in[ i ].mem = link_wksp->wksp;
2472 0 : ctx->in[ i ].chunk0 = fd_dcache_compact_chunk0( ctx->in[ i ].mem, link->dcache );
2473 0 : ctx->in[ i ].wmark = fd_dcache_compact_wmark ( ctx->in[ i ].mem, link->dcache, link->mtu );
2474 :
2475 0 : if( !strcmp( link->name, "stake_out" ) ) {
2476 0 : ctx->in_kind[ i ] = IN_KIND_EPOCH;
2477 0 : } else if( !strcmp( link->name, "pack_pohh" ) ) {
2478 0 : ctx->in_kind[ i ] = IN_KIND_PACK;
2479 0 : } else if( !strcmp( link->name, "bank_pohh" ) ) {
2480 0 : ctx->in_kind[ i ] = IN_KIND_BANK;
2481 0 : } else {
2482 0 : FD_LOG_ERR(( "unexpected input link name %s", link->name ));
2483 0 : }
2484 0 : }
2485 :
2486 0 : *ctx->shred_out = out1( topo, tile, "pohh_shred" );
2487 0 : *ctx->pack_out = out1( topo, tile, "pohh_pack" );
2488 0 : ctx->plugin_out->mem = NULL;
2489 0 : if( FD_LIKELY( tile->pohh.plugins_enabled ) ) {
2490 0 : *ctx->plugin_out = out1( topo, tile, "pohh_plugin" );
2491 0 : }
2492 :
2493 0 : ctx->features_activation_avail = 0UL;
2494 0 : for( ulong i=0UL; i<FD_SHRED_FEATURES_ACTIVATION_SLOT_CNT; i++ )
2495 0 : ctx->features_activation->slots[i] = FD_SHRED_FEATURES_ACTIVATION_SLOT_DISABLED;
2496 :
2497 0 : ulong scratch_top = FD_SCRATCH_ALLOC_FINI( l, scratch_align() );
2498 0 : if( FD_UNLIKELY( scratch_top > (ulong)scratch + scratch_footprint( tile ) ) )
2499 0 : FD_LOG_ERR(( "scratch overflow %lu %lu %lu", scratch_top - (ulong)scratch - scratch_footprint( tile ), scratch_top, (ulong)scratch + scratch_footprint( tile ) ));
2500 0 : }
2501 :
2502 : /* One tick, one microblock, one plugin slot end, one plugin slot start,
2503 : one leader update, one features activation, and one leader bank
2504 : handoff. */
2505 0 : #define STEM_BURST (7UL)
2506 :
2507 : /* See explanation in fd_pack */
2508 0 : #define STEM_LAZY (128L*3000L)
2509 :
2510 0 : #define STEM_CALLBACK_CONTEXT_TYPE fd_pohh_tile_t
2511 0 : #define STEM_CALLBACK_CONTEXT_ALIGN alignof(fd_pohh_tile_t)
2512 :
2513 0 : #define STEM_CALLBACK_DURING_HOUSEKEEPING during_housekeeping
2514 0 : #define STEM_CALLBACK_METRICS_WRITE metrics_write
2515 0 : #define STEM_CALLBACK_AFTER_CREDIT after_credit
2516 0 : #define STEM_CALLBACK_BEFORE_FRAG before_frag
2517 0 : #define STEM_CALLBACK_DURING_FRAG during_frag
2518 0 : #define STEM_CALLBACK_AFTER_FRAG after_frag
2519 :
2520 : #include "../../disco/stem/fd_stem.c"
2521 :
2522 : fd_topo_run_tile_t fd_tile_pohh = {
2523 : .name = "pohh",
2524 : .populate_allowed_seccomp = NULL,
2525 : .populate_allowed_fds = NULL,
2526 : .scratch_align = scratch_align,
2527 : .scratch_footprint = scratch_footprint,
2528 : .privileged_init = privileged_init,
2529 : .unprivileged_init = unprivileged_init,
2530 : .run = stem_run,
2531 : };
|