Line data Source code
1 : #include "../../../../disco/tiles.h"
2 :
3 : #include "../../../../disco/plugin/fd_plugin.h"
4 :
5 : /* Let's say there was a computer, the "leader" computer, that acted as
6 : a bank. Users could send it messages saying they wanted to deposit
7 : money, or transfer it to someone else.
8 :
9 : That's how, for example, Bank of America works but there are problems
10 : with it. One simple problem is: the bank can set your balance to
11 : zero if they don't like you.
12 :
13 : You could try to fix this by having the bank periodically publish the
14 : list of all account balances and transactions. If the customers add
15 : unforgeable signatures to their deposit slips and transfers, then
16 : the bank cannot zero a balance without it being obvious to everyone.
17 :
18 : There's still problems. The bank can't lie about your balance now or
19 : take your money, but it can just not accept deposits on your behalf
20 : by ignoring you.
21 :
22 : You could fix this by getting a few independent banks together, lets
23 : say Bank of America, Bank of England, and Westpac, and having them
24 : rotate who operates the leader computer periodically. If one bank
25 : ignores your deposits, you can just wait and send them to the next
26 : one.
27 :
28 : This is Solana.
29 :
30 : There's still problems of course but they are largely technical. How
31 : do the banks agree who is leader? How do you recover if a leader
32 : misbehaves? How do customers verify the transactions aren't forged?
33 : How do banks receive and publish and verify each others work quickly?
34 : These are the main technical innovations that enable Solana to work
35 : well.
36 :
37 : What about Proof of History?
38 :
39 : One particular niche problem is about the leader schedule. When the
40 : leader computer is moving from one bank to another, the new bank must
41 : wait for the old bank to say it's done and provide a final list of
42 : balances that it can start working off of. But: what if the computer
43 : at the old bank crashes and never says its done?
44 :
45 : Does the new leader just take over at some point? What if the new
46 : leader is malicious, and says the past thousand leaders crashed, and
47 : there have been no transactions for days? How do you check?
48 :
49 : This is what Proof of History solves. Each bank in the network must
50 : constantly do a lot of busywork (compute hashes), even when it is not
51 : leader.
52 :
53 : If the prior thousand leaders crashed, and no transactions happened
54 : in an hour, the new leader would have to show they did about an hour
55 : of busywork for everyone else to believe them.
56 :
57 : A better name for this is proof of skipping. If a leader is skipping
58 : slots (building off of a slot that is not the direct parent), it must
59 : prove that it waited a good amount of time to do so.
60 :
61 : It's not a perfect solution. For one thing, some banks have really
62 : fast computers and can compute a lot of busywork in a short amount of
63 : time, allowing them to skip prior slot(s) anyway. But: there is a
64 : social component that prevents validators from skipping the prior
65 : leader slot. It is easy to detect when this happens and the network
66 : could respond by ignoring their votes or stake.
67 :
68 : You could come up with other schemes: for example, the network could
69 : just use wall clock time. If a new leader publishes a block without
70 : waiting 400 milliseconds for the prior slot to complete, then there
71 : is no "proof of skipping" and the nodes ignore the slot.
72 :
73 : These schemes have a problem in that they are not deterministic
74 : across the network (different computers have different clocks), and
75 : so they will cause frequent forks which are very expensive to
76 : resolve. Even though the proof of history scheme is not perfect,
77 : it is better than any alternative which is not deterministic.
78 :
79 : With all that background, we can now describe at a high level what
80 : this PoH tile actually does,
81 :
82 : (1) Whenever any other leader in the network finishes a slot, and
83 : the slot is determined to be the best one to build off of, this
84 : tile gets "reset" onto that block, the so called "reset slot".
85 :
86 : (2) The tile is constantly doing busy work, hash(hash(hash(...))) on
87 : top of the last reset slot, even when it is not leader.
88 :
89 : (3) When the tile becomes leader, it continues hashing from where it
90 : was. Typically, the prior leader finishes their slot, so the
91 : reset slot will be the parent one, and this tile only publishes
92 : hashes for its own slot. But if prior slots were skipped, then
93 : there might be a whole chain already waiting.
94 :
95 : That's pretty much it. When we are leader, in addition to doing
96 : busywork, we publish ticks and microblocks to the shred tile. A
97 : microblock is a non-empty group of transactions whose hashes are
98 : mixed-in to the chain, while a tick is a periodic stamp of the
99 : current hash, with no transactions (nothing mixed in). We need
100 : to send both to the shred tile, as ticks are important for other
101 : validators to verify in parallel.
102 :
103 : As well, the tile should never become leader for a slot that it has
104 : published anything for, otherwise it may create a duplicate block.
105 :
106 : Some particularly common misunderstandings:
107 :
108 : - PoH is critical to security.
109 :
110 : This largely isn't true. The target hash rate of the network is
111 : so slow (1 hash per 500 nanoseconds) that a malicious leader can
112 : easily catch up if they start from an old hash, and the only
113 : practical attack prevented is the proof of skipping. Most of the
114 : long range attacks in the Solana whitepaper are not relevant.
115 :
116 : - PoH keeps passage of time.
117 :
118 : This is also not true. The way the network keeps time so it can
119 : decide who is leader is that, each leader uses their operating
120 : system clock to time 400 milliseconds and publishes their block
121 : when this timer expires.
122 :
123 : If a leader just hashed as fast as they could, they could publish
124 : a block in tens of milliseconds, and the rest of the network
125 : would happily accept it. This is why the Solana "clock" as
126 : determined by PoH is not accurate and drifts over time.
127 :
128 : - PoH prevents transaction reordering by the leader.
129 :
130 : The leader can, in theory, wait until the very end of their
131 : leader slot to publish anything at all to the network. They can,
132 : in particular, hold all received transactions for 400
133 : milliseconds and then reorder and publish some right at the end
134 : to advantage certain transactions.
135 :
136 : You might be wondering... if all the PoH chain is helping us do is
137 : prove that slots were skipped correctly, why do we need to "mix in"
138 : transactions to the hash value? Or do anything at all for slots
139 : where we don't skip the prior slot?
140 :
141 : It's a good question, and the answer is that this behavior is not
142 : necessary. An ideal implementation of PoH have no concept of ticks
143 : or mixins, and would not be part of the TPU pipeline at all.
144 : Instead, there would be a simple field "skip_proof" on the last
145 : shred we send for a slot, the hash(hash(...)) value. This field
146 : would only be filled in (and only verified by replayers) in cases
147 : where the slot actually skipped a parent.
148 :
149 : Then what is the "clock? In Solana, time is constructed as follows:
150 :
151 : HASHES
152 :
153 : The base unit of time is a hash. Hereafter, any values whose
154 : units are in hashes are called a "hashcnt" to distinguish them
155 : from actual hashed values.
156 :
157 : Agave generally defines a constant duration for each tick
158 : (see below) and then varies the number of hashcnt per tick, but
159 : as we consider the hashcnt the base unit of time, Firedancer and
160 : this PoH implementation defines everything in terms of hashcnt
161 : duration instead.
162 :
163 : In mainnet-beta, testnet, and devnet the hashcnt ticks over
164 : (increments) every 100 nanoseconds. The hashcnt rate is
165 : specified as 500 nanoseconds according to the genesis, but there
166 : are several features which increase the number of hashes per
167 : tick while keeping tick duration constant, which make the time
168 : per hashcnt lower. These features up to and including the
169 : `update_hashes_per_tick6` feature are activated on mainnet-beta,
170 : devnet, and testnet, and are described in the TICKS section
171 : below.
172 :
173 : Other chains and development environments might have a different
174 : hashcnt rate in the genesis, or they might not have activated
175 : the features which increase the rate yet, which we also support.
176 :
177 : In practice, although each validator follows a hashcnt rate of
178 : 100 nanoseconds, the overall observed hashcnt rate of the
179 : network is a little slower than once every 100 nanoseconds,
180 : mostly because there are gaps and clock synchronization issues
181 : during handoff between leaders. This is referred to as clock
182 : drift.
183 :
184 : TICKS
185 :
186 : The leader needs to periodically checkpoint the hash value
187 : associated with a given hashcnt so that they can publish it to
188 : other nodes for verification.
189 :
190 : On mainnet-beta, testnet, and devnet this occurs once every
191 : 62,500 hashcnts, or approximately once every 6.4 microseconds.
192 : This value is determined at genesis time, and according to the
193 : features below, and could be different in development
194 : environments or on other chains which we support.
195 :
196 : Due to protocol limitations, when mixing in transactions to the
197 : proof-of-history chain, it cannot occur on a tick boundary (but
198 : can occur at any other hashcnt).
199 :
200 : Ticks exist mainly so that verification can happen in parallel.
201 : A verifier computer, rather than needing to do hash(hash(...))
202 : all in sequence to verify a proof-of-history chain, can do,
203 :
204 : Core 0: hash(hash(...))
205 : Core 1: hash(hash(...))
206 : Core 2: hash(hash(...))
207 : Core 3: hash(hash(...))
208 : ...
209 :
210 : Between each pair of tick boundaries.
211 :
212 : Solana sometimes calls the current tick the "tick height",
213 : although it makes more sense to think of it as a counter from
214 : zero, it's just the number of ticks since the genesis hash.
215 :
216 : There is a set of features which increase the number of hashcnts
217 : per tick. These are all deployed on mainnet-beta, devnet, and
218 : testnet.
219 :
220 : name: update_hashes_per_tick
221 : id: 3uFHb9oKdGfgZGJK9EHaAXN4USvnQtAFC13Fh5gGFS5B
222 : hashes per tick: 12,500
223 : hashcnt duration: 500 nanos
224 :
225 : name: update_hashes_per_tick2
226 : id: EWme9uFqfy1ikK1jhJs8fM5hxWnK336QJpbscNtizkTU
227 : hashes per tick: 17,500
228 : hashcnt duration: 357.142857143 nanos
229 :
230 : name: update_hashes_per_tick3
231 : id: 8C8MCtsab5SsfammbzvYz65HHauuUYdbY2DZ4sznH6h5
232 : hashes per tick: 27,500
233 : hashcnt duration: 227.272727273 nanos
234 :
235 : name: update_hashes_per_tick4
236 : id: 8We4E7DPwF2WfAN8tRTtWQNhi98B99Qpuj7JoZ3Aikgg
237 : hashes per tick: 47,500
238 : hashcnt duration: 131.578947368 nanos
239 :
240 : name: update_hashes_per_tick5
241 : id: BsKLKAn1WM4HVhPRDsjosmqSg2J8Tq5xP2s2daDS6Ni4
242 : hashes per tick: 57,500
243 : hashcnt duration: 108.695652174 nanos
244 :
245 : name: update_hashes_per_tick6
246 : id: FKu1qYwLQSiehz644H6Si65U5ZQ2cp9GxsyFUfYcuADv
247 : hashes per tick: 62,500
248 : hashcnt duration: 100 nanos
249 :
250 : In development environments, there is a way to configure the
251 : hashcnt per tick to be "none" during genesis, for a so-called
252 : "low power" tick producer. The idea is not to spin cores during
253 : development. This is equivalent to setting the hashcnt per tick
254 : to be 1, and increasing the hashcnt duration to the desired tick
255 : duration.
256 :
257 : SLOTS
258 :
259 : Each leader needs to be leader for a fixed amount of time, which
260 : is called a slot. During a slot, a leader has an opportunity to
261 : receive transactions and produce a block for the network,
262 : although they may miss ("skip") the slot if they are offline or
263 : not behaving.
264 :
265 : In mainnet-beta, testnet, and devnet a slot is 64 ticks, or
266 : 4,000,000 hashcnts, or approximately 400 milliseconds.
267 :
268 : Due to the way the leader schedule is constructed, each leader
269 : is always given at least four (4) consecutive slots in the
270 : schedule. This means when becoming leader you will be leader
271 : for at least 4 slots, or 1.6 seconds.
272 :
273 : It is rare, although can happen that a leader gets more than 4
274 : consecutive slots (eg, 8, or 12), if they are lucky with the
275 : leader schedule generation.
276 :
277 : The number of ticks in a slot is fixed at genesis time, and
278 : could be different for development or other chains, which we
279 : support. There is nothing special about 4 leader slots in a
280 : row, and this might be changed in future, and the proof of
281 : history makes no assumptions that this is the case.
282 :
283 : EPOCHS
284 :
285 : Infrequently, the network needs to do certain housekeeping,
286 : mainly things like collecting rent and deciding on the leader
287 : schedule. The length of an epoch is fixed on mainnet-beta,
288 : devnet and testnet at 420,000 slots, or around ~2 (1.94) days.
289 : This value is fixed at genesis time, and could be different for
290 : other chains including development, which we support. Typically
291 : in development, epochs are every 8,192 slots, or around ~1 hour
292 : (54.61 minutes), although it depends on the number of ticks per
293 : slot and the target hashcnt rate of the genesis as well.
294 :
295 : In development, epochs need not be a fixed length either. There
296 : is a "warmup" option, where epochs start short and grow, which
297 : is useful for quickly warming up stake during development.
298 :
299 : The epoch is important because it is the only time the leader
300 : schedule is updated. The leader schedule is a list of which
301 : leader is leader for which slot, and is generated by a special
302 : algorithm that is deterministic and known to all nodes.
303 :
304 : The leader schedule is computed one epoch in advance, so that
305 : at slot T, we always know who will be leader up until the end
306 : of slot T+EPOCH_LENGTH. Specifically, the leader schedule for
307 : epoch N is computed during the epoch boundary crossing from
308 : N-2 to N-1. For mainnet-beta, the slots per epoch is fixed and
309 : will always be 420,000. */
310 :
311 : #include "../../../../ballet/pack/fd_pack.h"
312 : #include "../../../../ballet/sha256/fd_sha256.h"
313 : #include "../../../../disco/metrics/fd_metrics.h"
314 : #include "../../../../disco/topo/fd_pod_format.h"
315 : #include "../../../../disco/shred/fd_shredder.h"
316 : #include "../../../../disco/shred/fd_stake_ci.h"
317 : #include "../../../../disco/bank/fd_bank_abi.h"
318 : #include "../../../../disco/keyguard/fd_keyload.h"
319 : #include "../../../../disco/metrics/generated/fd_metrics_poh.h"
320 : #include "../../../../flamenco/leaders/fd_leaders.h"
321 :
322 : /* When we are becoming leader, and we think the prior leader might have
323 : skipped their slot, we give them a grace period to finish. In the
324 : Agave client this is called grace ticks. This is a courtesy to
325 : maintain network health, and is not strictly necessary. It is
326 : actually advantageous to us as new leader to take over right away and
327 : give no grace period, since we could generate more fees.
328 :
329 : Here we define the grace period to be two slots, which is taken from
330 : Agave directly. */
331 : #define GRACE_SLOTS (2UL)
332 :
333 : /* The maximum number of microblocks that pack is allowed to pack into a
334 : single slot. This is not consensus critical, and pack could, if we
335 : let it, produce as many microblocks as it wants, and the slot would
336 : still be valid.
337 :
338 : We have this here instead so that PoH can estimate slot completion,
339 : and keep the hashcnt up to date as pack progresses through packing
340 : the slot. If this upper bound was not enforced, PoH could tick to
341 : the last hash of the slot and have no hashes left to mixin incoming
342 : microblocks from pack, so this upper bound is a coordination
343 : mechanism so that PoH can progress hashcnts while the slot is active,
344 : and know that pack will not need those hashcnts later to do mixins. */
345 0 : #define MAX_MICROBLOCKS_PER_SLOT (32768UL)
346 :
347 : /* When we are hashing in the background in case a prior leader skips
348 : their slot, we need to store the result of each tick hash so we can
349 : publish them when we become leader. The network requires at least
350 : one leader slot to publish in each epoch for the leader schedule to
351 : generate, so in the worst case we might need two full epochs of slots
352 : to store the hashes. (Eg, if epoch T only had a published slot in
353 : position 0 and epoch T+1 only had a published slot right at the end).
354 :
355 : There is a tighter bound: the block data limit of mainnet-beta is
356 : currently FD_PACK_MAX_DATA_PER_BLOCK, or 27,332,342 bytes per slot.
357 : At 48 bytes per tick, it is not possible to publish a slot that skips
358 : 569,424 or more prior slots. */
359 0 : #define MAX_SKIPPED_TICKS (1UL+(FD_PACK_MAX_DATA_PER_BLOCK/48UL))
360 :
361 0 : #define IN_KIND_BANK (0)
362 0 : #define IN_KIND_PACK (1)
363 0 : #define IN_KIND_STAKE (2)
364 :
365 :
366 : typedef struct {
367 : fd_wksp_t * mem;
368 : ulong chunk0;
369 : ulong wmark;
370 : } fd_poh_in_ctx_t;
371 :
372 : typedef struct {
373 : ulong idx;
374 : fd_wksp_t * mem;
375 : ulong chunk0;
376 : ulong wmark;
377 : ulong chunk;
378 : } fd_poh_out_ctx_t;
379 :
380 : typedef struct {
381 : fd_stem_context_t * stem;
382 :
383 : /* Static configuration determined at genesis creation time. See
384 : long comment above for more information. */
385 : ulong tick_duration_ns;
386 : ulong hashcnt_per_tick;
387 : ulong ticks_per_slot;
388 :
389 : /* Derived from the above configuration, but we precompute it. */
390 : double slot_duration_ns;
391 : double hashcnt_duration_ns;
392 : ulong hashcnt_per_slot;
393 : /* Constant, fixed at initialization. The maximum number of
394 : microblocks that the pack tile can publish in each slot. */
395 : ulong max_microblocks_per_slot;
396 :
397 : /* The current slot and hashcnt within that slot of the proof of
398 : history, including hashes we have been producing in the background
399 : while waiting for our next leader slot. */
400 : ulong slot;
401 : ulong hashcnt;
402 : ulong cus_used;
403 :
404 : /* When we send a microblock on to the shred tile, we need to tell
405 : it how many hashes there have been since the last microblock, so
406 : this tracks the hashcnt of the last published microblock.
407 :
408 : If we are skipping slots prior to our leader slot, the last_slot
409 : will be quite old, and potentially much larger than the number of
410 : hashcnts in one slot. */
411 : ulong last_slot;
412 : ulong last_hashcnt;
413 :
414 : /* If we have published a tick or a microblock for a particular slot
415 : to the shred tile, we should never become leader for that slot
416 : again, otherwise we could publish a duplicate block.
417 :
418 : This value tracks the max slot that we have published a tick or
419 : microblock for so we can prevent this. */
420 : ulong highwater_leader_slot;
421 :
422 : /* See how this field is used below. If we have sequential leader
423 : slots, we don't reset the expected slot end time between the two,
424 : to prevent clock drift. If we didn't do this, our 2nd slot would
425 : end 400ms + `time_for_replay_to_move_slot_and_reset_poh` after
426 : our 1st, rather than just strictly 400ms. */
427 : ulong expect_sequential_leader_slot;
428 :
429 : /* There's a race condition ... let's say two banks A and B, bank A
430 : processes some transactions, then releases the account locks, and
431 : sends the microblock to PoH to be stamped. Pack now re-packs the
432 : same accounts with a new microblock, sends to bank B, bank B
433 : executes and sends the microblock to PoH, and this all happens fast
434 : enough that PoH picks the 2nd block to stamp before the 1st. The
435 : accounts database changes now are misordered with respect to PoH so
436 : replay could fail.
437 :
438 : To prevent this race, we order all microblocks and only process
439 : them in PoH in the order they are produced by pack. This is a
440 : little bit over-strict, we just need to ensure that microblocks
441 : with conflicting accounts execute in order, but this is easiest to
442 : implement for now. */
443 : ulong expect_microblock_idx;
444 :
445 : /* The PoH tile must never drop microblocks that get committed by the
446 : bank, so it needs to always be able to mixin a microblock hash.
447 : Mixing in requires incrementing the hashcnt, so we need to ensure
448 : at all times that there is enough hascnts left in the slot to
449 : mixin whatever future microblocks pack might produce for it.
450 :
451 : This value tracks that. At any time, max_microblocks_per_slot
452 : - microblocks_lower_bound is an upper bound on the maximum number
453 : of microblocks that might still be received in this slot. */
454 : ulong microblocks_lower_bound;
455 :
456 : uchar __attribute__((aligned(32UL))) hash[ 32 ];
457 :
458 : /* When we are not leader, we need to save the hashes that were
459 : produced in case the prior leader skips. If they skip, we will
460 : replay these skipped hashes into our next leader bank so that
461 : the slot hashes sysvar can be updated correctly, and also publish
462 : them to peer nodes as part of our outgoing shreds. */
463 : uchar skipped_tick_hashes[ MAX_SKIPPED_TICKS ][ 32 ];
464 :
465 : /* The timestamp in nanoseconds of when the reset slot was received.
466 : This is the timestamp we are building on top of to determine when
467 : our next leader slot starts. */
468 : long reset_slot_start_ns;
469 :
470 : /* The timestamp in nanoseconds of when we got the bank for the
471 : current leader slot. */
472 : long leader_bank_start_ns;
473 :
474 : /* The hashcnt corresponding to the start of the current reset slot. */
475 : ulong reset_slot;
476 :
477 : /* The hashcnt at which our next leader slot begins, or ULONG max if
478 : we have no known next leader slot. */
479 : ulong next_leader_slot;
480 :
481 : /* If an in progress frag should be skipped */
482 : int skip_frag;
483 :
484 : /* If we currently are the leader according the clock AND we have
485 : received the leader bank for the slot from the replay stage,
486 : this value will be non-NULL.
487 :
488 : Note that we might be inside our leader slot, but not have a bank
489 : yet, in which case this will still be NULL.
490 :
491 : It will be NULL for a brief race period between consecutive leader
492 : slots, as we ping-pong back to replay stage waiting for a new bank.
493 :
494 : Agave refers to this as the "working bank". */
495 : void const * current_leader_bank;
496 :
497 : fd_sha256_t * sha256;
498 :
499 : fd_stake_ci_t * stake_ci;
500 :
501 : fd_pubkey_t identity_key;
502 :
503 : /* The Agave client needs to be notified when the leader changes,
504 : so that they can resume the replay stage if it was suspended waiting. */
505 : void * signal_leader_change;
506 :
507 : /* These are temporarily set in during_frag so they can be used in
508 : after_frag once the frag has been validated as not overrun. */
509 : uchar _txns[ USHORT_MAX ];
510 : fd_microblock_trailer_t _microblock_trailer[ 1 ];
511 :
512 : int in_kind[ 64 ];
513 : fd_poh_in_ctx_t in[ 64 ];
514 :
515 : fd_poh_out_ctx_t shred_out[ 1 ];
516 : fd_poh_out_ctx_t pack_out[ 1 ];
517 : fd_poh_out_ctx_t plugin_out[ 1 ];
518 :
519 : fd_histf_t begin_leader_delay[ 1 ];
520 : fd_histf_t first_microblock_delay[ 1 ];
521 : fd_histf_t slot_done_delay[ 1 ];
522 : } fd_poh_ctx_t;
523 :
524 : /* The PoH recorder is implemented in Firedancer but for now needs to
525 : work with Agave, so we have a locking scheme for them to
526 : co-operate.
527 :
528 : This is because the PoH tile lives in the Agave memory address
529 : space and their version of concurrency is locking the PoH recorder
530 : and reading arbitrary fields.
531 :
532 : So we allow them to lock the PoH tile, although with a very bad (for
533 : them) locking scheme. By default, the tile has full and exclusive
534 : access to the data. If part of Agave wishes to read/write they
535 : can either,
536 :
537 : 1. Rewrite their concurrency to message passing based on mcache
538 : (preferred, but not feasible).
539 : 2. Signal to the tile they wish to acquire the lock, by setting
540 : fd_poh_waiting_lock to 1.
541 :
542 : During before credit, the tile will check if there is the waiting
543 : lock is set to 1, and if so, set the returned lock to 1, indicating
544 : to the waiter that they may now proceed.
545 :
546 : When the waiter is done reading and writing, they restore the
547 : returned lock value back to zero, and the POH tile continues with its
548 : day. */
549 :
550 : static fd_poh_ctx_t * fd_poh_global_ctx;
551 :
552 : static volatile ulong fd_poh_waiting_lock __attribute__((aligned(128UL)));
553 : static volatile ulong fd_poh_returned_lock __attribute__((aligned(128UL)));
554 :
555 : /* Agave also needs to write to some mcaches, so we trampoline
556 : that via. the PoH tile as well. */
557 :
558 : struct poh_link {
559 : fd_frag_meta_t * mcache;
560 : ulong depth;
561 : ulong tx_seq;
562 :
563 : void * mem;
564 : void * dcache;
565 : ulong chunk0;
566 : ulong wmark;
567 : ulong chunk;
568 :
569 : ulong cr_avail;
570 : ulong rx_cnt;
571 : ulong * rx_fseqs[ 32UL ];
572 : };
573 :
574 : typedef struct poh_link poh_link_t;
575 :
576 : poh_link_t gossip_dedup;
577 : poh_link_t stake_out;
578 : poh_link_t crds_shred;
579 : poh_link_t replay_resolv;
580 :
581 : poh_link_t replay_plugin;
582 : poh_link_t gossip_plugin;
583 : poh_link_t start_progress_plugin;
584 : poh_link_t vote_listener_plugin;
585 :
586 : static void
587 0 : poh_link_wait_credit( poh_link_t * link ) {
588 0 : if( FD_LIKELY( link->cr_avail ) ) return;
589 :
590 0 : while( 1 ) {
591 0 : ulong cr_query = ULONG_MAX;
592 0 : for( ulong i=0UL; i<link->rx_cnt; i++ ) {
593 0 : ulong const * _rx_seq = link->rx_fseqs[ i ];
594 0 : ulong rx_seq = FD_VOLATILE_CONST( *_rx_seq );
595 0 : ulong rx_cr_query = (ulong)fd_long_max( (long)link->depth - fd_long_max( fd_seq_diff( link->tx_seq, rx_seq ), 0L ), 0L );
596 0 : cr_query = fd_ulong_min( rx_cr_query, cr_query );
597 0 : }
598 0 : if( FD_LIKELY( cr_query>0UL ) ) {
599 0 : link->cr_avail = cr_query;
600 0 : break;
601 0 : }
602 0 : FD_SPIN_PAUSE();
603 0 : }
604 0 : }
605 :
606 : static void
607 : poh_link_publish( poh_link_t * link,
608 : ulong sig,
609 : uchar const * data,
610 0 : ulong data_sz ) {
611 0 : while( FD_UNLIKELY( !FD_VOLATILE_CONST( link->mcache ) ) ) FD_SPIN_PAUSE();
612 0 : if( FD_UNLIKELY( !link->mem ) ) return; /* link not enabled, don't publish */
613 0 : poh_link_wait_credit( link );
614 :
615 0 : uchar * dst = (uchar *)fd_chunk_to_laddr( link->mem, link->chunk );
616 0 : fd_memcpy( dst, data, data_sz );
617 0 : ulong tspub = (ulong)fd_frag_meta_ts_comp( fd_tickcount() );
618 0 : fd_mcache_publish( link->mcache, link->depth, link->tx_seq, sig, link->chunk, data_sz, 0UL, 0UL, tspub );
619 0 : link->chunk = fd_dcache_compact_next( link->chunk, data_sz, link->chunk0, link->wmark );
620 0 : link->cr_avail--;
621 0 : link->tx_seq++;
622 0 : }
623 :
624 : static void
625 : poh_link_init( poh_link_t * link,
626 : fd_topo_t * topo,
627 : fd_topo_tile_t * tile,
628 0 : ulong out_idx ) {
629 0 : fd_topo_link_t * topo_link = &topo->links[ tile->out_link_id[ out_idx ] ];
630 0 : fd_topo_wksp_t * wksp = &topo->workspaces[ topo->objs[ topo_link->dcache_obj_id ].wksp_id ];
631 :
632 0 : link->mem = wksp->wksp;
633 0 : link->depth = fd_mcache_depth( topo_link->mcache );
634 0 : link->tx_seq = 0UL;
635 0 : link->dcache = topo_link->dcache;
636 0 : link->chunk0 = fd_dcache_compact_chunk0( wksp->wksp, topo_link->dcache );
637 0 : link->wmark = fd_dcache_compact_wmark ( wksp->wksp, topo_link->dcache, topo_link->mtu );
638 0 : link->chunk = link->chunk0;
639 0 : link->cr_avail = 0UL;
640 0 : link->rx_cnt = 0UL;
641 0 : for( ulong i=0UL; i<topo->tile_cnt; i++ ) {
642 0 : fd_topo_tile_t * _tile = &topo->tiles[ i ];
643 0 : for( ulong j=0UL; j<_tile->in_cnt; j++ ) {
644 0 : if( _tile->in_link_id[ j ]==topo_link->id && _tile->in_link_reliable[ j ] ) {
645 0 : FD_TEST( link->rx_cnt<32UL );
646 0 : link->rx_fseqs[ link->rx_cnt++ ] = _tile->in_link_fseq[ j ];
647 0 : break;
648 0 : }
649 0 : }
650 0 : }
651 0 : FD_COMPILER_MFENCE();
652 0 : link->mcache = topo_link->mcache;
653 0 : FD_COMPILER_MFENCE();
654 0 : FD_TEST( link->mcache );
655 0 : }
656 :
657 : /* To help show correctness, functions that might be called from
658 : Rust, either directly or indirectly, have this fake "attribute"
659 : CALLED_FROM_RUST, which is actually nothing. Calls from Rust
660 : typically execute on threads did not call fd_boot, so they do not
661 : have the typical FD_TL variables. In particular, they cannot use
662 : normal metrics, and their log messages don't have full context.
663 : Additionally, Rust functions marked CALLED_FROM_RUST cannot call back
664 : into a C fd_ext function without causing a deadlock (although the
665 : other Rust fd_ext functions have a similar problem).
666 :
667 : To prevent annotation from polluting the whole codebase, calls to
668 : functions outside this file are manually checked and marked as being
669 : safe at each call rather than annotated. */
670 : #define CALLED_FROM_RUST
671 :
672 : static CALLED_FROM_RUST fd_poh_ctx_t *
673 0 : fd_ext_poh_write_lock( void ) {
674 0 : for(;;) {
675 : /* Acquire the waiter lock to make sure we are the first writer in the queue. */
676 0 : if( FD_LIKELY( !FD_ATOMIC_CAS( &fd_poh_waiting_lock, 0UL, 1UL) ) ) break;
677 0 : FD_SPIN_PAUSE();
678 0 : }
679 0 : FD_COMPILER_MFENCE();
680 0 : for(;;) {
681 : /* Now wait for the tile to tell us we can proceed. */
682 0 : if( FD_LIKELY( FD_VOLATILE_CONST( fd_poh_returned_lock ) ) ) break;
683 0 : FD_SPIN_PAUSE();
684 0 : }
685 0 : FD_COMPILER_MFENCE();
686 0 : return fd_poh_global_ctx;
687 0 : }
688 :
689 : static CALLED_FROM_RUST void
690 0 : fd_ext_poh_write_unlock( void ) {
691 0 : FD_COMPILER_MFENCE();
692 0 : FD_VOLATILE( fd_poh_returned_lock ) = 0UL;
693 0 : }
694 :
695 : /* The PoH tile needs to interact with the Agave address space to
696 : do certain operations that Firedancer hasn't reimplemented yet, a.k.a
697 : transaction execution. We have Agave export some wrapper
698 : functions that we call into during regular tile execution. These do
699 : not need any locking, since they are called serially from the single
700 : PoH tile. */
701 :
702 : extern CALLED_FROM_RUST void fd_ext_bank_acquire( void const * bank );
703 : extern CALLED_FROM_RUST void fd_ext_bank_release( void const * bank );
704 : extern CALLED_FROM_RUST void fd_ext_poh_signal_leader_change( void * sender );
705 : extern void fd_ext_poh_register_tick( void const * bank, uchar const * hash );
706 :
707 : /* fd_ext_poh_initialize is called by Agave on startup to
708 : initialize the PoH tile with some static configuration, and the
709 : initial reset slot and hash which it retrieves from a snapshot.
710 :
711 : This function is called by some random Agave thread, but
712 : it blocks booting of the PoH tile. The tile will spin until it
713 : determines that this initialization has happened.
714 :
715 : signal_leader_change is an opaque Rust object that is used to
716 : tell the replay stage that the leader has changed. It is a
717 : Box::into_raw(Arc::increment_strong(crossbeam::Sender)), so it
718 : has infinite lifetime unless this C code releases the refcnt.
719 :
720 : It can be used with `fd_ext_poh_signal_leader_change` which
721 : will just issue a nonblocking send on the channel. */
722 :
723 : CALLED_FROM_RUST void
724 : fd_ext_poh_initialize( ulong tick_duration_ns, /* See clock comments above, will be 6.4 microseconds for mainnet-beta. */
725 : ulong hashcnt_per_tick, /* See clock comments above, will be 62,500 for mainnet-beta. */
726 : ulong ticks_per_slot, /* See clock comments above, will almost always be 64. */
727 : ulong tick_height, /* The counter (height) of the tick to start hashing on top of. */
728 : uchar const * last_entry_hash, /* Points to start of a 32 byte region of memory, the hash itself at the tick height. */
729 0 : void * signal_leader_change /* See comment above. */ ) {
730 0 : FD_COMPILER_MFENCE();
731 0 : for(;;) {
732 : /* Make sure the ctx is initialized before trying to take the lock. */
733 0 : if( FD_LIKELY( FD_VOLATILE_CONST( fd_poh_global_ctx ) ) ) break;
734 0 : FD_SPIN_PAUSE();
735 0 : }
736 0 : fd_poh_ctx_t * ctx = fd_ext_poh_write_lock();
737 :
738 0 : ctx->slot = tick_height/ticks_per_slot;
739 0 : ctx->hashcnt = 0UL;
740 0 : ctx->cus_used = 0UL;
741 0 : ctx->last_slot = ctx->slot;
742 0 : ctx->last_hashcnt = 0UL;
743 0 : ctx->reset_slot = ctx->slot;
744 0 : ctx->reset_slot_start_ns = fd_log_wallclock(); /* safe to call from Rust */
745 :
746 0 : memcpy( ctx->hash, last_entry_hash, 32UL );
747 :
748 0 : ctx->signal_leader_change = signal_leader_change;
749 :
750 : /* Static configuration about the clock. */
751 0 : ctx->tick_duration_ns = tick_duration_ns;
752 0 : ctx->hashcnt_per_tick = hashcnt_per_tick;
753 0 : ctx->ticks_per_slot = ticks_per_slot;
754 :
755 : /* Recompute derived information about the clock. */
756 0 : ctx->slot_duration_ns = (double)ticks_per_slot*(double)tick_duration_ns;
757 0 : ctx->hashcnt_duration_ns = (double)tick_duration_ns/(double)hashcnt_per_tick;
758 0 : ctx->hashcnt_per_slot = ticks_per_slot*hashcnt_per_tick;
759 :
760 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick==1UL ) ) {
761 : /* Low power producer, maximum of one microblock per tick in the slot */
762 0 : ctx->max_microblocks_per_slot = ctx->ticks_per_slot;
763 0 : } else {
764 : /* See the long comment in after_credit for this limit */
765 0 : ctx->max_microblocks_per_slot = fd_ulong_min( MAX_MICROBLOCKS_PER_SLOT, ctx->ticks_per_slot*(ctx->hashcnt_per_tick-1UL) );
766 0 : }
767 :
768 0 : fd_ext_poh_write_unlock();
769 0 : }
770 :
771 : /* fd_ext_poh_acquire_bank gets the current leader bank if there is one
772 : currently active. PoH might think we are leader without having a
773 : leader bank if the replay stage has not yet noticed we are leader.
774 :
775 : The bank that is returned is owned the caller, and must be converted
776 : to an Arc<Bank> by calling Arc::from_raw() on it. PoH increments the
777 : reference count before returning the bank, so that it can also keep
778 : its internal copy.
779 :
780 : If there is no leader bank, NULL is returned. In this case, the
781 : caller should not call `Arc::from_raw()`. */
782 :
783 : CALLED_FROM_RUST void const *
784 0 : fd_ext_poh_acquire_leader_bank( void ) {
785 0 : fd_poh_ctx_t * ctx = fd_ext_poh_write_lock();
786 0 : void const * bank = NULL;
787 0 : if( FD_LIKELY( ctx->current_leader_bank ) ) {
788 : /* Clone refcount before we release the lock. */
789 0 : fd_ext_bank_acquire( ctx->current_leader_bank );
790 0 : bank = ctx->current_leader_bank;
791 0 : }
792 0 : fd_ext_poh_write_unlock();
793 0 : return bank;
794 0 : }
795 :
796 : /* fd_ext_poh_reset_slot returns the slot height one above the last good
797 : (unskipped) slot we are building on top of. This is always a good
798 : known value, and will not be ULONG_MAX. */
799 :
800 : CALLED_FROM_RUST ulong
801 0 : fd_ext_poh_reset_slot( void ) {
802 0 : fd_poh_ctx_t * ctx = fd_ext_poh_write_lock();
803 0 : ulong reset_slot = ctx->reset_slot;
804 0 : fd_ext_poh_write_unlock();
805 0 : return reset_slot;
806 0 : }
807 :
808 : /* fd_ext_poh_reached_leader_slot returns 1 if we have reached a slot
809 : where we are leader. This is used by the replay stage to determine
810 : if it should create a new leader bank descendant of the prior reset
811 : slot block.
812 :
813 : Sometimes, even when we reach our slot we do not return 1, as we are
814 : giving a grace period to the prior leader to finish publishing their
815 : block.
816 :
817 : out_leader_slot is the slot height of the leader slot we reached, and
818 : reset_slot is the slot height of the last good (unskipped) slot we
819 : are building on top of. */
820 :
821 : CALLED_FROM_RUST int
822 : fd_ext_poh_reached_leader_slot( ulong * out_leader_slot,
823 0 : ulong * out_reset_slot ) {
824 0 : fd_poh_ctx_t * ctx = fd_ext_poh_write_lock();
825 :
826 0 : *out_leader_slot = ctx->next_leader_slot;
827 0 : *out_reset_slot = ctx->reset_slot;
828 :
829 0 : if( FD_UNLIKELY( ctx->next_leader_slot==ULONG_MAX ||
830 0 : ctx->slot<ctx->next_leader_slot ) ) {
831 : /* Didn't reach our leader slot yet. */
832 0 : fd_ext_poh_write_unlock();
833 0 : return 0;
834 0 : }
835 :
836 0 : if( FD_LIKELY( ctx->reset_slot==ctx->next_leader_slot ) ) {
837 : /* We were reset onto our leader slot, because the prior leader
838 : completed theirs, so we should start immediately, no need for a
839 : grace period. */
840 0 : fd_ext_poh_write_unlock();
841 0 : return 1;
842 0 : }
843 :
844 0 : if( FD_LIKELY( ctx->next_leader_slot>=1UL ) ) {
845 0 : fd_epoch_leaders_t * leaders = fd_stake_ci_get_lsched_for_slot( ctx->stake_ci, ctx->next_leader_slot-1UL ); /* Safe to call from Rust */
846 0 : if( FD_LIKELY( leaders ) ) {
847 0 : fd_pubkey_t const * leader = fd_epoch_leaders_get( leaders, ctx->next_leader_slot-1UL ); /* Safe to call from Rust */
848 0 : if( FD_LIKELY( leader ) ) {
849 0 : if( FD_UNLIKELY( !memcmp( leader->uc, ctx->identity_key.uc, 32UL ) ) ) {
850 : /* We were the leader in the previous slot, so also no need for
851 : a grace period. We wouldn't get here if we were still
852 : processing the prior slot so begin new one immediately. */
853 0 : fd_ext_poh_write_unlock();
854 0 : return 1;
855 0 : }
856 0 : }
857 0 : }
858 0 : }
859 :
860 0 : if( FD_UNLIKELY( ctx->next_leader_slot-ctx->reset_slot>=4UL ) ) {
861 : /* The prior leader has not completed any slot successfully during
862 : their 4 leader slots, so they are probably inactive and no need
863 : to give a grace period. */
864 0 : fd_ext_poh_write_unlock();
865 0 : return 1;
866 0 : }
867 :
868 0 : if( FD_LIKELY( ctx->slot-ctx->next_leader_slot<GRACE_SLOTS ) ) {
869 : /* The prior leader hasn't finished their last slot, and they are
870 : likely still publishing, and within their grace period of two
871 : slots so we will keep waiting. */
872 0 : fd_ext_poh_write_unlock();
873 0 : return 0;
874 0 : }
875 :
876 0 : fd_ext_poh_write_unlock();
877 0 : return 1;
878 0 : }
879 :
880 : CALLED_FROM_RUST static inline void
881 : publish_plugin_slot_start( fd_poh_ctx_t * ctx,
882 : ulong slot,
883 0 : ulong parent_slot ) {
884 0 : if( FD_UNLIKELY( !ctx->plugin_out->mem ) ) return;
885 :
886 0 : fd_plugin_msg_slot_start_t * slot_start = (fd_plugin_msg_slot_start_t *)fd_chunk_to_laddr( ctx->plugin_out->mem, ctx->plugin_out->chunk );
887 0 : *slot_start = (fd_plugin_msg_slot_start_t){ .slot = slot, .parent_slot = parent_slot };
888 0 : fd_stem_publish( ctx->stem, ctx->plugin_out->idx, FD_PLUGIN_MSG_SLOT_START, ctx->plugin_out->chunk, sizeof(fd_plugin_msg_slot_start_t), 0UL, 0UL, 0UL );
889 0 : ctx->plugin_out->chunk = fd_dcache_compact_next( ctx->plugin_out->chunk, sizeof(fd_plugin_msg_slot_start_t), ctx->plugin_out->chunk0, ctx->plugin_out->wmark );
890 0 : }
891 :
892 : CALLED_FROM_RUST static inline void
893 : publish_plugin_slot_end( fd_poh_ctx_t * ctx,
894 : ulong slot,
895 0 : ulong cus_used ) {
896 0 : if( FD_UNLIKELY( !ctx->plugin_out->mem ) ) return;
897 :
898 0 : fd_plugin_msg_slot_end_t * slot_end = (fd_plugin_msg_slot_end_t *)fd_chunk_to_laddr( ctx->plugin_out->mem, ctx->plugin_out->chunk );
899 0 : *slot_end = (fd_plugin_msg_slot_end_t){ .slot = slot, .cus_used = cus_used };
900 0 : fd_stem_publish( ctx->stem, ctx->plugin_out->idx, FD_PLUGIN_MSG_SLOT_END, ctx->plugin_out->chunk, sizeof(fd_plugin_msg_slot_end_t), 0UL, 0UL, 0UL );
901 0 : ctx->plugin_out->chunk = fd_dcache_compact_next( ctx->plugin_out->chunk, sizeof(fd_plugin_msg_slot_end_t), ctx->plugin_out->chunk0, ctx->plugin_out->wmark );
902 0 : }
903 :
904 : CALLED_FROM_RUST static void
905 : publish_became_leader( fd_poh_ctx_t * ctx,
906 0 : ulong slot ) {
907 0 : double tick_per_ns = fd_tempo_tick_per_ns( NULL );
908 0 : fd_histf_sample( ctx->begin_leader_delay, (ulong)((double)(fd_log_wallclock()-ctx->reset_slot_start_ns)/tick_per_ns) );
909 :
910 0 : long slot_start_ns = ctx->reset_slot_start_ns + (long)((double)(slot-ctx->reset_slot)*ctx->slot_duration_ns);
911 :
912 : /* No need to check flow control, there are always credits became when we
913 : are leader, we will not "become" leader again until we are done, so at
914 : most one frag in flight at a time. */
915 :
916 0 : uchar * dst = (uchar *)fd_chunk_to_laddr( ctx->pack_out->mem, ctx->pack_out->chunk );
917 :
918 0 : fd_became_leader_t * leader = (fd_became_leader_t *)dst;
919 0 : leader->slot_start_ns = slot_start_ns;
920 0 : leader->slot_end_ns = (long)((double)slot_start_ns + ctx->slot_duration_ns);
921 0 : leader->bank = ctx->current_leader_bank;
922 0 : leader->max_microblocks_in_slot = ctx->max_microblocks_per_slot;
923 0 : leader->ticks_per_slot = ctx->ticks_per_slot;
924 0 : leader->total_skipped_ticks = ctx->ticks_per_slot*(slot-ctx->reset_slot);
925 :
926 0 : if( FD_UNLIKELY( leader->ticks_per_slot+leader->total_skipped_ticks>=MAX_SKIPPED_TICKS ) )
927 0 : FD_LOG_ERR(( "Too many skipped ticks %lu for slot %lu, chain must halt", leader->ticks_per_slot+leader->total_skipped_ticks, slot ));
928 :
929 0 : ulong sig = fd_disco_poh_sig( slot, POH_PKT_TYPE_BECAME_LEADER, 0UL );
930 0 : fd_stem_publish( ctx->stem, ctx->pack_out->idx, sig, ctx->pack_out->chunk, sizeof(fd_became_leader_t), 0UL, 0UL, 0UL );
931 0 : ctx->pack_out->chunk = fd_dcache_compact_next( ctx->pack_out->chunk, sizeof(fd_became_leader_t), ctx->pack_out->chunk0, ctx->pack_out->wmark );
932 0 : }
933 :
934 : /* The PoH tile knows when it should become leader by waiting for its
935 : leader slot (with the operating system clock). This function is so
936 : that when it becomes the leader, it can be told what the leader bank
937 : is by the replay stage. See the notes in the long comment above for
938 : more on how this works. */
939 :
940 : CALLED_FROM_RUST void
941 : fd_ext_poh_begin_leader( void const * bank,
942 : ulong slot,
943 0 : ulong hashcnt_per_tick ) {
944 0 : fd_poh_ctx_t * ctx = fd_ext_poh_write_lock();
945 :
946 0 : FD_TEST( !ctx->current_leader_bank );
947 :
948 0 : if( FD_UNLIKELY( slot!=ctx->slot ) ) FD_LOG_ERR(( "Trying to begin leader slot %lu but we are now on slot %lu", slot, ctx->slot ));
949 0 : if( FD_UNLIKELY( slot!=ctx->next_leader_slot ) ) FD_LOG_ERR(( "Trying to begin leader slot %lu but next leader slot is %lu", slot, ctx->next_leader_slot ));
950 :
951 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick!=hashcnt_per_tick ) ) {
952 0 : FD_LOG_WARNING(( "hashes per tick changed from %lu to %lu", ctx->hashcnt_per_tick, hashcnt_per_tick ));
953 :
954 : /* Recompute derived information about the clock. */
955 0 : ctx->hashcnt_duration_ns = (double)ctx->tick_duration_ns/(double)hashcnt_per_tick;
956 0 : ctx->hashcnt_per_slot = ctx->ticks_per_slot*hashcnt_per_tick;
957 0 : ctx->hashcnt_per_tick = hashcnt_per_tick;
958 :
959 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick==1UL ) ) {
960 : /* Low power producer, maximum of one microblock per tick in the slot */
961 0 : ctx->max_microblocks_per_slot = ctx->ticks_per_slot;
962 0 : } else {
963 : /* See the long comment in after_credit for this limit */
964 0 : ctx->max_microblocks_per_slot = fd_ulong_min( MAX_MICROBLOCKS_PER_SLOT, ctx->ticks_per_slot*(ctx->hashcnt_per_tick-1UL) );
965 0 : }
966 :
967 : /* Discard any ticks we might have done in the interim. They will
968 : have the wrong number of hashes per tick. We can just catch back
969 : up quickly if not too many slots were skipped and hopefully
970 : publish on time. Note that tick production and verification of
971 : skipped slots is done for the eventual bank that publishes a
972 : slot, for example:
973 :
974 : Reset Slot: 998
975 : Epoch Transition Slot: 1000
976 : Leader Slot: 1002
977 :
978 : In this case, if a feature changing the hashcnt_per_tick is
979 : activated in slot 1000, and we are publishing empty ticks for
980 : slots 998, 999, 1000, and 1001, they should all have the new
981 : hashes_per_tick number of hashes, rather than the older one, or
982 : some combination. */
983 :
984 0 : FD_TEST( ctx->last_slot==ctx->reset_slot );
985 0 : FD_TEST( !ctx->last_hashcnt );
986 0 : ctx->slot = ctx->reset_slot;
987 0 : ctx->hashcnt = 0UL;
988 0 : }
989 :
990 0 : ctx->current_leader_bank = bank;
991 0 : ctx->microblocks_lower_bound = 0UL;
992 0 : ctx->cus_used = 0UL;
993 0 : ctx->expect_microblock_idx = 0UL;
994 :
995 : /* We are about to start publishing to the shred tile for this slot
996 : so update the highwater mark so we never republish in this slot
997 : again. Also check that the leader slot is greater than the
998 : highwater, which should have been ensured earlier. */
999 :
1000 0 : FD_TEST( ctx->highwater_leader_slot==ULONG_MAX || slot>=ctx->highwater_leader_slot );
1001 0 : ctx->highwater_leader_slot = fd_ulong_max( fd_ulong_if( ctx->highwater_leader_slot==ULONG_MAX, 0UL, ctx->highwater_leader_slot ), slot );
1002 :
1003 0 : publish_became_leader( ctx, slot );
1004 0 : FD_LOG_INFO(( "fd_ext_poh_begin_leader(slot=%lu, highwater_leader_slot=%lu, last_slot=%lu, last_hashcnt=%lu)", slot, ctx->highwater_leader_slot, ctx->last_slot, ctx->last_hashcnt ));
1005 :
1006 0 : fd_ext_poh_write_unlock();
1007 0 : }
1008 :
1009 : /* Determine what the next slot is in the leader schedule is that we are
1010 : leader. Includes the current slot. If we are not leader in what
1011 : remains of the current and next epoch, return ULONG_MAX. */
1012 :
1013 : static inline CALLED_FROM_RUST ulong
1014 0 : next_leader_slot( fd_poh_ctx_t * ctx ) {
1015 : /* If we have published anything in a particular slot, then we
1016 : should never become leader for that slot again. */
1017 0 : ulong min_leader_slot = fd_ulong_max( ctx->slot, fd_ulong_if( ctx->highwater_leader_slot==ULONG_MAX, 0UL, ctx->highwater_leader_slot ) );
1018 :
1019 0 : for(;;) {
1020 0 : fd_epoch_leaders_t * leaders = fd_stake_ci_get_lsched_for_slot( ctx->stake_ci, min_leader_slot ); /* Safe to call from Rust */
1021 0 : if( FD_UNLIKELY( !leaders ) ) break;
1022 :
1023 0 : while( min_leader_slot<(leaders->slot0+leaders->slot_cnt) ) {
1024 0 : fd_pubkey_t const * leader = fd_epoch_leaders_get( leaders, min_leader_slot ); /* Safe to call from Rust */
1025 0 : if( FD_UNLIKELY( !memcmp( leader->key, ctx->identity_key.key, 32UL ) ) ) return min_leader_slot;
1026 0 : min_leader_slot++;
1027 0 : }
1028 0 : }
1029 :
1030 0 : return ULONG_MAX;
1031 0 : }
1032 :
1033 : static CALLED_FROM_RUST void
1034 0 : no_longer_leader( fd_poh_ctx_t * ctx ) {
1035 0 : if( FD_UNLIKELY( ctx->current_leader_bank ) ) fd_ext_bank_release( ctx->current_leader_bank );
1036 : /* If we stop being leader in a slot, we can never become leader in
1037 : that slot again, and all in-flight microblocks for that slot
1038 : should be dropped. */
1039 0 : ctx->highwater_leader_slot = fd_ulong_max( fd_ulong_if( ctx->highwater_leader_slot==ULONG_MAX, 0UL, ctx->highwater_leader_slot ), ctx->slot );
1040 0 : ctx->current_leader_bank = NULL;
1041 0 : ctx->next_leader_slot = next_leader_slot( ctx );
1042 :
1043 0 : FD_COMPILER_MFENCE();
1044 0 : fd_ext_poh_signal_leader_change( ctx->signal_leader_change );
1045 0 : FD_LOG_INFO(( "no_longer_leader(next_leader_slot=%lu)", ctx->next_leader_slot ));
1046 0 : }
1047 :
1048 : /* fd_ext_poh_reset is called by the Agave client when a slot on
1049 : the active fork has finished a block and we need to reset our PoH to
1050 : be ticking on top of the block it produced. */
1051 :
1052 : CALLED_FROM_RUST void
1053 : fd_ext_poh_reset( ulong completed_bank_slot, /* The slot that successfully produced a block */
1054 : uchar const * reset_blockhash, /* The hash of the last tick in the produced block */
1055 0 : ulong hashcnt_per_tick /* The hashcnt per tick of the bank that completed */ ) {
1056 0 : fd_poh_ctx_t * ctx = fd_ext_poh_write_lock();
1057 :
1058 0 : ulong slot_before_reset = ctx->slot;
1059 0 : int leader_before_reset = ctx->slot>=ctx->next_leader_slot;
1060 0 : if( FD_UNLIKELY( leader_before_reset && ctx->current_leader_bank ) ) {
1061 : /* If we were in the middle of a leader slot that we notified pack
1062 : pack to start packing for we can never publish into that slot
1063 : again, mark all in-flight microblocks to be dropped. */
1064 0 : ctx->highwater_leader_slot = fd_ulong_max( fd_ulong_if( ctx->highwater_leader_slot==ULONG_MAX, 0UL, ctx->highwater_leader_slot ), 1UL+ctx->slot );
1065 0 : }
1066 :
1067 0 : ctx->leader_bank_start_ns = fd_log_wallclock(); /* safe to call from Rust */
1068 0 : if( FD_UNLIKELY( ctx->expect_sequential_leader_slot==(completed_bank_slot+1UL) ) ) {
1069 : /* If we are being reset onto a slot, it means some block was fully
1070 : processed, so we reset to build on top of it. Typically we want
1071 : to update the reset_slot_start_ns to the current time, because
1072 : the network will give the next leader 400ms to publish,
1073 : regardless of how long the prior leader took.
1074 :
1075 : But: if we were leader in the prior slot, and the block was our
1076 : own we can do better. We know that the next slot should start
1077 : exactly 400ms after the prior one started, so we can use that as
1078 : the reset slot start time instead. */
1079 0 : ctx->reset_slot_start_ns = ctx->reset_slot_start_ns + (long)((double)((completed_bank_slot+1UL)-ctx->reset_slot)*ctx->slot_duration_ns);
1080 0 : } else {
1081 0 : ctx->reset_slot_start_ns = ctx->leader_bank_start_ns;
1082 0 : }
1083 0 : ctx->expect_sequential_leader_slot = ULONG_MAX;
1084 :
1085 0 : memcpy( ctx->hash, reset_blockhash, 32UL );
1086 0 : ctx->slot = completed_bank_slot+1UL;
1087 0 : ctx->hashcnt = 0UL;
1088 0 : ctx->last_slot = ctx->slot;
1089 0 : ctx->last_hashcnt = 0UL;
1090 0 : ctx->reset_slot = ctx->slot;
1091 :
1092 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick!=hashcnt_per_tick ) ) {
1093 0 : FD_LOG_WARNING(( "hashes per tick changed from %lu to %lu", ctx->hashcnt_per_tick, hashcnt_per_tick ));
1094 :
1095 : /* Recompute derived information about the clock. */
1096 0 : ctx->hashcnt_duration_ns = (double)ctx->tick_duration_ns/(double)hashcnt_per_tick;
1097 0 : ctx->hashcnt_per_slot = ctx->ticks_per_slot*hashcnt_per_tick;
1098 0 : ctx->hashcnt_per_tick = hashcnt_per_tick;
1099 :
1100 0 : if( FD_UNLIKELY( ctx->hashcnt_per_tick==1UL ) ) {
1101 : /* Low power producer, maximum of one microblock per tick in the slot */
1102 0 : ctx->max_microblocks_per_slot = ctx->ticks_per_slot;
1103 0 : } else {
1104 : /* See the long comment in after_credit for this limit */
1105 0 : ctx->max_microblocks_per_slot = fd_ulong_min( MAX_MICROBLOCKS_PER_SLOT, ctx->ticks_per_slot*(ctx->hashcnt_per_tick-1UL) );
1106 0 : }
1107 0 : }
1108 :
1109 0 : if( FD_UNLIKELY( leader_before_reset ) ) {
1110 : /* No longer have a leader bank if we are reset. Replay stage will
1111 : call back again to give us a new one if we should become leader
1112 : for the reset slot.
1113 :
1114 : The order is important here, ctx->hashcnt must be updated before
1115 : calling no_longer_leader. */
1116 0 : no_longer_leader( ctx );
1117 0 : }
1118 0 : ctx->next_leader_slot = next_leader_slot( ctx );
1119 0 : FD_LOG_INFO(( "fd_ext_poh_reset(slot=%lu,next_leader_slot=%lu)", ctx->reset_slot, ctx->next_leader_slot ));
1120 :
1121 0 : if( FD_UNLIKELY( ctx->slot>=ctx->next_leader_slot ) ) {
1122 : /* We are leader after the reset... two cases: */
1123 0 : if( FD_LIKELY( ctx->slot==slot_before_reset ) ) {
1124 : /* 1. We are reset onto the same slot we are already leader on.
1125 : This is a common case when we have two leader slots in a
1126 : row, replay stage will reset us to our own slot. No need to
1127 : do anything here, we already sent a SLOT_START. */
1128 0 : FD_TEST( leader_before_reset );
1129 0 : } else {
1130 : /* 2. We are reset onto a different slot. If we were leader
1131 : before, we should first end that slot, then begin the new
1132 : one if we are newly leader now. */
1133 0 : if( FD_LIKELY( leader_before_reset ) ) publish_plugin_slot_end( ctx, slot_before_reset, ctx->cus_used );
1134 0 : else publish_plugin_slot_start( ctx, ctx->next_leader_slot, ctx->reset_slot );
1135 0 : }
1136 0 : } else {
1137 0 : if( FD_UNLIKELY( leader_before_reset ) ) publish_plugin_slot_end( ctx, slot_before_reset, ctx->cus_used );
1138 0 : }
1139 :
1140 0 : fd_ext_poh_write_unlock();
1141 0 : }
1142 :
1143 : /* Since it can't easily return an Option<Pubkey>, return 1 for Some and
1144 : 0 for None. */
1145 : CALLED_FROM_RUST int
1146 : fd_ext_poh_get_leader_after_n_slots( ulong n,
1147 0 : uchar out_pubkey[ static 32 ] ) {
1148 0 : fd_poh_ctx_t * ctx = fd_ext_poh_write_lock();
1149 0 : ulong slot = ctx->slot + n;
1150 0 : fd_epoch_leaders_t * leaders = fd_stake_ci_get_lsched_for_slot( ctx->stake_ci, slot ); /* Safe to call from Rust */
1151 :
1152 0 : int copied = 0;
1153 0 : if( FD_LIKELY( leaders ) ) {
1154 0 : fd_pubkey_t const * leader = fd_epoch_leaders_get( leaders, slot ); /* Safe to call from Rust */
1155 0 : if( FD_LIKELY( leader ) ) {
1156 0 : memcpy( out_pubkey, leader, 32UL );
1157 0 : copied = 1;
1158 0 : }
1159 0 : }
1160 0 : fd_ext_poh_write_unlock();
1161 0 : return copied;
1162 0 : }
1163 :
1164 : FD_FN_CONST static inline ulong
1165 3 : scratch_align( void ) {
1166 3 : return 128UL;
1167 3 : }
1168 :
1169 : FD_FN_PURE static inline ulong
1170 3 : scratch_footprint( fd_topo_tile_t const * tile ) {
1171 3 : (void)tile;
1172 3 : ulong l = FD_LAYOUT_INIT;
1173 3 : l = FD_LAYOUT_APPEND( l, alignof( fd_poh_ctx_t ), sizeof( fd_poh_ctx_t ) );
1174 3 : l = FD_LAYOUT_APPEND( l, fd_stake_ci_align(), fd_stake_ci_footprint() );
1175 3 : l = FD_LAYOUT_APPEND( l, FD_SHA256_ALIGN, FD_SHA256_FOOTPRINT );
1176 3 : return FD_LAYOUT_FINI( l, scratch_align() );
1177 3 : }
1178 :
1179 : static void
1180 : publish_tick( fd_poh_ctx_t * ctx,
1181 : fd_stem_context_t * stem,
1182 : uchar hash[ static 32 ],
1183 0 : int is_skipped ) {
1184 0 : ulong hashcnt = ctx->hashcnt_per_tick*(1UL+(ctx->last_hashcnt/ctx->hashcnt_per_tick));
1185 :
1186 0 : uchar * dst = (uchar *)fd_chunk_to_laddr( ctx->shred_out->mem, ctx->shred_out->chunk );
1187 :
1188 0 : FD_TEST( ctx->last_slot>=ctx->reset_slot );
1189 0 : fd_entry_batch_meta_t * meta = (fd_entry_batch_meta_t *)dst;
1190 0 : if( FD_UNLIKELY( is_skipped ) ) {
1191 : /* We are publishing ticks for a skipped slot, the reference tick
1192 : and block complete flags should always be zero. */
1193 0 : meta->reference_tick = 0UL;
1194 0 : meta->block_complete = 0;
1195 0 : } else {
1196 0 : meta->reference_tick = hashcnt/ctx->hashcnt_per_tick;
1197 0 : meta->block_complete = hashcnt==ctx->hashcnt_per_slot;
1198 0 : }
1199 :
1200 0 : ulong slot = fd_ulong_if( meta->block_complete, ctx->slot-1UL, ctx->slot );
1201 0 : meta->parent_offset = 1UL+slot-ctx->reset_slot;
1202 :
1203 0 : FD_TEST( hashcnt>ctx->last_hashcnt );
1204 0 : ulong hash_delta = hashcnt-ctx->last_hashcnt;
1205 :
1206 0 : dst += sizeof(fd_entry_batch_meta_t);
1207 0 : fd_entry_batch_header_t * tick = (fd_entry_batch_header_t *)dst;
1208 0 : tick->hashcnt_delta = hash_delta;
1209 0 : fd_memcpy( tick->hash, hash, 32UL );
1210 0 : tick->txn_cnt = 0UL;
1211 :
1212 0 : ulong tspub = (ulong)fd_frag_meta_ts_comp( fd_tickcount() );
1213 0 : ulong sz = sizeof(fd_entry_batch_meta_t)+sizeof(fd_entry_batch_header_t);
1214 0 : ulong sig = fd_disco_poh_sig( slot, POH_PKT_TYPE_MICROBLOCK, 0UL );
1215 0 : fd_stem_publish( stem, ctx->shred_out->idx, sig, ctx->shred_out->chunk, sz, 0UL, 0UL, tspub );
1216 0 : ctx->shred_out->chunk = fd_dcache_compact_next( ctx->shred_out->chunk, sz, ctx->shred_out->chunk0, ctx->shred_out->wmark );
1217 :
1218 0 : if( FD_UNLIKELY( hashcnt==ctx->hashcnt_per_slot ) ) {
1219 0 : ctx->last_slot++;
1220 0 : ctx->last_hashcnt = 0UL;
1221 0 : } else {
1222 0 : ctx->last_hashcnt = hashcnt;
1223 0 : }
1224 0 : }
1225 :
1226 : static inline void
1227 : after_credit( fd_poh_ctx_t * ctx,
1228 : fd_stem_context_t * stem,
1229 : int * opt_poll_in,
1230 0 : int * charge_busy ) {
1231 0 : ctx->stem = stem;
1232 :
1233 0 : FD_COMPILER_MFENCE();
1234 0 : if( FD_UNLIKELY( fd_poh_waiting_lock ) ) {
1235 0 : FD_VOLATILE( fd_poh_returned_lock ) = 1UL;
1236 0 : FD_COMPILER_MFENCE();
1237 0 : for(;;) {
1238 0 : if( FD_UNLIKELY( !FD_VOLATILE_CONST( fd_poh_returned_lock ) ) ) break;
1239 0 : FD_SPIN_PAUSE();
1240 0 : }
1241 0 : FD_COMPILER_MFENCE();
1242 0 : FD_VOLATILE( fd_poh_waiting_lock ) = 0UL;
1243 0 : *opt_poll_in = 0;
1244 0 : *charge_busy = 1;
1245 0 : return;
1246 0 : }
1247 0 : FD_COMPILER_MFENCE();
1248 :
1249 0 : int is_leader = ctx->next_leader_slot!=ULONG_MAX && ctx->slot>=ctx->next_leader_slot;
1250 0 : if( FD_UNLIKELY( is_leader && !ctx->current_leader_bank ) ) {
1251 : /* If we are the leader, but we didn't yet learn what the leader
1252 : bank object is from the replay stage, do not do any hashing.
1253 :
1254 : This is not ideal, but greatly simplifies the control flow. */
1255 0 : return;
1256 0 : }
1257 :
1258 : /* If we have skipped ticks pending because we skipped some slots to
1259 : become leader, register them now one at a time. */
1260 0 : if( FD_UNLIKELY( is_leader && ctx->last_slot<ctx->slot ) ) {
1261 0 : ulong publish_hashcnt = ctx->last_hashcnt+ctx->hashcnt_per_tick;
1262 0 : ulong tick_idx = (ctx->last_slot*ctx->ticks_per_slot+publish_hashcnt/ctx->hashcnt_per_tick)%MAX_SKIPPED_TICKS;
1263 :
1264 0 : fd_ext_poh_register_tick( ctx->current_leader_bank, ctx->skipped_tick_hashes[ tick_idx ] );
1265 0 : publish_tick( ctx, stem, ctx->skipped_tick_hashes[ tick_idx ], 1 );
1266 :
1267 : /* If we are catching up now and publishing a bunch of skipped
1268 : ticks, we do not want to process any incoming microblocks until
1269 : all the skipped ticks have been published out; otherwise we would
1270 : intersperse skipped tick messages with microblocks. */
1271 0 : *opt_poll_in = 0;
1272 0 : *charge_busy = 1;
1273 0 : return;
1274 0 : }
1275 :
1276 0 : int low_power_mode = ctx->hashcnt_per_tick==1UL;
1277 :
1278 : /* If we are the leader, always leave enough capacity in the slot so
1279 : that we can mixin any potential microblocks still coming from the
1280 : pack tile for this slot. */
1281 0 : ulong max_remaining_microblocks = ctx->max_microblocks_per_slot - ctx->microblocks_lower_bound;
1282 : /* With hashcnt_per_tick hashes per tick, we actually get
1283 : hashcnt_per_tick-1 chances to mixin a microblock. For each tick
1284 : span that we need to reserve, we also need to reserve the hashcnt
1285 : for the tick, hence the +
1286 : max_remaining_microblocks/(hashcnt_per_tick-1) rounded up.
1287 :
1288 : However, if hashcnt_per_tick is 1 because we're in low power mode,
1289 : this should probably just be max_remaining_microblocks. */
1290 0 : ulong max_remaining_ticks_or_microblocks = max_remaining_microblocks;
1291 0 : if( FD_LIKELY( !low_power_mode ) ) max_remaining_ticks_or_microblocks += (max_remaining_microblocks+ctx->hashcnt_per_tick-2UL)/(ctx->hashcnt_per_tick-1UL);
1292 :
1293 0 : ulong restricted_hashcnt = fd_ulong_if( ctx->hashcnt_per_slot>=max_remaining_ticks_or_microblocks, ctx->hashcnt_per_slot-max_remaining_ticks_or_microblocks, 0UL );
1294 :
1295 0 : ulong min_hashcnt = ctx->hashcnt;
1296 :
1297 0 : if( FD_LIKELY( !low_power_mode ) ) {
1298 : /* Recall that there are two kinds of events that will get published
1299 : to the shredder,
1300 :
1301 : (a) Ticks. These occur every 62,500 (hashcnt_per_tick) hashcnts,
1302 : and there will be 64 (ticks_per_slot) of them in each slot.
1303 :
1304 : Ticks must not have any transactions mixed into the hash.
1305 : This is not strictly needed in theory, but is required by the
1306 : current consensus protocol. They get published here in
1307 : after_credit.
1308 :
1309 : (b) Microblocks. These can occur at any other hashcnt, as long
1310 : as it is not a tick. Microblocks cannot be empty, and must
1311 : have at least one transactions mixed in. These get
1312 : published in after_frag.
1313 :
1314 : If hashcnt_per_tick is 1, then we are in low power mode and the
1315 : following does not apply, since we can mix in transactions at any
1316 : time.
1317 :
1318 : In the normal, non-low-power mode, though, we have to be careful
1319 : to make sure that we do not publish microblocks on tick
1320 : boundaries. To do that, we need to obey two rules:
1321 : (i) after_credit must not leave hashcnt one before a tick
1322 : boundary
1323 : (ii) if after_credit begins one before a tick boundary, it must
1324 : advance hashcnt and publish the tick
1325 :
1326 : There's some interplay between min_hashcnt and restricted_hashcnt
1327 : here, and we need to show that there's always a value of
1328 : target_hashcnt we can pick such that
1329 : min_hashcnt <= target_hashcnt <= restricted_hashcnt.
1330 : We'll prove this by induction for current_slot==0 and
1331 : is_leader==true, since all other slots should be the same.
1332 :
1333 : Let m_j and r_j be the min_hashcnt and restricted_hashcnt
1334 : (respectively) for the jth call to after_credit in a slot. We
1335 : want to show that for all values of j, it's possible to pick a
1336 : value h_j, the value of target_hashcnt for the jth call to
1337 : after_credit (which is also the value of hashcnt after
1338 : after_credit has completed) such that m_j<=h_j<=r_j.
1339 :
1340 : Additionally, let T be hashcnt_per_tick and N be ticks_per_slot.
1341 :
1342 : Starting with the base case, j==0. m_j=0, and
1343 : r_0 = N*T - max_microblocks_per_slot
1344 : - ceil(max_microblocks_per_slot/(T-1)).
1345 :
1346 : This is monotonic decreasing in max_microblocks_per_slot, so it
1347 : achieves its minimum when max_microblocks_per_slot is its
1348 : maximum.
1349 : r_0 >= N*T - N*(T-1) - ceil( (N*(T-1))/(T-1))
1350 : = N*T - N*(T-1)-N = 0.
1351 : Thus, m_0 <= r_0, as desired.
1352 :
1353 :
1354 :
1355 : Then, for the inductive step, assume there exists h_j such that
1356 : m_j<=h_j<=r_j, and we want to show that there exists h_{j+1},
1357 : which is the same as showing m_{j+1}<=r_{j+1}.
1358 :
1359 : Let a_j be 1 if we had a microblock immediately following the jth
1360 : call to after_credit, and 0 otherwise. Then hashcnt at the start
1361 : of the (j+1)th call to after_frag is h_j+a_j.
1362 : Also, set b_{j+1}=1 if we are in the case covered by rule (ii)
1363 : above during the (j+1)th call to after_credit, i.e. if
1364 : (h_j+a_j)%T==T-1. Thus, m_{j+1} = h_j + a_j + b_{j+1}.
1365 :
1366 : If we received an additional microblock, then
1367 : max_remaining_microblocks goes down by 1, and
1368 : max_remaining_ticks_or_microblocks goes down by either 1 or 2,
1369 : which means restricted_hashcnt goes up by either 1 or 2. In
1370 : particular, it goes up by 2 if the new value of
1371 : max_remaining_microblocks (at the start of the (j+1)th call to
1372 : after_credit) is congruent to 0 mod T-1. Let b'_{j+1} be 1 if
1373 : this condition is met and 0 otherwise. If we receive a
1374 : done_packing message, restricted_hashcnt can go up by more, but
1375 : we can ignore that case, since it is less restrictive.
1376 : Thus, r_{j+1}=r_j+a_j+b'_{j+1}.
1377 :
1378 : If h_j < r_j (strictly less), then h_j+a_j < r_j+a_j. And thus,
1379 : since b_{j+1}<=b'_{j+1}+1, just by virtue of them both being
1380 : binary,
1381 : h_j + a_j + b_{j+1} < r_j + a_j + b'_{j+1} + 1,
1382 : which is the same (for integers) as
1383 : h_j + a_j + b_{j+1} <= r_j + a_j + b'_{j+1},
1384 : m_{j+1} <= r_{j+1}
1385 :
1386 : On the other hand, if h_j==r_j, this is easy unless b_{j+1}==1,
1387 : which can also only happen if a_j==1. Then (h_j+a_j)%T==T-1,
1388 : which means there's an integer k such that
1389 :
1390 : h_j+a_j==(ticks_per_slot-k)*T-1
1391 : h_j ==ticks_per_slot*T - k*(T-1)-1 - k-1
1392 : ==ticks_per_slot*T - (k*(T-1)+1) - ceil( (k*(T-1)+1)/(T-1) )
1393 :
1394 : Since h_j==r_j in this case, and
1395 : r_j==(ticks_per_slot*T) - max_remaining_microblocks_j - ceil(max_remaining_microblocks_j/(T-1)),
1396 : we can see that the value of max_remaining_microblocks at the
1397 : start of the jth call to after_credit is k*(T-1)+1. Again, since
1398 : a_j==1, then the value of max_remaining_microblocks at the start
1399 : of the j+1th call to after_credit decreases by 1 to k*(T-1),
1400 : which means b'_{j+1}=1.
1401 :
1402 : Thus, h_j + a_j + b_{j+1} == r_j + a_j + b'_{j+1}, so, in
1403 : particular, h_{j+1}<=r_{j+1} as desired. */
1404 0 : min_hashcnt += (ulong)(min_hashcnt%ctx->hashcnt_per_tick == (ctx->hashcnt_per_tick-1UL)); /* add b_{j+1}, enforcing rule (ii) */
1405 0 : }
1406 : /* Now figure out how many hashes are needed to "catch up" the hash
1407 : count to the current system clock, and clamp it to the allowed
1408 : range. */
1409 0 : long now = fd_log_wallclock();
1410 0 : ulong target_hashcnt;
1411 0 : if( FD_LIKELY( !is_leader ) ) {
1412 0 : target_hashcnt = (ulong)((double)(now - ctx->reset_slot_start_ns) / ctx->hashcnt_duration_ns) - (ctx->slot-ctx->reset_slot)*ctx->hashcnt_per_slot;
1413 0 : } else {
1414 : /* We might have gotten very behind on hashes, but if we are leader
1415 : we want to catch up gradually over the remainder of our leader
1416 : slot, not all at once right now. This helps keep the tile from
1417 : being oversubscribed and taking a long time to process incoming
1418 : microblocks. */
1419 0 : long expected_slot_start_ns = ctx->reset_slot_start_ns + (long)((double)(ctx->slot-ctx->reset_slot)*ctx->slot_duration_ns);
1420 0 : double actual_slot_duration_ns = ctx->slot_duration_ns<(double)(ctx->leader_bank_start_ns - expected_slot_start_ns) ? 0.0 : ctx->slot_duration_ns - (double)(ctx->leader_bank_start_ns - expected_slot_start_ns);
1421 0 : double actual_hashcnt_duration_ns = actual_slot_duration_ns / (double)ctx->hashcnt_per_slot;
1422 0 : target_hashcnt = fd_ulong_if( actual_hashcnt_duration_ns==0.0, restricted_hashcnt, (ulong)((double)(now - ctx->leader_bank_start_ns) / actual_hashcnt_duration_ns) );
1423 0 : }
1424 : /* Clamp to [min_hashcnt, restricted_hashcnt] as above */
1425 0 : target_hashcnt = fd_ulong_max( fd_ulong_min( target_hashcnt, restricted_hashcnt ), min_hashcnt );
1426 :
1427 : /* The above proof showed that it was always possible to pick a value
1428 : of target_hashcnt, but we still have a lot of freedom in how to
1429 : pick it. It simplifies the code a lot if we don't keep going after
1430 : a tick in this function. In particular, we want to publish at most
1431 : 1 tick in this call, since otherwise we could consume infinite
1432 : credits to publish here. The credits are set so that we should
1433 : only ever publish one tick during this loop. Also, all the extra
1434 : stuff (leader transitions, publishing ticks, etc.) we have to do
1435 : happens at tick boundaries, so this lets us consolidate all those
1436 : cases.
1437 :
1438 : Mathematically, since the current value of hashcnt is h_j+a_j, the
1439 : next tick (advancing a full tick if we're currently at a tick) is
1440 : t_{j+1} = T*(floor( (h_j+a_j)/T )+1). We need to show that if we set
1441 : h'_{j+1} = min( h_{j+1}, t_{j+1} ), it is still valid.
1442 :
1443 : First, h'_{j+1} <= h_{j+1} <= r_{j+1}, so we're okay in that
1444 : direction.
1445 :
1446 : Next, observe that t_{j+1}>=h_j + a_j + 1, and recall that b_{j+1}
1447 : is 0 or 1. So then,
1448 : t_{j+1} >= h_j+a_j+b_{j+1} = m_{j+1}.
1449 :
1450 : We know h_{j+1) >= m_{j+1} from before, so then h'_{j+1} >=
1451 : m_{j+1}, as desired. */
1452 :
1453 0 : ulong next_tick_hashcnt = ctx->hashcnt_per_tick * (1UL+(ctx->hashcnt/ctx->hashcnt_per_tick));
1454 0 : target_hashcnt = fd_ulong_min( target_hashcnt, next_tick_hashcnt );
1455 :
1456 : /* We still need to enforce rule (i). We know that min_hashcnt%T !=
1457 : T-1 because of rule (ii). That means that if target_hashcnt%T ==
1458 : T-1 at this point, target_hashcnt > min_hashcnt (notice the
1459 : strict), so target_hashcnt-1 >= min_hashcnt and is thus still a
1460 : valid choice for target_hashcnt. */
1461 0 : target_hashcnt -= (ulong)( (!low_power_mode) & ((target_hashcnt%ctx->hashcnt_per_tick)==(ctx->hashcnt_per_tick-1UL)) );
1462 :
1463 0 : FD_TEST( target_hashcnt >= ctx->hashcnt );
1464 0 : FD_TEST( target_hashcnt >= min_hashcnt );
1465 0 : FD_TEST( target_hashcnt <= restricted_hashcnt );
1466 :
1467 0 : if( FD_UNLIKELY( ctx->hashcnt==target_hashcnt ) ) return; /* Nothing to do, don't publish a tick twice */
1468 :
1469 0 : *charge_busy = 1;
1470 :
1471 0 : while( ctx->hashcnt<target_hashcnt ) {
1472 0 : fd_sha256_hash( ctx->hash, 32UL, ctx->hash );
1473 0 : ctx->hashcnt++;
1474 0 : }
1475 :
1476 0 : if( FD_UNLIKELY( ctx->hashcnt==ctx->hashcnt_per_slot ) ) {
1477 0 : ctx->slot++;
1478 0 : ctx->hashcnt = 0UL;
1479 0 : }
1480 :
1481 0 : if( FD_UNLIKELY( !is_leader && !(ctx->hashcnt%ctx->hashcnt_per_tick ) ) ) {
1482 : /* We finished a tick while not leader... save the current hash so
1483 : it can be played back into the bank when we become the leader. */
1484 0 : ulong tick_idx = (ctx->slot*ctx->ticks_per_slot+ctx->hashcnt/ctx->hashcnt_per_tick)%MAX_SKIPPED_TICKS;
1485 0 : fd_memcpy( ctx->skipped_tick_hashes[ tick_idx ], ctx->hash, 32UL );
1486 :
1487 0 : ulong initial_tick_idx = (ctx->last_slot*ctx->ticks_per_slot+ctx->last_hashcnt/ctx->hashcnt_per_tick)%MAX_SKIPPED_TICKS;
1488 0 : if( FD_UNLIKELY( tick_idx==initial_tick_idx ) ) FD_LOG_ERR(( "Too many skipped ticks from slot %lu to slot %lu, chain must halt", ctx->last_slot, ctx->slot ));
1489 0 : }
1490 :
1491 0 : if( FD_UNLIKELY( is_leader && !(ctx->hashcnt%ctx->hashcnt_per_tick) ) ) {
1492 : /* We ticked while leader... tell the leader bank. */
1493 0 : fd_ext_poh_register_tick( ctx->current_leader_bank, ctx->hash );
1494 :
1495 : /* And send an empty microblock (a tick) to the shred tile. */
1496 0 : publish_tick( ctx, stem, ctx->hash, 0 );
1497 0 : }
1498 :
1499 0 : if( FD_UNLIKELY( !is_leader && ctx->slot>=ctx->next_leader_slot ) ) {
1500 : /* We ticked while not leader and are now leader... transition
1501 : the state machine. */
1502 0 : publish_plugin_slot_start( ctx, ctx->next_leader_slot, ctx->reset_slot );
1503 0 : }
1504 :
1505 0 : if( FD_UNLIKELY( is_leader && ctx->slot>ctx->next_leader_slot ) ) {
1506 : /* We ticked while leader and are no longer leader... transition
1507 : the state machine. */
1508 0 : FD_TEST( !max_remaining_microblocks );
1509 0 : publish_plugin_slot_end( ctx, ctx->next_leader_slot, ctx->cus_used );
1510 :
1511 0 : no_longer_leader( ctx );
1512 0 : ctx->expect_sequential_leader_slot = ctx->slot;
1513 :
1514 0 : double tick_per_ns = fd_tempo_tick_per_ns( NULL );
1515 0 : fd_histf_sample( ctx->slot_done_delay, (ulong)((double)(fd_log_wallclock()-ctx->reset_slot_start_ns)/tick_per_ns) );
1516 0 : ctx->next_leader_slot = next_leader_slot( ctx );
1517 :
1518 0 : if( FD_UNLIKELY( ctx->slot>=ctx->next_leader_slot ) ) {
1519 : /* We finished a leader slot, and are immediately leader for the
1520 : following slot... transition. */
1521 0 : publish_plugin_slot_start( ctx, ctx->next_leader_slot, ctx->next_leader_slot-1UL );
1522 0 : }
1523 0 : }
1524 0 : }
1525 :
1526 : static inline void
1527 0 : metrics_write( fd_poh_ctx_t * ctx ) {
1528 0 : FD_MHIST_COPY( POH_TILE, BEGIN_LEADER_DELAY_SECONDS, ctx->begin_leader_delay );
1529 0 : FD_MHIST_COPY( POH_TILE, FIRST_MICROBLOCK_DELAY_SECONDS, ctx->first_microblock_delay );
1530 0 : FD_MHIST_COPY( POH_TILE, SLOT_DONE_DELAY_SECONDS, ctx->slot_done_delay );
1531 0 : }
1532 :
1533 : static int
1534 : before_frag( fd_poh_ctx_t * ctx,
1535 : ulong in_idx,
1536 : ulong seq,
1537 0 : ulong sig ) {
1538 0 : (void)in_idx;
1539 0 : (void)seq;
1540 :
1541 0 : if( FD_LIKELY( ctx->in_kind[ in_idx ]==IN_KIND_BANK ) ) {
1542 0 : ulong microblock_idx = fd_disco_bank_sig_microblock_idx( sig );
1543 0 : FD_TEST( microblock_idx>=ctx->expect_microblock_idx );
1544 :
1545 : /* Return the fragment to stem so we can process it later, if it's
1546 : not next in the sequence. */
1547 0 : if( FD_UNLIKELY( microblock_idx>ctx->expect_microblock_idx ) ) return -1;
1548 :
1549 0 : ctx->expect_microblock_idx++;
1550 0 : }
1551 :
1552 0 : return 0;
1553 0 : }
1554 :
1555 : static inline void
1556 : during_frag( fd_poh_ctx_t * ctx,
1557 : ulong in_idx,
1558 : ulong seq,
1559 : ulong sig,
1560 : ulong chunk,
1561 0 : ulong sz ) {
1562 0 : (void)seq;
1563 0 : (void)sig;
1564 :
1565 0 : ctx->skip_frag = 0;
1566 :
1567 0 : if( FD_UNLIKELY( ctx->in_kind[ in_idx ]==IN_KIND_STAKE ) ) {
1568 0 : if( FD_UNLIKELY( chunk<ctx->in[ in_idx ].chunk0 || chunk>ctx->in[ in_idx ].wmark ) )
1569 0 : FD_LOG_ERR(( "chunk %lu %lu corrupt, not in range [%lu,%lu]", chunk, sz,
1570 0 : ctx->in[ in_idx ].chunk0, ctx->in[ in_idx ].wmark ));
1571 :
1572 0 : uchar const * dcache_entry = fd_chunk_to_laddr_const( ctx->in[ in_idx ].mem, chunk );
1573 0 : fd_stake_ci_stake_msg_init( ctx->stake_ci, dcache_entry );
1574 0 : return;
1575 0 : }
1576 :
1577 0 : ulong pkt_type;
1578 0 : ulong slot;
1579 0 : switch( ctx->in_kind[ in_idx ] ) {
1580 0 : case IN_KIND_BANK: {
1581 0 : pkt_type = POH_PKT_TYPE_MICROBLOCK;
1582 0 : slot = fd_disco_bank_sig_slot( sig );
1583 0 : break;
1584 0 : }
1585 0 : case IN_KIND_PACK: {
1586 0 : pkt_type = fd_disco_poh_sig_pkt_type( sig );
1587 0 : slot = fd_disco_poh_sig_slot( sig );
1588 0 : break;
1589 0 : }
1590 0 : default:
1591 0 : FD_LOG_ERR(( "unexpected in_kind %d", ctx->in_kind[ in_idx ] ));
1592 0 : }
1593 :
1594 0 : int is_frag_for_prior_leader_slot = 0;
1595 0 : if( FD_LIKELY( pkt_type==POH_PKT_TYPE_DONE_PACKING || pkt_type==POH_PKT_TYPE_MICROBLOCK ) ) {
1596 : /* The following sequence is possible...
1597 :
1598 : 1. We become leader in slot 10
1599 : 2. While leader, we switch to a fork that is on slot 8, where
1600 : we are leader
1601 : 3. We get the in-flight microblocks for slot 10
1602 :
1603 : These in-flight microblocks need to be dropped, so we check
1604 : against the high water mark (highwater_leader_slot) rather than
1605 : the current hashcnt here when determining what to drop.
1606 :
1607 : We know if the slot is lower than the high water mark it's from a stale
1608 : leader slot, because we will not become leader for the same slot twice
1609 : even if we are reset back in time (to prevent duplicate blocks). */
1610 0 : is_frag_for_prior_leader_slot = slot<ctx->highwater_leader_slot;
1611 0 : }
1612 :
1613 0 : if( FD_UNLIKELY( ctx->in_kind[ in_idx ]==IN_KIND_PACK ) ) {
1614 : /* We now know the real amount of microblocks published, so set an
1615 : exact bound for once we receive them. */
1616 0 : ctx->skip_frag = 1;
1617 0 : if( pkt_type==POH_PKT_TYPE_DONE_PACKING ) {
1618 0 : if( FD_UNLIKELY( is_frag_for_prior_leader_slot ) ) return;
1619 :
1620 0 : FD_TEST( ctx->microblocks_lower_bound<=ctx->max_microblocks_per_slot );
1621 0 : fd_done_packing_t const * done_packing = fd_chunk_to_laddr( ctx->in[ in_idx ].mem, chunk );
1622 0 : FD_LOG_INFO(( "done_packing(slot=%lu,seen_microblocks=%lu,microblocks_in_slot=%lu)",
1623 0 : ctx->slot,
1624 0 : ctx->microblocks_lower_bound,
1625 0 : done_packing->microblocks_in_slot ));
1626 0 : ctx->microblocks_lower_bound += ctx->max_microblocks_per_slot - done_packing->microblocks_in_slot;
1627 0 : }
1628 0 : return;
1629 0 : } else {
1630 0 : if( FD_UNLIKELY( chunk<ctx->in[ in_idx ].chunk0 || chunk>ctx->in[ in_idx ].wmark || sz>USHORT_MAX ) )
1631 0 : FD_LOG_ERR(( "chunk %lu %lu corrupt, not in range [%lu,%lu]", chunk, sz, ctx->in[ in_idx ].chunk0, ctx->in[ in_idx ].wmark ));
1632 :
1633 0 : uchar * src = (uchar *)fd_chunk_to_laddr( ctx->in[ in_idx ].mem, chunk );
1634 :
1635 0 : fd_memcpy( ctx->_txns, src, sz-sizeof(fd_microblock_trailer_t) );
1636 0 : fd_memcpy( ctx->_microblock_trailer, src+sz-sizeof(fd_microblock_trailer_t), sizeof(fd_microblock_trailer_t) );
1637 :
1638 0 : ctx->skip_frag = is_frag_for_prior_leader_slot;
1639 0 : }
1640 0 : }
1641 :
1642 : static void
1643 : publish_microblock( fd_poh_ctx_t * ctx,
1644 : fd_stem_context_t * stem,
1645 : ulong slot,
1646 : ulong hashcnt_delta,
1647 0 : ulong txn_cnt ) {
1648 0 : uchar * dst = (uchar *)fd_chunk_to_laddr( ctx->shred_out->mem, ctx->shred_out->chunk );
1649 0 : FD_TEST( slot>=ctx->reset_slot );
1650 0 : fd_entry_batch_meta_t * meta = (fd_entry_batch_meta_t *)dst;
1651 0 : meta->parent_offset = 1UL+slot-ctx->reset_slot;
1652 0 : meta->reference_tick = (ctx->hashcnt/ctx->hashcnt_per_tick) % ctx->ticks_per_slot;
1653 0 : meta->block_complete = !ctx->hashcnt;
1654 :
1655 0 : dst += sizeof(fd_entry_batch_meta_t);
1656 0 : fd_entry_batch_header_t * header = (fd_entry_batch_header_t *)dst;
1657 0 : header->hashcnt_delta = hashcnt_delta;
1658 0 : fd_memcpy( header->hash, ctx->hash, 32UL );
1659 :
1660 0 : dst += sizeof(fd_entry_batch_header_t);
1661 0 : ulong payload_sz = 0UL;
1662 0 : ulong included_txn_cnt = 0UL;
1663 0 : for( ulong i=0UL; i<txn_cnt; i++ ) {
1664 0 : fd_txn_p_t * txn = (fd_txn_p_t *)(ctx->_txns + i*sizeof(fd_txn_p_t));
1665 0 : if( FD_UNLIKELY( !(txn->flags & FD_TXN_P_FLAGS_EXECUTE_SUCCESS) ) ) continue;
1666 :
1667 0 : fd_memcpy( dst, txn->payload, txn->payload_sz );
1668 0 : payload_sz += txn->payload_sz;
1669 0 : dst += txn->payload_sz;
1670 0 : included_txn_cnt++;
1671 0 : }
1672 0 : header->txn_cnt = included_txn_cnt;
1673 :
1674 : /* We always have credits to publish here, because we have a burst
1675 : value of 3 credits, and at most we will publish_tick() once and
1676 : then publish_became_leader() once, leaving one credit here to
1677 : publish the microblock. */
1678 0 : ulong tspub = (ulong)fd_frag_meta_ts_comp( fd_tickcount() );
1679 0 : ulong sz = sizeof(fd_entry_batch_meta_t)+sizeof(fd_entry_batch_header_t)+payload_sz;
1680 0 : ulong new_sig = fd_disco_poh_sig( slot, POH_PKT_TYPE_MICROBLOCK, 0UL );
1681 0 : fd_stem_publish( stem, ctx->shred_out->idx, new_sig, ctx->shred_out->chunk, sz, 0UL, 0UL, tspub );
1682 0 : ctx->shred_out->chunk = fd_dcache_compact_next( ctx->shred_out->chunk, sz, ctx->shred_out->chunk0, ctx->shred_out->wmark );
1683 0 : }
1684 :
1685 : static inline void
1686 : after_frag( fd_poh_ctx_t * ctx,
1687 : ulong in_idx,
1688 : ulong seq,
1689 : ulong sig,
1690 : ulong chunk,
1691 : ulong sz,
1692 : ulong tsorig,
1693 0 : fd_stem_context_t * stem ) {
1694 0 : (void)in_idx;
1695 0 : (void)seq;
1696 0 : (void)chunk;
1697 0 : (void)tsorig;
1698 :
1699 0 : if( FD_UNLIKELY( ctx->skip_frag ) ) return;
1700 :
1701 0 : if( FD_UNLIKELY( ctx->in_kind[ in_idx ]==IN_KIND_STAKE ) ) {
1702 0 : fd_stake_ci_stake_msg_fini( ctx->stake_ci );
1703 : /* It might seem like we do not need to do state transitions in and
1704 : out of being the leader here, since leader schedule updates are
1705 : always one epoch in advance (whether we are leader or not would
1706 : never change for the currently executing slot) but this is not
1707 : true for new ledgers when the validator first boots. We will
1708 : likely be the leader in slot 1, and get notified of the leader
1709 : schedule for that slot while we are still in it.
1710 :
1711 : For safety we just handle both transitions, in and out, although
1712 : the only one possible should be into leader. */
1713 0 : ulong next_leader_slot_after_frag = next_leader_slot( ctx );
1714 :
1715 0 : int currently_leader = ctx->slot>=ctx->next_leader_slot;
1716 0 : int leader_after_frag = ctx->slot>=next_leader_slot_after_frag;
1717 :
1718 0 : FD_LOG_INFO(( "stake_update(before_leader=%lu,after_leader=%lu)",
1719 0 : ctx->next_leader_slot,
1720 0 : next_leader_slot_after_frag ));
1721 :
1722 0 : ctx->next_leader_slot = next_leader_slot_after_frag;
1723 0 : if( FD_UNLIKELY( currently_leader && !leader_after_frag ) ) {
1724 : /* Shouldn't ever happen, otherwise we need to do a state
1725 : transition out of being leader. */
1726 0 : FD_LOG_ERR(( "stake update caused us to no longer be leader in an active slot" ));
1727 0 : }
1728 :
1729 : /* Nothing to do if we transition into being leader, since it
1730 : will just get picked up by the regular tick loop. */
1731 0 : if( FD_UNLIKELY( !currently_leader && leader_after_frag ) ) {
1732 0 : publish_plugin_slot_start( ctx, next_leader_slot_after_frag, ctx->reset_slot );
1733 0 : }
1734 :
1735 0 : return;
1736 0 : }
1737 :
1738 0 : if( FD_UNLIKELY( !ctx->microblocks_lower_bound ) ) {
1739 0 : double tick_per_ns = fd_tempo_tick_per_ns( NULL );
1740 0 : fd_histf_sample( ctx->first_microblock_delay, (ulong)((double)(fd_log_wallclock()-ctx->reset_slot_start_ns)/tick_per_ns) );
1741 0 : }
1742 :
1743 0 : ulong target_slot = fd_disco_bank_sig_slot( sig );
1744 :
1745 0 : if( FD_UNLIKELY( target_slot!=ctx->next_leader_slot || target_slot!=ctx->slot ) ) {
1746 0 : FD_LOG_ERR(( "packed too early or late target_slot=%lu, current_slot=%lu. highwater_leader_slot=%lu",
1747 0 : target_slot, ctx->slot, ctx->highwater_leader_slot ));
1748 0 : }
1749 :
1750 0 : FD_TEST( ctx->current_leader_bank );
1751 0 : FD_TEST( ctx->microblocks_lower_bound<ctx->max_microblocks_per_slot );
1752 0 : ctx->microblocks_lower_bound += 1UL;
1753 :
1754 0 : ulong txn_cnt = (sz-sizeof(fd_microblock_trailer_t))/sizeof(fd_txn_p_t);
1755 0 : fd_txn_p_t * txns = (fd_txn_p_t *)(ctx->_txns);
1756 0 : ulong executed_txn_cnt = 0UL;
1757 0 : ulong cus_used = 0UL;
1758 0 : for( ulong i=0UL; i<txn_cnt; i++ ) {
1759 0 : if( FD_LIKELY( txns[ i ].flags & FD_TXN_P_FLAGS_EXECUTE_SUCCESS ) ) {
1760 0 : executed_txn_cnt++;
1761 0 : cus_used += txns[ i ].bank_cu.actual_consumed_cus;
1762 0 : }
1763 0 : }
1764 :
1765 : /* We don't publish transactions that fail to execute. If all the
1766 : transactions failed to execute, the microblock would be empty,
1767 : causing agave to think it's a tick and complain. Instead, we just
1768 : skip the microblock and don't hash or update the hashcnt. */
1769 0 : if( FD_UNLIKELY( !executed_txn_cnt ) ) return;
1770 :
1771 0 : uchar data[ 64 ];
1772 0 : fd_memcpy( data, ctx->hash, 32UL );
1773 0 : fd_memcpy( data+32UL, ctx->_microblock_trailer->hash, 32UL );
1774 0 : fd_sha256_hash( data, 64UL, ctx->hash );
1775 :
1776 0 : ctx->hashcnt++;
1777 0 : FD_TEST( ctx->hashcnt>ctx->last_hashcnt );
1778 0 : ulong hashcnt_delta = ctx->hashcnt - ctx->last_hashcnt;
1779 :
1780 : /* The hashing loop above will never leave us exactly one away from
1781 : crossing a tick boundary, so this increment will never cause the
1782 : current tick (or the slot) to change, except in low power mode
1783 : for development, in which case we do need to register the tick
1784 : with the leader bank. We don't need to publish the tick since
1785 : sending the microblock below is the publishing action. */
1786 0 : if( FD_UNLIKELY( !(ctx->hashcnt%ctx->hashcnt_per_slot ) ) ) {
1787 0 : ctx->slot++;
1788 0 : ctx->hashcnt = 0UL;
1789 0 : }
1790 :
1791 0 : ctx->last_slot = ctx->slot;
1792 0 : ctx->last_hashcnt = ctx->hashcnt;
1793 :
1794 0 : ctx->cus_used += cus_used;
1795 :
1796 0 : if( FD_UNLIKELY( !(ctx->hashcnt%ctx->hashcnt_per_tick ) ) ) {
1797 0 : fd_ext_poh_register_tick( ctx->current_leader_bank, ctx->hash );
1798 0 : if( FD_UNLIKELY( ctx->slot>ctx->next_leader_slot ) ) {
1799 : /* We ticked while leader and are no longer leader... transition
1800 : the state machine. */
1801 0 : publish_plugin_slot_end( ctx, ctx->next_leader_slot, ctx->cus_used );
1802 :
1803 0 : no_longer_leader( ctx );
1804 :
1805 0 : if( FD_UNLIKELY( ctx->slot>=ctx->next_leader_slot ) ) {
1806 : /* We finished a leader slot, and are immediately leader for the
1807 : following slot... transition. */
1808 0 : publish_plugin_slot_start( ctx, ctx->next_leader_slot, ctx->next_leader_slot-1UL );
1809 0 : }
1810 0 : }
1811 0 : }
1812 :
1813 0 : publish_microblock( ctx, stem, target_slot, hashcnt_delta, txn_cnt );
1814 0 : }
1815 :
1816 : static void
1817 : privileged_init( fd_topo_t * topo,
1818 0 : fd_topo_tile_t * tile ) {
1819 0 : void * scratch = fd_topo_obj_laddr( topo, tile->tile_obj_id );
1820 :
1821 0 : FD_SCRATCH_ALLOC_INIT( l, scratch );
1822 0 : fd_poh_ctx_t * ctx = FD_SCRATCH_ALLOC_APPEND( l, alignof( fd_poh_ctx_t ), sizeof( fd_poh_ctx_t ) );
1823 :
1824 0 : if( FD_UNLIKELY( !strcmp( tile->poh.identity_key_path, "" ) ) )
1825 0 : FD_LOG_ERR(( "identity_key_path not set" ));
1826 :
1827 0 : const uchar * identity_key = fd_keyload_load( tile->poh.identity_key_path, /* pubkey only: */ 1 );
1828 0 : fd_memcpy( ctx->identity_key.uc, identity_key, 32UL );
1829 0 : }
1830 :
1831 : /* The Agave client needs to communicate to the shred tile what
1832 : the shred version is on boot, but shred tile does not live in the
1833 : same address space, so have the PoH tile pass the value through
1834 : via. a shared memory ulong. */
1835 :
1836 : static volatile ulong * fd_shred_version;
1837 :
1838 : void
1839 0 : fd_ext_shred_set_shred_version( ulong shred_version ) {
1840 0 : while( FD_UNLIKELY( !fd_shred_version ) ) FD_SPIN_PAUSE();
1841 0 : *fd_shred_version = shred_version;
1842 0 : }
1843 :
1844 : void
1845 : fd_ext_poh_publish_gossip_vote( uchar * data,
1846 0 : ulong data_len ) {
1847 0 : poh_link_publish( &gossip_dedup, 1UL, data, data_len );
1848 0 : }
1849 :
1850 : void
1851 : fd_ext_poh_publish_leader_schedule( uchar * data,
1852 0 : ulong data_len ) {
1853 0 : poh_link_publish( &stake_out, 2UL, data, data_len );
1854 0 : }
1855 :
1856 : void
1857 : fd_ext_poh_publish_cluster_info( uchar * data,
1858 0 : ulong data_len ) {
1859 0 : poh_link_publish( &crds_shred, 2UL, data, data_len );
1860 0 : }
1861 :
1862 : void
1863 : fd_ext_plugin_publish_replay_stage( ulong sig,
1864 : uchar * data,
1865 0 : ulong data_len ) {
1866 0 : poh_link_publish( &replay_plugin, sig, data, data_len );
1867 0 : }
1868 :
1869 : void
1870 : fd_ext_plugin_publish_start_progress( ulong sig,
1871 : uchar * data,
1872 0 : ulong data_len ) {
1873 0 : poh_link_publish( &start_progress_plugin, sig, data, data_len );
1874 0 : }
1875 :
1876 : void
1877 : fd_ext_plugin_publish_vote_listener( ulong sig,
1878 : uchar * data,
1879 0 : ulong data_len ) {
1880 0 : poh_link_publish( &vote_listener_plugin, sig, data, data_len );
1881 0 : }
1882 :
1883 : void
1884 : fd_ext_plugin_publish_periodic( ulong sig,
1885 : uchar * data,
1886 0 : ulong data_len ) {
1887 0 : poh_link_publish( &gossip_plugin, sig, data, data_len );
1888 0 : }
1889 :
1890 : void
1891 : fd_ext_resolv_publish_root_bank( uchar * data,
1892 0 : ulong data_len ) {
1893 0 : poh_link_publish( &replay_resolv, 0UL, data, data_len );
1894 0 : }
1895 :
1896 : void
1897 : fd_ext_resolv_publish_completed_blockhash( uchar * data,
1898 0 : ulong data_len ) {
1899 0 : poh_link_publish( &replay_resolv, 1UL, data, data_len );
1900 0 : }
1901 :
1902 : static inline fd_poh_out_ctx_t
1903 : out1( fd_topo_t const * topo,
1904 : fd_topo_tile_t const * tile,
1905 0 : char const * name ) {
1906 0 : ulong idx = ULONG_MAX;
1907 :
1908 0 : for( ulong i=0UL; i<tile->out_cnt; i++ ) {
1909 0 : fd_topo_link_t const * link = &topo->links[ tile->out_link_id[ i ] ];
1910 0 : if( !strcmp( link->name, name ) ) {
1911 0 : if( FD_UNLIKELY( idx!=ULONG_MAX ) ) FD_LOG_ERR(( "tile %s:%lu had multiple output links named %s but expected one", tile->name, tile->kind_id, name ));
1912 0 : idx = i;
1913 0 : }
1914 0 : }
1915 :
1916 0 : if( FD_UNLIKELY( idx==ULONG_MAX ) ) FD_LOG_ERR(( "tile %s:%lu had no output link named %s", tile->name, tile->kind_id, name ));
1917 :
1918 0 : void * mem = topo->workspaces[ topo->objs[ topo->links[ tile->out_link_id[ idx ] ].dcache_obj_id ].wksp_id ].wksp;
1919 0 : ulong chunk0 = fd_dcache_compact_chunk0( mem, topo->links[ tile->out_link_id[ idx ] ].dcache );
1920 0 : ulong wmark = fd_dcache_compact_wmark ( mem, topo->links[ tile->out_link_id[ idx ] ].dcache, topo->links[ tile->out_link_id[ idx ] ].mtu );
1921 :
1922 0 : return (fd_poh_out_ctx_t){ .idx = idx, .mem = mem, .chunk0 = chunk0, .wmark = wmark, .chunk = chunk0 };
1923 0 : }
1924 :
1925 : static void
1926 : unprivileged_init( fd_topo_t * topo,
1927 0 : fd_topo_tile_t * tile ) {
1928 0 : void * scratch = fd_topo_obj_laddr( topo, tile->tile_obj_id );
1929 :
1930 0 : FD_SCRATCH_ALLOC_INIT( l, scratch );
1931 0 : fd_poh_ctx_t * ctx = FD_SCRATCH_ALLOC_APPEND( l, alignof( fd_poh_ctx_t ), sizeof( fd_poh_ctx_t ) );
1932 0 : void * stake_ci = FD_SCRATCH_ALLOC_APPEND( l, fd_stake_ci_align(), fd_stake_ci_footprint() );
1933 0 : void * sha256 = FD_SCRATCH_ALLOC_APPEND( l, FD_SHA256_ALIGN, FD_SHA256_FOOTPRINT );
1934 :
1935 0 : #define NONNULL( x ) (__extension__({ \
1936 0 : __typeof__((x)) __x = (x); \
1937 0 : if( FD_UNLIKELY( !__x ) ) FD_LOG_ERR(( #x " was unexpectedly NULL" )); \
1938 0 : __x; }))
1939 :
1940 0 : ctx->stake_ci = NONNULL( fd_stake_ci_join( fd_stake_ci_new( stake_ci, &ctx->identity_key ) ) );
1941 0 : ctx->sha256 = NONNULL( fd_sha256_join( fd_sha256_new( sha256 ) ) );
1942 0 : ctx->current_leader_bank = NULL;
1943 0 : ctx->signal_leader_change = NULL;
1944 :
1945 0 : ctx->slot = 0UL;
1946 0 : ctx->hashcnt = 0UL;
1947 0 : ctx->last_hashcnt = 0UL;
1948 0 : ctx->highwater_leader_slot = ULONG_MAX;
1949 0 : ctx->next_leader_slot = ULONG_MAX;
1950 0 : ctx->reset_slot = ULONG_MAX;
1951 :
1952 0 : ctx->expect_sequential_leader_slot = ULONG_MAX;
1953 :
1954 0 : ctx->microblocks_lower_bound = 0UL;
1955 :
1956 :
1957 0 : ulong poh_shred_obj_id = fd_pod_query_ulong( topo->props, "poh_shred", ULONG_MAX );
1958 0 : FD_TEST( poh_shred_obj_id!=ULONG_MAX );
1959 :
1960 0 : fd_shred_version = fd_fseq_join( fd_topo_obj_laddr( topo, poh_shred_obj_id ) );
1961 0 : FD_TEST( fd_shred_version );
1962 :
1963 0 : poh_link_init( &gossip_dedup, topo, tile, out1( topo, tile, "gossip_dedup" ).idx );
1964 0 : poh_link_init( &stake_out, topo, tile, out1( topo, tile, "stake_out" ).idx );
1965 0 : poh_link_init( &crds_shred, topo, tile, out1( topo, tile, "crds_shred" ).idx );
1966 0 : poh_link_init( &replay_resolv, topo, tile, out1( topo, tile, "replay_resol" ).idx );
1967 :
1968 0 : if( FD_LIKELY( tile->poh.plugins_enabled ) ) {
1969 0 : poh_link_init( &replay_plugin, topo, tile, out1( topo, tile, "replay_plugi" ).idx );
1970 0 : poh_link_init( &gossip_plugin, topo, tile, out1( topo, tile, "gossip_plugi" ).idx );
1971 0 : poh_link_init( &start_progress_plugin, topo, tile, out1( topo, tile, "startp_plugi" ).idx );
1972 0 : poh_link_init( &vote_listener_plugin, topo, tile, out1( topo, tile, "votel_plugin" ).idx );
1973 0 : } else {
1974 : /* Mark these mcaches as "available", so the system boots, but the
1975 : memory is not set so nothing will actually get published via.
1976 : the links. */
1977 0 : FD_COMPILER_MFENCE();
1978 0 : replay_plugin.mcache = (fd_frag_meta_t*)1;
1979 0 : gossip_plugin.mcache = (fd_frag_meta_t*)1;
1980 0 : start_progress_plugin.mcache = (fd_frag_meta_t*)1;
1981 0 : vote_listener_plugin.mcache = (fd_frag_meta_t*)1;
1982 0 : FD_COMPILER_MFENCE();
1983 0 : }
1984 :
1985 0 : FD_LOG_INFO(( "PoH waiting to be initialized by Agave client... %lu %lu", fd_poh_waiting_lock, fd_poh_returned_lock ));
1986 0 : FD_VOLATILE( fd_poh_global_ctx ) = ctx;
1987 0 : FD_COMPILER_MFENCE();
1988 0 : for(;;) {
1989 0 : if( FD_LIKELY( FD_VOLATILE_CONST( fd_poh_waiting_lock ) ) ) break;
1990 0 : FD_SPIN_PAUSE();
1991 0 : }
1992 0 : FD_VOLATILE( fd_poh_waiting_lock ) = 0UL;
1993 0 : FD_VOLATILE( fd_poh_returned_lock ) = 1UL;
1994 0 : FD_COMPILER_MFENCE();
1995 0 : for(;;) {
1996 0 : if( FD_UNLIKELY( !FD_VOLATILE_CONST( fd_poh_returned_lock ) ) ) break;
1997 0 : FD_SPIN_PAUSE();
1998 0 : }
1999 0 : FD_COMPILER_MFENCE();
2000 :
2001 0 : if( FD_UNLIKELY( ctx->reset_slot==ULONG_MAX ) ) FD_LOG_ERR(( "PoH was not initialized by Agave client" ));
2002 :
2003 0 : fd_histf_join( fd_histf_new( ctx->begin_leader_delay, FD_MHIST_SECONDS_MIN( POH_TILE, BEGIN_LEADER_DELAY_SECONDS ),
2004 0 : FD_MHIST_SECONDS_MAX( POH_TILE, BEGIN_LEADER_DELAY_SECONDS ) ) );
2005 0 : fd_histf_join( fd_histf_new( ctx->first_microblock_delay, FD_MHIST_SECONDS_MIN( POH_TILE, FIRST_MICROBLOCK_DELAY_SECONDS ),
2006 0 : FD_MHIST_SECONDS_MAX( POH_TILE, FIRST_MICROBLOCK_DELAY_SECONDS ) ) );
2007 0 : fd_histf_join( fd_histf_new( ctx->slot_done_delay, FD_MHIST_SECONDS_MIN( POH_TILE, SLOT_DONE_DELAY_SECONDS ),
2008 0 : FD_MHIST_SECONDS_MAX( POH_TILE, SLOT_DONE_DELAY_SECONDS ) ) );
2009 :
2010 0 : for( ulong i=0UL; i<tile->in_cnt; i++ ) {
2011 0 : fd_topo_link_t * link = &topo->links[ tile->in_link_id[ i ] ];
2012 0 : fd_topo_wksp_t * link_wksp = &topo->workspaces[ topo->objs[ link->dcache_obj_id ].wksp_id ];
2013 :
2014 0 : ctx->in[ i ].mem = link_wksp->wksp;
2015 0 : ctx->in[ i ].chunk0 = fd_dcache_compact_chunk0( ctx->in[ i ].mem, link->dcache );
2016 0 : ctx->in[ i ].wmark = fd_dcache_compact_wmark ( ctx->in[ i ].mem, link->dcache, link->mtu );
2017 :
2018 0 : if( FD_UNLIKELY( !strcmp( link->name, "stake_out" ) ) ) {
2019 0 : ctx->in_kind[ i ] = IN_KIND_STAKE;
2020 0 : } else if( FD_UNLIKELY( !strcmp( link->name, "pack_bank" ) ) ) {
2021 0 : ctx->in_kind[ i ] = IN_KIND_PACK;
2022 0 : } else if( FD_LIKELY( !strcmp( link->name, "bank_poh" ) ) ) {
2023 0 : ctx->in_kind[ i ] = IN_KIND_BANK;
2024 0 : } else {
2025 0 : FD_LOG_ERR(( "unexpected input link name %s", link->name ));
2026 0 : }
2027 0 : }
2028 :
2029 0 : *ctx->shred_out = out1( topo, tile, "poh_shred" );
2030 0 : *ctx->pack_out = out1( topo, tile, "poh_pack" );
2031 0 : ctx->plugin_out->mem = NULL;
2032 0 : if( FD_LIKELY( tile->poh.plugins_enabled ) ) {
2033 0 : *ctx->plugin_out = out1( topo, tile, "poh_plugin" );
2034 0 : }
2035 :
2036 0 : ulong scratch_top = FD_SCRATCH_ALLOC_FINI( l, 1UL );
2037 0 : if( FD_UNLIKELY( scratch_top > (ulong)scratch + scratch_footprint( tile ) ) )
2038 0 : FD_LOG_ERR(( "scratch overflow %lu %lu %lu", scratch_top - (ulong)scratch - scratch_footprint( tile ), scratch_top, (ulong)scratch + scratch_footprint( tile ) ));
2039 0 : }
2040 :
2041 : /* One tick, one microblock, one plugin slot end, one plugin slot start,
2042 : and one leader update. */
2043 0 : #define STEM_BURST (5UL)
2044 :
2045 : /* See explanation in fd_pack */
2046 0 : #define STEM_LAZY (128L*3000L)
2047 :
2048 0 : #define STEM_CALLBACK_CONTEXT_TYPE fd_poh_ctx_t
2049 0 : #define STEM_CALLBACK_CONTEXT_ALIGN alignof(fd_poh_ctx_t)
2050 :
2051 0 : #define STEM_CALLBACK_METRICS_WRITE metrics_write
2052 0 : #define STEM_CALLBACK_AFTER_CREDIT after_credit
2053 0 : #define STEM_CALLBACK_BEFORE_FRAG before_frag
2054 0 : #define STEM_CALLBACK_DURING_FRAG during_frag
2055 0 : #define STEM_CALLBACK_AFTER_FRAG after_frag
2056 :
2057 : #include "../../../../disco/stem/fd_stem.c"
2058 :
2059 : fd_topo_run_tile_t fd_tile_poh = {
2060 : .name = "poh",
2061 : .populate_allowed_seccomp = NULL,
2062 : .populate_allowed_fds = NULL,
2063 : .scratch_align = scratch_align,
2064 : .scratch_footprint = scratch_footprint,
2065 : .privileged_init = privileged_init,
2066 : .unprivileged_init = unprivileged_init,
2067 : .run = stem_run,
2068 : };
|