Line data Source code
1 : #ifndef HEADER_fd_src_discof_poh_h
2 : #define HEADER_fd_src_discof_poh_h
3 :
4 : /* Let's say there was a computer, the "leader" computer, that acted as
5 : a bank. Users could send it messages saying they wanted to deposit
6 : money, or transfer it to someone else.
7 :
8 : That's how, for example, Bank of America works but there are problems
9 : with it. One simple problem is: the bank can set your balance to
10 : zero if they don't like you.
11 :
12 : You could try to fix this by having the bank periodically publish the
13 : list of all account balances and transactions. If the customers add
14 : unforgeable signatures to their deposit slips and transfers, then
15 : the bank cannot zero a balance without it being obvious to everyone.
16 :
17 : There's still problems. The bank can't lie about your balance now or
18 : take your money, but it can just not accept deposits on your behalf
19 : by ignoring you.
20 :
21 : You could fix this by getting a few independent banks together, lets
22 : say Bank of America, Bank of England, and Westpac, and having them
23 : rotate who operates the leader computer periodically. If one bank
24 : ignores your deposits, you can just wait and send them to the next
25 : one.
26 :
27 : This is Solana.
28 :
29 : There's still problems of course but they are largely technical. How
30 : do the banks agree who is leader? How do you recover if a leader
31 : misbehaves? How do customers verify the transactions aren't forged?
32 : How do banks receive and publish and verify each others work quickly?
33 : These are the main technical innovations that enable Solana to work
34 : well.
35 :
36 : What about Proof of History?
37 :
38 : One particular niche problem is about the leader schedule. When the
39 : leader computer is moving from one bank to another, the new bank must
40 : wait for the old bank to say it's done and provide a final list of
41 : balances that it can start working off of. But: what if the computer
42 : at the old bank crashes and never says its done?
43 :
44 : Does the new leader just take over at some point? What if the new
45 : leader is malicious, and says the past thousand leaders crashed, and
46 : there have been no transactions for days? How do you check?
47 :
48 : This is what Proof of History solves. Each bank in the network must
49 : constantly do a lot of busywork (compute hashes), even when it is not
50 : leader.
51 :
52 : If the prior thousand leaders crashed, and no transactions happened
53 : in an hour, the new leader would have to show they did about an hour
54 : of busywork for everyone else to believe them.
55 :
56 : A better name for this is proof of skipping. If a leader is skipping
57 : slots (building off of a slot that is not the direct parent), it must
58 : prove that it waited a good amount of time to do so.
59 :
60 : It's not a perfect solution. For one thing, some banks have really
61 : fast computers and can compute a lot of busywork in a short amount of
62 : time, allowing them to skip prior slot(s) anyway. But: there is a
63 : social component that prevents validators from skipping the prior
64 : leader slot. It is easy to detect when this happens and the network
65 : could respond by ignoring their votes or stake.
66 :
67 : You could come up with other schemes: for example, the network could
68 : just use wall clock time. If a new leader publishes a block without
69 : waiting 400 milliseconds for the prior slot to complete, then there
70 : is no "proof of skipping" and the nodes ignore the slot.
71 :
72 : These schemes have a problem in that they are not deterministic
73 : across the network (different computers have different clocks), and
74 : so they will cause frequent forks which are very expensive to
75 : resolve. Even though the proof of history scheme is not perfect,
76 : it is better than any alternative which is not deterministic.
77 :
78 : With all that background, we can now describe at a high level what
79 : this PoH tile actually does,
80 :
81 : (1) Whenever any other leader in the network finishes a slot, and
82 : the slot is determined to be the best one to build off of, this
83 : tile gets "reset" onto that block, the so called "reset slot".
84 :
85 : (2) The tile is constantly doing busy work, hash(hash(hash(...))) on
86 : top of the last reset slot, even when it is not leader.
87 :
88 : (3) When the tile becomes leader, it continues hashing from where it
89 : was. Typically, the prior leader finishes their slot, so the
90 : reset slot will be the parent one, and this tile only publishes
91 : hashes for its own slot. But if prior slots were skipped, then
92 : there might be a whole chain already waiting.
93 :
94 : That's pretty much it. When we are leader, in addition to doing
95 : busywork, we publish ticks and microblocks to the shred tile. A
96 : microblock is a non-empty group of transactions whose hashes are
97 : mixed-in to the chain, while a tick is a periodic stamp of the
98 : current hash, with no transactions (nothing mixed in). We need
99 : to send both to the shred tile, as ticks are important for other
100 : validators to verify in parallel.
101 :
102 : As well, the tile should never become leader for a slot that it has
103 : published anything for, otherwise it may create a duplicate block.
104 :
105 : Some particularly common misunderstandings:
106 :
107 : - PoH is critical to security.
108 :
109 : This largely isn't true. The target hash rate of the network is
110 : so slow (1 hash per 500 nanoseconds) that a malicious leader can
111 : easily catch up if they start from an old hash, and the only
112 : practical attack prevented is the proof of skipping. Most of the
113 : long range attacks in the Solana whitepaper are not relevant.
114 :
115 : - PoH keeps passage of time.
116 :
117 : This is also not true. The way the network keeps time so it can
118 : decide who is leader is that, each leader uses their operating
119 : system clock to time 400 milliseconds and publishes their block
120 : when this timer expires.
121 :
122 : If a leader just hashed as fast as they could, they could publish
123 : a block in tens of milliseconds, and the rest of the network
124 : would happily accept it. This is why the Solana "clock" as
125 : determined by PoH is not accurate and drifts over time.
126 :
127 : - PoH prevents transaction reordering by the leader.
128 :
129 : The leader can, in theory, wait until the very end of their
130 : leader slot to publish anything at all to the network. They can,
131 : in particular, hold all received transactions for 400
132 : milliseconds and then reorder and publish some right at the end
133 : to advantage certain transactions.
134 :
135 : You might be wondering... if all the PoH chain is helping us do is
136 : prove that slots were skipped correctly, why do we need to "mix in"
137 : transactions to the hash value? Or do anything at all for slots
138 : where we don't skip the prior slot?
139 :
140 : It's a good question, and the answer is that this behavior is not
141 : necessary. An ideal implementation of PoH have no concept of ticks
142 : or mixins, and would not be part of the TPU pipeline at all.
143 : Instead, there would be a simple field "skip_proof" on the last
144 : shred we send for a slot, the hash(hash(...)) value. This field
145 : would only be filled in (and only verified by replayers) in cases
146 : where the slot actually skipped a parent.
147 :
148 : Then what is the "clock? In Solana, time is constructed as follows:
149 :
150 : HASHES
151 :
152 : The base unit of time is a hash. Hereafter, any values whose
153 : units are in hashes are called a "hashcnt" to distinguish them
154 : from actual hashed values.
155 :
156 : Agave generally defines a constant duration for each tick
157 : (see below) and then varies the number of hashcnt per tick, but
158 : as we consider the hashcnt the base unit of time, Firedancer and
159 : this PoH implementation defines everything in terms of hashcnt
160 : duration instead.
161 :
162 : In mainnet-beta, testnet, and devnet the hashcnt ticks over
163 : (increments) every 100 nanoseconds. The hashcnt rate is
164 : specified as 500 nanoseconds according to the genesis, but there
165 : are several features which increase the number of hashes per
166 : tick while keeping tick duration constant, which make the time
167 : per hashcnt lower. These features up to and including the
168 : `update_hashes_per_tick6` feature are activated on mainnet-beta,
169 : devnet, and testnet, and are described in the TICKS section
170 : below.
171 :
172 : Other chains and development environments might have a different
173 : hashcnt rate in the genesis, or they might not have activated
174 : the features which increase the rate yet, which we also support.
175 :
176 : In practice, although each validator follows a hashcnt rate of
177 : 100 nanoseconds, the overall observed hashcnt rate of the
178 : network is a little slower than once every 100 nanoseconds,
179 : mostly because there are gaps and clock synchronization issues
180 : during handoff between leaders. This is referred to as clock
181 : drift.
182 :
183 : TICKS
184 :
185 : The leader needs to periodically checkpoint the hash value
186 : associated with a given hashcnt so that they can publish it to
187 : other nodes for verification.
188 :
189 : On mainnet-beta, testnet, and devnet this occurs once every
190 : 62,500 hashcnts, or approximately once every 6.4 microseconds.
191 : This value is determined at genesis time, and according to the
192 : features below, and could be different in development
193 : environments or on other chains which we support.
194 :
195 : Due to protocol limitations, when mixing in transactions to the
196 : proof-of-history chain, it cannot occur on a tick boundary (but
197 : can occur at any other hashcnt).
198 :
199 : Ticks exist mainly so that verification can happen in parallel.
200 : A verifier computer, rather than needing to do hash(hash(...))
201 : all in sequence to verify a proof-of-history chain, can do,
202 :
203 : Core 0: hash(hash(...))
204 : Core 1: hash(hash(...))
205 : Core 2: hash(hash(...))
206 : Core 3: hash(hash(...))
207 : ...
208 :
209 : Between each pair of tick boundaries.
210 :
211 : Solana sometimes calls the current tick the "tick height",
212 : although it makes more sense to think of it as a counter from
213 : zero, it's just the number of ticks since the genesis hash.
214 :
215 : There is a set of features which increase the number of hashcnts
216 : per tick. These are all deployed on mainnet-beta, devnet, and
217 : testnet.
218 :
219 : name: update_hashes_per_tick
220 : id: 3uFHb9oKdGfgZGJK9EHaAXN4USvnQtAFC13Fh5gGFS5B
221 : hashes per tick: 12,500
222 : hashcnt duration: 500 nanos
223 :
224 : name: update_hashes_per_tick2
225 : id: EWme9uFqfy1ikK1jhJs8fM5hxWnK336QJpbscNtizkTU
226 : hashes per tick: 17,500
227 : hashcnt duration: 357.142857143 nanos
228 :
229 : name: update_hashes_per_tick3
230 : id: 8C8MCtsab5SsfammbzvYz65HHauuUYdbY2DZ4sznH6h5
231 : hashes per tick: 27,500
232 : hashcnt duration: 227.272727273 nanos
233 :
234 : name: update_hashes_per_tick4
235 : id: 8We4E7DPwF2WfAN8tRTtWQNhi98B99Qpuj7JoZ3Aikgg
236 : hashes per tick: 47,500
237 : hashcnt duration: 131.578947368 nanos
238 :
239 : name: update_hashes_per_tick5
240 : id: BsKLKAn1WM4HVhPRDsjosmqSg2J8Tq5xP2s2daDS6Ni4
241 : hashes per tick: 57,500
242 : hashcnt duration: 108.695652174 nanos
243 :
244 : name: update_hashes_per_tick6
245 : id: FKu1qYwLQSiehz644H6Si65U5ZQ2cp9GxsyFUfYcuADv
246 : hashes per tick: 62,500
247 : hashcnt duration: 100 nanos
248 :
249 : In development environments, there is a way to configure the
250 : hashcnt per tick to be "none" during genesis, for a so-called
251 : "low power" tick producer. The idea is not to spin cores during
252 : development. This is equivalent to setting the hashcnt per tick
253 : to be 1, and increasing the hashcnt duration to the desired tick
254 : duration.
255 :
256 : SLOTS
257 :
258 : Each leader needs to be leader for a fixed amount of time, which
259 : is called a slot. During a slot, a leader has an opportunity to
260 : receive transactions and produce a block for the network,
261 : although they may miss ("skip") the slot if they are offline or
262 : not behaving.
263 :
264 : In mainnet-beta, testnet, and devnet a slot is 64 ticks, or
265 : 4,000,000 hashcnts, or approximately 400 milliseconds.
266 :
267 : Due to the way the leader schedule is constructed, each leader
268 : is always given at least four (4) consecutive slots in the
269 : schedule. This means when becoming leader you will be leader
270 : for at least 4 slots, or 1.6 seconds.
271 :
272 : It is rare, although can happen that a leader gets more than 4
273 : consecutive slots (eg, 8, or 12), if they are lucky with the
274 : leader schedule generation.
275 :
276 : The number of ticks in a slot is fixed at genesis time, and
277 : could be different for development or other chains, which we
278 : support. There is nothing special about 4 leader slots in a
279 : row, and this might be changed in future, and the proof of
280 : history makes no assumptions that this is the case.
281 :
282 : EPOCHS
283 :
284 : Infrequently, the network needs to do certain housekeeping,
285 : mainly things like collecting rent and deciding on the leader
286 : schedule. The length of an epoch is fixed on mainnet-beta,
287 : devnet and testnet at 420,000 slots, or around ~2 (1.94) days.
288 : This value is fixed at genesis time, and could be different for
289 : other chains including development, which we support. Typically
290 : in development, epochs are every 8,192 slots, or around ~1 hour
291 : (54.61 minutes), although it depends on the number of ticks per
292 : slot and the target hashcnt rate of the genesis as well.
293 :
294 : In development, epochs need not be a fixed length either. There
295 : is a "warmup" option, where epochs start short and grow, which
296 : is useful for quickly warming up stake during development.
297 :
298 : The epoch is important because it is the only time the leader
299 : schedule is updated. The leader schedule is a list of which
300 : leader is leader for which slot, and is generated by a special
301 : algorithm that is deterministic and known to all nodes.
302 :
303 : The leader schedule is computed one epoch in advance, so that
304 : at slot T, we always know who will be leader up until the end
305 : of slot T+EPOCH_LENGTH. Specifically, the leader schedule for
306 : epoch N is computed during the epoch boundary crossing from
307 : N-2 to N-1. For mainnet-beta, the slots per epoch is fixed and
308 : will always be 420,000. */
309 :
310 : #include "../../disco/pack/fd_pack.h"
311 : #include "../../disco/stem/fd_stem.h"
312 : #include "../../util/fd_util_base.h"
313 : #include "../../ballet/sha256/fd_sha256.h"
314 :
315 : /* FD_POH_{ALIGN,FOOTPRINT} describe the alignment and footprint needed
316 : for a memory region to hold a fd_poh_t. ALIGN is a positive
317 : integer power of 2. FOOTPRINT is a multiple of align. These are
318 : provided to facilitate compile time declarations. */
319 :
320 0 : #define FD_POH_ALIGN (128UL)
321 0 : #define FD_POH_FOOTPRINT (128UL)
322 :
323 0 : #define FD_POH_MAGIC (0xF17EDA2CE580A000) /* FIREDANCE POH V0 */
324 :
325 : /* The maximum number of microblocks that pack is allowed to pack into a
326 : single slot. This is not consensus critical, and pack could, if we
327 : let it, produce as many microblocks as it wants, and the slot would
328 : still be valid.
329 :
330 : We have this here instead so that PoH can estimate slot completion,
331 : and keep the hashcnt up to date as pack progresses through packing
332 : the slot. If this upper bound was not enforced, PoH could tick to
333 : the last hash of the slot and have no hashes left to mixin incoming
334 : microblocks from pack, so this upper bound is a coordination
335 : mechanism so that PoH can progress hashcnts while the slot is active,
336 : and know that pack will not need those hashcnts later to do mixins. */
337 0 : #define MAX_MICROBLOCKS_PER_SLOT (32768UL)
338 :
339 : /* When we are hashing in the background in case a prior leader skips
340 : their slot, we need to store the result of each tick hash so we can
341 : publish them when we become leader. The network requires at least
342 : one leader slot to publish in each epoch for the leader schedule to
343 : generate, so in the worst case we might need two full epochs of slots
344 : to store the hashes. (Eg, if epoch T only had a published slot in
345 : position 0 and epoch T+1 only had a published slot right at the end).
346 :
347 : There is a tighter bound: the block data limit of mainnet-beta is
348 : currently FD_PACK_MAX_DATA_PER_BLOCK, or 27,332,342 bytes per slot.
349 : At 48 bytes per tick, it is not possible to publish a slot that skips
350 : 569,424 or more prior slots. */
351 0 : #define MAX_SKIPPED_TICKS (1UL+(FD_PACK_MAX_DATA_PER_BLOCK/48UL))
352 :
353 : struct fd_poh_leader_slot_ended {
354 : int completed;
355 : ulong slot;
356 : uchar blockhash[ 32UL ];
357 : };
358 :
359 : typedef struct fd_poh_leader_slot_ended fd_poh_leader_slot_ended_t;
360 :
361 : struct fd_poh_out_private {
362 : ulong idx;
363 : fd_wksp_t * mem;
364 : ulong chunk0;
365 : ulong wmark;
366 : ulong chunk;
367 : };
368 :
369 : typedef struct fd_poh_out_private fd_poh_out_t;
370 :
371 : struct __attribute__((aligned(FD_POH_ALIGN))) fd_poh_private {
372 : int state;
373 :
374 : /* Static configuration determined at genesis creation time. See
375 : long comment above for more information. */
376 : ulong tick_duration_ns;
377 : ulong hashcnt_per_tick;
378 : ulong ticks_per_slot;
379 :
380 : /* Derived from the above configuration, but we precompute it. */
381 : double slot_duration_ns;
382 : double hashcnt_duration_ns;
383 : ulong hashcnt_per_slot;
384 :
385 : /* The maximum number of microblocks that the pack tile can publish in
386 : each slot. */
387 : ulong max_microblocks_per_slot;
388 :
389 : /* The block id of the completed block. */
390 : uchar completed_block_id[ 32UL ];
391 :
392 : /* The slot we were reset on (what we are building on top of). */
393 : ulong reset_slot;
394 : long reset_slot_start_ns;
395 :
396 : /* The current slot and hashcnt within that slot of the proof of
397 : history, including hashes we have been producing in the background
398 : while waiting for our next leader slot. */
399 : ulong slot;
400 : ulong hashcnt;
401 :
402 : ulong next_leader_slot;
403 : long leader_slot_start_ns;
404 :
405 : /* When we send a microblock on to the shred tile, we need to tell
406 : it how many hashes there have been since the last microblock, so
407 : this tracks the hashcnt of the last published microblock.
408 :
409 : If we are skipping slots prior to our leader slot, the last_slot
410 : will be quite old, and potentially much larger than the number of
411 : hashcnts in one slot. */
412 : ulong last_slot;
413 : ulong last_hashcnt;
414 :
415 : /* If we have received the slot done message from pack yet. We are
416 : not allowed to fully finish hashing the block until this happens so
417 : that we know which slot the slot_done message is arriving for. */
418 : int slot_done;
419 :
420 : /* The PoH tile must never drop microblocks that get committed by the
421 : bank, so it needs to always be able to mixin a microblock hash.
422 : Mixing in requires incrementing the hashcnt, so we need to ensure
423 : at all times that there is enough hascnts left in the slot to
424 : mixin whatever future microblocks pack might produce for it.
425 :
426 : This value tracks that. At any time, max_microblocks_per_slot
427 : - microblocks_lower_bound is an upper bound on the maximum number
428 : of microblocks that might still be received in this slot. */
429 : ulong microblocks_lower_bound;
430 :
431 : uchar __attribute__((aligned(32UL))) reset_hash[ 32 ];
432 : uchar __attribute__((aligned(32UL))) hash[ 32 ];
433 :
434 : /* When we are not leader, we need to save the hashes that were
435 : produced in case the prior leader skips. If they skip, we will
436 : replay these skipped hashes into our next leader bank so that
437 : the slot hashes sysvar can be updated correctly, and also publish
438 : them to peer nodes as part of our outgoing shreds. */
439 : uchar skipped_tick_hashes[ MAX_SKIPPED_TICKS ][ 32 ];
440 :
441 : fd_sha256_t * sha256;
442 :
443 : fd_poh_out_t shred_out[ 1 ];
444 : fd_poh_out_t replay_out[ 1 ];
445 :
446 : ulong magic;
447 : };
448 :
449 : typedef struct fd_poh_private fd_poh_t;
450 :
451 : FD_PROTOTYPES_BEGIN
452 :
453 : FD_FN_CONST ulong
454 : fd_poh_align( void );
455 :
456 : FD_FN_CONST ulong
457 : fd_poh_footprint( void );
458 :
459 : void *
460 : fd_poh_new( void * shmem );
461 :
462 : fd_poh_t *
463 : fd_poh_join( void * shpoh,
464 : fd_poh_out_t * shred_out,
465 : fd_poh_out_t * replay_out );
466 :
467 : void
468 : fd_poh_reset( fd_poh_t * poh,
469 : fd_stem_context_t * stem,
470 : ulong hashcnt_per_tick,
471 : ulong ticks_per_slot,
472 : ulong tick_duration_ns,
473 : ulong completed_slot,
474 : uchar const * completed_blockhash,
475 : ulong next_leader_slot,
476 : ulong max_microblocks_in_slot,
477 : uchar const * completed_block_id );
478 :
479 : int
480 : fd_poh_have_leader_bank( fd_poh_t const * poh );
481 :
482 : void
483 : fd_poh_begin_leader( fd_poh_t * poh,
484 : ulong slot,
485 : ulong hashcnt_per_tick,
486 : ulong ticks_per_slot,
487 : ulong tick_duration_ns,
488 : ulong max_microblocks_in_slot );
489 :
490 : void
491 : fd_poh_done_packing( fd_poh_t * poh,
492 : ulong microblocks_in_slot );
493 :
494 : void
495 : fd_poh_advance( fd_poh_t * poh,
496 : fd_stem_context_t * stem,
497 : int * opt_poll_in,
498 : int * charge_busy );
499 :
500 : void
501 : fd_poh1_mixin( fd_poh_t * poh,
502 : fd_stem_context_t * stem,
503 : ulong slot,
504 : uchar const * hash,
505 : ulong txn_cnt,
506 : fd_txn_p_t const * txns );
507 :
508 : FD_PROTOTYPES_END
509 :
510 : #endif /* HEADER_fd_src_discof_poh_h */
|