Log Sequence Number. See Log Sequence Numbers (LSN).
A log sequence number points to a record in the log. It consists of two parts:
- hi(), a.k.a., file(). This is a number that matches a log partition file, e.g., "log.<file>"
- lo(), a.k.a., rba(). This is byte-offset into the log partition, and is the first byte of a log record, or is the first byte after the last log record in the file (where the next log record could be written).
All state is stored in a single 64-bit value. This reading or setting is atomic on 64-bit platforms (though updates still need protection).
- Warning
- This is NOT atomic on 32-bit platforms.
Because all state fits in 64 bits, there is a trade-off between maximum supported log partition size and number of partitions. Two reasonable choices are:
- 16-bit partition numbers, up to 256TB per partition
- 32-bit partition numbers, up to 4GB per partition
48-bit offsets are larger, but (slightly) more expensive and likely to wrap sooner. 32-bit offsets are still pretty big, and the chance of wrapping is much smaller (though a production system could theoretically hit the limit, since the count persists as long as the database exists. For now we go with the 32-32 split.
lsn_t no longer cares whether the disk can handle the full range it supports. If you support 48-bit partition sizes and the disk can only handle 32-bit offsets, the largest file will just happen to be smaller than lsn_t technically supports.
lsn_t does not cater to unaligned accesses. Log writes, in particular, are expected to be 8-byte aligned. The extra wasted bytes just aren't worth the performance hit of allowing misalignment.
- Note
- Once the database runs long enough we will run out of partition numbers (only 64k possible). Fortunately, this is a log, so lsn_t don't last forever. Eventually things become durable and the log partition file gets reclaimed (deleted). As long as the first partition is gone before the last one fills, we can simply wrap and change the sense of lsn_t comparisions.