Implementation of a log archiver using asynchronous reader and writer threads.
More...
Implementation of a log archiver using asynchronous reader and writer threads.
The log archiver runs as a background daemon whose execution is controlled by an ArchiverControl object. Once a log archiver thread is created and forked, it waits for an activation to start working. The caller thread must invoke the activate() method to perform this activation.
Log archiving works in activation cycles, in which it first waits for an activation and then consumes the recovery log up to a given LSN value (
- See also
- activate(bool, lsn_t)). This cycle is executed in an infinite loop until the method shutdown() is invoked. Once shutdown is invoked, the current cycle is not interrupted. Instead, it finishes consuming the log until the LSN given in the last successful activation and only then it exits. The destructor also invokes shutdown() if not done yet.
The class LogArchiver itself serves merely as an orchestrator of its components, which are:
- LogArchiver::LogConsumer, which encapsulates a reader thread and parsing individual log records from the recovery log.
- LogArchiver::ArchiverHeap, which performs run generation by sorting the input stream given by the log consumer.
- LogArchiver::BlockAssembly, which consumes the sorted output from the heap, builds indexed blocks of log records (used for instant restore), and passes them over to the asynchronous writer thread
- LogArchiver::ArchiveDirectory, which represents the set of sorted runs that compose the log archive itself. It manages filesystem operations to read from and write to the log archive, controls access to the archive index, and provides scanning facilities used by restore.
One activation cycle consists of consuming all log records from the log consumer, which must first be opened with the given "end LSN". Each log record is then inserted into the heap until it becomes full. Then, log records are removed from the heap (usually in bulk, e.g., one block at a time) and passed to the block assembly component. The cycle finishes once all log records up to the given LSN are inserted into the heap, which does not necessarily mean that the persistent log archive will contain all those log records. The only way to enforce that is to perform a shutdown. This design maintains the heap always as full as possible, which generates runs whose size is (i) as large as possible and (ii) independent of the activation behavior.
In the typical operation mode, a LogArchiver instance is constructed using the sm_options provided by the user, but for tests and external experiments, it can also be constructed by passing instances of these four components above.
A note on processing older log partitions (TODO): Before we implemented the archiver, the log manager would delete a partition once it was eliminated from the list of 8 open partitions. The compiler flag KEEP_LOG_PARTITIONS was used to omit the delete operation, leaving the complete history of the database in the log directory. However, if log archiving is enabled, it should take over the responsibility of deleting old log partitions. Currently, if the flag is not set and the archiver cannot keep up with the growth of the log, partitions would be lost from archiving.
- See also
- LogArchiver::LogConsumer
-
LogArchiver::ArchiverHeap
-
LogArchiver::BlockAssembly
-
LogArchiver::ArchiveDirectory
- Author
- Caetano Sauer
| void LogArchiver::replacement |
( |
| ) |
|
|
private |
Replacement part of replacement-selection algorithm. Fetches log records from the read buffer into the sort workspace and adds a correspondent entry to the heap. When workspace is full, invoke selection until there is space available for the current log record.
Unlike standard replacement selection, runs are limited to the size of the workspace, in order to maintain a simple non-overlapping mapping between regions of the input file (i.e., the recovery log) and the runs. To achieve that, we change the logic that assigns run numbers to incoming records:
a) Standard RS: if incoming key is larger than the last record written, assign to current run, otherwise to the next run. b) Log-archiving RS: keep track of run number currently being written, always assigning the incoming records to a greater run. Once all records from the current run are removed from the heap, increment the counter. To start, initial input records are assigned to run 1 until the workspace is full, after which incoming records are assigned to run 2.