Architecture22 min read

Write-Ahead Logging (WAL): The Foundation of Database Durability

Master Write-Ahead Logging from fundamentals to implementation. Learn how databases guarantee durability through WAL, understand crash recovery, checkpoint strategies, and see how modern databases like NexaDB implement WAL for reliability.

K
Krish
December 6, 2025
Share:

What is Write-Ahead Logging (WAL)?

Write-Ahead Logging (WAL) is a technique for ensuring data durability in databases. The principle is simple but powerful: before modifying any data, first write the intended changes to a persistent log.

The Golden Rule of WAL:
┌─────────────────────────────────────────────────────────────┐
│  "Never modify the data page before logging the change."   │
└─────────────────────────────────────────────────────────────┘

This guarantees that if the system crashes at any point, the database can recover to a consistent state by replaying the log.

Why WAL Matters

Consider what happens without WAL:

Without WAL - Disaster Scenario:

1. User: UPDATE balance = balance - 100 WHERE user = 'Alice'
2. User: UPDATE balance = balance + 100 WHERE user = 'Bob'
3. Database writes Alice's new balance to disk ✓
4. 💥 CRASH before writing Bob's balance
5. On restart: Alice lost $100, Bob gained nothing!
   → Money disappeared from the system

With WAL:

With WAL - Safe Recovery:

1. User: UPDATE balance = balance - 100 WHERE user = 'Alice'
2. User: UPDATE balance = balance + 100 WHERE user = 'Bob'
3. Database writes BOTH changes to WAL ✓
4. Database writes Alice's balance to disk ✓
5. 💥 CRASH before writing Bob's balance
6. On restart: Replay WAL → Apply Bob's update
   → Transaction fully committed, system consistent!

The ACID Connection

WAL is the implementation mechanism behind the D in ACID (Durability):

ACID Property What It Means How WAL Helps
Atomicity All or nothing Undo log enables rollback
Consistency Valid state only Recovery replays committed txns
Isolation Transactions don't interfere MVCC uses WAL records
Durability Committed = permanent WAL persists before data

How WAL Works: Deep Dive

The Write Path

┌──────────────────────────────────────────────────────────────┐
│                     WRITE OPERATION                          │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│ 1. Generate WAL Record                                       │
│    - Transaction ID                                          │
│    - Operation type (INSERT, UPDATE, DELETE)                 │
│    - Before image (for UNDO)                                 │
│    - After image (for REDO)                                  │
│    - Timestamp / LSN (Log Sequence Number)                   │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│ 2. Append to WAL Buffer (in memory)                          │
│    - Fast sequential write                                   │
│    - Batched for efficiency                                  │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│ 3. Flush WAL to Disk (fsync)                                 │
│    - On commit                                               │
│    - Or when buffer is full                                  │
│    - Or at configured intervals                              │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│ 4. Apply Changes to Data Pages (in buffer pool)             │
│    - May be deferred (lazy)                                  │
│    - Pages marked "dirty"                                    │
└──────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌──────────────────────────────────────────────────────────────┐
│ 5. Background: Flush dirty pages to data files               │
│    - Checkpoint process                                      │
│    - Order doesn't matter (WAL protects us)                  │
└──────────────────────────────────────────────────────────────┘

WAL Record Structure

A typical WAL record contains:

@dataclass
class WALRecord:
    lsn: int              # Log Sequence Number (monotonically increasing)
    transaction_id: int   # Which transaction made this change
    operation: str        # INSERT, UPDATE, DELETE, COMMIT, ABORT
    table_id: int         # Which table was modified
    page_id: int          # Which page was modified
    offset: int           # Offset within the page
    before_image: bytes   # Old data (for UNDO)
    after_image: bytes    # New data (for REDO)
    timestamp: int        # When this happened
    checksum: int         # Data integrity verification

Log Sequence Numbers (LSN)

The LSN is the backbone of WAL ordering:

class LSNGenerator:
    """
    Generates monotonically increasing Log Sequence Numbers.
    LSN = unique identifier for each WAL record.
    """
    def __init__(self):
        self._current = 0
        self._lock = threading.Lock()

    def next(self) -> int:
        with self._lock:
            self._current += 1
            return self._current

Each data page tracks its PageLSN - the LSN of the most recent WAL record that modified it:

Page Header:
┌───────────────────────────────────────┐
│ PageLSN: 1000                         │  ← Last WAL record that touched this page
│ Checksum: 0xABCD1234                  │
│ Free Space: 2048 bytes                │
├───────────────────────────────────────┤
│ Data...                               │
└───────────────────────────────────────┘

WAL: [LSN 998] [LSN 999] [LSN 1000] [LSN 1001] ...
                              ↑
                    This record modified the page

Crash Recovery: ARIES Algorithm

Most databases use the ARIES (Algorithms for Recovery and Isolation Exploiting Semantics) recovery protocol:

The Three Phases

┌─────────────────────────────────────────────────────────────┐
│                   ARIES RECOVERY                            │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Phase 1: ANALYSIS                                          │
│  ─────────────────                                          │
│  • Scan WAL from last checkpoint                            │
│  • Build transaction table (active txns at crash)           │
│  • Build dirty page table (pages not flushed)               │
│  • Determine REDO start point                               │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Phase 2: REDO                                              │
│  ─────────────                                              │
│  • Replay WAL from REDO start point                         │
│  • Reapply ALL changes (even if page on disk)               │
│  • Use LSN comparison to skip already-applied changes       │
│  • Brings database to exact crash state                     │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Phase 3: UNDO                                              │
│  ──────────                                                 │
│  • Roll back uncommitted transactions                       │
│  • Walk backward through WAL                                │
│  • Apply before-images to undo changes                      │
│  • Write compensation log records (CLRs)                    │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Implementation

class ARIESRecovery:
    def recover(self):
        # Phase 1: Analysis
        active_txns, dirty_pages, redo_lsn = self.analysis_phase()

        # Phase 2: Redo
        self.redo_phase(redo_lsn)

        # Phase 3: Undo
        self.undo_phase(active_txns)

        print("Recovery complete. Database is consistent.")

    def analysis_phase(self):
        """Scan WAL to understand crash state."""
        active_txns = {}    # txn_id -> last_lsn
        dirty_pages = {}    # page_id -> rec_lsn (first LSN that dirtied it)

        # Start from last checkpoint
        checkpoint_lsn = self.read_checkpoint()

        for record in self.wal.scan_from(checkpoint_lsn):
            if record.operation == 'BEGIN':
                active_txns[record.transaction_id] = record.lsn
            elif record.operation == 'COMMIT':
                del active_txns[record.transaction_id]
            elif record.operation == 'ABORT':
                del active_txns[record.transaction_id]
            else:
                # Data modification
                active_txns[record.transaction_id] = record.lsn
                if record.page_id not in dirty_pages:
                    dirty_pages[record.page_id] = record.lsn

        # REDO starts from min of all rec_lsn values
        redo_lsn = min(dirty_pages.values()) if dirty_pages else checkpoint_lsn

        return active_txns, dirty_pages, redo_lsn

    def redo_phase(self, start_lsn):
        """Replay WAL to restore crash state."""
        for record in self.wal.scan_from(start_lsn):
            if record.operation in ('INSERT', 'UPDATE', 'DELETE'):
                page = self.buffer_pool.get_page(record.page_id)

                # Only redo if page hasn't seen this change
                if page.lsn < record.lsn:
                    self.apply_redo(page, record)
                    page.lsn = record.lsn

    def undo_phase(self, active_txns):
        """Roll back uncommitted transactions."""
        # Get all LSNs to undo, sorted descending
        to_undo = sorted(active_txns.values(), reverse=True)

        while to_undo:
            lsn = to_undo.pop(0)
            record = self.wal.read(lsn)

            if record.operation in ('INSERT', 'UPDATE', 'DELETE'):
                # Apply the undo
                page = self.buffer_pool.get_page(record.page_id)
                self.apply_undo(page, record)

                # Write Compensation Log Record
                clr = self.write_clr(record)

                # Follow undo chain
                if record.prev_lsn:
                    to_undo.append(record.prev_lsn)

Checkpointing: Limiting Recovery Time

Without checkpoints, recovery would replay the entire WAL from the beginning. Checkpoints limit how far back we need to go.

Types of Checkpoints

1. Consistent Checkpoint (Quiesce)

Steps:
1. Stop accepting new transactions
2. Wait for all active transactions to complete
3. Flush all dirty pages to disk
4. Write checkpoint record to WAL
5. Resume operations

Pros: Simple recovery
Cons: Database unavailable during checkpoint (can be minutes!)

2. Fuzzy Checkpoint (Non-blocking)

Steps:
1. Write BEGIN_CHECKPOINT to WAL
2. Record active transactions and dirty pages
3. Continue normal operations (don't block!)
4. Asynchronously flush dirty pages
5. Write END_CHECKPOINT to WAL

Pros: No blocking
Cons: More complex recovery

NexaDB's Checkpoint Strategy

class CheckpointManager:
    def __init__(self, wal, buffer_pool, config):
        self.wal = wal
        self.buffer_pool = buffer_pool
        self.checkpoint_interval = config.get('checkpoint_interval', 300)  # 5 min
        self.max_wal_size = config.get('max_wal_size', 1024 * 1024 * 1024)  # 1GB

    def should_checkpoint(self):
        """Determine if checkpoint is needed."""
        # Time-based
        if time.time() - self.last_checkpoint > self.checkpoint_interval:
            return True

        # WAL size-based (prevent unbounded growth)
        if self.wal.size() > self.max_wal_size:
            return True

        return False

    def fuzzy_checkpoint(self):
        """Perform non-blocking fuzzy checkpoint."""
        # Record start
        begin_lsn = self.wal.append(CheckpointBegin(
            active_txns=self.get_active_transactions(),
            dirty_pages=self.buffer_pool.get_dirty_pages()
        ))

        # Flush dirty pages in background
        for page_id in self.buffer_pool.get_dirty_pages():
            self.buffer_pool.flush_page(page_id)
            yield  # Allow other operations to proceed

        # Record end
        self.wal.append(CheckpointEnd(begin_lsn=begin_lsn))

        # Update master checkpoint record
        self.update_checkpoint_pointer(begin_lsn)

        # Truncate old WAL segments
        self.wal.truncate_before(begin_lsn)

WAL Durability Levels

Different applications need different durability guarantees:

Level 1: Full Durability (fsync on commit)

def commit_full_durability(self, txn_id):
    """Guaranteed durability - data survives any crash."""
    record = CommitRecord(txn_id=txn_id, timestamp=time.time())
    self.wal.append(record)
    self.wal.fsync()  # Force to disk
    return True

# Latency: ~1-10ms per commit (disk dependent)
# Guarantees: Committed transaction NEVER lost
# Use for: Financial transactions, critical data

Level 2: Group Commit (batched fsync)

class GroupCommit:
    """Batch multiple commits into single fsync."""

    def __init__(self, wal, max_delay_ms=10, max_batch_size=100):
        self.wal = wal
        self.pending = []
        self.max_delay = max_delay_ms / 1000
        self.max_batch = max_batch_size
        self.lock = threading.Lock()
        self.commit_event = threading.Event()

    def commit(self, txn_id):
        with self.lock:
            record = CommitRecord(txn_id=txn_id)
            self.wal.append(record)
            self.pending.append(txn_id)

            if len(self.pending) >= self.max_batch:
                self._flush()
            else:
                # Wait for batch or timeout
                threading.Timer(self.max_delay, self._maybe_flush).start()

        self.commit_event.wait()
        return True

    def _flush(self):
        if self.pending:
            self.wal.fsync()
            self.pending.clear()
            self.commit_event.set()

# Latency: ~1-10ms amortized across batch
# Throughput: 10-100x higher than individual fsync
# Use for: High-throughput applications with slight latency tolerance

Level 3: Async Durability (periodic fsync)

def commit_async(self, txn_id):
    """Fast commit, eventual durability."""
    record = CommitRecord(txn_id=txn_id, timestamp=time.time())
    self.wal.append(record)  # In-memory only
    return True

# Background thread:
def background_sync(self):
    while True:
        time.sleep(0.01)  # 10ms
        self.wal.fsync()

# Latency: <0.1ms per commit
# Risk: Up to 10ms of data loss on crash
# Use for: Caching layers, non-critical data, metrics

NexaDB Configuration

# nexadb.yaml
wal:
  sync_mode: "group"      # full | group | async
  group_commit_delay: 10  # ms
  group_commit_size: 100  # max transactions per batch
  checkpoint_interval: 300  # seconds
  max_wal_size: 1GB

WAL in Different Databases

PostgreSQL: Full-Featured WAL

PostgreSQL WAL Features:
├── Full ARIES recovery
├── Streaming replication (replicas read WAL)
├── Point-in-Time Recovery (PITR)
├── WAL archiving
├── Logical decoding (change data capture)
└── Configurable sync levels

MySQL/InnoDB: Redo Log + Doublewrite

InnoDB adds "doublewrite buffer" for torn page protection:

1. Write pages to doublewrite buffer (sequential)
2. fsync doublewrite buffer
3. Write pages to actual locations
4. On recovery: if page is torn, restore from doublewrite

SQLite: WAL Mode

-- Enable WAL mode
PRAGMA journal_mode=WAL;

-- SQLite WAL benefits:
-- • Readers don't block writers
-- • Writers don't block readers
-- • Faster for many workloads
-- • Automatic checkpointing

LSM-Tree Databases (LevelDB, RocksDB, NexaDB)

In LSM-trees, WAL protects the MemTable:

Write → WAL → MemTable → (flush) → SSTable

If crash before flush:
  → Replay WAL to rebuild MemTable
  → Continue from there

NexaDB optimization: Parallel WAL writing during MemTable flush

Implementation: Building a WAL

Here's a production-quality WAL implementation:

import struct
import os
import mmap
from dataclasses import dataclass
from typing import Iterator, Optional
import hashlib

@dataclass
class WALRecord:
    lsn: int
    transaction_id: int
    operation: int  # 1=INSERT, 2=UPDATE, 3=DELETE, 4=COMMIT, 5=ABORT
    key: bytes
    value: bytes
    checksum: int

    HEADER_SIZE = 32  # lsn(8) + txn_id(8) + op(4) + key_len(4) + val_len(4) + checksum(4)

    def serialize(self) -> bytes:
        key_len = len(self.key)
        val_len = len(self.value)

        header = struct.pack(
            '<QQIII',
            self.lsn,
            self.transaction_id,
            self.operation,
            key_len,
            val_len
        )

        data = header + self.key + self.value
        checksum = self._calculate_checksum(data)

        return data + struct.pack('<I', checksum)

    @classmethod
    def deserialize(cls, data: bytes) -> 'WALRecord':
        lsn, txn_id, op, key_len, val_len = struct.unpack('<QQIII', data[:24])

        offset = 24
        key = data[offset:offset + key_len]
        offset += key_len
        value = data[offset:offset + val_len]
        offset += val_len
        checksum = struct.unpack('<I', data[offset:offset + 4])[0]

        # Verify checksum
        expected = cls._calculate_checksum(data[:offset])
        if checksum != expected:
            raise ValueError(f"WAL record corrupted: checksum mismatch")

        return cls(lsn, txn_id, op, key, value, checksum)

    @staticmethod
    def _calculate_checksum(data: bytes) -> int:
        return int(hashlib.md5(data).hexdigest()[:8], 16)


class WriteAheadLog:
    """
    A durable, append-only log for crash recovery.

    Features:
    - Append-only writes (no seeking, fast sequential I/O)
    - CRC checksums for corruption detection
    - Segment-based storage for easier management
    - Configurable sync modes
    """

    SEGMENT_SIZE = 64 * 1024 * 1024  # 64MB segments

    def __init__(self, directory: str, sync_mode: str = 'group'):
        self.directory = directory
        self.sync_mode = sync_mode
        self.current_segment = None
        self.current_lsn = 0
        self.segments = []

        os.makedirs(directory, exist_ok=True)
        self._recover_segments()
        self._open_current_segment()

    def _recover_segments(self):
        """Find and recover existing WAL segments."""
        for filename in sorted(os.listdir(self.directory)):
            if filename.startswith('wal_') and filename.endswith('.log'):
                segment_id = int(filename[4:-4])
                self.segments.append(segment_id)

                # Find highest LSN
                for record in self._scan_segment(segment_id):
                    self.current_lsn = max(self.current_lsn, record.lsn)

    def _open_current_segment(self):
        """Open or create the current segment file."""
        if not self.segments:
            self.segments.append(0)

        segment_id = self.segments[-1]
        path = os.path.join(self.directory, f'wal_{segment_id:010d}.log')

        self.current_segment = open(path, 'ab+')

        # Check if we need a new segment
        if self.current_segment.tell() >= self.SEGMENT_SIZE:
            self._rotate_segment()

    def _rotate_segment(self):
        """Create a new segment file."""
        if self.current_segment:
            self.current_segment.close()

        new_segment_id = self.segments[-1] + 1
        self.segments.append(new_segment_id)

        path = os.path.join(self.directory, f'wal_{new_segment_id:010d}.log')
        self.current_segment = open(path, 'ab+')

    def append(self, transaction_id: int, operation: int,
               key: bytes, value: bytes = b'') -> int:
        """
        Append a record to the WAL.

        Returns the LSN of the new record.
        """
        self.current_lsn += 1
        lsn = self.current_lsn

        record = WALRecord(
            lsn=lsn,
            transaction_id=transaction_id,
            operation=operation,
            key=key,
            value=value,
            checksum=0
        )

        data = record.serialize()

        # Write length prefix + data
        self.current_segment.write(struct.pack('<I', len(data)))
        self.current_segment.write(data)

        # Sync based on mode
        if self.sync_mode == 'full':
            self.fsync()

        # Rotate if needed
        if self.current_segment.tell() >= self.SEGMENT_SIZE:
            self._rotate_segment()

        return lsn

    def fsync(self):
        """Force WAL to disk."""
        self.current_segment.flush()
        os.fsync(self.current_segment.fileno())

    def scan_from(self, start_lsn: int) -> Iterator[WALRecord]:
        """Scan WAL records starting from a given LSN."""
        for segment_id in self.segments:
            for record in self._scan_segment(segment_id):
                if record.lsn >= start_lsn:
                    yield record

    def _scan_segment(self, segment_id: int) -> Iterator[WALRecord]:
        """Scan all records in a segment."""
        path = os.path.join(self.directory, f'wal_{segment_id:010d}.log')

        if not os.path.exists(path):
            return

        with open(path, 'rb') as f:
            while True:
                length_data = f.read(4)
                if len(length_data) < 4:
                    break

                length = struct.unpack('<I', length_data)[0]
                record_data = f.read(length)

                if len(record_data) < length:
                    break  # Incomplete record (crash during write)

                try:
                    yield WALRecord.deserialize(record_data)
                except ValueError:
                    break  # Corrupted record

    def truncate_before(self, lsn: int):
        """Remove WAL segments that are fully before the given LSN."""
        segments_to_remove = []

        for segment_id in self.segments[:-1]:  # Never remove current
            max_lsn = 0
            for record in self._scan_segment(segment_id):
                max_lsn = max(max_lsn, record.lsn)

            if max_lsn < lsn:
                segments_to_remove.append(segment_id)

        for segment_id in segments_to_remove:
            path = os.path.join(self.directory, f'wal_{segment_id:010d}.log')
            os.remove(path)
            self.segments.remove(segment_id)

    def close(self):
        """Close the WAL."""
        if self.current_segment:
            self.fsync()
            self.current_segment.close()

Performance Considerations

1. Sequential vs Random I/O

WAL writes are sequential:
┌─────────────────────────────────────────────────┐
│ HDD: Sequential Write = 100-200 MB/s            │
│ HDD: Random Write = 0.5-2 MB/s                  │
│ Improvement: 50-400x faster!                    │
├─────────────────────────────────────────────────┤
│ SSD: Sequential Write = 500-3000 MB/s           │
│ SSD: Random Write = 50-500 MB/s                 │
│ Improvement: 5-10x faster                       │
└─────────────────────────────────────────────────┘

2. Fsync Costs

# Benchmark: fsync latency by storage type
results = {
    'RAM Disk': '0.01ms',
    'NVMe SSD': '0.05-0.1ms',
    'SATA SSD': '0.1-0.5ms',
    'HDD': '2-10ms',
    'Network Storage': '1-100ms'
}

# Takeaway: Group commit is essential for HDDs

3. WAL Size Management

Problem: WAL grows forever without checkpoints

Solution: Regular checkpointing + WAL truncation

NexaDB approach:
├── Checkpoint every 5 minutes
├── Checkpoint when WAL > 1GB
├── Keep minimum 2 segments for safety
└── Async page flushing during checkpoint

Common Issues and Solutions

Issue 1: WAL Growing Too Large

# Increase checkpoint frequency
wal:
  checkpoint_interval: 60  # Every minute
  max_wal_size: 256MB      # Trigger checkpoint at 256MB

Issue 2: Slow Commits (fsync bottleneck)

# Enable group commit
wal:
  sync_mode: group
  group_commit_delay: 5    # 5ms batching window
  group_commit_size: 200   # Max 200 txns per batch

Issue 3: Recovery Taking Too Long

# More frequent checkpoints = faster recovery
wal:
  checkpoint_interval: 60   # Checkpoint every minute
  # Recovery will replay max 1 minute of WAL

Issue 4: Torn Page Writes

# On power failure, a page write might be partial
# Solution: Full page images in WAL after checkpoint

def write_with_fpi(self, page_id, changes):
    page = self.buffer.get(page_id)

    # If first modification since checkpoint, log full page
    if page.lsn <= self.last_checkpoint_lsn:
        self.wal.append_full_page_image(page_id, page.data)

    # Apply changes and log them
    for change in changes:
        self.apply_change(page, change)
        self.wal.append(change)

NexaDB's WAL Implementation

NexaDB uses a high-performance WAL optimized for LSM-Tree storage:

class NexaDBWAL:
    """
    NexaDB's production WAL implementation.

    Features:
    - Lock-free concurrent appends
    - Automatic group commit
    - Parallel recovery
    - Compression (optional)
    """

    def __init__(self, config):
        self.directory = config['wal_directory']
        self.sync_mode = config.get('sync_mode', 'group')
        self.segment_size = config.get('segment_size', 64 * 1024 * 1024)

        # Group commit settings
        self.group_delay = config.get('group_commit_delay', 0.01)
        self.group_size = config.get('group_commit_size', 100)

        # Performance stats
        self.stats = {
            'writes': 0,
            'syncs': 0,
            'bytes_written': 0
        }

    def write(self, operation):
        """
        Append operation to WAL.

        Thread-safe, uses lock-free techniques where possible.
        """
        lsn = self._allocate_lsn()
        record = self._serialize(lsn, operation)

        self._append_to_buffer(record)
        self.stats['writes'] += 1
        self.stats['bytes_written'] += len(record)

        if self.sync_mode == 'full':
            self._sync_now()

        return lsn

    def recover(self):
        """
        Replay WAL after crash.

        Uses parallel recovery when multiple segments exist.
        """
        print("Starting WAL recovery...")
        start = time.time()

        # Parallel scan of segments
        with ThreadPoolExecutor() as executor:
            records = list(executor.map(self._scan_segment, self.segments))

        # Merge and sort by LSN
        all_records = sorted(
            [r for segment in records for r in segment],
            key=lambda r: r.lsn
        )

        # Replay
        for record in all_records:
            self._apply_record(record)

        elapsed = time.time() - start
        print(f"WAL recovery complete: {len(all_records)} records in {elapsed:.2f}s")

# Usage in NexaDB:
# Every write goes through WAL first
def put(self, key, value):
    # 1. Log to WAL
    lsn = self.wal.write(PutOperation(key, value))

    # 2. Apply to MemTable
    self.memtable.put(key, value, lsn)

    # 3. Return success (data is durable in WAL)
    return True

Conclusion

Write-Ahead Logging is the unsung hero of database reliability. By understanding WAL, you gain insight into:

  • Why commits feel instant (memory write + sequential log)
  • How databases survive crashes (ARIES recovery)
  • Why checkpoint matters (bounds recovery time)
  • Performance tradeoffs (durability vs latency)

Key takeaways:

  1. WAL before data: Never modify data pages before logging
  2. LSN ordering: Every change has a unique sequence number
  3. Checkpoints: Limit how much WAL needs replaying
  4. Group commit: Essential for high throughput
  5. Recovery: Analysis → Redo → Undo

NexaDB implements a modern WAL with group commit, parallel recovery, and tight integration with its LSM-Tree storage engine.


Learn more about NexaDB's internals: LSM-Tree Storage Engine | Getting Started

Further Reading


Questions about WAL or database durability? Join our Discord community or explore NexaDB's source code.

Ready to Try NexaDB?

Experience vector search, TOON format, and high-performance NoSQL.