Skip to content

Data Synchronization Process

This document details the logic behind the data synchronization mechanism used by the txlog build command.

The "Build" Concept

Unlike real-time monitoring agents that hook into system events, Txlog uses a "build" or "sync" model. This means it processes historical data in batches. This approach was chosen for:

  1. Reliability: It captures transactions that happened even when the agent wasn't running.
  2. Simplicity: It avoids complex kernel hooks or background daemons.

Synchronization Logic

The synchronization process follows these steps:

1. Host Identification

The agent identifies the machine using:

  • Machine ID: Derived from /etc/machine-id. This is the primary unique key.
  • Hostname: Used as a human-readable label.

2. State Reconciliation

Before sending any data, the agent performs a "diff" operation:

go
// Simplified logic
local_transactions = get_dnf_history()
server_transactions = api.get_saved_ids(machine_id)

missing_transactions = local_transactions - server_transactions

This ensures that:

  • We don't re-send data that is already safe.
  • We catch up on data if the agent hasn't run for a long time.

3. Atomic Transaction Uploads

Each missing transaction is uploaded individually. A transaction includes:

  • Metadata (ID, User, Time, Command Line).
  • A list of all altered packages (Install, Update, Remove).
  • Scriptlet output (logs generated by RPM scripts).

This atomicity ensures that a transaction record on the server is always complete. We never store partial transactions.