Data Synchronization Process

This document details the logic behind the data synchronization mechanism used by the txlog build command.

The "Build" Concept

Unlike real-time monitoring agents that hook into system events, Txlog uses a "build" or "sync" model. This means it processes historical data in batches. This approach was chosen for:

Reliability: It captures transactions that happened even when the agent wasn't running.
Simplicity: It avoids complex kernel hooks or background daemons.

Synchronization Logic

The synchronization process follows these steps:

1. Host Identification

The agent identifies the machine using:

Machine ID: Derived from /etc/machine-id. This is the primary unique key.
Hostname: Used as a human-readable label.

2. State Reconciliation

Before sending any data, the agent performs a "diff" operation:

// Simplified logic
local_transactions = get_dnf_history()
server_transactions = api.get_saved_ids(machine_id)

missing_transactions = local_transactions - server_transactions

This ensures that:

We don't re-send data that is already safe.
We catch up on data if the agent hasn't run for a long time.

3. Atomic Transaction Uploads

Each missing transaction is uploaded individually. A transaction includes:

Metadata (ID, User, Time, Command Line).
A list of all altered packages (Install, Update, Remove).
Scriptlet output (logs generated by RPM scripts).

This atomicity ensures that a transaction record on the server is always complete. We never store partial transactions.

Data Synchronization Process ​

The "Build" Concept ​

Synchronization Logic ​

1. Host Identification ​