Analytical Data Lake Architecture

PlaneConnection stores operational data in two distinct layers designed for different purposes. The transactional layer handles the rapid reads and writes of daily operations — trip creation, safety report submission, maintenance logging. The analytical layer is optimised for a different class of question: what has been happening across hundreds of records over months and years? Understanding why these layers exist, how data flows between them, and who benefits from each layer helps you make the most of PlaneConnection’s reporting and benchmarking capabilities.

Who should read this: Safety managers, accountable executives, Directors of Operations, and compliance officers who rely on trend analysis, benchmarking, or SmartScore. Administrators configuring data export integrations should also read this page. For day-to-day operational reporting, see Run Reports and Use Analytics.

Why Operational Databases Are Not Built for Analysis

PlaneConnection’s operational database is a globally distributed transactional database optimised for operational workloads. It is excellent at answering questions like “what is the current status of trip PC-2847?” or “which maintenance items are due in the next 30 days?” It handles thousands of concurrent reads and writes with low latency. Analytical questions are structurally different. “How has our engine exceedance rate trended over the last 18 months?” or “how does our corrective action closure time compare to peer operators of similar fleet size?” require scanning large volumes of historical records, joining across many tables, and aggregating results. Running these queries directly against the operational database would compete for resources with live operations and would also be slow — transactional databases are not designed for columnar analytical scans. The solution is a separate analytical layer that receives a copy of operational data in a format optimised for analytical queries.

The Dual-Write Pattern

When data is created or updated in PlaneConnection’s operational layer, a dual-write process simultaneously records a copy to the analytical data lake. The two writes are coordinated so that the analytical layer is never more than a few minutes behind the operational layer.

┌──────────────────┐     ┌─────────────────────────────────┐
│  Operational     │────►│  Analytical Data Lake             │
│  Layer           │     │  Apache Iceberg tables            │
│  Transactional   │     │  Columnar, partitioned, indexed   │
│  Low-latency     │     │  Historical, append-optimised     │
└──────────────────┘     └─────────────────────────────────┘
        │                              │
   Live ops queries           Trend, benchmark, and
   (current state)            aggregate queries

The dual-write approach means that the analytical layer is not derived from periodic batch exports. Safety reports, audit trail entries, and maintenance records flow to the data lake continuously, making trend analysis current rather than days or weeks stale.

What Flows to the Data Lake

Not all data is relevant for analytical workloads. The following data categories flow to the analytical layer:

Safety reports and investigations — submission timestamps, report types, severity classifications, resolution times, and closure outcomes.
Audit trail entries — who changed what and when, across all modules.
Maintenance records — inspection types, completion times, deferred items, and work order durations.
Trip and crew data — departure/arrival times, duty periods, delay codes, and FRAT scores (de-identified where configured).
SPI measurements — safety performance indicator readings and threshold crossings over time.
SmartScore inputs — the underlying measurements that feed composite safety scoring (see SmartScore Privacy for data handling details).

Personally identifiable information (PII) is handled according to the workspace’s privacy configuration. Analytical queries on de-identified data sets do not expose individual crew member or passenger information.

Apache Iceberg: What It Provides

The analytical layer uses Apache Iceberg, an open table format designed for large-scale analytical workloads on object storage. PlaneConnection stores Iceberg tables on durable object storage, providing cost-effective storage without egress fees. Iceberg provides four capabilities that matter for aviation analytics:

Schema Evolution

Aviation data requirements change as regulations evolve and operators add new SMS elements. Iceberg allows columns to be added, renamed, or removed from analytical tables without breaking existing queries or requiring a full data reload. When PlaneConnection adds a new safety report field, the analytical layer picks it up automatically.

Partitioning

Iceberg tables are partitioned by time and workspace. A query for “all safety reports from workspace X in Q3 2025” reads only the relevant partition rather than scanning the entire table. This makes trend queries fast even as data volumes grow over years.

Time Travel

Iceberg maintains a complete history of table state. You can query the data lake as it existed at any point in the past — useful for reconstructing the state of records at the time of an audit, verifying what data was available before a safety decision was made, or satisfying a regulatory request for historical data.

Time travel queries are available via the PlaneConnection data export API and the SmartScore audit log. Contact support if you need to reconstruct historical data for a specific audit date.

Open Format

Iceberg is an open standard with broad ecosystem support. Data stored in PlaneConnection’s analytical layer can be read by standard tools — Apache Spark, DuckDB, Snowflake, AWS Athena, and others — without exporting to a proprietary format. Operators who need to integrate PlaneConnection data with their own data warehouse have a straightforward path.

Who Benefits from the Analytical Layer

Safety Managers

Safety managers use the analytical layer for historical trend analysis that operational queries cannot provide efficiently. Month-over-month safety report rates, SPI trend charts, investigation closure time distributions, and hazard recurrence analysis all draw from the data lake. The Use Analytics guide covers these features in detail.

Accountable Executives

The executive dashboard aggregates data lake metrics into compliance-oriented views: SMS program effectiveness scores, corrective action completion rates, and safety culture assessment trends over time. These views are designed to support the management review requirements of FAA 14 CFR Part 5.

Cross-Fleet Benchmarking

Because PlaneConnection serves multiple operators, the analytical layer enables opt-in cross-fleet benchmarking within peer groups. An operator can see how their SPI measurements, investigation closure times, or training compliance rates compare to anonymised aggregates from similar operators (matched by fleet size, operation type, and geography). This benchmarking is a core input to the SmartScore model.

Cross-fleet benchmarking uses only anonymised, aggregated data. No individual operator’s raw records are visible to other operators. Participation in benchmarking is opt-in and configured under Settings > SmartScore. See SmartScore Privacy for full details.

Insurers via SmartScore API

Aviation insurers who have contracted SmartScore API access receive de-identified aggregate scores derived from the analytical layer. The API returns composite safety metrics, not raw records. Individual flight data, passenger information, and crew details are never exposed via this interface. See the SmartScore Methodology reference for the specific metrics included.

Audit Trails and the Data Lake

Audit trail entries receive special treatment in the data lake. Every write to the operational database generates an immutable audit record that is appended to the analytical layer. Because Iceberg supports append-only writes, these audit records cannot be deleted or modified — they are the analytical-layer equivalent of the record integrity hash chain in the operational layer. This means that even if an operational record were modified and the modification were not immediately visible to an auditor viewing the live application, the audit trail in the data lake preserves a complete sequence of every state the record has passed through. For aviation operators, this provides a secondary integrity layer that complements the cryptographic hash chains described in the Record Integrity explanation.

Record Integrity

Cryptographic tamper-evidence for maintenance records — the operational-layer counterpart to analytical-layer audit trails.

SmartScore Privacy

How de-identification and consent work for data contributed to cross-fleet benchmarking.

Use Analytics

How to access trend analysis, SPI dashboards, and benchmarking in the platform.

Export Data

Exporting raw data and Iceberg-format exports for external analytics tools.

Safety Concepts

Regulatory Framework

SmartScore

Maintenance

Platform Architecture

Why Operational Databases Are Not Built for Analysis

The Dual-Write Pattern

What Flows to the Data Lake

Apache Iceberg: What It Provides

Schema Evolution

Partitioning

Time Travel

Open Format

Who Benefits from the Analytical Layer

Safety Managers

Accountable Executives

Cross-Fleet Benchmarking

Insurers via SmartScore API

Audit Trails and the Data Lake

Record Integrity

SmartScore Privacy

Use Analytics

Export Data

​Why Operational Databases Are Not Built for Analysis

​The Dual-Write Pattern

​What Flows to the Data Lake

​Apache Iceberg: What It Provides

​Schema Evolution

​Partitioning

​Time Travel

​Open Format

​Who Benefits from the Analytical Layer

​Safety Managers

​Accountable Executives

​Cross-Fleet Benchmarking

​Insurers via SmartScore API

​Audit Trails and the Data Lake

​Related

Record Integrity

SmartScore Privacy

Use Analytics

Export Data

Why Operational Databases Are Not Built for Analysis

The Dual-Write Pattern

What Flows to the Data Lake

Apache Iceberg: What It Provides

Schema Evolution

Partitioning

Time Travel

Open Format

Who Benefits from the Analytical Layer

Safety Managers

Accountable Executives

Cross-Fleet Benchmarking

Insurers via SmartScore API

Audit Trails and the Data Lake

Related