How a Fintech Analytics Provider Scaled to 27 Credit Unions on a Single ETL Core

By Noel Benjamin

Fintech analytics provider scaling ETL core across 27 credit unions

Our client is an AI-powered analytics and recommendation platform serving banks and credit unions in the United States. The hard part was not deciding what insights mattered. The hard part was getting dependable data into the system in the first place. Roughly 145 million credit union members are served across this market (NCUA 2025 Annual Report). Analytics only becomes useful when the ETL platform can handle messy files, inconsistent schemas, and recurring workloads without constant manual rescue.

This customer story is intentionally specific. It is not a broad take on AI in data integration. Nor is it another example of financial process automation. It is about the ETL platform itself. The reusable foundation High Peak Software built so our client could run hourly automated data pipelines reliably across dozens of financial institution clients, feeding the analytics engine that powers personalized product recommendations, account holder value scoring, campaign creation, and dashboards.

Client Overview and Challenge

Our client operates in a demanding corner of fintech ETL. Their core product helps financial institutions transform raw member data into actionable intelligence. But their credit union and bank clients generate vast quantities of raw data. This spans three key domains: member information, account and product holdings, and transaction histories. Getting that data from dozens of different institutions into a single, unified analytics-ready format was the critical bottleneck.

Schema Heterogeneity Was the Real Obstacle

The biggest technical challenge was schema heterogeneity across core banking systems. Every credit union exports data from a different core banking system. Column names, delimiters, date formats, and encodings all differ per client. The same field, for example member birth date, arrives differently from each source. One client sends a full date string, another a birth year integer, and a third a timestamp. That meant our client did not just need file ingestion. It needed an ETL platform for credit union analytics that could absorb variation without turning every new source into a custom engineering effort.

This is a familiar pattern across financial services, where legacy infrastructure is difficult to modernize and disjointed integrations increase complexity. For our client, that complexity showed up in the most practical place possible. They needed to get member data from heterogeneous source systems into a stable analytics workflow.

Data Quality, Operational Blind Spots, and No Reusable Foundation

Our client also had to contend with source files that routinely contained malformed CSV lines, missing mandatory columns, duplicate primary keys, and unmapped product categories, all of which needed to be caught and reported without silently corrupting the analytics layer. Long-running processing tasks needed to complete predictably, sometimes taking several hours per client. And there was zero operational visibility: no alerting when pipeline runs succeeded or failed, no audit trail of what was processed, and broken runs were discovered reactively, often hours or days later.

Critically, there was no reusable foundation. Adding a new credit union client required bespoke engineering each time. There was no abstraction that could be extended without touching existing logic, making growth unsustainable. For our client, the issue was not just raw ingestion. It was dependable data pipeline automation for hourly analytics workloads at scale.

The Solution High Peak Built

High Peak built a production-grade, plugin-based ETL framework and a fully automated cloud-native orchestration layer. The core stack used Python 3.11, Pydantic v2 for domain model validation, Docker, Amazon ECS on AWS Fargate, and Kestra for orchestration. It also relied on Amazon S3, PostgreSQL, AWS Systems Manager Parameter Store for secrets management, CloudWatch for logging, and Microsoft Teams for notifications. The goal was straightforward. The team needed to create a reusable platform that could handle source variation, scheduled execution, and production observability without relying on brittle, source-by-source fixes.

A Five-Stage Pipeline with Plugin Architecture

The central design choice was a plugin-based model with an object-oriented architecture, instead of a purely config-driven mapping engine or a growing collection of bespoke scripts. High Peak evaluated all three approaches and selected the plugin model. The ingest surface was not uniform enough for simple mapping rules alone, and it changed often enough that isolated scripts would have compounded the maintenance burden.

The core pipeline runs every record through five stages: Extract (download and read source files from S3, with a safety check ensuring files are older than 30 minutes to prevent ingesting mid-upload files), Transform (map client-specific schemas to our client’s unified data model across seven domain entities: Member, MemberProductAccount, Product, ProductCategory, Transaction, Application, and SurveyData), Validate (accumulate bad lines and invalid values into error CSVs via a Collector Registry without aborting the run), Load (bulk-load validated data into PostgreSQL), and PostProcess (execute dataset-level hooks including open-member filtering, application record upserts, and vehicle detail mapping).

Each new credit union client is onboarded by providing just three files: a preprocessor class (handling client-specific deduplication, custom field mapping, and product classification), a transformation map, and a constants file. The core pipeline code requires zero changes. This is what made the solution feel like a platform rather than a patchwork.

Cloud-Native Orchestration for Hourly Automated Workloads

Docker standardized packaging and deployment. Amazon ECS on AWS Fargate gave the platform a managed container runtime so workloads could run containers without managing the underlying infrastructure. Each ETL run is an independent, isolated container invoked with a client identifier.

High Peak built the automation layer using Kestra (pinned to version 1.1.13 for output-passing stability), running as an ECS Fargate service. The primary ETL flow runs on an hourly cron, automatically checking S3 for each active client and firing an ECS task per eligible client, with up to three clients processed concurrently per cycle. The execution model is asynchronous and fire-and-forget: the orchestrator launches the ECS task, writes the task ARN to the database, and exits immediately, so the hourly trigger completes in under a minute regardless of downstream task duration.

Ad hoc flows support manual one-off runs with synchronous execution, automatic CloudWatch log retrieval (the last 80 lines on failure), and collision-safe locking to prevent double-runs. Self-healing mechanisms detect orphaned run records caused by network blips during ECS launch and automatically reset them. The system fetches all secrets at runtime from AWS Systems Manager Parameter Store. Credential rotation requires only a Parameter Store update with no redeployment.

Operational Visibility Was Built Into the Platform

One of the most important issues in the brief was operational blind spots. High Peak addressed that directly with Microsoft Teams Adaptive Card notifications via Graph API, including user @mentions so the right engineer is alerted immediately on success or failure. Engineers also get full visibility through the Kestra UI with per-execution logs and CloudWatch links, so they can diagnose issues without needing AWS console access.

That visibility also supports the business case. Financial institutions are pushing toward strong data foundations and more personalized offers, but recommendation quality depends on the reliability of the pipelines underneath. For our client, the ETL layer had to be dependable before the analytics layer could be trusted.

Results

High Peak delivered a reusable ETL platform that matched our client’s actual constraints and scaled well beyond initial expectations.

  • 27 financial institution clients fully integrated on a single shared pipeline core.
  • Multiple core banking platform formats handled without changes to the framework; only client-specific extension files added per platform variant.
  • Hourly automated ETL runs with up to 3 clients processed concurrently per cycle, with zero manual intervention required for standard scheduled runs.
  • New client onboarding reduced from a bespoke engineering sprint to a structured 3-file extension with no risk to existing clients. After the framework stabilized around client 8, every subsequent client was onboarded purely by extending the plugin model.
  • Four independent audit trails per execution: orchestration UI record, CloudWatch log stream, run_status database entry, and Teams notification.
  • Dramatically reduced debugging time through in-UI log surfacing: the last 80 lines of container output stream directly into the orchestration task log on failure.
  • Automatic failure recovery preventing permanent stuck states that would otherwise require manual database intervention.

If you want to compare this with adjacent work, see our customer stories on financial data insight generation and financial risk management and scoring systems. Those engagements focus on different business outcomes. This one is about the ETL platform underneath the analytics stack.

Key Takeaways

  • Credit union analytics gets bottlenecked by data ingestion long before it gets limited by model quality.
  • When source schemas vary widely, a plugin-based architecture with a shared pipeline core works well. Client-specific extension points can be a better fit than config-only mapping or one-off scripts.
  • Production ETL needs observability from day one, especially for long-running scheduled workloads across many clients.
  • Cloud-native orchestration is valuable when it simplifies operations without sacrificing execution control.
  • A reusable ETL foundation creates leverage. Once the framework stabilizes, new client onboarding becomes a structured 3-file extension rather than bespoke engineering.

FAQ

What made this a platform instead of a collection of scripts?

The difference was the reusable framework. High Peak did not solve one file format at a time with disconnected logic. It built a plugin-based ETL foundation with a five-stage pipeline core (Extract, Transform, Validate, Load, PostProcess) that runs identically for every client. Client-specific behavior is isolated in three extension files, and no client onboarding requires changes to the core codebase.

Why was a plugin-based OOP architecture the right fit?

The underlying problem was not uniform enough for a config-only approach. It was also not stable enough for bespoke scripts to stay maintainable. High Peak evaluated three approaches: configuration-driven mapping, fully bespoke scripts, and the plugin model. The plugin model gave our client a consistent core with room for source-specific behavior. It proved scalable through 27 client integrations without core code changes.

How did the platform handle heterogeneous credit union data sources?

High Peak designed it around schema variability from the start. A composable transformation system handles type casting, date parsing, value mapping, and boolean coercion per client, with seven validated domain models (Member, MemberProductAccount, Product, ProductCategory, Transaction, Application, SurveyData) enforced through Pydantic v2. The system accumulates and reports bad data rather than silently dropping it.

Why use Docker, ECS Fargate, and Kestra for this ETL workload?

The workload involved recurring, long-running tasks across many clients that needed predictable, isolated execution. Containerization standardized deployment. ECS Fargate eliminated cluster management. And Kestra provided YAML-first orchestration with native ECS task-launching support, hourly scheduling, ad hoc debugging flows, and self-healing mechanisms for operational reliability.

What business use case did the ETL platform support?

The system processes member, account, product, transaction, and application data on an hourly cadence to feed our client’s analytics engine, which generates personalized product recommendations, individual account holder value scoring, campaign creation tools, and analytics dashboards for credit unions and banks.

Ready to Achieve Similar Results?

If your team is dealing with mismatched schemas, fragile file pipelines, data quality issues, or scheduled workloads nobody can fully see, the answer is usually not another patch. It is a better foundation. Let’s connect and build an ETL platform that supports production analytics at scale.