- 28 Aug 2024
- 4 Minutes To Read
- Print
- DarkLight
- PDF
Mambu Extract
- Updated On 28 Aug 2024
- 4 Minutes To Read
- Print
- DarkLight
- PDF
Mambu Extract is a data replication solution that syncs real-time data from Mambu to the customer's data store efficiently. It utilizes AWS Data Migration Service for replication and offers a serverless option for reduced operational costs. The architecture involves capturing changes in the database, transforming them into Parquet files, and uploading to the customer's S3 bucket. The setup allows customers full control over their data for downstream integrations. The process includes file and S3 bucket configurations, data upload operations, scalability based on load, monitoring via Grafana Cloud, and high availability with disaster recovery measures. Mambu Extract recommends extracting specific tables to minimize data transfer size and exposure.
Mambu Extract is a near real-time data replication solution between Mambu and the customer that is designed with high throughput and reliability. Mambu Extract outperforms other methods of data extraction, such as database backups via API, the Stitch ETL connector, and streaming data, by incrementally syncing only the latest data to the customer data store. These syncs happen on a regular cadence to ensure that the target data store is as close as possible to the live data on the Mambu banking platform.
Mambu Extract is currently only available for customers with dedicated environments on AWS.
Architecture
Mambu Extract’s architecture is based on AWS Data Migration Service (DMS Serverless), which over the last year evolved to support ongoing replication or change data capture (CDC), and database schema conversion. It offers a serverless option, reducing operational overhead and costs while natively providing high availability.
Mambu Extract uses the customer production RDS MySQL Master database as its source, and replicates data to an AWS S3 bucket on the customer side as the target. This setup allows the customer to have full control over the data and further downstream integrations - for example, the Data Warehouse or Data Lake - without any dependencies with Mambu.
The architectural flow is described in the following diagram:
- Customer sends request to Mambu Core Banking Engine APIs.
- Core Banking Engine commits the change to the database (RDS MySQL).
- AWS Data Migration Service captures the change in the database, transforms to Parquet file and uploads to customer S3 bucket.
- Customer downloads the Parquet file.
File and S3 bucket configurations
S3 bucket folder structure
S3 object storage is used as a target for the data extraction, and must be configured in the customer’s AWS account. The folder structure in the bucket will be pre-created automatically by Mambu Extract & AWS DMS service as follows:
<bucket name>
mambu-extract-data
<tenant-name-one>
<table-name-one>
…
<table-name-n>
…
<tenant-name-n>
S3 bucket structure example - initial load
- Table name: “sequence table”
- File: LOAD00000001.parquet → This file is created during initial data load.
- Folder: 2024/ → This folder is created during CDC (Change Data Capture).
S3 bucket structure example - Change Data Capture
- Table name: “sequence table”
- Folder: 2024 (automatically created by DMS, represents “YEAR” when file was extracted)
- Folder: 08 (automatically created by DMS, represents “MONTH” when file was extracted)
- Folder: 01 (automatically created by DMS, represents “DAY” when file was extracted)
Parquet files
Parquet files use the following configuration:
{
"CsvRowDelimiter": "\n",
"CsvDelimiter": ",",
"CompressionType": "NONE",
"DataFormat": "parquet",
"ParquetVersion": "parquet-2-0",
"TimestampColumnName": "timeMigration"
}
Operations
Data upload
Upload frequency is configurable per environment. By default, files during the full load phase (initial full database replication) will be roughly 650MB for each Parquet file.
During the CDC phase, the minimum Parquet file size is set to 32MB and 1 minute intervals, meaning when the CDC is ongoing transactions are batched together (cached) and Parquet file writing is triggered by whichever parameter condition (either 32MB file size or 1 minute interval) is met first.
Transactions Parquet files will be under the specific table folder and under the date when the row changes happened.
Scalability
Mambu Extract uses AWS DMS Serverless, which scales automatically based on the load (data changes in the source data store → RDS MySQL). To avoid unpredictable or infinite scale-outs, the range of DCU’s (DMS Compute Units) is pre-configured to a set value, for example 4-16 DCUs. This configuration is based on the typical load of the customer environment. The DMS load is continuously monitored and the scalability range is modified to ensure low latency of data replication.
Replication lag can be controlled by increasing the scalability range. To change the scalability range, the DMS service must be disabled, the configurations applied, and the DMS service re-enabled. You must set a downtime window for Mambu Extract to perform this action.
Monitoring and alerting
Mambu Extract is monitored using the Mambu observability platform via Grafana Cloud. Critical metrics are exported to Grafana and alerts are pre-configured to notify your product and SRE teams.
To ensure replication lag is minimal (up to 5 minutes), set monitoring according to the following measures:
- CDC source latency
- CDC target latency
In regular conditions the replication lag is counted in seconds, and higher lag can be expected only for short periods, mostly during spikes where large data sets are modified.
High availability and disaster recovery
To ensure that the Mambu Extract service remains highly available, we configure the AWS DMS service with a Multi-AZ configuration. In case of a single availability zone failure, the DMS service will be automatically switched to the next available Availability Zone.
Customers must configure their S3 bucket for the production (primary) region as well as the disaster recovery region. S3 is a regional AWS service, so buckets per region must be created.
Mambu table list
Mambu Extract allows you to extract all tables. However to minimize the size of transferred data and exposure of raw data, Mambu recommends that customers use the following standard list of tables:
Table names:
- loanaccount
- loantransaction
- gljournalentry
- customfield
- customfieldvalue
- client
- customfieldset
- savingsaccount
- savingstransaction
- loanproduct
- repayment
- transactiondetails
- transactionchannel
- activity
- lineofcredit
For more details about data structures, please refer to the following page: Data Dictionary.