How Infinitium reduced fraud detection time by 95% with Amazon ECS and AWS Fargate on AWS Graviton

1 month ago 14
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

This post was created in collaboration with Infinitium Engineering Team.

Introduction

Infinitium (a Euronet Company) is a leading digital payments company in Southeast Asia, specializing in secure online payment solutions and risk management services. With a strong presence across the Asia Pacific region, Infinitium offers cutting-edge technologies such as 3D Secure (3DS) authentication, fraud detection systems, and next-generation payment gateway services. The company’s solutions are used by a network of 60 banks and serve approximately 180 million card users globally. Infinitium’s focus on cloud-based, highly configurable solutions for multi-payment methods and omni-channel transactions positions it as a key player in the rapidly evolving digital payment landscape. This makes it particularly relevant for businesses using cloud technologies such as Amazon Web Services (AWS) cloud for their payment infrastructure.

Key challenges with on-premises environment

Infinitium deployed their payment Fraud Detection Service v1 on-premises, but they encountered several significant challenges that hindered their ability to efficiently and effectively detect fraudulent activities. Scaling the on-premises infrastructure to accommodate increasing transaction volumes and data proved both difficult and costly. Furthermore, the need for constant management and maintenance of hardware and software consumed significant time, effort, and specialized skills. The on-premises systems also struggled to deliver the necessary performance for real-time fraud detection, resulting in slower processing times. Furthermore, maintaining high security and compliance standards demanded constant vigilance and considerable resources. Lastly, providing low-latency services to a global customer base was particularly challenging with the on-premises infrastructure.

Solution overview

To address the various challenges faced by Infinitium’s on-premises fraud detection system, we redesigned the solution on the AWS cloud. The new architecture uses a range of AWS services, including but not limited to Amazon Elastic Container Service (Amazon ECS), Amazon ElastiCacheAmazon Simple Queue Service (Amazon SQS), and AWS Lambda. Together, these services have resolved the issues of scalability, maintenance, performance, security, and global latency. The following components were instrumental in achieving these improvements:

Figure 1. AWS cloud-based fraud detection system architecture of Infinitium

Figure 1. AWS cloud-based fraud detection system architecture of Infinitium

Using Amazon ECS and AWS Fargate for scalable and managed infrastructure

To overcome the difficulties of scaling and managing on-premises infrastructure, Infinitium transitioned to using Amazon ECS and AWS Fargate. This service flexibly allocates computing resources according to increasing transactions through Amazon ECS Service Auto Scaling, eliminating the overhead of managing servers. The containerized environment provided by Amazon ECS and Fargate allows for a scalable infrastructure that dynamically adjusts to demand, which is designed to provide optimal performance during peak transaction periods. It also reduced management overhead by eliminating the need for constant hardware and software maintenance, freeing up resources to focus on core business operations. Furthermore, the performance was enhanced by using the power of AWS Graviton processors, which offered a 20% improvement in price-performance efficiency.

To transition their Fraud Detection Service (FDS 2.0) to run efficiently on AWS Graviton processors within Amazon ECS and Fargate, Infinitium undertook several critical technical modifications to make the Java application compatible with Arm64 processors. They began by updating their continuous integration/continuous development (CI/CD) pipelines to use Amazon Corretto, a production-ready, Arm64-optimized distribution of OpenJDK, designed to provide seamless compatibility with AWS Graviton. They also designed the process to provide Arm64 compatibility for all Docker base images used and rebuilt custom images specifically for AWS Graviton. Recognizing the reliance on various third-party Java libraries, Infinitium conducted a comprehensive audit to identify and replace dependencies lacking Arm64 support, making necessary code modifications for those that weren’t compatible.

Given the serverless nature of Amazon ECS and Fargate, Infinitium optimized their task definitions to use the AWS Graviton processing power, particularly by fine-tuning memory and CPU resource allocations to align with the efficiency and performance characteristics of the Arm architecture. They conducted extensive testing within the Fargate environment, such as load, stress, and regression testing, designed to provide assurance that the application not only functioned correctly but also delivered optimal performance in production. The CI/CD pipelines were configured to produce multi-architecture Docker images supporting both x86 and Arm64, with automated testing across both platforms designed to provide consistent performance.

Converting to asynchronous execution using Amazon SQS and AWS Lambda

Previously, Infinitium’s application operated as a monolithic, synchronous system, where the components were tightly coupled and run sequentially within a single codebase. This design meant that any process, regardless of its complexity or duration, had to wait for preceding tasks to complete before it could proceed, which created bottlenecks in the system. As a result, real-time processing speeds were significantly hindered because the entire application had to process each transaction in a linear, step-by-step method, leading to delays and reduced efficiency, especially under high transaction volumes. This lack of flexibility and parallelism made it difficult to meet the demands of real-time fraud detection, where rapid response times are critical.

To address these limitations, Infinitium re-architected the application using Amazon SQS and Lambda, thus increasing transaction throughput by 5x with a single instance, while reducing service failure rates by 70%. By decoupling processing tasks through asynchronous execution, they enabled tasks to be processed independently, which reduced bottlenecks and improved overall system responsiveness. This transition also facilitated real-time processing, which allowed transactions to be handled more swiftly and enhanced the system’s ability to detect fraudulent activities in real time. The combination of these AWS services not only resolved the inherent issues of the monolithic architecture but also empowered Infinitium to achieve the high performance and scalability necessary for effective fraud detection in a dynamic environment.

Enhancing data retrieval performance using Amazon ElastiCache

One major bottleneck in the previous system was the centralized database, which slowed down fraud detection queries. In the monolithic architecture, the application components relied on a single, centralized database to store and retrieve data. This setup created a heavy load on the database, particularly when handling large volumes of transactions and complex queries in real-time. As multiple services and processes competed for access to the same database, contention for resources increased, leading to slower query response times. This centralized approach also meant that any delay in database processing directly impacted the entire system, which caused bottlenecks that hindered the application’s ability to quickly analyze transactions and identify fraudulent activities.

To overcome these limitations, Infinitium implemented ElastiCache, moving critical fraud detection rules and policies to a high-performance caching layer. This shift drastically reduced the time needed to check fraud detection rules, providing near-instantaneous query responses with microsecond latency. The distributed caching layer optimized data access by being designed to provide ready availability of frequently accessed data, such as authentication data, temporary device data, and transaction data. This not only further boosted system performance but also allowed for the efficient sharing of session data across microservices. As a result, the implementation of ElastiCache played a crucial role in alleviating the performance bottlenecks associated with the centralized database, which led to a more responsive and scalable fraud detection system.

Using purpose-built database services for optimal data management

To meet the diverse and specific data storage and retrieval requirements needs, Infinitium adopted a suite of purpose-built database services, each carefully chosen for its unique strengths. Amazon Neptune was employed to handle the persistence of transaction and device nodes, efficiently managing and querying complex relationships and patterns crucial for detecting fraudulent activities. This graph database was particularly effective in modeling and analyzing the intricate connections between different entities, such as transactions and devices, which are essential for uncovering sophisticated fraud schemes.

Furthermore, Amazon DocumentDB (with MongoDB compatibility) was chosen for managing the rule engines and policies that drive fraud detection. Its document-oriented approach allowed for faster data retrieval and streamlined the process of making changes to rules and policies without the complexity typically associated with SQL queries. This made it easier to manage and update the system’s logic in response to emerging threats and evolving fraud patterns. Also, Amazon Relational Database Service (Amazon RDS) for PostgreSQL was used to support specific dependencies within the fraud detection system, such as organization and user access, permissions, and other lookup-related data. Amazon RDS was chosen for its robustness, stability, and compatibility, because it was designed to provide reliable and efficient operation of these critical aspects of the system.

Deep dive into Amazon ECS architecture for optimal performance

Infinitium’s migration to AWS cloud focused on maximizing the performance and scalability of their FDS 2.0. In this section, we review how the application was architected for optimal performance:

Container configuration

To provide efficient resource usage and high performance in the Fargate environment, Infinitium designed its application in line with best practices for serverless container architecture. Fargate automatically manages compute resources, streamlining both deployment and scaling.

For resource allocation, Infinitium fine-tuned container configurations by implementing automated load testing within the CI/CD pipeline. This allowed them to analyze performance metrics and precisely adjust memory and CPU allocations to meet processing demands without over-provisioning, because it was designed to provide optimal resource usage. By specifying precise resource requirements for each container, they designed the system to achieve a balanced allocation of resources, optimizing both performance and cost-efficiency. This approach is crucial in the Fargate environment, where costs are directly tied to actual resource usage for each task.

Auto scaling in Fargate provided Infinitium with the ability to effortlessly handle fluctuations in transaction volumes. As demand increased, Fargate automatically launched more tasks across multiple Availability Zones (AZs), designed to help the system handle peak traffic periods. This automated scaling process improved fault tolerance and high availability, all without needing manual intervention from the team.

In terms of task scheduling, while Fargate doesn’t offer direct control over strategies such as binpacking or custom placement, it inherently distributes tasks across multiple AZs. This automatic task spreading enhances both reliability and fault tolerance, which is designed to help Infinitium’s microservices remain highly available without the need for detailed custom configuration.

Deployment strategy

For FDS 2.0, a rolling update deployment strategy was employed to help facilitate seamless application updates without service interruptions. Rolling updates involve gradually replacing old versions of the application with new ones, allowing the service to remain operational throughout the update process. A key part of this strategy is the use of min/max percent parameters, which control the speed and safety of deployments. The minimum healthy percent parameter keeps a certain percentage of the application operational during the update, while the maximum percent controls how many new tasks can be launched above the desired count to facilitate faster updates.

This approach allows for a smooth transition between application versions, which is designed to prevent downtime while the service is updated. By controlling the deployment speed with these parameters, Infinitium optimized the process to minimize the risk of disruptions while helping to maintain highly availability during updates.

Service connectivity

Infinitium’s FDS 2.0 comprises numerous microservices that need efficient communication. To facilitate this, Infinitium implemented Amazon ECS Service Connect, which provides a managed solution for service-to-service communication by integrating service discovery and a service mesh within Amazon ECS. This allows services to be referenced within namespaces independently of VPC DNS configurations, while also offering standardized metrics and logs for monitoring.

Service Connect is designed to facilitate secure, low-latency communication between the microservices over an internal network, helping to avoid the latency and security issues associated with internet-based communication. It automatically manages IP address assignments, which enables seamless service discovery and communication between microservices. As new instances are added or removed, Service Connect dynamically updates DNS records and load balancers, maintaining consistent connectivity and reliable communication throughout the system.

The configuration for Service Connect used by Infinitium included the following:

Figure 2. FDS 2.0 Amazon ECS Service Connect architecture.

Figure 2. FDS 2.0 Amazon ECS Service Connect architecture.

Achievements of architecture optimization

The re-architecture of Infinitium’s FDS 2.0 on AWS cloud has resulted in significant improvements across multiple dimensions, demonstrating the transformative power of cloud-native solutions. By migrating to Amazon ECS and Fargate, Infinitium has achieved exceptional performance gains. The processing time for fraud detection was reduced from eight seconds to less than 400 milliseconds. This drastic reduction enables real-time fraud detection and response, which is critical in maintaining the integrity and security of financial transactions. The following are the performance testing results for the service.

Requests Executions Response Times (ms) Network (KB/sec)
Label #Samples KO Error % Average Min Max 90th pct 95th pct 99th pct Throughput Received Sent
Total 30090 0 0% 497.3 73 11281 803 926 1234.99 100.34 223.54 711.58
Transaction Without Profiling Request 30090 0 0% 497.3 73 11281 803 926 1234.99 100.34 223.54 711.58

Optimizing container configurations and using AWS Graviton instances have significantly improved cost-efficiency, resulting in a 20% reduction in costs while maintaining the same level of performance. AWS Graviton processors offered enhanced performance at a lower cost, contributing to these overall savings and being designed to provide superior processing capabilities.

The enhanced performance and reliability of FDS 2.0 have had a direct positive impact on customer satisfaction. Banks and financial institutions using Infinitium’s service can now offer their customers a faster, more secure, and seamless transaction experience. This improvement has strengthened customer trust and loyalty, positioning Infinitium as a reliable partner in financial security.

Infinitium’s migration to AWS cloud not only resolved the challenges associated with their on-premises infrastructure but also unlocked new levels of operational efficiency and scalability. The successful deployment of FDS 2.0 showcases the advantages of adopting AWS cloud-native services, providing a scalable, secure, and high-performance solution that meets the demands of the modern digital financial landscape.

Conclusion

Infinitium’s strategic migration to AWS cloud and the re-architecture of their FDS 2.0 have yielded substantial improvements in performance, scalability, cost-efficiency, security, and reliability. By using AWS Fargate with AWS Graviton, they reduced fraud detection processing time from eight seconds to less than 400 milliseconds, thus enabling real-time detection. The adoption of a microservices-based architecture with robust auto scaling and efficient service connectivity was designed to provide seamless and secure communication within their system. This transition not only resolved previous on-premises challenges but also enhanced customer satisfaction by providing a faster, more secure transaction experience, positioning Infinitium as a leader in financial fraud detection.

More resources:

Read Entire Article