Skip to main content

Blueprint of a transaction monitoring solution on top of AWS and custom ML algorithm for money laundering

In the last post, I presented a possible solution on top of Microsoft Azure that can be used for real-time analytics on bank accounts using out of the box services using a custom Machine Learning algorithm. In this post, we take a look at a similar approach that we can have using AWS Services.
One of the key factors that we took into consideration when we decided what AWS service to use was to involve minimal configuration and to be out of the box services.

Business scenario
Imagine that you are working for a bank that has subsidiaries in multiple regions around the world. You want to develop a system that can monitor in real time the bank activities that are happening cross subsidiaries and identify suspect any suspect transactions or accounts.

Business problem
The bank lacks a solution that can aggregate all the audit data from the account activity and run a custom Machine Learning system on top of it. They already did an internal audit cross different subsidiaries and found a way how they can remove customer-specific information to be able to do collect and analyse data in one location.

Machine Learning Algorithm
The bank already develops a custom ML algorithm that can detect and mark any account or transactions that look, suspect. The solutions can detect suspect transactions so good, that you decide that in a specific the situation you already take automatic actions like suspending the access to the account or delay the transaction for a specific time interval for further investigation.

Technical challenges
The first challenge is to collect all the accounts activities in a central location, like a stream of data. The current number of activities per minutes that needs to be ingested are between 40.000 to 150.000. In the future, the growth rate is estimated to be between 15-20% per year, mainly because of the online platforms.
The second challenge is to build a solution that can apply the ML algorithm on a stream of data without required to build a new data centre. The board approved a test period of 18 months, but they cannot do a commitment for 5 years until they don’t see that the solution is working as promised.
The 3rd challenge is to provide a more user-friendly dashboard, that would allow the security analytics team to interact with the system easier and to be able to drill down inside data easier. The current solution has dashboards that cannot be customised too much, and they would like to be able to write queries in a more human-friendly way.
The last biggest challenge is to aggregate the streams of data in only one data source that would be processed by the ML algorithm. Each country or region produces its own data stream of data that needs to be merged based on the timestamp.

AWS approach
Inside AWS, many services can be used in different combination to resolve a business problem. There are two important factors that we need to take into account at this moment.
The first one is related to Machine Learning algorithm. Having a custom one, we need to be able to define and run their specific algorithm. An AWS service like SageMaker can be used with success to run custom ML algorithm.
The second item is related to our operation and development costs. We should use as much as possible services that are out of the shelve, where the configuration and customisation steps that are kept at minimal.

AWS Blueprint
Inside each subsidiary, we already have a system that stores audit and logs data inside persistent storage. In parallel with this current the system, we will add another component that can receive all the audit and logs information that is relevant for us, do a pre-filtering of them and push them inside AWS. 

The information that is relevant for us is pushed inside AWS, landing inside AWS Kinesis Data Stream. We use one instance of AWS Kinesis Data Stream because there is no out of the box service that would allow us to merge multiple data streams. Another option would be to have a data stream for each subsidiary that lands inside AWS Kinesis Data Stream and use Azure Functions to FW the data to amain one, but it does not make sense in the current context.

AWS S3 is used for this purpose because at this moment in time, AWS SageMaker can consume content only for AWS S3. ASW SageMaker enables us with almost zero infrastructure and configuration effort to run our custom Machine Learning algorithm and analyse existing transaction data.

The output of running this algorithm is pushed to a bucket inside AWS S3. To be able to get notifications when new alerts are available, we have event notification configured on top of the buckets, that generate notifications that are pushed to AWS SQS. At this phase, we have the option to use AWS SQS or AWS SNS based on our needs and preferences.
From AWS SQS, we can get notifications and persist them inside AWS DynamoDB for later analyses. The data persistence inside AWS DynamoDB is done using AWS Functions that takes the message and persist it. 
For cases when we can take automatic actions, the AWS SQS can trigger another AWS Lambda function that makes a direct call to the subsidiary datacenter to trigger an action. Another approach would be to do have the subsidiary registered to an AWS SQS or AWS SNS and the notification that automatical actions can be taken.

The dashboard that is used by the security analyst can be hosted inside an AWS EC2 or AWS ECS. It is just a web application where different information is provided to the user. For reporting capabilities, AWS QuickSigth can be used. Custom reports can be generated easily on top of existing data, and the security analyst can drill down inside the data with just a click.
To be able to create a more interactive user experience, where the end user can interact and navigate inside existing data AWS Lex. It enables us to create a powerful bot, that can be used by the end user to put questions in a natural language and navigate inside existing data.

As we can see, building such a solution on top of AWS can be done in different ways. Except the one presented above, we could have another approach using Kafka and Elasticsearch. Keep in mind that our main focus was to define a blueprint of a solution where the customization, configuration and operation part is minimal.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

Cloud Myths: Cloud is Cheaper (Pill 1 of 5 / Cloud Pills)

Cloud Myths: Cloud is Cheaper (Pill 1 of 5 / Cloud Pills) The idea that moving to the cloud reduces the costs is a common misconception. The cloud infrastructure provides flexibility, scalability, and better CAPEX, but it does not guarantee lower costs without proper optimisation and management of the cloud services and infrastructure. Idle and unused resources, overprovisioning, oversize databases, and unnecessary data transfer can increase running costs. The regional pricing mode, multi-cloud complexity, and cost variety add extra complexity to the cost function. Cloud adoption without a cost governance strategy can result in unexpected expenses. Improper usage, combined with a pay-as-you-go model, can result in a nightmare for business stakeholders who cannot track and manage the monthly costs. Cloud-native services such as AI services, managed databases, and analytics platforms are powerful, provide out-of-the-shelve capabilities, and increase business agility and innovation. H...