Blueprint of a transaction monitoring solution on top of AWS and custom ML algorithm for money laundering
In the last post, I presented a possible solution on top of Microsoft Azure that can be used for real-time analytics on bank accounts using out of the box services using a custom Machine Learning algorithm. In this post, we take a look at a similar approach that we can have using AWS Services.
One of the key factors that we took into consideration when we decided what AWS service to use was to involve minimal configuration and to be out of the box services.
If you already read the post where I offered an Azure the solution, you should jump to “AWS Approach” section.
Imagine that you are working for a bank that has subsidiaries in multiple regions around the world. You want to develop a system that can monitor in real time the bank activities that are happening cross subsidiaries and identify suspect any suspect transactions or accounts.
The bank lacks a solution that can aggregate all the audit data from the account activity and run a custom Machine Learning system on top of it. They already did an internal audit cross different subsidiaries and found a way how they can remove customer-specific information to be able to do collect and analyse data in one location.
Machine Learning Algorithm
The bank already develops a custom ML algorithm that can detect and mark any account or transactions that look, suspect. The solutions can detect suspect transactions so good, that you decide that in a specific the situation you already take automatic actions like suspending the access to the account or delay the transaction for a specific time interval for further investigation.
The first challenge is to collect all the accounts activities in a central location, like a stream of data. The current number of activities per minutes that needs to be ingested are between 40.000 to 150.000. In the future, the growth rate is estimated to be between 15-20% per year, mainly because of the online platforms.
The second challenge is to build a solution that can apply the ML algorithm on a stream of data without required to build a new data centre. The board approved a test period of 18 months, but they cannot do a commitment for 5 years until they don’t see that the solution is working as promised.
The 3rd challenge is to provide a more user-friendly dashboard, that would allow the security analytics team to interact with the system easier and to be able to drill down inside data easier. The current solution has dashboards that cannot be customised too much, and they would like to be able to write queries in a more human-friendly way.
The last biggest challenge is to aggregate the streams of data in only one data source that would be processed by the ML algorithm. Each country or region produces its own data stream of data that needs to be merged based on the timestamp.
Inside AWS, many services can be used in different combination to resolve a business problem. There are two important factors that we need to take into account at this moment.
The first one is related to Machine Learning algorithm. Having a custom one, we need to be able to define and run their specific algorithm. An AWS service like SageMaker can be used with success to run custom ML algorithm.
The second item is related to our operation and development costs. We should use as much as possible services that are out of the shelve, where the configuration and customisation steps that are kept at minimal.
Inside each subsidiary, we already have a system that stores audit and logs data inside persistent storage. In parallel with this current the system, we will add another component that can receive all the audit and logs information that is relevant for us, do a pre-filtering of them and push them inside AWS.
The information that is relevant for us is pushed inside AWS, landing inside AWS Kinesis Data Stream. We use one instance of AWS Kinesis Data Stream because there is no out of the box service that would allow us to merge multiple data streams. Another option would be to have a data stream for each subsidiary that lands inside AWS Kinesis Data Stream and use Azure Functions to FW the data to amain one, but it does not make sense in the current context.
AWS S3 is used for this purpose because at this moment in time, AWS SageMaker can consume content only for AWS S3. ASW SageMaker enables us with almost zero infrastructure and configuration effort to run our custom Machine Learning algorithm and analyse existing transaction data.
The output of running this algorithm is pushed to a bucket inside AWS S3. To be able to get notifications when new alerts are available, we have event notification configured on top of the buckets, that generate notifications that are pushed to AWS SQS. At this phase, we have the option to use AWS SQS or AWS SNS based on our needs and preferences.
From AWS SQS, we can get notifications and persist them inside AWS DynamoDB for later analyses. The data persistence inside AWS DynamoDB is done using AWS Functions that takes the message and persist it.
For cases when we can take automatic actions, the AWS SQS can trigger another AWS Lambda function that makes a direct call to the subsidiary datacenter to trigger an action. Another approach would be to do have the subsidiary registered to an AWS SQS or AWS SNS and the notification that automatical actions can be taken.
The dashboard that is used by the security analyst can be hosted inside an AWS EC2 or AWS ECS. It is just a web application where different information is provided to the user. For reporting capabilities, AWS QuickSigth can be used. Custom reports can be generated easily on top of existing data, and the security analyst can drill down inside the data with just a click.
To be able to create a more interactive user experience, where the end user can interact and navigate inside existing data AWS Lex. It enables us to create a powerful bot, that can be used by the end user to put questions in a natural language and navigate inside existing data.
As we can see, building such a solution on top of AWS can be done in different ways. Except the one presented above, we could have another approach using Kafka and Elasticsearch. Keep in mind that our main focus was to define a blueprint of a solution where the customization, configuration and operation part is minimal.