Blueprint of a transaction monitoring solution on top of AWS and custom ML algorithm for money laundering
In the last post, I presented a possible solution on top of
Microsoft Azure that can be used for real-time analytics on bank accounts using
out of the box services using a custom Machine Learning algorithm. In this post,
we take a look at a similar approach that we can have using AWS Services.
One of the key factors that we took into consideration when
we decided what AWS service to use was to involve minimal configuration and to
be out of the box services.
If you already read the post where I offered an Azure the solution, you should jump to “AWS Approach” section.
Related posts:
Related posts:
Business scenario
Imagine that you are working for a bank that has subsidiaries
in multiple regions around the world. You want to develop a system that can
monitor in real time the bank activities that are happening cross subsidiaries
and identify suspect any suspect transactions or accounts.
Business problem
The bank lacks a solution that can aggregate all the audit
data from the account activity and run a custom Machine Learning system on top
of it. They already did an internal audit cross different subsidiaries and
found a way how they can remove customer-specific information to be able to do
collect and analyse data in one location.
Machine Learning
Algorithm
The bank already develops a custom ML algorithm that can
detect and mark any account or transactions that look, suspect. The solutions
can detect suspect transactions so good, that you decide that in a specific the situation you already take automatic actions like suspending the access to the
account or delay the transaction for a specific time interval for further
investigation.
Technical challenges
The first challenge is to collect all the accounts
activities in a central location, like a stream of data. The current number of
activities per minutes that needs to be ingested are between 40.000 to 150.000.
In the future, the growth rate is estimated to be between 15-20% per year, mainly
because of the online platforms.
The second challenge is to build a solution that can apply
the ML algorithm on a stream of data without required to build a new data
centre. The board approved a test period of 18 months, but they cannot do a
commitment for 5 years until they don’t see that the solution is working as
promised.
The 3rd challenge is to provide a more
user-friendly dashboard, that would allow the security analytics team to
interact with the system easier and to be able to drill down inside data
easier. The current solution has dashboards that cannot be customised too much,
and they would like to be able to write queries in a more human-friendly way.
The last biggest challenge is to aggregate the streams of
data in only one data source that would be processed by the ML algorithm. Each
country or region produces its own data stream of data that needs to be merged
based on the timestamp.
AWS approach
Inside AWS, many services can be used in different
combination to resolve a business problem. There are two important factors
that we need to take into account at this moment.
The first one is related to Machine Learning algorithm.
Having a custom one, we need to be able to define and run their specific algorithm. An
AWS service like SageMaker can be used with success to run custom ML algorithm.
The second item is related to our operation and development
costs. We should use as much as possible services that are out of the shelve,
where the configuration and customisation steps that are kept at minimal.
AWS Blueprint
Inside each subsidiary, we already have a system that stores
audit and logs data inside persistent storage. In parallel with this current the system, we will add another component that can receive all the audit and logs
information that is relevant for us, do a pre-filtering of them and push them inside AWS.
The information that is relevant for us is pushed inside
AWS, landing inside AWS Kinesis Data Stream. We use one instance of AWS Kinesis Data Stream because there is no out of the box service that would allow us to merge multiple data streams. Another option would be to have a data stream for each subsidiary that lands inside AWS Kinesis Data Stream and use Azure Functions to FW the data to amain one, but it does not make sense in the current context.
AWS S3 is used for this purpose because at this moment in
time, AWS SageMaker can consume content only for AWS S3. ASW SageMaker enables
us with almost zero infrastructure and configuration effort to run our custom
Machine Learning algorithm and analyse existing transaction data.
The output of running this algorithm is pushed to a bucket
inside AWS S3. To be able to get notifications when new alerts are available, we
have event notification configured on top of the buckets, that generate
notifications that are pushed to AWS SQS. At this phase, we have the option to
use AWS SQS or AWS SNS based on our needs and preferences.
From AWS SQS, we can get notifications and persist them inside
AWS DynamoDB for later analyses. The data persistence inside AWS DynamoDB is done using AWS Functions that takes the message and persist it.
For
cases when we can take automatic actions, the AWS SQS can trigger another AWS Lambda function that makes a direct call to the subsidiary datacenter to
trigger an action. Another approach would be to do have the subsidiary registered
to an AWS SQS or AWS SNS and the notification that automatical actions can be
taken.
The dashboard that is used by the security analyst can be
hosted inside an AWS EC2 or AWS ECS. It is just a web application where
different information is provided to the user. For reporting capabilities, AWS
QuickSigth can be used. Custom reports can be generated easily on top of
existing data, and the security analyst can drill down inside the data with
just a click.
To be able to create a more interactive user experience, where
the end user can interact and navigate inside existing data AWS Lex. It enables us to create a powerful bot, that can be used by the end user to put questions in a natural language and navigate inside existing data.
As we can see, building such a solution on top of AWS can be
done in different ways. Except the one presented above, we could have another
approach using Kafka and Elasticsearch. Keep in mind that our main focus was to
define a blueprint of a solution where the customization, configuration and
operation part is minimal.
Comments
Post a Comment