Blueprint of a transaction monitoring solution on top of Azure and custom ML algorithm for money laundering
In this post, we talk about how we can use the cloud to enable us to do real-time analytics of bank account activities using our own
custom Machine Learning algorithm.
Related posts:
Related posts:
Business scenario
Imagine that you are working for a bank that has subsidiaries
in multiple regions around the world. You want to develop a system that can
monitor in real time the bank activities that are happening cross subsidiaries
and identify suspect any suspect transactions or accounts.
Machine Learning Algorithm
The bank already develops a custom ML algorithm that can
detect and mark any account or transactions that looks, suspect. The solutions
can detect suspect transactions so good, that you decide that in a specific
situation you already take automatic actions like suspending the access to the
account or delay the transaction for a specific time interval for further
investigation.
Business problem
The bank lacks a solution that can aggregate all the audit
data from the account activity and run a custom Machine Learning system on top
of it. They already did an internal audit cross different subsidiaries and
found a way how they can remove customer-specific information to be able to do
collect and analyse data in one location.
Technical challenges
The first challenge is to collect all the accounts
activities in a central location, like a stream of data. The current number of
activities per minutes that needs to be ingested are between 40.000 to 150.000.
In the future, the growth rate is estimated to be between 15-20% per year, mainly
because of the online platforms.
The second challenge is to build a solution that can apply
the ML algorithm on a stream of data without required to build a new data
centre. The board approved a test period of 18 months, but they cannot do a
commitment for 5 years until they don’t see that the solution is working as
promised.
The 3rd challenge is to provide a more
user-friendly dashboard, that would allow the security analytics team to
interact with the system easier and to be able to drill down inside data
easier. The current solution has dashboards that cannot be customised too much,
and they would like to be able to write queries in a more human-friendly way.
The last biggest challenge is to aggregate the streams of
data in only one data source that would be processed by the ML algorithm. Each
country or region produces its own data stream of data that needs to be merged
based on the timestamp.
Cloud approach
An approach for such a solution would be to use Microsoft
Azure or AWS and build a platform on top of the cloud. There are no upfront
costs and there are a lot of SaaS services inside them that can enable fast
implementation. The most significant advantage in this context is that the bank
already received the green light from an external audit company that is
allowing them to push content outside subsidiary regions as long as the
customer identification information is removed.
Solution overview on top of Microsoft Azure
In the next section, let’s take a look at a solution build
on top of Microsoft Azure, where we try to identify the key services that can
enable us to build a platform that can scale easier, required low initial
investment and can operation costs are kept as low as possible.
Azure Blueprint
Inside each subsidiary, we already have a local system that
can provide us with a stream of audit and logs information. This is our entry
point that can be used used to collect the account activities. Inside each
subsidiary datacenter, a custom component needs to be installed that will:
1 Collect relevant activities
2 Remove customer identification information
3 Push the stream of data inside Microsoft Azure
Because we have multiple subsidiaries that need to push
content to the platform, the best approach is to use a dedicated Azure Event
Hub for each of them. Each subsidiary has its instance of Azure Event Hub where
activities are pushed.
All the activities are collected in a central Azure Event
Hub by an instance of Azure Event Grid. Azure Event Grid is capable of merging,
or the stream of data an connect them to external references that are stored
inside Azure Storage. This gives us the flexibility to do a pre-processing or
transformation in the future and connect other data sources without changing
the platform architecture.
The main instance of Azure Event Hub is connected to Azure
Stream Analytics that becomes our central location where the real magic is
happening. Azure Stream Analytics is allowing us to connect it to Azure Machine
Learning solution and analyse the activity stream using our custom algorithm
hosted inside it.
The combination of Azure ML and Azure Stream Analytics
enable us to have a system that can apply our ML algorithm on top of a stream
of data without requiring us to do any custom configuration or development.
This is the most significant advantage offered by Microsoft Azure to our
platform and is the differential factor.
The number of activities that are marked as suspected is
pretty low, under 0.05%. We need to take into account that for a part of them
will land in a repository where security analysts team will review them, and
others will need to trigger specific actions.
A good candidate is Azure Service Bus, that would allow us
to hook multiple consumers. Consumers that can trigger automated actions can
filter the output from Azure Stream Analytics and accept only suspicious
activities where the confidence is over a specific threshold, and automatic
actions can be taken.
Information from Azure Service Bus is pushed in two
different systems using Azure Functions. The first one is to a dedicated
subscription consumed by an Azure Function for automatic actions can be taken.
This function makes calls to a REST API exposed by each subsidiary that it is
used to suspend accounts or delay transactions.
The second Azure Functions consume content from another Azure
Service Bus subscription that pushes the content to a repository like Azure Cosmos
DB that it is used by PowerBI to generate custom reports on top of it and a
dashboard for monitoring used by the security analyst. On top of this, a bot
developed on top of Azure Bot Service it is used by the security analyst team
to query the storages and extract insights related to different items.
As we can see from the infrastructure and services used perspective,
the proposed solution is using services offered in a SaaS and PaaS model. Using
this approach, the initial investment is kept as low a possible, and most of
the issues related to scalability, operation and similar activities are solved
out of the box.
Comments
Post a Comment