Skip to main content

Blueprint of a transaction monitoring solution on top of Azure and custom ML algorithm for money laundering



Business scenario
Imagine that you are working for a bank that has subsidiaries in multiple regions around the world. You want to develop a system that can monitor in real time the bank activities that are happening cross subsidiaries and identify suspect any suspect transactions or accounts.

Machine Learning Algorithm
The bank already develops a custom ML algorithm that can detect and mark any account or transactions that looks, suspect. The solutions can detect suspect transactions so good, that you decide that in a specific situation you already take automatic actions like suspending the access to the account or delay the transaction for a specific time interval for further investigation.

Business problem
The bank lacks a solution that can aggregate all the audit data from the account activity and run a custom Machine Learning system on top of it. They already did an internal audit cross different subsidiaries and found a way how they can remove customer-specific information to be able to do collect and analyse data in one location.

Technical challenges
The first challenge is to collect all the accounts activities in a central location, like a stream of data. The current number of activities per minutes that needs to be ingested are between 40.000 to 150.000. In the future, the growth rate is estimated to be between 15-20% per year, mainly because of the online platforms.
The second challenge is to build a solution that can apply the ML algorithm on a stream of data without required to build a new data centre. The board approved a test period of 18 months, but they cannot do a commitment for 5 years until they don’t see that the solution is working as promised.
The 3rd challenge is to provide a more user-friendly dashboard, that would allow the security analytics team to interact with the system easier and to be able to drill down inside data easier. The current solution has dashboards that cannot be customised too much, and they would like to be able to write queries in a more human-friendly way.
The last biggest challenge is to aggregate the streams of data in only one data source that would be processed by the ML algorithm. Each country or region produces its own data stream of data that needs to be merged based on the timestamp.

Cloud approach
An approach for such a solution would be to use Microsoft Azure or AWS and build a platform on top of the cloud. There are no upfront costs and there are a lot of SaaS services inside them that can enable fast implementation. The most significant advantage in this context is that the bank already received the green light from an external audit company that is allowing them to push content outside subsidiary regions as long as the customer identification information is removed.

Solution overview on top of Microsoft Azure
In the next section, let’s take a look at a solution build on top of Microsoft Azure, where we try to identify the key services that can enable us to build a platform that can scale easier, required low initial investment and can operation costs are kept as low as possible.
Azure Blueprint
Inside each subsidiary, we already have a local system that can provide us with a stream of audit and logs information. This is our entry point that can be used used to collect the account activities. Inside each subsidiary datacenter, a custom component needs to be installed that will:
     1 Collect relevant activities
     2 Remove customer identification information
     3 Push the stream of data inside Microsoft Azure

Because we have multiple subsidiaries that need to push content to the platform, the best approach is to use a dedicated Azure Event Hub for each of them. Each subsidiary has its instance of Azure Event Hub where activities are pushed.
All the activities are collected in a central Azure Event Hub by an instance of Azure Event Grid. Azure Event Grid is capable of merging, or the stream of data an connect them to external references that are stored inside Azure Storage. This gives us the flexibility to do a pre-processing or transformation in the future and connect other data sources without changing the platform architecture.

The main instance of Azure Event Hub is connected to Azure Stream Analytics that becomes our central location where the real magic is happening. Azure Stream Analytics is allowing us to connect it to Azure Machine Learning solution and analyse the activity stream using our custom algorithm hosted inside it.
The combination of Azure ML and Azure Stream Analytics enable us to have a system that can apply our ML algorithm on top of a stream of data without requiring us to do any custom configuration or development. This is the most significant advantage offered by Microsoft Azure to our platform and is the differential factor.

The number of activities that are marked as suspected is pretty low, under 0.05%. We need to take into account that for a part of them will land in a repository where security analysts team will review them, and others will need to trigger specific actions.
A good candidate is Azure Service Bus, that would allow us to hook multiple consumers. Consumers that can trigger automated actions can filter the output from Azure Stream Analytics and accept only suspicious activities where the confidence is over a specific threshold, and automatic actions can be taken.
Information from Azure Service Bus is pushed in two different systems using Azure Functions. The first one is to a dedicated subscription consumed by an Azure Function for automatic actions can be taken. This function makes calls to a REST API exposed by each subsidiary that it is used to suspend accounts or delay transactions.
The second Azure Functions consume content from another Azure Service Bus subscription that pushes the content to a repository like Azure Cosmos DB that it is used by PowerBI to generate custom reports on top of it and a dashboard for monitoring used by the security analyst. On top of this, a bot developed on top of Azure Bot Service it is used by the security analyst team to query the storages and extract insights related to different items.

Conclusion
As we can see from the infrastructure and services used perspective, the proposed solution is using services offered in a SaaS and PaaS model. Using this approach, the initial investment is kept as low a possible, and most of the issues related to scalability, operation and similar activities are solved out of the box.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provided a too