Skip to main content

Blueprint of a transaction monitoring solution on top of AWS and custom ML algorithm for money laundering

In the last post, I presented a possible solution on top of Microsoft Azure that can be used for real-time analytics on bank accounts using out of the box services using a custom Machine Learning algorithm. In this post, we take a look at a similar approach that we can have using AWS Services.
One of the key factors that we took into consideration when we decided what AWS service to use was to involve minimal configuration and to be out of the box services.

Business scenario
Imagine that you are working for a bank that has subsidiaries in multiple regions around the world. You want to develop a system that can monitor in real time the bank activities that are happening cross subsidiaries and identify suspect any suspect transactions or accounts.

Business problem
The bank lacks a solution that can aggregate all the audit data from the account activity and run a custom Machine Learning system on top of it. They already did an internal audit cross different subsidiaries and found a way how they can remove customer-specific information to be able to do collect and analyse data in one location.

Machine Learning Algorithm
The bank already develops a custom ML algorithm that can detect and mark any account or transactions that look, suspect. The solutions can detect suspect transactions so good, that you decide that in a specific the situation you already take automatic actions like suspending the access to the account or delay the transaction for a specific time interval for further investigation.

Technical challenges
The first challenge is to collect all the accounts activities in a central location, like a stream of data. The current number of activities per minutes that needs to be ingested are between 40.000 to 150.000. In the future, the growth rate is estimated to be between 15-20% per year, mainly because of the online platforms.
The second challenge is to build a solution that can apply the ML algorithm on a stream of data without required to build a new data centre. The board approved a test period of 18 months, but they cannot do a commitment for 5 years until they don’t see that the solution is working as promised.
The 3rd challenge is to provide a more user-friendly dashboard, that would allow the security analytics team to interact with the system easier and to be able to drill down inside data easier. The current solution has dashboards that cannot be customised too much, and they would like to be able to write queries in a more human-friendly way.
The last biggest challenge is to aggregate the streams of data in only one data source that would be processed by the ML algorithm. Each country or region produces its own data stream of data that needs to be merged based on the timestamp.

AWS approach
Inside AWS, many services can be used in different combination to resolve a business problem. There are two important factors that we need to take into account at this moment.
The first one is related to Machine Learning algorithm. Having a custom one, we need to be able to define and run their specific algorithm. An AWS service like SageMaker can be used with success to run custom ML algorithm.
The second item is related to our operation and development costs. We should use as much as possible services that are out of the shelve, where the configuration and customisation steps that are kept at minimal.

AWS Blueprint
Inside each subsidiary, we already have a system that stores audit and logs data inside persistent storage. In parallel with this current the system, we will add another component that can receive all the audit and logs information that is relevant for us, do a pre-filtering of them and push them inside AWS. 

The information that is relevant for us is pushed inside AWS, landing inside AWS Kinesis Data Stream. We use one instance of AWS Kinesis Data Stream because there is no out of the box service that would allow us to merge multiple data streams. Another option would be to have a data stream for each subsidiary that lands inside AWS Kinesis Data Stream and use Azure Functions to FW the data to amain one, but it does not make sense in the current context.

AWS S3 is used for this purpose because at this moment in time, AWS SageMaker can consume content only for AWS S3. ASW SageMaker enables us with almost zero infrastructure and configuration effort to run our custom Machine Learning algorithm and analyse existing transaction data.

The output of running this algorithm is pushed to a bucket inside AWS S3. To be able to get notifications when new alerts are available, we have event notification configured on top of the buckets, that generate notifications that are pushed to AWS SQS. At this phase, we have the option to use AWS SQS or AWS SNS based on our needs and preferences.
From AWS SQS, we can get notifications and persist them inside AWS DynamoDB for later analyses. The data persistence inside AWS DynamoDB is done using AWS Functions that takes the message and persist it. 
For cases when we can take automatic actions, the AWS SQS can trigger another AWS Lambda function that makes a direct call to the subsidiary datacenter to trigger an action. Another approach would be to do have the subsidiary registered to an AWS SQS or AWS SNS and the notification that automatical actions can be taken.

The dashboard that is used by the security analyst can be hosted inside an AWS EC2 or AWS ECS. It is just a web application where different information is provided to the user. For reporting capabilities, AWS QuickSigth can be used. Custom reports can be generated easily on top of existing data, and the security analyst can drill down inside the data with just a click.
To be able to create a more interactive user experience, where the end user can interact and navigate inside existing data AWS Lex. It enables us to create a powerful bot, that can be used by the end user to put questions in a natural language and navigate inside existing data.

As we can see, building such a solution on top of AWS can be done in different ways. Except the one presented above, we could have another approach using Kafka and Elasticsearch. Keep in mind that our main focus was to define a blueprint of a solution where the customization, configuration and operation part is minimal.

Comments

Popular posts from this blog

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine:
threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration:

TeamCity.NET 4.51EF 6.0.2VS2013
It seems that there …

Entity Framework (EF) TransactionScope vs Database.BeginTransaction

In today blog post we will talk a little about a new feature that is available on EF6+ related to Transactions.
Until now, when we had to use transaction we used ‘TransactionScope’. It works great and I would say that is something that is now in our blood.
using (var scope = new TransactionScope(TransactionScopeOption.Required)) { using (SqlConnection conn = new SqlConnection("...")) { conn.Open(); SqlCommand sqlCommand = new SqlCommand(); sqlCommand.Connection = conn; sqlCommand.CommandText = ... sqlCommand.ExecuteNonQuery(); ... } scope.Complete(); } Starting with EF6.0 we have a new way to work with transactions. The new approach is based on Database.BeginTransaction(), Database.Rollback(), Database.Commit(). Yes, no more TransactionScope.
In the followi…

GET call of REST API that contains '/'-slash character in the value of a parameter

Let’s assume that we have the following scenario: I have a public HTTP endpoint and I need to post some content using GET command. One of the parameters contains special characters like “\” and “/”. If the endpoint is an ApiController than you may have problems if you encode the parameter using the http encoder.
using (var httpClient = new HttpClient()) { httpClient.BaseAddress = baseUrl; Task<HttpResponseMessage> response = httpClient.GetAsync(string.Format("api/foo/{0}", "qwert/qwerqwer"))); response.Wait(); response.Result.EnsureSuccessStatusCode(); } One possible solution would be to encode the query parameter using UrlTokenEncode method of HttpServerUtility class and GetBytes method ofUTF8. In this way you would get the array of bytes of the parameter and encode them as a url token.
The following code show to you how you could write the encode and decode methods.
publ…