The last posts covered blueprints of a money laundering solution for Microsoft Azure and AWS. You can check the two posts if you want to find more about the proposed solutions below:
- Blueprint of a transaction monitoring solution on top of AWS and custom ML algorithm for money laundering
- Blueprint of a transaction monitoring solution on top of Azure and custom ML algorithm for money laundering
- Comparison of a money laundering solution build on top of AWS and Azure
The blueprints were designed, having in mind two key elements.
- 1. Reduce the operation and management cost at minimal
- 2. Reduce at minimum the configuration and development costs by using out-of-the-box cloud services as much as possible
In this post, we will make a comparison between what services we used for each layer, trying to identify the key differential factors between each of them.
For this purpose, we used Azure Event Hub and AWS Kinesis Data Stream. Both services are capable of ingesting an event stream of data over HTTPS. Azure Event Hub has the advantage to support AMQP protocols that can offer us a stateful connection between the bank data centres and us. On the other side, using Amazon Kinesis, we can push payload that has 1MB in comparison with Azure Event Hub where the maximum size of a single request is limited to 265KB.
The audit and logs of events are kept inside Azure Event Hub for maximum 7 days, in comparison with AWS Kinesis where are limited to maximum 1 day. This can be an essential factor at the moment when our backend system is throttled, and we need to define NFR related to the recovery time.
Inside AWS Kinesis Data Stream, you can't define multiple consumers. You can have only one consumer of the stream. Azure Event Hub provides us with the concept of consumer groups, allowing us to extend the platform and hook other listeners the stream of audit and logs.
The monitoring capabilities of AWS Kinesis is fully integrated with AWS CloudWatch that enables us to monitor the instances. Azure Event Hub has only basic monitoring capabilities that are well integrated with Azure Monitoring.
To be able to aggregate audit streams from multiple locations, we would use AWS Kinesis Data Firehose or Azure Event Grid. At this moment in time, Azure is offering to us Azure Event Grid that can be used with success to aggregate content from multiple Azure Event Hubs, combining the data streams with content from other sources and push the output to another instance of Azure Event Hub.
This service enables us to aggregate in one stream of data all the transactions and execute data filtering. Situations when logs or audit data are not in the right format from all the datacenters, it’s a common situation, especially when the system lifecycle is independent in each subsidiary. This can be handled easily using a filter that can push the invalid content to another stream where data transformation can be executed.
AKS Kinesis is not supporting the capability to merge multiple Kinesis streams. A classical approach using AWS Lambda is available but requires substantial computation power, and extra cost for the implementation in comparison with a solution build on top of Azure Event Grid. Because of this, the approach build on top of AKS relies only on one instance of AKS Kinesis Data Stream. To avoid the use of AWS Lambda between multiple instances of ASK Kinesis Data Streams.
Custom ML algorithm
Inside Azure, we can combine Azure Stream Analytics and Azure Machine Learning service in one solution, where the stream of data is processed inside Azure Stream Analytics applying a custom machine learning solution hosted and running inside Azure Machine Learning. There is no need to store the data stream inside storage before applying the data analytic part.
For Machine Learning part of AWS package, we can use SageMaker - that is enabling us to run our custom ML algorithm. Before fetching data to SageMaker, it is required for us to push the transaction information from AWS Kinesis to AWS S3.
At this moment in time, SageMaker can consume data only from S3, in comparison with Azure Machine Learning that it is well integrated with Azure Stream Analytics. We are forced to push the output of Kinesis Data Firehose to AWS S3, because of a current limitation of AWS SageMaker that can consume content only from AWS S3 Storage.
Both Machine Learning solutions are robust and flexible, offering a great experience from the UI and UX. From some perspective, you could say that Azure Machine Learning is more for citizen data scientists in comparison with SageMaker that looks like a solution closer to the developers and data scientists. At the same time, don’t forget that both of them are supporting R language and Jupyter Notebook interfaces are available (SageMaker) or can be imported (Azure Machine Learning). From a features perspective, both platforms are supporting anomaly detection, ranking, recommendation and many more.
For transactions that are marked as suspect, both blueprints are using a based messaging solution to send a notification to other systems. For Azure, we used an ESB service offered out of the box – Azure Service Bus, that provides us with the capability to use the subscriber concept. In this way, we can hook multiple subscribers dynamically. For enterprise applications, where a reliable message broker is required, Azure Service Bus is the way to do it. It is a good starting point solution for most of the situations.
Similar to Azure, AWS is offering a wide variety of options related to messaging solutions. Inside the blueprint, the initial concept was based using one AWS SQS instances, where the AWS Lambda that consumes the messages would decide if the subsidiary system needs to be notified. The solution can be extended with success using AWS MQ that it provides similar functionality of an ESB system.
In both solutions, we rely on serverless architecture that can scale dynamically, and we pay only for what we consume. On AWS, we rely on Lambda functions to run our logic. They are flexible enough to provide us with connectivity with almost any kind of internal or external system. The same functionality is implemented inside Azure using Azure Functions. From an implementation point of view, there are differences related to how we need to design our serverless solutions. Inside Azure Functions, we need to work with bindings, triggers in comparison with AWS where the hooks are done using handler and context. The concepts are similar, but the way how you can use it is a little different.
Both services are offering similar functionality and set of features. The services are evolving so fast, that comparison them becomes absolute in just a few days. We need to remember that even if they are offering the same functionality, the way how you implement the logic inside AWS Lambda is different from the way how you do it inside Azure Functions, mainly because of the handles, bindings, triggers and context.
The blueprints are using a NoSQL approach using documents. Azure Cosmos DB and AWS DynamoDB are canonical databased offered by the two cloud providers. You will find that Azure Cosmos DB offer includes multiple types of database model not only document store and a key-value store. This does not mean that inside AWS you don’t have an option to graph storage or wide column store. AWS offers them as a different service – AWS Neptune and AWS Cassandra.
The most significant difference between them is the consistency model. Inside AWS DynamoDB there are only 2 consistency levels in comparison with Azure Cosmos DB where there are 5 different levels. Inside Azure, the cost of running the solution with different consistency levels is the same in comparison with AWS DynamoDB, where the impact of the consistency level can be identified in the running cost of the service.
The dashboards are hosted inside AWS EC2 and Azure Web Apps. Azure Web Apps is offering a more SaaS approach, where you need to deploy the web application and you are done. The configuration, infrastructure and security part are managed by Azure by default, and the development team can focus on the content. Inside AWS, even if you are using EC2, you already have templates that can configure your machine with the right distribution of Linux and Apache.
Another approach on the web interface would be to use microservices when you could host your web application inside AWS ECS or Azure Kubernetes Services. The blueprint does not cover the load balancer and caching topic, where both providers are offering strong solutions Application Load Balancer together with Route 53 inside AWS or Azure Load Balancer and Traffic Manager inside Azure.
For bot integration, the blueprint proposed SaaS solution from both providers – AWS Lex and Azure Bot Service.
AWS Lex is well integrated with the AWS ecosystem and supports a wide variety of technology stacks with text and speech support. The integration with AWS Lambda and mobile applications offers an out of the box solution for our case. The limitations that could affect our business use cases is the multilingual support and the data set preparation and mapping it’s not so easy. When you need AWS Lex inside a web application things could be slowed that you would expect at the beginning.
The bot frameworks offered by Microsoft inside Azure has support for multiple languages, it is well integrated with multiple channels, and it’s built on top LUIS API. The documentation is excellent, and the ramp-up phase is much faster in comparison with AWS. The biggest drawback is the SDK limitation, where we have libraries for C# and Node.JS, for the rest we need to use REST API.
Both blueprints are using reporting capabilities offered out-of-the-box. PowerBI is well known inside Microsoft ecosystem, being well integrated with different systems. It’s super easy to use, and the ML layer that can analyse your report data can help users to discover insights that they were not aware until then.
Amazon QuickSight is offering similar capabilities like PowerBI, including ML capabilities. The interesting fact is the payment mechanism that is per session. From the learning curve, the Amazon solution is more easilty to use, but the number of features is not high.
Looking at the blueprints, they are similar from the type of cloud services that were used as building blocks. The functionality and capabilities of each cloud provider are almost the same. The differences are not high and you can accomplish the same things using both cloud providers. You need to be aware of the small differences that might affect the way how you design your solution. A good example is connectors supported by AWS SageMaker and Azure Event Grid equivalent inside AWS.