Skip to main content

Azure Event Hub (Day 3 of 31)

List of all posts from this series: http://vunvulearadu.blogspot.ro/2014/11/azure-blog-post-marathon-is-ready-to.html

Short Description 
Event Hub is an event aggregator special created for ingress use case. Can be used with success in use cases where we have millions of events from different sources and we need to be able to collect and process them. It can be the best solution for the event collecting part.
All this benefits comes with a price. Features like death letter queue, transaction support or delivery guaranty don’t exist anymore. For this scenario we should use Service Bus Queue or Topic (but they are not as fast as Event Hub).


Main Features 
Streaming Capability
Event Hub has the stream capability, giving us the possibility to consume event as an event stream, connecting it to different big data systems.

AMQP support
Even Hub can be accessed using the REST API over HTTP protocol. Also, we have support for AMQP (it is one of the most use message queue protocol).

Partitioning
Events are grouped in partitioned. On each partition  events are added at the end of the sequence in the order they arrive. Each partition is independent, having different grow rate and retention policy. The number of partition is directly influenced by the consumers and how many consumer we have.
The number if partition in this moment can be between 8 and 32 and cannot be change once created.

Event Data
Represent the content of an event (message body from Service Bus). This event data arrive and are stored in partitions. We can specify for event data the lifetime of them.

Event Consumer
Is represented by an application that consume events from partition. Each partition should have only one event consumer at a time.

Consumer Group
Is the mechanism used to deliver the same content to multiple consumers. Each consumer group has his own offset on the event hub. For example if we have 4 consumer groups, it means that the same event data will be received by all 4 consumers. It is similar with Subscriptions concept from Service Bus Topics.
All event data that are send to event hub will be received by each consumer group.

Partition Offsets
Each consumer can set the stream offsets on partition. This value can be based on timestamp or offset value. It is importan to remember that this value is managed by consumer.
It is recomanded to use timestamps, because the management is made more easilty.

Checkpoints
Can be used with success to ‘commit’ reading from a partitions by a consumer group. Using checkpoints we can specify how much of events we were able to consume with success. In case of something bad happens, we can use this checkpoints to continue the event data processing from the last checkpoint.

Event Publisher
Are represented by device (senders) that send events to Event Hub.

Security
The access to event hub (from event publishers) is based on Shared Access Signature (SAS). Don’t forget that when you create a token for Event Publisher you should give only send access rights. The token can already contains the partition key, otherwise, when we send data to the Event Hub we need to specify the partition key.

Events Data Size
The maximum size of an event data of a batch that contains multiple event data is 256KB.

AMQP vs HTTP
First connection over AMQP is more expensive, because is bidirectional socket where a secure channel is establish. But once created it is less expensive to send event data (like a session, we don’t need to set the secure channel for each new event data.
HTTP/S is less expensive to send event data, but for each event data we need to establish the secure channel.
AMQP can be used with success when we send data to Event Hub in a constant manner. HTTP should be used when we send data only from time to time.

Partitions (Partitions Keys) 
Are used to group events for event consumer. We can group events based on our own business model (device type, location and so on).
The partitions themselves are not relevant for Event Publisher, they only need to send the partition key that will be used to group event data in partitions.
For consumer groups, the partitions are very important, because each consumer group can consume messages from a specific partition.

Throughput Capacity
Each Event Hub has Throughput Unit that controls how many event data can be processed by an Event Hub. In this moment a Throughput Unit is defined by:

  • 1MB or 1K events per second for Ingress
  • 2MB for Egress

From the management portal we have full control on how many Throughput Units we want to allocate.

Black List
We have the ability to put a device (the access token of the device) in a black list. When a device is in the black list he cannot access the Event Hub anymore (send event data).

Partition Key
It is used to distribute event data for partitions. All event data with the same partition key will be send to one partition. If we don’t specify the partition key, the events data will be send to partitions in a round robin manner.

IEventProcessor
This interface can be used with success when we want to define Event Consumers entities.

Message Retention
Messages will be stored on Event Hub for a specific time interval. Once this time expired messages are removed automatically. One of the benefits of this mechanism is not only the cleaning mechanism but also you can process messages more than one based using Partition Offset.

Limitations 
Maxim size of event data (256KB) can be seen as a limitation, but is not. We are talking about a transport platform that needs to manage millions of messages per second. It make sense to work with small units.
The number of partitions (32) and Service Bus Brokered Connections (1000) is limited. We have the ability to request more.
Only one consumer per consumer group and partitions. Yes, is it okay. Why? We have the stream capability, and a stream cannot be spitted. The concept is different in comparison with Service Bus Topic or Queue.
We don’t have support for sequencing, dead-lettering, transaction support and strong delivery assurances.
The maximum number of consumers on a partition from a consumer group is 5. The recommended value is one.

Applicable Use Cases 
Below you can find 4 use cases where Event Hub can be used with success.

Telemetry data
If you need to collect telemetry data from devices, that you could collect all this data over event hub. Event Hub can be the perfect channel for collection data. We can plug it to different big data ingest systems.

Audit Information
Event Hub can be used with success to collect audit data from devices that are on field. Can be useful when we scale from 10k devices to 1M devices or when we execute commands on devices and the audit level increase drastically.

GPS Location
It can be the perfect channel to collect the GPS location of devices. This is a use cases when we can afford to lose from time to time 1 or 2 GPS positions.

Device Status
When we need to collect the device status at a specific time interval, event hub is a cheap and simple way to collect it.

Code Sample 
// Create event hub
EventHubDescription hubDescr = new EventHubDescription("foohub");
hubDescr.PartitionCount = 16;
namespaceManager.CreateEventHubAsync(hubDescr).Wait();

// Create publisher
EventHubClient hubClient = EventHubClient.Create("foohub");

// Send event data
FooEventData fooData = new FooEventData()
{
    DeviceId = 9997,
    Location = "12345.242423"
}
EventData data = new EventData(fooData, fooSerialized) 
    {
       PartitionKey = info.DeviceId.ToString()
    };
    
hubClient.Send(data);

// Create consumer for messages from last day
EventHubReceiver hubConsumer = await defaultConsumerGroup
    .CreateReceiverAsync(
        shardId: fooPartitionId, 
        startingDateTimeUtc : DateTime.UtcNow.AddDays(-1)); 
        
// Consume a message
var message = await hubConsumer.ReceiveAsync();

// Connect to an event processor using a consumer group
EventProcessorHost host = new EventProcessorHost(
    WorkerName, EventHubName, defaultConsumerGroup.GroupName, eventHubConnectionString, blobConnectionString);
            host.RegisterEventProcessorAsync<SimpleEventProcessor>();


Pros and Cons 

Pros

  • Log millions of events per second 
  • Simple authorization mechanism 
  • Time-based event buffering 
  • Elastic scale 
  • Pluggable adapters for other cloud services

Cons

  • No all features from Service Bus Topic exist (but is acceptable)
  • The size of the event data is limited to 256KB (the size is pretty okay)
  • Number of partitions is limited to maximum 32


Pricing
When we calculate the price of Event Hub we should take into account the fallowing components:

  • Outbound traffic
  • Throughput units
  • Ingress Events count
  • Event Data Storage (1day is free, additional days cost)
  • Number of Service Bus brokered connections needed


Conclusion
In conclusion I would say that Event Hub it is the perfect solution in IoT world and can be very useful when we need to manage millions of events (messages) per second.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

How to audit an Azure Cosmos DB

In this post, we will talk about how we can audit an Azure Cosmos DB database. Before jumping into the problem let us define the business requirement: As an Administrator I want to be able to audit all changes that were done to specific collection inside my Azure Cosmos DB. The requirement is simple, but can be a little tricky to implement fully. First of all when you are using Azure Cosmos DB or any other storage solution there are 99% odds that you’ll have more than one system that writes data to it. This means that you have or not have control on the systems that are doing any create/update/delete operations. Solution 1: Diagnostic Logs Cosmos DB allows us activate diagnostics logs and stream the output a storage account for achieving to other systems like Event Hub or Log Analytics. This would allow us to have information related to who, when, what, response code and how the access operation to our Cosmos DB was done. Beside this there is a field that specifies what was th...

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see...