Skip to main content

[IoT Home Project] Part 6 - Stream Analytics and Power BI

In this post we will discover how to:
  • Store all sensor data into blob storage
  • Calculate the average temperature and humidity for every 1 minute time-span
  • Store the average temperature and humidity information to to Azure Tables
  • Send the average temperature and humidity information to Power BI
  • Create reports in Power BI to display information for the current day
  • Create a web application that displays Power BI reports

Store all sensor data into blob storage
Once we have sensor data in Azure IoT Hub, we can do anything with them. Let's create a Stream Analytics job that take sensor data from Azure IoT Hub and push it to Azure Blob Storage. Blob Storage is a very cheap place where you can store data. It is perfect to store bulk data.
The first step when you create a job for Stream Analytics is to specify Inputs and Outputs. For now, we will need as Input the IoT Hub. The input can be easily created, especially when Stream Analytic job and your instance of Azure IoT Hub are in the same subscription. I named my input data from IoT Hub - 'inputDeviceData'.
The output of the job will be Azure Blob Storage. For this we'll specify an output to the blob storage where we want to persist our content - ''outputSensorDataBlobStorage'. When you write to Blobs, multiple files are created. Because of this it is required to specify a container, where all the files will be created.
Last think that is necessary to do is to specify the query.  
SELECT
    deviceId as deviceId,
    sensorInf.temp as temp,
    sensorInf.humidity as humidity,
    sensorInf.distance as distance,
    sensorInf.light as light,
    EventEnqueuedUtcTime as receivedTime
INTO
    outputSensorDataBlobStorage
FROM
    inputDeviceData TIMESTAMP BY EventEnqueuedUtcTime
WHERE
    msgType = 'sensorData'
As you can see, in the FROM field we specify the input source and using INFO the ouput (blob storage). Even if in this moment we don't have multiple messages, I want only messages of type 'sensorData' to be stored. To do this we need to use WHERE clause.
In the SELECT we specified the list of fields that we want to store in the output. Do you remember the message format that we send in JSON format from Raspberry? This is how I know the name of each property from INPUT.
As you can see, there is a default property called 'EventEnqueuedUtcTime' that represents the time when the event arrived to Azure IoT Hub. I decided to ignore the device time and use this value as reference.
Final Stream Analytics Query: https://github.com/vunvulear/IoTHomeProject/blob/master/stream-analytics-jobs/dump-sensor-data-to-blob-calculate-avg-dump-to-azuretable-and-powerbi.sql

Calculate the average temperature and humidity for every 1 minute time-span
To calculate the average temperature for a specific time interval, the most simple way is on top of Stream Analytics. Each job allows us to specify a time window where we can apply different aggregation function like average (AVG).
SELECT 
    deviceId as deviceId,
    System.TimeStamp as timeslot,
    AVG(sensorInf.temp) as avgtemp,
    AVG(sensorInf.humidity) as avghumidity,
    AVG(sensorInf.distance) as avgdistance,
    AVG(sensorInf.light) as avglight
INTO 
    outputSensorDataConsolidatedTableStorage
FROM
    inputDeviceData TIMESTAMP BY EventEnqueuedUtcTime
WHERE
    msgType = 'sensorData'
GROUP BY
    deviceId,
    TumblingWindow(second, 60)
As we can see above, I used GROUP BY clause to specify the window frame, in our case is 1 minute. This means that all values for each minute will be aggregated. For sensor data I applied AVG.

Last, but not least, we need to specify the output of our job, that will be in this case an Azure Table. Don't forget to do this job before writing the query. On top of table name you need to specify two additional fields. Partition Key and Row Key. This values represents the properties name from the query (SELECT) that will be used to set Partition Key and Row Key.
This values are mandatory and are important. The combination of this two keys forms key of the entity that is stored in Azure Table. This combination needs to be unique and is used to retrieve data from tables.
For cost optimization you can increase the batch size to 10 or 50. But in our case the impact of cost is low and will force us to see data in Azure Table only after 10 or 50 minutes - we don't want this.
Both queries can be inside the same job.
This is how the table should look like.
Final Stream Analytics Query: https://github.com/vunvulear/IoTHomeProject/blob/master/stream-analytics-jobs/dump-sensor-data-to-blob-calculate-avg-dump-to-azuretable-and-powerbi.sql

Send the average temperature and humidity information to Power BI
For Power BI, the trick is not to send data, there is already a connector build-in Stream Analytics. The trick is create and use Power BI. It is not complicated at all, but is might be something new. For me, it was the first time when I used Power BI as service and I didn't need more than 2 hours to discover all things that I need. 
First, you need to go to https://powerbi.microsoft.com/en-us/ and subscribe for a new account. Once you do this, you can go back to Stream Analytics and create a new output for Power BI. Dataset name and Table name will be later used to create the reports. 

The query that send data to Azure Table and the one used to send data to Power BI looks the same. The only difference is the output. We can calculate the AVG data only once and send it to both outputs. This is done using WITH statement that defines a temporary source - if we can call it in this way. The final query should look like this:
WITH avgdata AS (
SELECT 
    deviceId as deviceId,
    System.TimeStamp as timeslot,
    AVG(sensorInf.temp) as avgtemp,
    AVG(sensorInf.humidity) as avghumidity,
    AVG(sensorInf.distance) as avgdistance,
    AVG(sensorInf.light) as avglight
FROM
    inputDeviceData TIMESTAMP BY EventEnqueuedUtcTime
WHERE
    msgType = 'sensorData'
GROUP BY
    deviceId,
    TumblingWindow(second, 60)
)

SELECT 
    *
INTO 
    outputSensorDataConsolidatedTableStorage
FROM
    avgdata

SELECT 
    *
INTO 
    outputPowerBIDataConsolidated
FROM
    avgdata

Final Stream Analytics Query: https://github.com/vunvulear/IoTHomeProject/blob/master/stream-analytics-jobs/dump-sensor-data-to-blob-calculate-avg-dump-to-azuretable-and-powerbi.sql

Now, that we have defined all inputs and outputs for Stream Analytics, let's take a look how it should look like (see below).

Create reports in Power BI to display information for the current day
Even if this might sounds complicated is extremely simple. If you navigate to Power BI dashboard for your app, you'll be able to find in Datasets 'iot' with a table called 'sensorConsolidated' (see definition of the Stream Analytics Output from the image below). This will be used to generate our reports. 
To display a chart that display the average temperature data you need to:
  • Create a new report in 'Reports' section
  • Select 'Area Chart' ad drag a new chart
  • Select 'avgtemp' and 'timeslot' from fields section
We need time-slot field because the this fields contains the minute for which the average temperature is calculated.

You can create similar reports for other fields also. Feel free to play around with different time of reports. Also, if you want you can drag fields to 'Values' section, that allows us to calculate the min/max/avg for a specific interval. In my case, for each data, as you will see in the image from the next section, where I created a small dashboard. 
I don't want to go to deep in Power BI, because it's a complex subject. The tool is extremity powerful and can be used to create in a few seconds powerful reports.

Create a web application that displays Power BI reports
The cool thing and Power BI is that allows us to export the report to a web page. Basically, from the report we can get the HTML and needs to be added to the page where we want to show our reports. The HTML is an iframe that is populated by Power BI. Pretty cool.
From Report if you go to 'File>Publish To Web' you shall end up in the right location where iframe is generated and ready to copy/paste.
To publish the report we'll create a Web App (Free tier is enough) with an empty page where will past the iframe HTML from Power BI. You'll see on GitHub that I created a empty ASP.NET Core application that I deployed in the Web App. This was only for fun. In this moment index.html page that is under 'wwwroot' is more than enough.
Don't wary if you see that the report is not updated in you web app every second. Inside Power BI there is a cache for reports that updates every 1 hour. 
The web app is live for now - http://vunvulear-iot-portal.azurewebsites.net. I don't know for how long I will keep the web app, but now is working (smile). A print screen can be found below.
The source code for this web app can be found on GitHub: https://github.com/vunvulear/IoTHomeProject/tree/master/webapp-iot-portal
HINT: To be able to set as default page index.html in a ASP.NET Core application you need to navigate to Startup.cs and specify in the Configure method to use default files and load static content. Static content is expected to be found under wwwroot folder. 
        public void Configure(IApplicationBuilder app)
        {
            app.UseDefaultFiles();
            app.UseStaticFiles();
        }
At the end of the post, let's take a look on our web report dashboard.

Conclusion
Each time I'm impressed how we can create and deliver functionality so easy using out-of-the-box services. Stream Analytics and Power BI change the way how we need to write our applications.

Next post
[IoT Home Project] Part 7 - Read/Write data to device twin

Next Step 
In the next post we will start to use device twin to control how often sensor data are send to Azure IoT Hub.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(...

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see...

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provi...