In this post we will talk about how we can group charts, dashboards and reports in different categories based on how fast we need to be able to ingest and update data inside them.
Time is a relative term especially when you put it together with business insights and application reporting. There are two important aspects related to time from business and application insights perspective.
1. Time Granularity
The first one is related to the time granularity. In the beginning, most of the business stakeholders require the granularity to be as small as possible, until they realize that there is not too much inside of that and that in order to be able to understand something, the time granularity needs to be increased.
Most of the systems that are available now on the market allow us to change the time internal (granularity) on the fly, enabling us to navigate inside our data from a different time perspective.
2. Time Interval
The second aspect is the time interval, from the moment the data arrives inside our storage system until the moment it can be displayed insight a specific report. I’m not saying that it is impossible to have all the data in moment 0 inside your reports, but it can be extremely expensive and you might not even need it.
From my experience, I’ve seen business stakeholders that at the beginning they would say that this data needs to be displayed inside the reports immediately. After 2 or 3 meetings they would realize that the report is generated for the previous week and there is no extra value if the data for the report is ready immediately or after 1 day. Of course, this has a direct impact on the cost and on how you implement the solution. Also terms like immediately or real-time is relative. Each stakeholder has a different understanding of it, even when they are in the same team. This is why it is important to clarify these from the beginning.
Taking this into considerations, I started to group reports and any kind of application insights in 4 different categories:
These categories are defined based on the time interval that is allowed to exist from the moment the data arrives inside the backend system until the moment when it can be displayed inside the charts (reports).
It is important to know that a specific chart can be in multiple categories (e.g. Near-real time and reporting). Even so, you should look, as it would be a different chart for each category. The solutions that you might need to use might be different for each category.
Real Time
In general, I group any kind of insights that needs to be displayed, inside a chart or dashboard with a maximum of 2-3 seconds delays. These are the kind of views that are used by maintenance and support team to get real time insights about the system.
In general, when you have such a system you will try to modify in any ways the input data. Not because this is not possible, but because any ETL process might increase the latency between the moment when you receive the data and the moment that you can display it.
Nowadays, it is a real trend for companies to require this kind of charts, but the reality is that not to many of them really need a real time chart. For example, if you don’t have a team that is monitoring 24/7 the system, there is not too much use in having a chart like this.
Additional to this not all the data that is produced by a system needs to be inside a real time chart. A good example is when you monitor a pool of servers. You care about their state and you might want to know the current state of each server and the applications that are running inside of them. There is no extra value if you see in real time that the number of processes that are running on the servers is changing or other similar things. Except when you are debugging it, but for this we have Performance Counters and other ways to collect metrics.
Real time charts are used especially in trading and monitor production lines inside factories for example. In most of the cases even if the initial requirement is defined around real time, once you start to clarify the business use case and what is required you realize that it is not nothing more than a near-real time chart.
Near-real Time
This is that kind of chart where data is updated every few seconds. The latency from the moment the data arrives in the backend until the moment when it is displayed is less than 30-60 seconds. Most of the real time charts are in reality near-real time, where there is no impact on the business if data is displayed with a latency of 30-60 seconds.
From the latency perspective, I would even say that even a few minutes latency (2-5 minutes) is still in near-real time. For this kind of system, the most important element is not the updated time interval, but the ability to drill inside the data and get different perspectives for the same data. In this way, the support team will be able to identify issues before they even happen.
Most of the current reporting and monitoring tools that are available nowadays are part of this category. From a running cost perspective, the difference between a real-time and a near-real time system can be 3x or even 5x.
A pretty new but powerful tool that can be used for this kind of charts and insights, is Azure Time Series Insights that has great capabilities and can dynamically change the way you look at data.
Reporting
This is the category of charts and reporting capability that I usually like to call it a classical. This are that kind of reports that can be generated every 1 hour, 1 day or every few days. There is nothing special from this perspective.
The new versions of systems that are available for this category are offering the ability to create dynamic reports where an user can drill down inside the data based on their needs.
Consolidation
Many times this category overlaps with the previous one. This is happening because the tools that are used to create consolidation charts are in general the same with the one used for reporting. There are only some edge cases, when because of the data volume and complexity, Hadoop and other similar tools are used to pre-process data before pushing them inside the reporting systems.
The charts that are part of the consolidation category are those kind of charts that you generate one time per week, month or quarter and that are used by the business stakeholders to get a high level view on their business and to get a status.
Mixing them
Reporting and Consolidation categories can be mixed with success in the same tool, allowing users to have an high-level overview, but in the same time being able to drill down inside the data.
In some cases, Near-real time and Reporting categories can be combined, but I don’t recommend this. The biggest problem is that because of the different perspectives that can exist. If for the Near-real time, most of the charts are around time perspective, for Reporting, the charts can be around other points, not only related to time.
From Real time, it is easy to have perspective from a near-real time perspective and this two go hand in hand.
Conclusion
Based on the data refresh time and how fast we can process new data, we can generate a different perspective of the same chart. Even if it is trendy to be able to display real/near time application insights, ask yourself if this is really required and what are the tradeoffs and costs. There is no sense to offer this kind of solutions if there is no real business requirement nor it adds extra value.
Additionally to this, it is not the same thing to store real time data that is sent with a frequency of 10 milliseconds vs 5 minutes. Storage and processing costs are different. Also the tools and mechanisms that are used for this will be different.
YES, a solution can have insights from all this 4 categories. YES, you might even find ways to store them in the same storage type or system. And YES, you’ll need to be able to create static data points for Reporting and Consolidation category. You might not want to process 100GB of data where data frequency is 10 milliseconds to generate a report for the last 3 months where data point perspective is at 1 day.
Time is a relative term especially when you put it together with business insights and application reporting. There are two important aspects related to time from business and application insights perspective.
1. Time Granularity
The first one is related to the time granularity. In the beginning, most of the business stakeholders require the granularity to be as small as possible, until they realize that there is not too much inside of that and that in order to be able to understand something, the time granularity needs to be increased.
Most of the systems that are available now on the market allow us to change the time internal (granularity) on the fly, enabling us to navigate inside our data from a different time perspective.
2. Time Interval
The second aspect is the time interval, from the moment the data arrives inside our storage system until the moment it can be displayed insight a specific report. I’m not saying that it is impossible to have all the data in moment 0 inside your reports, but it can be extremely expensive and you might not even need it.
From my experience, I’ve seen business stakeholders that at the beginning they would say that this data needs to be displayed inside the reports immediately. After 2 or 3 meetings they would realize that the report is generated for the previous week and there is no extra value if the data for the report is ready immediately or after 1 day. Of course, this has a direct impact on the cost and on how you implement the solution. Also terms like immediately or real-time is relative. Each stakeholder has a different understanding of it, even when they are in the same team. This is why it is important to clarify these from the beginning.
Taking this into considerations, I started to group reports and any kind of application insights in 4 different categories:
- Real Time
- Near-real Time
- Reporting
- Consolidation
These categories are defined based on the time interval that is allowed to exist from the moment the data arrives inside the backend system until the moment when it can be displayed inside the charts (reports).
It is important to know that a specific chart can be in multiple categories (e.g. Near-real time and reporting). Even so, you should look, as it would be a different chart for each category. The solutions that you might need to use might be different for each category.
Real Time
In general, I group any kind of insights that needs to be displayed, inside a chart or dashboard with a maximum of 2-3 seconds delays. These are the kind of views that are used by maintenance and support team to get real time insights about the system.
In general, when you have such a system you will try to modify in any ways the input data. Not because this is not possible, but because any ETL process might increase the latency between the moment when you receive the data and the moment that you can display it.
Nowadays, it is a real trend for companies to require this kind of charts, but the reality is that not to many of them really need a real time chart. For example, if you don’t have a team that is monitoring 24/7 the system, there is not too much use in having a chart like this.
Additional to this not all the data that is produced by a system needs to be inside a real time chart. A good example is when you monitor a pool of servers. You care about their state and you might want to know the current state of each server and the applications that are running inside of them. There is no extra value if you see in real time that the number of processes that are running on the servers is changing or other similar things. Except when you are debugging it, but for this we have Performance Counters and other ways to collect metrics.
Real time charts are used especially in trading and monitor production lines inside factories for example. In most of the cases even if the initial requirement is defined around real time, once you start to clarify the business use case and what is required you realize that it is not nothing more than a near-real time chart.
Near-real Time
This is that kind of chart where data is updated every few seconds. The latency from the moment the data arrives in the backend until the moment when it is displayed is less than 30-60 seconds. Most of the real time charts are in reality near-real time, where there is no impact on the business if data is displayed with a latency of 30-60 seconds.
From the latency perspective, I would even say that even a few minutes latency (2-5 minutes) is still in near-real time. For this kind of system, the most important element is not the updated time interval, but the ability to drill inside the data and get different perspectives for the same data. In this way, the support team will be able to identify issues before they even happen.
Most of the current reporting and monitoring tools that are available nowadays are part of this category. From a running cost perspective, the difference between a real-time and a near-real time system can be 3x or even 5x.
A pretty new but powerful tool that can be used for this kind of charts and insights, is Azure Time Series Insights that has great capabilities and can dynamically change the way you look at data.
Reporting
This is the category of charts and reporting capability that I usually like to call it a classical. This are that kind of reports that can be generated every 1 hour, 1 day or every few days. There is nothing special from this perspective.
The new versions of systems that are available for this category are offering the ability to create dynamic reports where an user can drill down inside the data based on their needs.
Consolidation
Many times this category overlaps with the previous one. This is happening because the tools that are used to create consolidation charts are in general the same with the one used for reporting. There are only some edge cases, when because of the data volume and complexity, Hadoop and other similar tools are used to pre-process data before pushing them inside the reporting systems.
The charts that are part of the consolidation category are those kind of charts that you generate one time per week, month or quarter and that are used by the business stakeholders to get a high level view on their business and to get a status.
Mixing them
Reporting and Consolidation categories can be mixed with success in the same tool, allowing users to have an high-level overview, but in the same time being able to drill down inside the data.
In some cases, Near-real time and Reporting categories can be combined, but I don’t recommend this. The biggest problem is that because of the different perspectives that can exist. If for the Near-real time, most of the charts are around time perspective, for Reporting, the charts can be around other points, not only related to time.
From Real time, it is easy to have perspective from a near-real time perspective and this two go hand in hand.
Conclusion
Based on the data refresh time and how fast we can process new data, we can generate a different perspective of the same chart. Even if it is trendy to be able to display real/near time application insights, ask yourself if this is really required and what are the tradeoffs and costs. There is no sense to offer this kind of solutions if there is no real business requirement nor it adds extra value.
Additionally to this, it is not the same thing to store real time data that is sent with a frequency of 10 milliseconds vs 5 minutes. Storage and processing costs are different. Also the tools and mechanisms that are used for this will be different.
YES, a solution can have insights from all this 4 categories. YES, you might even find ways to store them in the same storage type or system. And YES, you’ll need to be able to create static data points for Reporting and Consolidation category. You might not want to process 100GB of data where data frequency is 10 milliseconds to generate a report for the last 3 months where data point perspective is at 1 day.
Comments
Post a Comment