Skip to main content

Data Model for Reporting over Windows Azure Tables

One of my colleague tried implemented a browser history mechanism for MVC. Based on this data he would like to generate two simple reports:
  1. Top 5 web addresses accessed by a given user per day
  2. Top 5 web addresses accessed by all user
When the data store is implemented using SQL Azure, this problem can be resolved very simple.  The question that appeared here is: Can we implement a data store model using Windows Azure Table Services?
I will try to propose a possible data model that is using Windows Azure Table.
In the default implementation (using SQL Azure), there were 3 kind of information that is stored in the SQL tables:
  • URL visited
  • User
  • Date
Because we don’t have an order by, count or a max function in a query over Windows Azure Table we need to think at a model that would help us with this. We will start with the first requirement:

Top 5 web addresses accessed by a given user per day
To be able to solve this problem we need a data model that permit us to refer to URLs for a given day and user.
Theoretically, we can have an unlimited number of tables in Windows Azure Tables (and we don’t need to pay for each table in part). Because of this we can have a different table for each day. In this way the cleanup mechanism will be extremely simple. Also, when we want to access historical, selecting a specific day will be very easily.
We already know that each table from Windows Azure Tables contains two fields that play the role of keys: Partition Key and Row Key. Partition key can be used with success when we have different items types saved in the same table. Entities could be grouped based on the user, because of this we can save in the Partition Key the user id. In this way we will be able to specify a specific date and user.
In the row key we can store the visited URL. Another property (column) will be needed to store how many times the URL was visited in a specific day. On the server we can define a mechanism that will add or increment a visited URL.
The downside of this solution is that we will need to make two different transactions when we want to increment the history counter. One transaction that brings the current counter for the given user and URL (if exist) and another one that update (insert) the counter value.
When we need to generate the TOP 5 for a given day per user we will have to load all URLs for specific user and order this by the counter value. We don’t need to forget that in Windows Azure Tables we don’t have support for Order By and Top N functions of a query – because of this we need to retrieve all the URLs that a user visited in a given day.

Top 5 web addresses accessed by all user

Option 1
To full fit this request we need to create another Windows Azure Table that will store the URL and the counter for each URL. To be able to support Top 5 not only for URL but also for domains we will store in the Partition Key the URL domain and the Row Key the rest of the URL path.
Another solution would be to have in the Partition Key the same values for all rows and in the Row Key the full URL. We don’t want to have in the Partition Key the URL (and in the Row Key the counter) because in this case we risk having a table that is fragmented – when a Windows Azure Table is too big, Windows Azure can split our table based on the Partition Key and move our spited table on different machines (this is not visible for consumer – is an implementation detail).
The biggest downside of this solution in the moment when we need to retrieve the top 5 most visited URLs. We will need to retrieve all the content from our table and calculate the top 5.

Option 2
Another possible solution for this problem is to have more than one table. Based on how many a URL was visited we will be stored in a specific table. For example we would have
  • VisitedUrls1to100
  • VisitedUrls101to1000
  • VisitedUrls1001to10000
  • VisitedUrls10001to100000
For example when the URL is in the first table (VisitedUrls1to100) and the counter value will reach 101, that the entity will be moved to the next table and so on. In this implementation we will have a big problem to find a URL to increment the counter. To optimize this we would need to use another table that would store the URL and the table name where our URL counter can be found (VisitedUrls1to100).
The good part of this implementation is in the moment when we need to calculate top 5 and we can retrieve only a part of the URLs.

These were the possible implementation that I see using Windows Azure Tables. I think that this problem is not suitable for Windows Azure Table and a relational database is better for this case.
I didn’t forget about a problem that needs to be solved here – concurrency. I will come with a post tomorrow.
Part 2


Post a Comment

Popular posts from this blog

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine:
threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration:

TeamCity.NET 4.51EF 6.0.2VS2013
It seems that there …

How to check in AngularJS if a service was register or not

There are cases when you need to check in a service or a controller was register in AngularJS.
For example a valid use case is when you have the same implementation running on multiple application. In this case, you may want to intercept the HTTP provider and add a custom step there. This step don’t needs to run on all the application, only in the one where the service exist and register.
A solution for this case would be to have a flag in the configuration that specify this. In the core you would have an IF that would check the value of this flag.
Another solution is to check if a specific service was register in AngularJS or not. If the service was register that you would execute your own logic.
To check if a service was register or not in AngularJS container you need to call the ‘has’ method of ‘inhector’. It will return TRUE if the service was register.
if ($injector.has('httpInterceptorService')) { $httpProvider.interceptors.push('httpInterceptorService&#…

Fundamental Books of a Software Engineer (version 2018)

More then six years ago I wrote a blog post about fundamental books that any software engineer (developer) should read. Now it is an excellent time to update this list with new entries.

There are 5 different categories of books, that represent the recommended path. For example, you start with Coding books, after that, you read books about Programming, Design and so on.
There are some books about C++ that I recommend not because you shall know C++, only because the concepts that you can learn from it.


Writing solid codeCode completeProgramming Pearls, more programming pearls(recommended)[NEW] Introduction to Algorithms


Refactoring (M. Fowler)Pragmatic ProgrammerClean code[NEW] Software Engineering: A Practitioner's Approach[NEW] The Mythical Man-Month[NEW] The Art of Computer Programming


Applying UML and Patterns (GRASP patterns)C++ coding standards (Sutter, Alexandrescu)The C++ programming language (Stroustrup, Part IV)Object-oriented programming (Peter Coad)P…