Skip to main content

Data Model for Reporting over Windows Azure Tables

One of my colleague tried implemented a browser history mechanism for MVC. Based on this data he would like to generate two simple reports:
  1. Top 5 web addresses accessed by a given user per day
  2. Top 5 web addresses accessed by all user
When the data store is implemented using SQL Azure, this problem can be resolved very simple.  The question that appeared here is: Can we implement a data store model using Windows Azure Table Services?
I will try to propose a possible data model that is using Windows Azure Table.
In the default implementation (using SQL Azure), there were 3 kind of information that is stored in the SQL tables:
  • URL visited
  • User
  • Date
Because we don’t have an order by, count or a max function in a query over Windows Azure Table we need to think at a model that would help us with this. We will start with the first requirement:

Top 5 web addresses accessed by a given user per day
To be able to solve this problem we need a data model that permit us to refer to URLs for a given day and user.
Theoretically, we can have an unlimited number of tables in Windows Azure Tables (and we don’t need to pay for each table in part). Because of this we can have a different table for each day. In this way the cleanup mechanism will be extremely simple. Also, when we want to access historical, selecting a specific day will be very easily.
We already know that each table from Windows Azure Tables contains two fields that play the role of keys: Partition Key and Row Key. Partition key can be used with success when we have different items types saved in the same table. Entities could be grouped based on the user, because of this we can save in the Partition Key the user id. In this way we will be able to specify a specific date and user.
In the row key we can store the visited URL. Another property (column) will be needed to store how many times the URL was visited in a specific day. On the server we can define a mechanism that will add or increment a visited URL.
The downside of this solution is that we will need to make two different transactions when we want to increment the history counter. One transaction that brings the current counter for the given user and URL (if exist) and another one that update (insert) the counter value.
When we need to generate the TOP 5 for a given day per user we will have to load all URLs for specific user and order this by the counter value. We don’t need to forget that in Windows Azure Tables we don’t have support for Order By and Top N functions of a query – because of this we need to retrieve all the URLs that a user visited in a given day.

Top 5 web addresses accessed by all user

Option 1
To full fit this request we need to create another Windows Azure Table that will store the URL and the counter for each URL. To be able to support Top 5 not only for URL but also for domains we will store in the Partition Key the URL domain and the Row Key the rest of the URL path.
Another solution would be to have in the Partition Key the same values for all rows and in the Row Key the full URL. We don’t want to have in the Partition Key the URL (and in the Row Key the counter) because in this case we risk having a table that is fragmented – when a Windows Azure Table is too big, Windows Azure can split our table based on the Partition Key and move our spited table on different machines (this is not visible for consumer – is an implementation detail).
The biggest downside of this solution in the moment when we need to retrieve the top 5 most visited URLs. We will need to retrieve all the content from our table and calculate the top 5.

Option 2
Another possible solution for this problem is to have more than one table. Based on how many a URL was visited we will be stored in a specific table. For example we would have
  • VisitedUrls1to100
  • VisitedUrls101to1000
  • VisitedUrls1001to10000
  • VisitedUrls10001to100000
For example when the URL is in the first table (VisitedUrls1to100) and the counter value will reach 101, that the entity will be moved to the next table and so on. In this implementation we will have a big problem to find a URL to increment the counter. To optimize this we would need to use another table that would store the URL and the table name where our URL counter can be found (VisitedUrls1to100).
The good part of this implementation is in the moment when we need to calculate top 5 and we can retrieve only a part of the URLs.

These were the possible implementation that I see using Windows Azure Tables. I think that this problem is not suitable for Windows Azure Table and a relational database is better for this case.
I didn’t forget about a problem that needs to be solved here – concurrency. I will come with a post tomorrow.
Part 2


Post a Comment

Popular posts from this blog

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine:
threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration:

TeamCity.NET 4.51EF 6.0.2VS2013
It seems that there …

Entity Framework (EF) TransactionScope vs Database.BeginTransaction

In today blog post we will talk a little about a new feature that is available on EF6+ related to Transactions.
Until now, when we had to use transaction we used ‘TransactionScope’. It works great and I would say that is something that is now in our blood.
using (var scope = new TransactionScope(TransactionScopeOption.Required)) { using (SqlConnection conn = new SqlConnection("...")) { conn.Open(); SqlCommand sqlCommand = new SqlCommand(); sqlCommand.Connection = conn; sqlCommand.CommandText = ... sqlCommand.ExecuteNonQuery(); ... } scope.Complete(); } Starting with EF6.0 we have a new way to work with transactions. The new approach is based on Database.BeginTransaction(), Database.Rollback(), Database.Commit(). Yes, no more TransactionScope.
In the followi…

GET call of REST API that contains '/'-slash character in the value of a parameter

Let’s assume that we have the following scenario: I have a public HTTP endpoint and I need to post some content using GET command. One of the parameters contains special characters like “\” and “/”. If the endpoint is an ApiController than you may have problems if you encode the parameter using the http encoder.
using (var httpClient = new HttpClient()) { httpClient.BaseAddress = baseUrl; Task<HttpResponseMessage> response = httpClient.GetAsync(string.Format("api/foo/{0}", "qwert/qwerqwer"))); response.Wait(); response.Result.EnsureSuccessStatusCode(); } One possible solution would be to encode the query parameter using UrlTokenEncode method of HttpServerUtility class and GetBytes method ofUTF8. In this way you would get the array of bytes of the parameter and encode them as a url token.
The following code show to you how you could write the encode and decode methods.