Skip to main content

How we can remove millions of entities from a Windows Azure Table (part 1)

Part 2
Windows Azure Table is a great please to persist different information. We can store in the same table thousands of thousands of thousands of thousands of items. This sounds so good, but we can have small problems. The first problem that we can face up is how we can delete all the content of a table very fast.
The maximum number of items that we can update/delete in a batch is 100 entities. Because of this deleting 1 million of entities will take a long of time. We could try to parallelize this action, but this is a little complicated and maybe we don’t want to do this.
Another solution that we could use is to delete this table and recreate it. If you need to drop all the content from a table this is faster than delete entity by entity. Think in this way. When you need to remove the content of the text file, it is faster to delete row by row or to delete the file and recreate it.
CloudTableClient tableStorage = new CloudTableClient(
  [absoluteUri], 
  [credentials]);
tableStorage.DeleteTableIfExist(tableName);
bool wasRecreated = false;
while (!wasRecreated)
{  
  try
  {  
    tableStorage.CreateTableIfNotExist(tableName);
    wasRecreated = true;
  }  
  catch (StorageClientException storageClientException)
  {
    if (!(storageClientException.ErrorCode == StorageErrorCode.ResourceAlreadyExists
                        && storageClientException.StatusCode == HttpStatusCode.Conflict))
    {  
      throw;
    }
    Thread.Sleep(1000);  
  }
}
When we call the delete method, even is a sync or an async method the table will be mark for deletion. The real delete action will be in background and you don’t have any kind of possibility to be notified. When you try to recreate the table and the delete action is still in progress, a StorageClientException will be throwing when the error code will be set to “ResourceAlreadyExist” and the status code will be set to “Conflict”.
Usually this kind of action will take for around 40s. depends on the load of the servers. The problem with this solution is with the time interval while the table is deleted. In this period of time clients will not be able to access this table and they need to handle this expectation.  This can be accepted in some situations.
We saw a solution to delete the content of a table that have millions of entities. What do you think? Do you see a better solution?
Part 2

Comments

Popular posts from this blog

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine:
threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration:

TeamCity.NET 4.51EF 6.0.2VS2013
It seems that there …

How to check in AngularJS if a service was register or not

There are cases when you need to check in a service or a controller was register in AngularJS.
For example a valid use case is when you have the same implementation running on multiple application. In this case, you may want to intercept the HTTP provider and add a custom step there. This step don’t needs to run on all the application, only in the one where the service exist and register.
A solution for this case would be to have a flag in the configuration that specify this. In the core you would have an IF that would check the value of this flag.
Another solution is to check if a specific service was register in AngularJS or not. If the service was register that you would execute your own logic.
To check if a service was register or not in AngularJS container you need to call the ‘has’ method of ‘inhector’. It will return TRUE if the service was register.
if ($injector.has('httpInterceptorService')) { $httpProvider.interceptors.push('httpInterceptorService&#…

Fundamental Books of a Software Engineer (version 2018)

More then six years ago I wrote a blog post about fundamental books that any software engineer (developer) should read. Now it is an excellent time to update this list with new entries.

There are 5 different categories of books, that represent the recommended path. For example, you start with Coding books, after that, you read books about Programming, Design and so on.
There are some books about C++ that I recommend not because you shall know C++, only because the concepts that you can learn from it.

Coding

Writing solid codeCode completeProgramming Pearls, more programming pearls(recommended)[NEW] Introduction to Algorithms

Programming

Refactoring (M. Fowler)Pragmatic ProgrammerClean code[NEW] Software Engineering: A Practitioner's Approach[NEW] The Mythical Man-Month[NEW] The Art of Computer Programming

Design

Applying UML and Patterns (GRASP patterns)C++ coding standards (Sutter, Alexandrescu)The C++ programming language (Stroustrup, Part IV)Object-oriented programming (Peter Coad)P…