Skip to main content

How we can remove millions of entities from a Windows Azure Table (part 1)

Part 2
Windows Azure Table is a great please to persist different information. We can store in the same table thousands of thousands of thousands of thousands of items. This sounds so good, but we can have small problems. The first problem that we can face up is how we can delete all the content of a table very fast.
The maximum number of items that we can update/delete in a batch is 100 entities. Because of this deleting 1 million of entities will take a long of time. We could try to parallelize this action, but this is a little complicated and maybe we don’t want to do this.
Another solution that we could use is to delete this table and recreate it. If you need to drop all the content from a table this is faster than delete entity by entity. Think in this way. When you need to remove the content of the text file, it is faster to delete row by row or to delete the file and recreate it.
CloudTableClient tableStorage = new CloudTableClient(
  [absoluteUri], 
  [credentials]);
tableStorage.DeleteTableIfExist(tableName);
bool wasRecreated = false;
while (!wasRecreated)
{  
  try
  {  
    tableStorage.CreateTableIfNotExist(tableName);
    wasRecreated = true;
  }  
  catch (StorageClientException storageClientException)
  {
    if (!(storageClientException.ErrorCode == StorageErrorCode.ResourceAlreadyExists
                        && storageClientException.StatusCode == HttpStatusCode.Conflict))
    {  
      throw;
    }
    Thread.Sleep(1000);  
  }
}
When we call the delete method, even is a sync or an async method the table will be mark for deletion. The real delete action will be in background and you don’t have any kind of possibility to be notified. When you try to recreate the table and the delete action is still in progress, a StorageClientException will be throwing when the error code will be set to “ResourceAlreadyExist” and the status code will be set to “Conflict”.
Usually this kind of action will take for around 40s. depends on the load of the servers. The problem with this solution is with the time interval while the table is deleted. In this period of time clients will not be able to access this table and they need to handle this expectation.  This can be accepted in some situations.
We saw a solution to delete the content of a table that have millions of entities. What do you think? Do you see a better solution?
Part 2

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

Azure AD and AWS Cognito side-by-side

In the last few weeks, I was involved in multiple opportunities on Microsoft Azure and Amazon, where we had to analyse AWS Cognito, Azure AD and other solutions that are available on the market. I decided to consolidate in one post all features and differences that I identified for both of them that we should need to take into account. Take into account that Azure AD is an identity and access management services well integrated with Microsoft stack. In comparison, AWS Cognito is just a user sign-up, sign-in and access control and nothing more. The focus is not on the main features, is more on small things that can make a difference when you want to decide where we want to store and manage our users.  This information might be useful in the future when we need to decide where we want to keep and manage our users.  Feature Azure AD (B2C, B2C) AWS Cognito Access token lifetime Default 1h – the value is configurable 1h – cannot be modified

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see