Skip to main content

NoSQL in 5 minutes

NoSQL – one of 2013’s trends. If three or four years ago we rarely heard about a project to use NoSQL, nowadays the number of projects using non-relational databases is extremely high. In this article we will see the advantages In NoSQL taxonomy, a document is and challenges we could have when we use seen as a record from relational databases
NoSQL. In the second part of the article we will analyze and emphasize several nonrelational solutions and their benefits.

What is NoSQL?
The easiest definition would be: NoSQL is a database that doesn’t respect the rules of a non-relational database (DBMS). A non-relational database is not based on a relational model. Data is not groups in tables; therefore there is no mathematical relationship between them.
These databases are built in order to run on a large cluster. Data from such storage does not have a predefined schema. For this reasons, any new field can be added without any problem. NoSQL has appeared and developed around web applications, consequently the vast majority of functionalities are those that a web application has.

Benefits and risks
A non-relational database model is a flexible one. Depending on the solution that we use, we could have a very ‘hang loose’ model that can be changed with a minimum cost. There are many NoSQL solutions that are not model-based. For example, even though Cassandra and HBase have a pre-defined model, adding a new field can be easily done. There are various solutions that can store any kind of data structure without defining a model. An example could be those storing keyvalue-pairs or documents.    and collections are seen as tables. The main difference is that in a table we will have records with the same structure, while a collection can have documents with different fields.
Non-relational databases are much more scalable than the classical ones. If we want to scale in a relational database we need powerful servers instead of adding some machines with a normal configuration to the cluster. This is due to the way in which a relational database works and adding a new node can be expensive.
The way in which a relational database is built easily allows a horizontal scaling. Moreover, these databases are suitable for virtualization and cloud.
Taking into account the databases’ dimensions and the growing number of transactions, a relational database is much more expensive than NoSQL. Solutions like Hadoop can process a lot of data. They are extremely horizontally scalable, which makes them very attractive.
Concerning costs, a non-relational database is a lot cheaper. We do not need hardware custom or special features to create a very powerful cluster. Using some regular servers, we can have an efficient database.
Certainly, NoSQL is not only milk and honey. Most of the solutions are rather new on the market compared to relational databases. For this reason some important functionalities  may be missing – business mine and business intelligent. NoSQL has evolved to meet the requirements of web applications, which is the main cause for some missing features, not necessary on the web. That does not mean that they are missing and cannot be found, rather they are not quite mature enough or specific to the problem that the NoSQL solution is trying to solve.
Because they are so new to the market, many NoSQL solutions are pre-production versions, which cannot be used every time in the world of enterprise. The lack of official support for some products could be a stopper for medium and large projects.
The syntax with which we can interrogate a NoSQL database is different from a simple SQL query. We usually need to have some programming concepts. The number of experts in NoSQL databases is much lower than the one in SQL. The administration may be a nightmare, because support for administrators is presently weak.
However, ACID and transactions support is not common in NoSQL storage. Queries that can be written are pretty simple, and sometimes storages do not allow us to „join” the collections, therefore we have to write the code to do this.
All these issues will be solved in time, and the question we must ask ourselves when we think about architecture and we believe NoSQL could help is „Why not?”

The most widely used NoSQL solutions
 On the market there are countless NoSQL solutions. There is no universal solution to solve all the problems we have. For this reason, when the we want to inteseveral types of storage. We may identify within our application several problems which require a NoSQL solution. We may need different solutions for each of these cases. This would add extra complexity because we would have two storages that we need to integrate.
MongoDB
This is one of the most used types of storage. In this type of storage all content is stored in the form of documents. Over these collections of documents we can perform any kind of dynamic queries to extract different data. In many ways MongoDB is closest to a relational database. All data we want to store is kept as a hash facilitating information retrieval. Basic CRUD operations work quickly on MongoDB.
It is a good solution when you need to store a lot of data that must be accessed in a very short time. MongoDB is a storage which can be used successfully. If we do not perform many insert, update and delete operations, information remains unchanged for a period of time. It can be successfully used when properties are stored as a query and /or index. For example, in a voting system, CMS or a storage system for comments.  Another case in which it can be used is to store lists of categories and products in an online store. Due to the fact that it is directed to queries and the list of products does not change every two seconds, queries to be made on them will be rapid.
Another benefit is the self-share. A MongoDB database can be very easily held on 2/3 servers. The mechanism for data and documented.
Cassandra
It is the second on the list with eCommerce solutions for the storage of NoSQL solutions. This storage can become our friend when we have data that changes frequently. If the problem we want to solve is dominated by insertions and modifications of stored data, then Cassandra is our solution. Compared to insert and change, any query we do on our data is much slower. This storage is more oriented to writings, than to queries that retrieve data. If in MongoDB the data we work with was seen as documents with a hash attached to each of them, Cassandra stores all content in the form of columns.
In MongoDB, the data we access may not be in the latest version. Instead, Cassandra guarantees us the data we obtain through queries has the latest version. So if we access an email that is stored with the help of Cassandra, we get the latest version of the message. This solution can be installed in multiple data centers from different locations, providing support for failover or back-up - extremely high availability.
you have an eCommerce solution, where we need a storage system for our shopping cart. Insert and update operations will be done quickly, and each data query will bring the latest version of the shopping cart - this is very important when we perform check-out.
Cassandra came to be used in the financial industry, being ideal due to the performance of insert operations. In this environment data changes very often, the actions’ value being new in every moment.
CouchDB
 If most of the operations we perform are just insert and read, no update, then CouchDB is a much better solution. This storage is targeted only to read and write operations.
Cassandra is a storage that can be successfully used as a tool for logging. In such a system we have many scripts, and the queries are rare and quite simple. For this reason it is the ideal solution when Besides this, we have an efficient support to pre-define queries and control the different versions that stored data may have. Therefore, update operations are not so fast. From all storages presented so far, this is the first storage that guarantees us ACID through the versioning system it implements.
Another feature of this storage is the support for replication. CouchDB is a good solution when we want to move the database offline. For example, on a mobile device that does not have an internet connection. Through this functionality, we have support for the distributed architecture to support replication in both directions.
It can be a solution for applications on mobile devices, which do not have 24 hour internet connectivity. Simultaneously, it is very useful in case of a CMS or CRM, where we need versioning and predefined queries.
HBase
 This database is entirely integrated into Hadoop. The aim is to be used when we need to perform data analysis. HBase is designed to store large amounts of data that could normally not be stored in a normal database.
It can work in memory without any problem, and the data it stores can be compressed. It is one of the few NoSQL databases that support this feature. Due to its particularity, Hbase is used with Hadoop. In some cases, when working with tens / hundreds of millions of records, Hbase is worth being used.
Membase
As the name implies, this non-relational database can stay in memory. It is a perfect solution with very low latency, and and content replication becomes an easy process.
It is very common in games backend, especially online. Many systems that work with real-time data they need to manipulate or show use Membase storage. In these cases Membase may not be the only storage level that the application uses.
Redis
This storage is perfect when the number of the updates we need to do on our data is very high. It is an optimized storage for such operations. It is based on a very simple key-value. Therefore the queries that can be made are very limited. Although we have support for transactions, there is still not enough mature support for clustering. This can become a problem when the data we want to store does not fit in memory - the size of the database is related to the amount of internal memory.
Redis is quite interesting when we have real-time systems that need to communicate. In these cases Redis is one of the best solutions. There are several stock applications using this storage.

What does the future hold for us?
We see an increasing number of applications that use NoSQL. This does not mean that relational databases will disappear. The two types of storage will continue to exist and often coexist. Hybrid applications, which use both relational databases and NoSQL, are becoming more common. Also, an application does not need to use only a single database. There are solutions using two or more NoSQL databases. A good example is an eCommerce application that can use MongoDB to store the list of items and categories, and Cassandra to store the shopping cart to each of the clients.

Conclusion
In conclusion, we can say that NoSQL databases that must be part of our area of knowledge. Compared to relational databases we have many options, and each of these does one thing very well. In the NoSQL world we do not have storage to solve all the problems we may have. Each type of storage can solve different problems. The future belongs neither to non-relational databases, nor to relational ones. The future belongs to applications that use both types of storage, depending on the needs.

Comments

Popular posts from this blog

Windows Docker Containers can make WIN32 API calls, use COM and ASP.NET WebForms

After the last post , I received two interesting questions related to Docker and Windows. People were interested if we do Win32 API calls from a Docker container and if there is support for COM. WIN32 Support To test calls to WIN32 API, let’s try to populate SYSTEM_INFO class. [StructLayout(LayoutKind.Sequential)] public struct SYSTEM_INFO { public uint dwOemId; public uint dwPageSize; public uint lpMinimumApplicationAddress; public uint lpMaximumApplicationAddress; public uint dwActiveProcessorMask; public uint dwNumberOfProcessors; public uint dwProcessorType; public uint dwAllocationGranularity; public uint dwProcessorLevel; public uint dwProcessorRevision; } ... [DllImport("kernel32")] static extern void GetSystemInfo(ref SYSTEM_INFO pSI); ... SYSTEM_INFO pSI = new SYSTEM_INFO(

ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded

Today blog post will be started with the following error when running DB tests on the CI machine: threw exception: System.InvalidOperationException: The Entity Framework provider type 'System.Data.Entity.SqlServer.SqlProviderServices, EntityFramework.SqlServer' registered in the application config file for the ADO.NET provider with invariant name 'System.Data.SqlClient' could not be loaded. Make sure that the assembly-qualified name is used and that the assembly is available to the running application. See http://go.microsoft.com/fwlink/?LinkId=260882 for more information. at System.Data.Entity.Infrastructure.DependencyResolution.ProviderServicesFactory.GetInstance(String providerTypeName, String providerInvariantName) This error happened only on the Continuous Integration machine. On the devs machines, everything has fine. The classic problem – on my machine it’s working. The CI has the following configuration: TeamCity .NET 4.51 EF 6.0.2 VS2013 It see

Navigating Cloud Strategy after Azure Central US Region Outage

 Looking back, July 19, 2024, was challenging for customers using Microsoft Azure or Windows machines. Two major outages affected customers using CrowdStrike Falcon or Microsoft Azure computation resources in the Central US. These two outages affected many people and put many businesses on pause for a few hours or even days. The overlap of these two issues was a nightmare for travellers. In addition to blue screens in the airport terminals, they could not get additional information from the airport website, airline personnel, or the support line because they were affected by the outage in the Central US region or the CrowdStrike outage.   But what happened in reality? A faulty CrowdStrike update affected Windows computers globally, from airports and healthcare to small businesses, affecting over 8.5m computers. Even if the Falson Sensor software defect was identified and a fix deployed shortly after, the recovery took longer. In parallel with CrowdStrike, Microsoft provided a too