Database Information Technology

Top 10 misconceptions about NoSQL

I was really surprised when I found same prevalent misconceptions about the NoSQL at tens of vlogs and blog while I was preparing some materials for my Advanced Database course last year.

Well, of course no one has to be expected to know all types of databases within all their features comprehensively (even the DB professionals have expertise only on one or few). But the comparative differentiations are the tender spots for the IT professionals to be able to choose correct DB architecture a project.

Let’s cut the warming short. By the post I’m sharing “my” top ten of those misconceptions that push many developers to wrong ways.

1.) The Meaning of “NO” SQL

The “No” part of the NoSQL doesn’t mean “anti-Sql” like in “no-war” slogan. It also doesn’t mean the absence of SQL like the no in “no-sugar diet”. NO is standing for “Not Only“. Lets extend all letters to make it more understandable. It is “Not Only Structured Query Language”.

The name tells us the main advantage of a NoSQL DB is: supporting not only structured query language but also unstructured.

So if you do not like the SQL, being a supporter of NoSQL may not a good idea because it is not an anti SQL movement as you may expected. It is something plus SQL.

2.) SQL does not support large scaled data (big-data)

Both SQL & NoSQL do not have any abstracted limit on sizes/number of collections of data. The limits can only be depended on the resources the DB running on. All SQL platforms (a.k.a. RDMS) (like MySQL and MSSQL) provides at least one solution to handle those huge sizes of data. Solutions can be clustering or distributing or running with DB replications. Since the server OS platforms supports vertical and horizontal resizing to improve the total capacity, it is not possible to talk about any limit for these DBs.

The RDMS are currently using by banks, global social networks and even by the national health systems those typically having quadrillions of records distributed to hundreds of different servers. Because SQL provides more consistency and stability on high profiled data. I’ve chosen the MariaDB and the MySQL and experienced incredible sizes of datasets at some payment institution and banking projects.

Other hand, the NoSQL DBs are very powerful while handling big data while it is too complicated to put in a solid structure. The point is, NoSQL DBs have to use the same features like clustering and none of them has a magic stick for keeping big data in a small resource. Lets keep it for the next heading.

3.) The Performance of NoSQL is Better

It is up to what is running at the database. Data term is a wide definition. Databases are not developed with same abilities and for doing same jobs. Just like; both the cars and trucks are vehicles but aims to do something relatively better.

Since all of the database fractions have been optimized for years based on the same IT they are using very similar data engines (so there is only one math in this universe) and it is not easy to find any significant difference on performance efficiency for all types of data.

If you are looking for the best DB by the performance: you should recognize your data (and its flow) first, then search the best matching for it.

In terms of hosting/running structured and relational data the SQL DBs are unequivocally better. The document and key-value subtypes of NoSQL are better by far on reading/writing unstructured big data (e.g. searching patterns in images of a social network). So it is up to your data.

4.) Only NoSQL Supports Multi Dimensional Data

As you know a table (I mean the table like in excel not the dinner table:) ) can have only two dimensions: the columns and the rows. That’s where the SQL DBs relations come. Relations were the only way to create more dimensional tables by using links between tables while NoSQL does it on a single table (correct term is the “single collection” instead of “table” for NoSQL).

But, despite this feature is defined as the motto of NoSQL, currently many SQL type platforms that I know provide a JSON COLUMN support as well. JSON column support allows to keep multi dimensional data in a cell and run queries on it.

You may read this post simply explains all these queries and architects.

In a nutshell, SQL does support multi-dimensional data. There is a new point should be highlighted about JSON COLUMNS:
* This feature converts SQL to NewSQL (another DB class name: that has both SQL and NoSQL abilities)

5.) NoSQL is Better for Data Mining

Such as in the performance misconception, it is up to the data too. In most cases SQL can be a better choice. Generally the NoSQL can return only the data then a software should run the complicated mining algorithms while the SQL allows you run more complicated queries that contains math, statistics and even text mining then gives the final result without software. Here is an example of MySQL

SELECT MIN(column) mini, MAX(column) as maxi, STDDEV(column) as standard_dev, ((column-AVG(column))/standard_dev) as difference, AES_DECRYPT(column2), SHA256(column3) FROM test;

Around 2 years ago, I’ve used MySQL in a “parallel genetic algorithm” (an IA method) project and put the fitness calculation inside the query that made it super-faster. (I’ll write a post about it soon)

6.) Only NoSQL Supports Clustering

First, I think the clustering should not be considered as a responsibility for a database software. It can be simply provided by the OS layer, datacenter or even by a virtualization software.

Nevertheless, almost all DB platforms that I know have at least one solution on pocket. Sure, SQL types too. Just I’ve experienced MySQL replication (primary-secondary), MariaDB Replication and Clustering. In addition MS SQL and PostgreSQL offers replication and clustering for years.

Note: I use the words “Primary” and “Secondary” as replication terms instead of the old ugly words “Master” and “Slave”.

7.) NoSQL is an Alternative to SQL

The most important point is: you do not have to choose only one of them. They are doing something different and they can work together better as a team. Almost all modern programing environments provides drivers and frameworks for both of them.

In many cases of IT projects the data can be classified into two class. One of them is for the structured, high profiled data that needs the consistency. Other is for the complicated big data which is less important like logs, statistics e.t.c. By these cases, the best way is putting consistent data in SQL and use NoSQL for complicated (unstructured).

8.) NoSQL is the NewSQL

The “newSql” is a term and one of three class names of DB classification:

  • SQL (RDMS like MsSQL, MySql, MariaDB, PostgreSQL …)
  • NoSQL (Couchbase, MongoDB, Redis …)
  • NewSQL (Combination of better features of both SQL and NoSQL)

Nowadays, the familiar NoSQL products are relatively younger then SQL (I’ve just checked the Microsoft SQL server is now 31 yo). But the newSQL word is another term should not be confused.

9.) NoSQL is More Compatible

Most of NoSQL DB platforms doesn’t support ACID, Transact-SQL or some rollback features which is really important for high profiled data management. Compatibility of security standards are also another question mark for NoSQL. Additionally, some SQL functions like JOIN, GROUP BY is not supported by some NoSQL DB platforms.

10.) Software Developing With NoSQL is Easier

NoSQL DB platforms allows to store and query data without designing schemas and consist structures. But as far as I’ve experienced and all developers know that, if you do not set the rules to database you have to set them to your codes. Because the unstructured data is not usable for the software.

The new generation of the developers like this lazy-style NoSQL platforms and defines it as the rapid development. Essentially in many cases it is not the rapid development, it is “postponed” development.

Schemas (structures) in SQL are initial rule sets that provides the validation of the data before saving and reading it. SQL platforms returns errors/warnings if you try save something unexpected. Schemas are standards for consistency, stability and validation for the data.

Lets say you have a human table with age and name columns and set the age column as an integer, name column as string value. Now you can not save a data array like age = “mahmut”, name=35 mistakenly. The error you got from this case is going help to understand what goes wrong and improve your code. Otherwise when you use NoSQL you can save a collection with name=35 and age=”mahmut” or even 35=name, lastname=”mahmut” without any error. But in that case you have to put conditions on your code.