The internet, and more recently the cloud-related businesses, are posing new challenges for the company. Flexibility, speed, security and control are the keywords of modern IT. This movement requires the integration of new technologies and the use of more and more agile applications, particularly in the field of DBMSs.
SGBDs are particularly concerned, being by nature the core of the information system.
How has the DBMS evolved to meet these new requirements? What are the issues related to modern databases? How to protect these new application tools?
Through this series of articles, we propose to answer these questions and lift the veil on NoSQL databases.
Birth of a DBMS of the NoSQL type?
Since the 2000s, with the explosion of web applications and a growth of their uses, the major Web players have had to face new challenges concerning their IT infrastructures.
The explosion of data volumes over the last ten years will be noticed. Various studies have forecast an exponential growth in data volume between 2015 and 2020. Faced with this growth, open source relational database solutions have rapidly shown their limits, especially in the web domain.
Who has not been the victim of “a SQL query of death”?
When thousands of users request a database to access information, this search can be very expensive as a resource and introduce wait times for users.
To respond mainly to these performance needs but also massive data manipulation, it is necessary to modify its approach to:
- Gain flexibility and flexibility regarding architectures and infrastructures;
- Manage a large volume of data and exponential growth;
- Guarantee a significant increase in infrastructure costs;
- Store and restore data with minimum latency.
Especially for DBMSs, the relational database model has been challenged by strong limitations in distributed architecture.
In this context, new paradigms have made the DBMS evolve. This thinking, which is based on distributed computing, has given rise to NoSQL (Not only SQL) database engines that are starting to be present in enterprise applications through the following solutions:
- Apache Cassandra
- Apache CouchDB
We can approach the NoSQL engines according to two axes:
- Uses (needs and type of application);
- The data schema used by the NoSQL engine
Note: In this article, we will only discuss the infrastructure aspect and the protection of these new architectures.
We will mainly see two types of uses for NoSQL engines:
- For performance-oriented applications: Some NoSQL engines are designed to optimize data manipulation performance, either by offering a cache space in memory when requesting a RDBMS (such as InnoDB memcached plugin for MySQL), as a full-fledged DBMS. These improvements are possible thanks to different mechanisms:
- Using RAM to store data: The MemTable allows data execution and manipulation directly from RAM.
NoSQL Cassandra engine – Writing a new data
- But also with a distributed architecture algorithms to process requests on different nodes of the cluster.
Relational vs NoSQL Databases – Properties and Theorems
To accompany the digital transformation but also this trend of “big data”, it was necessary to study new solutions to circumvent the limits of the SQL solutions ( mainly due to the respects of the constraints ACID ) whose main limits are:
- Query consuming resources and increasingly powerful servers
- Analyze growing quantities of data (PetaOctets)
- Fault tolerance (rigid architecture)
A NoSQL engine is designed to gain agility and flexibility for these new issues.
SQL: Atomicity, Consistency, Isolation, Durability, or ACID Constraints
Relational databases are designed to work on the basis of IT transactions and must respect the ACID properties to ensure the properties of a computer transaction which limits the distributed computing approach:
- Atomicity : the property of atomicity requires that all the actions that make up the transaction be realized; otherwise, the transaction is canceled and the data must be returned to its original state before the transaction. Atomicity must be respected for every situation, hardware failure, or unexpected service interruption.
- Consistency : Before and after the execution of a transaction, all data in the database must pass from a valid state to another valid or consistent state. Any data change must meet the defined rules of integrity, rollback and other constraints.
- Isolation (Insulation): A transaction is carried out in isolation, that is, any transaction must be performed as if it were alone on the system. No dependency between transactions.
- Durability : A successful transaction is final; that is to say that when a transaction is confirmed, it is permanent even in case of failure.
Unlike a relational database and to work with a distributed architecture, NoSQL engines do not respect the properties above. They are designed to rely on the CAP theorem (Consistency, Availability, Partition tolerance) . This theorem defines three basic requirements that are necessary for an IT relationship (consistency, availability, and partitioning tolerance) when designing applications in a distributed architecture.