Relational vs non-relational databases
| Relational database (tables + link keys) |
Non-relational database |
| Stores and organizes data in tables. |
Not tabular relational database model. |
| SQL follows a strict schema. |
NoSQL deals with dynamic schema and is very flexible. |
| These tables are related to each other through the use of foreign keys, which are used to link data stored in different tables. |
Types of NoSQL databases: |
- document=JSON
- key-value
- graph=nodes (real information) and edges (relationships)
- wide-column=dynamic columns. |
| Pros: Great querying capabilities. You may perform complex queries with SQL. Because queries are executed directly inside the database.
You can’t use Python, because first you need to load to memory large amount of data. | Pros: flexible schema, horizontal scaling, fault-tolerant. |
| ACID (atomicity=success, consistency=schema, integrity=no effect to others, durability=no lost) | Good if you have a lot of data, you want law latency. |
| Indexes: performance optimizer, improve the speed of data access | |
| Normalization: reduces redundancy and ensures that data is stored in the most efficient way. | |
| SQL queries tuning: analyzing the execution plans of slow queries and making adjustments to the query structure | |
| Partitioning (creating new tables that divide table into smaller chunks). A partition is a logical database’s split into separate, independent portion.
- Range-based partitioning based on 12 months, based on years.
- Hash-based partitioning.
Sharding is a partitioning technique, dividing data across multiple databases or servers. | |
| Replication
- Master-slave=ONLY MASTER receives updates
- peer-to-peer=no special master, all are slaves
- multi-master=MULTIPLE SERVERS receive write operations simultaneous | |
| | |
All databases have well defined, TCP based networking.
Other differences: which one to choose?
Here are some general guidelines for when to use a relational database:
- You need to store structured data that follows a predefined schema
- You need to perform complex queries on the data, including joins across multiple tables
- You need to ensure data consistency and integrity through transactions
- You need to ensure data security through access control and encryption
On the other hand, here are some general guidelines for when to use a NoSQL database:
- You need to store large amounts of data that may have a more flexible or hierarchical structure
- You need to scale horizontally across multiple servers
- You need a database that can handle a high volume of read and write operations
- You need a database that can handle a high degree of data variability
Data can be operational and analytical
- Operational data is used for day-to-day transactions and needs to be fresh—for example, product inventory and bank balance. Such data is captured in real-time using Online Transaction Processing (OLTP) systems.
- Analytical data is used by businesses to make decisions and to find insights about customer behavior, product performance, and forecasting. It includes data collected over a period of time and is usually stored in OLAP (Online Analytical Processing) systems, warehouses, or data lakes.
Data can be structured (tables), unstructured (emails, texts, image data, social media posts), semi-structures (JSON, XML).