Data and Database¶

Data¶

Data is a collection of a distinct small unit of information. It can be used in a variety of forms like text, numbers, media, bytes, etc. it can be stored in pieces of paper or electronic memory, etc.

Word 'Data' is originated from the word 'datum' that means 'single piece of information.' It is plural of the word datum.

Database¶

The database is a collection of inter-related data which is used to retrieve, insert and delete the data efficiently. It is also used to organize the data in the form of a table, schema, views, and reports, etc.

Hierarchical Database¶

A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. The hierarchical database model mandates that each child record has only one parent, whereas each parent record can have one or more child records. In order to retrieve data from a hierarchical database, the whole tree needs to be traversed starting from the root node. This model is recognized as the first database model created by IBM in the 1960s.

Network Database¶

A network database is a type of database model wherein multiple member records or files can be linked to multiple owner files and vice versa. The model can be viewed as an upside-down tree where each member information is the branch linked to the owner, which is the bottom of the tree.

In Layman’s terms, a network database is a hierarchical database, but with a major tweak. The child records are given the freedom to associate with multiple parent records. As a result, a network or net of database files linked with multiple threads is observed.

Object-Oriented Database¶

An object-oriented database (OOD) is a database system that can work with complex data objects — that is, objects that mirror those used in object-oriented programming languages.

In object-oriented programming, everything is an object, and many objects are quite complex, having different properties and methods. An object-oriented database management system works in concert with an object-oriented programming language to facilitate the storage and retrieval of object-oriented data.

The elements of a OODM are:

Object: A real world entity, such as a specific life task in a to-do list — “take the garbage out”.
Attributes and Methods: An object has state and behaviors. An object has properties (which might also be called attributes) like name, status, and create_date. The set of properties taken together represents its state. Along with this, an object has behaviors (also known as methods, actions, or functions) that modify or operate on its properties, like update_task() or get_task_history().
Class: The grouping of all objects with the same properties and behaviors form a class. In our example above, we talked about task objects. These objects together all belong to the Task class.

In the chart above, we have different objects linked to one another using methods; one can get the address of the Person (represented by the Person Object) using the livesAt() method. Furthermore, these objects have attributes which are in fact the data elements that need to be defined in the database.

NoSQL Database¶

NoSQL originally referring to non SQL or non relational is a database that provides a mechanism for storage and retrieval of data. This data is modeled in means other than the tabular relations used in relational databases.

NoSQL systems are also sometimes called Not only SQL to emphasize the fact that they may support SQL-like query languages.

Pros and Cons of NoSQL Database¶

Major advantages are:

Flexible Data Model
Elastic Scalability
High Performance
Open Source

Major disadvantages are:

Lack of Standardization
Backup
Consistency
Management challenge
Large Document Size

Typed of NoSQL Database¶

Types of NoSQL databases and the name of the databases system that falls in that category are:

Graph Databases: Amazon Neptune, Neo4j
Key value store: Memcached, Redis, Coherence
Tabular: Hbase, Big Table, Accumulo
Document-based: MongoDB, CouchDB, Cloudant

Relational Databases¶

A relational database organizes data into rows and columns, which collectively form a table. Data is typically structured across multiple tables, which can be joined together via a primary key or a foreign key. These unique identifiers demonstrate the different relationships which exist between tables, and these relationships are usually illustrated through different types of data models.

Database Management System¶

A database management system (DBMS) is system software for creating and managing databases. A DBMS makes it possible for end users to create, protect, read, update and delete data in a database. The most prevalent type of data management platform, the DBMS essentially serves as an interface between databases and users or application programs, ensuring that data is consistently organized and remains easily accessible.

Allowed Tasks¶

DBMS provides an interface to perform various operations like database creation, storing data in it, updating data, creating a table in the database and a lot more.
It provides protection and security to the database. In the case of multiple users, it also maintains data consistency.

Following tasks are allowed by DBMS:

Data Definition: It is used for creation, modification, and removal of definition that defines the organization of data in the database.
Data Updation: It is used for the insertion, modification, and deletion of the actual data in the database.
Data Retrieval: It is used to retrieve the data from the database which can be used by applications for various purposes.
User Administration: It is used for registering and monitoring users, maintain data integrity, enforcing data security, dealing with concurrency control, monitoring performance and recovering information corrupted by unexpected failure.

Characterstics of DBMS¶

It uses a digital repository established on a server to store and manage the information.
It can provide a clear and logical view of the process that manipulates data.
DBMS contains automatic backup and recovery procedures.
It can reduce the complex relationship between data.
It is used to support manipulation and processing of data.
It is used to provide security of data.
It can view the database from different viewpoints according to the requirements of the user.

Advantages of DBMS¶

The advanatges of DBMS are:

Controls database redundancy: It can control data redundancy because it stores all the data in one single database file and that recorded data is placed in the database.
Data sharing: In DBMS, the authorized users of an organization can share the data among multiple users.
Easily Maintenance: It can be easily maintainable due to the centralized nature of the database system.
Reduce time: It reduces development time and maintenance need.
Backup: It provides backup and recovery subsystems which create automatic backup of data from hardware and software failures and restores the data if required.
multiple user interface: It provides different types of user interfaces like graphical user interfaces, application program interfaces

Disadvantages of DBMS¶

DBMS has some disadvantages too, like:

Cost of Hardware and Software: It requires a high speed of data processor and large memory size to run DBMS software.
Size: It occupies a large space of disks and large memory to run them efficiently.
Complexity: Database system creates additional complexity and requirements.
Higher impact of failure: Failure is highly impacted the database because in most of the organization, all the data stored in a single database and if the database is damaged due to electric failure or database corruption then the data may be lost forever.

DBMS vs. File System¶

File based systems were an early attempt to computerize the manual system. It is also called a traditional based approach in which a decentralized approach was taken where each department stored and controlled its own data with the help of a data processing specialist.

Meanwhile, A database approach is a well-organized collection of data that are related in a meaningful way which can be accessed by different users but stored only once in a system. The various operations performed by the DBMS system are: Insertion, deletion, selection, sorting etc.

Some more differences are:

Basis	DBMS Approach	File System Approach
Meaning	DBMS is a collection of data. In DBMS, the user is not required to write the procedures.	The file system is a collection of data. In this system, the user has to write the procedures for managing the database.
Sharing of data	Due to the centralized approach, data sharing is easy.	Data is distributed in many files, and it may be of different formats, so it isn't easy to share data.
Data Abstraction	DBMS gives an abstract view of data that hides the details.	The file system provides the detail of the data representation and storage of data.
Security and Protection	DBMS provides a good protection mechanism.	It isn't easy to protect a file under the file system.
Recovery Mechanism	DBMS provides a crash recovery mechanism, i.e., DBMS protects the user from system failure.	The file system doesn't have a crash mechanism, i.e., if the system crashes while entering some data, then the content of the file will be lost.
Manipulation Techniques	DBMS contains a wide variety of sophisticated techniques to store and retrieve the data.	The file system can't efficiently store and retrieve the data.
Concurrency Problems	DBMS takes care of Concurrent access of data using some form of locking.	In the File system, concurrent access has many problems like redirecting the file while deleting some information or updating some information.
Where to use	Database approach used in large systems which interrelate many files.	File system approach used in large systems which interrelate many files.
Cost	The database system is expensive to design.	The file system approach is cheaper to design.
Data Redundancy and Inconsistency	Due to the centralization of the database, the problems of data redundancy and inconsistency are controlled.	In this, the files and application programs are created by different programmers so that there exists a lot of duplication of data which may lead to inconsistency.
Structure	The database structure is complex to design.	The file system approach has a simple structure.
Data Independence	In this system, Data Independence exists, and it can be of two types. Logical Data Independence Physical Data Independence	In the File system approach, there exists no Data Independence.
Integrity Constraints	Integrity Constraints are easy to apply.	Integrity Constraints are difficult to implement in file system.
Data Models	In the database approach, 3 types of data models exist: Hierarchal data models Network data models Relational data models	In the file system approach, there is no concept of data models exists.
Flexibility	Changes are often a necessity to the content of the data stored in any system, and these changes are more easily with a database approach.	The flexibility of the system is less as compared to the DBMS approach.
Examples	Oracle, SQL Server, Sybase etc.	Cobol, C++ etc.

DBMS Architecture¶

1-Tier Architecture¶

In this architecture, the database is directly available to the user. It means the user can directly sit on the DBMS and uses it.
Any changes done here will directly be done on the database itself. It doesn't provide a handy tool for end users.
The 1-Tier architecture is used for development of the local application, where programmers can directly communicate with the database for the quick response.

2-Tier Architecture¶

The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on the client end can directly communicate with the database at the server side. For this interaction, API's like: ODBC, JDBC are used.
The user interfaces and application programs are run on the client-side.
The server side is responsible to provide the functionalities like: query processing and transaction management.
To communicate with the DBMS, client-side application establishes a connection with the server side.

3-Tier Architecture¶

The 3-Tier architecture contains another layer between the client and server. In this architecture, client can't directly communicate with the server.
The application on the client-end interacts with an application server which further communicates with the database system.
End user has no idea about the existence of the database beyond the application server. The database also has no idea about any other user beyond the application.
The 3-Tier architecture is used in case of large web application.

Database Language¶

A DBMS has appropriate languages and interfaces to express database queries and updates. Database languages can be used to read, store and update the data in the database.

Data Definition Language¶

DDL is used to define database structure or pattern. It is used to create schema, tables, indexes, constraints, etc. in the database. Using the DDL statements, you can create the skeleton of the database. Data definition language is used to store the information of metadata like the number of tables and schemas, their names, indexes, columns in each table, constraints, etc.

Here are some tasks that come under DDL

Create: It is used to create objects in the database.
Alter: It is used to alter the structure of the database.
Drop: It is used to delete objects from the database.
Truncate: It is used to remove all records from a table.
Rename: It is used to rename an object.
Comment: It is used to comment on the data dictionary.

Data Manipulation Language¶

It is used for accessing and manipulating data in a database. It handles user requests.

Here are some tasks that come under DML

Select: It is used to retrieve data from a database.
Insert: It is used to insert data into a table.
Update: It is used to update existing data within a table.
Delete: It is used to delete all records from a table.
Merge: It performs UPSERT operation, i.e., insert or update operations.
Call: It is used to call a structured query language or a Java subprogram.
Explain Plan: It has the parameter of explaining data.
Lock Table: It controls concurrency.

Data Control Language¶

It is used to retrieve the stored or saved data. The DCL execution is transactional. It also has rollback parameters.

Under this, we have

Grant: It is used to give user access privileges to a database.
Revoke: It is used to take back permissions from the user.

Transaction Control Language¶

TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical transaction.

Here,

Commit: It is used to save the transaction on the database.
Rollback: It is used to restore the database to original since the last Commit.

ACID Properties in DBMS¶

DBMS is the management of data that should remain integrated when any changes are done in it. It is because if the integrity of the data is affected, whole data will get disturbed and corrupted. Therefore, to maintain the integrity of the data, there are four properties described in the database management system, which are known as the ACID properties.

Atomicity¶

The term atomicity defines that the data remains atomic. It means if any operation is performed on the data, either it should be performed or executed completely or should not be executed at all. It further means that the operation should not break in between or execute partially. In the case of executing operations on the transaction, the operation should be completely executed and not partially.

Consistency¶

The word consistency means that the value should remain preserved always. In DBMS, the integrity of the data should be maintained, which means if a change in the database is made, it should remain preserved always. In the case of transactions, the integrity of the data is very essential so that the database remains consistent before and after the transaction.

Isolation¶

In DBMS, Isolation is the property of a database where no data should affect the other one and may occur concurrently. In short, the operation on one database should begin when the operation on the first database gets complete. It means if two operations are being performed on two different databases, they may not affect the value of one another. In the case of transactions, when two or more transactions occur simultaneously, the consistency should remain maintained. Any changes that occur in any particular transaction will not be seen by other transactions until the change is not committed in the memory.

Durability¶

In DBMS, the term durability ensures that the data after the successful execution of the operation becomes permanent in the database. The durability of the data should be so perfect that even if the system fails or leads to a crash, the database still survives. For committing the values, the COMMIT command must be used every time we make changes.

Transaction¶

A transaction can be defined as a group of tasks. A single task is the minimum processing unit which cannot be divided further.

Let's talk about various states of transaction.

Active State¶

The active state is the first state of every transaction. In this state, the transaction is being executed.
For example: Insertion or deletion or updating a record is done here. But all the records are still not saved to the database.

Partially Committed¶

In the partially committed state, a transaction executes its final operation, but the data is still not saved to the database.
In the total mark calculation example, a final display of the total marks step is executed in this state.

Commited¶

A transaction is said to be in a committed state if it executes all its operations successfully. In this state, all the effects are now permanently saved on the database system.

Failed State¶

If any of the checks made by the database recovery system fails, then the transaction is said to be in the failed state.
In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then the transaction will fail to execute.

Aborted¶

If any of the checks fail and the transaction has reached a failed state then the database recovery system will make sure that the database is in its previous consistent state. If not then it will abort or roll back the transaction to bring the database into a consistent state.
If the transaction fails in the middle of the transaction then before executing the transaction, all the executed transactions are rolled back to its consistent state.
After aborting the transaction, the database recovery module will select one of the two operations:

Re-start the transaction
Kill the transaction

RDBMS¶

RDBMS stands for Relational Database Management System.

All modern database management systems like SQL, MS SQL Server, IBM DB2, ORACLE, My-SQL, and Microsoft Access are based on RDBMS. It is called Relational Database Management System (RDBMS) because it is based on the relational model. Here, data is represented in terms of tuples (rows).

Different Part of RDBMS¶

Table/Relation¶

Everything in a relational database is stored in the form of relations. The RDBMS database uses tables to store data. A table is a collection of related data entries and contains rows and columns to store data. Each table represents some real-world objects such as person, place, or event about which information is collected. The organized collection of data into a relational table is known as the logical view of the database.

Properties:

Each relation has a unique name by which it is identified in the database.
Relation does not contain duplicate tuples.
The tuples of a relation have no specific order.
All attributes in a relation are atomic, i.e., each cell of a relation contains exactly one value.

Row/Tuple¶

A row of a table is also called a record or tuple. It contains the specific information of each entry in the table. It is a horizontal entity in the table.

Properties

No two tuples are identical to each other in all their entries.
All tuples of the relation have the same format and the same number of entries.
The order of the tuple is irrelevant. They are identified by their content, not by their position.

Column/Attribute¶

A column is a vertical entity in the table which contains all information associated with a specific field in a table. For example, "name" is a column in the above table which contains all information about a student's name.