Distributed database

A distributed database is a database in which data is stored across different physical locations. It may be stored in multiple computers located in the same physical location ; or maybe dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components.
System administrators can distribute collections of data across multiple physical locations. A distributed database can reside on organised network servers or decentralised independent computers on the Internet, on corporate intranets or extranets, or on other organisation networks. Because distributed databases store data across multiple computers, distributed databases may improve performance at end-user worksites by allowing transactions to be processed on many machines, instead of being limited to one.
Two processes ensure that the distributed databases remain up-to-date and current: replication and duplication.

Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be complex and time-consuming, depending on the size and number of the distributed databases. This process can also require much time and computer resources.
Duplication, on the other hand, has less complexity. It identifies one database as a master and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, users may change only the master database. This ensures that local data will not be overwritten.

Both replication and duplication can keep the data current in all distributive locations.
Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous, and asynchronous distributed database technologies. The implementation of these technologies can and do depend on the needs of the business and the sensitivity/confidentiality of the data stored in the database and the price the business is willing to spend on ensuring data security, consistency and integrity.
When discussing access to distributed databases, Microsoft favors the term distributed query, which it defines in protocol-specific manner as "ny SELECT, INSERT, UPDATE, or DELETE statement that references tables and rowsets from one or more external OLE DB data sources".
Oracle provides a more language-centric view in which distributed queries and distributed transactions form part of distributed SQL.

Architecture

A database user accesses the distributed database through one of the following methods:

Local applications: applications that do not require data from other sites.
Global applications: applications that do require data from other sites.

A homogeneous distributed database has identical software and hardware running all database instances, and may appear through a single interface as if it were a single database. A heterogeneous distributed database may have different hardware, operating systems, database management systems and even data models for different databases.

Homogeneous Distributed Databases Management System

With a homogeneous distributed database, all sites have identical software and are aware of each other and agree to cooperate in processing user requests. Each site surrenders part of its autonomy in terms of the right to change schema or software. A homogeneous DBMS appears to the user as a single system. The homogeneous system is much easier to design and manage. The following conditions must be satisfied for the homogeneous database:

The data structures used at each location must be the same or compatible.
The database application used at each location must be the same or compatible.
Heterogeneous DDBMS

In a heterogeneous distributed database, different sites may use different schema and software. The difference in a schema is a major problem for query processing and transaction processing. Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing. In heterogeneous systems, different nodes may have different hardware and software and data structures at various nodes or locations are also incompatible. Different computers and operating systems, database applications, or data models may be used at each of the locations. For example, one location may have the latest relational database management technology, while another location may store data using conventional files or old versions of the database management system. Similarly, one location may have the Windows operating system, while another may have UNIX. Heterogeneous systems are usually used when individual sites use their hardware and software. On heterogeneous systems, translations are required to allow communication between different sites. In this system, the users must be able to make requests in a database language at their local sites. Usually, the SQL database language is used for this purpose. If the hardware is different, then the translation is straightforward, in which computer codes and word-length is changed. The heterogeneous system is often not technically or economically feasible. In this system, a user at one location may be able to read but not update the data at another location.

Important considerations

Care must be taken to ensure the following:

The distribution is transparent — users must be able to interact with the system as if it were one logical system. This applies to the system's performance and methods of access among other things.
Transactions are transparent — each transaction must maintain database integrity across multiple databases. Transactions must also be divided into sub-transactions, each sub-transaction affecting one database system.

There are two principal approaches to store a relation r in a distributed database system: replication and fragmentation/partitioning.

Replication

In replication, the system maintains several identical replicas of the same relation r in different sites.

Fragmentation

The relation r is fragmented into several relations r₁, r₂, r₃....r_n in such a way that the actual relation could be reconstructed from the fragments and then the fragments are scattered to different locations. There are basically two schemes of fragmentation:
A distributed database can be run by independent or even competing parties as, for example, in bitcoin or Hasq.

Advantages

Management of distributed data with different levels of transparency like network transparency, fragmentation transparency, replication transparency, etc.
Increase reliability and availability
Easier expansion
Reflects organizational structure — database fragments potentially stored within the departments they relate to
Local autonomy or site autonomy — a department can control the data about them
Protection of valuable data — if there were ever a catastrophic event such as a fire, all of the data would not be in one place, but distributed in multiple locations
Improved performance — data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing the load on the databases to be balanced among servers.
Economics — it may cost less to create a network of smaller computers with the power of a single large computer
Modularity — systems can be modified, added and removed from the distributed database without affecting other modules
Reliable transactions - due to a replication of the database
Hardware, operating system, network, fragmentation, DBMS, replication and location independence
Continuous operation, even if some nodes go offline
Distributed query processing can improve performance
Single-site failure does not affect the performance of the system.
For those systems that support full distributed transactions, operations enjoy the ACID properties:
* A-atomicity, the transaction takes place as a whole or not at all
* C-consistency maps one consistent DB state to another
* I-isolation, each transaction sees a consistent DB
* D-durability, the results of a transaction must survive system failures

The Merge Replication Method is popularly used to consolidate the data between databases.

Disadvantages

Complexity — DBAs may have to do extra work to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database — for example, joins become prohibitively expensive when performed across multiple systems.
Economics — increased complexity and a more extensive infrastructure means extra labor costs
Security — remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured.
Difficult to maintain integrity — but in a distributed database, enforcing integrity over a network may require too much of the network's resources to be feasible
Inexperience — distributed databases are difficult to work with, and in such a young field there is not much readily available experience in "proper" practice
Lack of standards — there are no tools or methodologies yet to help users convert a centralized DBMS into a distributed DBMS
Database design more complex — In addition to traditional database design challenges, the design of a distributed database has to consider fragmentation of data, allocation of fragments to specific sites and data replication
Additional software is required
Operating system should support distributed environment
Concurrency control poses a major issue. It can be solved by locking and timestamping.
Distributed access to data
Analysis of distributed data

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...