Technical Interview Questions: Distributed Datastore Basics

Basics:

Distributed DB is divided into different nodes & Data is spread across the nodes.
There is no Master/Slave config in distributed DB.
Every machine is given the responsibility of saving part of the data based on some logic.
If some node goes down its fine since some of its data is captured into some other node.
Primary responsibility of a node is to store all the data. Secondary responsibility of another node(backup) will be to store part of that data. So a primary node will store all the data and several secondary nodes will store parts of the data. In case of loss, data will be retrieved from several secondary nodes.
Consistent hashing concept is used to distribute the data across several nodes.

It will determine the hash of data and spread it across different partition keys.

Primary datastore will take care of hashing the partition key and finding the respective node for storing it.
In No-SQL datastores like Cassandra there is a replication factor, which is usually a number. Let's say replication factor is 3 then the data will be stored onto 3 different nodes.

Let's say the backup strategy is to store the data into the next available node. So in that case the data will be stored in the computed node, then next available node, then next available.
So its a combination of consistent hashing and backing up the data into different nodes.

To handle more load extra nodes will be added. This will automatically assign the responsibility of the distributing the data across new and existing nodes using Gossip protocol.

All nodes talk to each other so they everything about each other.
If a node goes down it will not hear or talk anything. When other will communicate to it there will be no response. They'll understand that the node is down so they will share the responsibility amongst themselves automatically. Another node will become primary node to hold the data of that range let's say 10-20k.

HDFS has master node/name node and data node. Data resides in the data node.
In such cases ZooKeeper can take care of assigning responsibilities to slave nodes if master goes down.

Technical Interview Questions