Technical Interview Questions: List of things to read/know for system design

List of things to read/know for system design

How to approach it?

Ask questions on:

Features
How much to scale:

How much data you need to store in the DBs
What latency is expected
How many requests to expect within a second/minute?

Don't use buzzwords
Clear and organized thinking:

Draw all the boxes
Users
APIs

Drive discussions via 80-20 rule, where you should be talking 80% of the time

Basics: Things to think about:

Features for MVP
Define APIs
Think about System Availability
Think about Latency Performance:

For background jobs high latency is ok
For customer facing latency needs to be low.

Scalability - For more users
Durability - Data need to be stored in the DB without comprimising
Class Diagram
Security & Privacy
Cost effective

Concepts to know about:

Vertical v/s Horizontal Scaling

Add CPU on the Vertical. Adding more hosts is Horizontal.

CAP Theorem:

Consistency: Read will give you most recent write.
Availability: You will get the response back for sure. You might not get most recent writes.
Partition: Between 2 nodes, N/W packets could get dropped.
RDBMS: for Consistency
No-SQL: for Availability

ACID v/s BASE

RDBMS: ACID properties; Atomicity, Consistency, Isolation, Durability
No-SQL: BASE properties: Basically available Soft state Eventual Consistency.

Partitioning v/s Sharding Data

Sharding or spilting data based on some values.

Consistent Hashing
Optimistic V/s Pessimistic locking

Optimistic: Don't acquire locks till you are ready to commit the transaction
Pessimistic: Acquire the lock from the start

Strong V/s Eventual consistency

Strong: Reads will always see the latest write.

RDBMS

Eventual: Reads will sometimes see the latest write and eventually will see all the writes.

No-SQL: Both: You can configure to get Strong or Eventual consistency.
Eventual helps with high availability and delayed consistency.

RDBMS V/s NoSQL
Types of NoSQL

Key Value:
Wide Column:
Document Based: XML or Json data
Graph Based:

Caching

Cache data is not shared between nodes.
Shared between nodes.
Cannot be source of truth.
Limited storage

Datacenter/Hosts/Racks

Datacenter has Racks and racks have hosts.
Latency to communicate between Racks and hosts.

CPU/Memory/Hard drive/Network B/W

Improve throughput, latency

Random V/s Sequential write on the disk

Sequential reads and writes are better.
Random are Slower

Http vs Http2 vs Websockets

Entire web run on Http
Http2
Websockets: Bi-directional communication

TCP/IP Model:

4 layers

Ipv4 vs Ipv6

Ipv4: 32 bit address
Ipv6: 128 bit address
How does IP routing works

TCP vs UDP

TCP: Connection oriented reliable connection. Sending documents.
UDP: Unreliable connection. Video Streaming. Its ok to lose some data packets in video streaming. Losing some data packets doesn't lose the consistency.

DNS Lookup.
Https v/s TLS(Transport Level Security)

Secure communication between client and server in terms of data integrity and privacy.

Public key infrastructure & certificate authority
Symmetric v/s Asymetric key

Asymetric: Computational more expensive. Public-Private Key encryption.
Symmetric: AES

Loadbalancer -> L4 V/s L7

Load is distributed based on Round robin basis or average load on the node behind the LB.
L4
L7

CDNs & Edge

CDN: Content Delivery Network. Helps with:

Lower latency
Performance

Edge

Bloom Filters & Count-min Sketch
Paxos: Consensus over distributed hosts.
Design patterns & Object oriented design
Virtual machines & containers
Publisher-Subscriber over a Queue

Customer Facing request shouldn't be exposed over pub-sub.

MapReduce

Distributed and parallel processing of massive amounts of data.

Multithreading, concurrency, locks, synchronization, CAS

Actual implementation of some of these concepts:

Cassandra:

Wide column highly scalable DB. use cases:

Key-value storage
Time Series data
Storing more traditional rows with a lot of columns.

Can provide eventual and strong consistency.
Uses consistent hashing to shard the data.
Uses gossip protocol to make the nodes aware of other nodes status.

MongoDB/Couchbase:

Json like structure can be persisted in it.
scale well.

MySQL

Traditional DBs.
Master-Slave architecture for scalability.

Memcached

Distributed cache
Simple fast key-value storage

Redis

Setup as cluster
Flush data to hard-drive if you configure it that way.
All properties of Memcached in it.

ZooKeeper

Central configuration management tool.
Used for leader election and distributed locking.
Scales well for reads not writes
in-memory so can't store a lot of data.

Kafka

Fault tolerance highly available queue used in pub-sub or streaming application.
deliver message exactly once. Keep the messages in proper order in the assigned partition.

NGINX/HAProxy

Load balancers.
Manager 1000s of connection from clients with a single instance.

Solr, Elastic Search

Search platform on top of
Highly available, scalable, fault tolerant search funtionality
full search text

BlobStore like Amazon S3(part of AWS platform)

Stores big picture/files on cloud

Docker -> Kubernets/Mesos(Software tools to manage containers)

Software platform that provides the containers on DC or laptop or cloud.

Hadoop/Spark -> HDFS

Hadoop: MapReduce
Spark: Faster Hadoop as its in-memory mapreduce.
HDFS: Java based file system that is distributed and fault tolerant and Hadoop relies on it for all processing.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)