1. Request Collapsing
Request collapsing is a strategy used to combine multiple similar or identical requests into a single request, which helps reduce redundant load on a system and improves efficiency. For example, imagine a popular product page on an e-commerce website where hundreds of users request price information simultaneously. Instead of querying the database for each user, the system collapses these requests into a single database query, and then shares the response with all users.
2. Load Balancing
Load balancing ensures that incoming network traffic is distributed evenly across multiple servers, avoiding the scenario where a single server gets overwhelmed while others remain underutilized. This helps in maximizing resource utilization, preventing failures, and maintaining system performance. For instance, a popular website might use a load balancer to distribute visitor requests between several backend servers, so that no one server becomes a bottleneck.
3. Scaling
Scaling refers to the process of increasing a system’s capacity to handle more users, data, or traffic as demand grows. It can be done in two ways: vertical scaling, which involves upgrading the hardware resources of an existing server, or horizontal scaling, which involves adding more servers to handle increased load. For example, during Black Friday sales, an e-commerce site might scale horizontally by adding extra servers to handle the sudden spike in traffic.
4. Latency
Latency is the time taken for a request to travel from a sender to a receiver and back, and it is a critical factor in system performance. Low latency is essential for a good user experience, especially in applications that require real-time responses like video conferencing. For instance, the delay experienced while streaming a live video, such as buffering, is related to high latency in the network.
5. Throughput
Throughput is a measure of the number of requests or transactions a system can handle in a given period of time, and it represents the system’s processing capacity. High throughput indicates an efficient system capable of handling large volumes of requests effectively. For example, a web server that can serve 10,000 page requests per second has a high throughput, enabling it to handle many simultaneous users.
6. Database Indexing
Database indexing is a technique that improves the speed of data retrieval operations in a database by creating a separate data structure (index) that holds specific fields. This allows queries to quickly locate the needed records without scanning the entire dataset. For example, if a library database has an index on book titles, searching for a specific title is much faster than searching through every book.
7. Cache
A cache is a temporary storage area where frequently accessed data is kept for quick retrieval. Caching helps reduce response times and lowers the load on databases or remote servers. For instance, web browsers use a cache to store web page content, enabling quicker loading when revisiting the same page.
8. Distributed System
A distributed system is made up of multiple independent computers (nodes) that work together to achieve a common goal. These nodes communicate and coordinate with each other, creating a unified experience for users. An example is Google Search, where many servers work together to handle billions of search queries, providing fast and accurate results.
9. CAP Theorem
The CAP theorem states that a distributed system can only guarantee two of the following three properties: Consistency, Availability, and Partition Tolerance. This means trade-offs must be made when designing such systems. For instance, in a network partition, a NoSQL database like Cassandra chooses availability over consistency, ensuring that data is still accessible even if some parts of the system are temporarily out of sync.
10. Redundancy
Redundancy involves having backup components, systems, or processes in place to ensure continuity in case of failure. It provides fault tolerance, reducing the impact of individual component failures on the entire system. For example, cloud storage services often store copies of data in multiple data centers, ensuring that data remains accessible even if one data center experiences issues.
11. Fault Tolerance
Fault tolerance is the ability of a system to continue functioning correctly even in the presence of faults or failures. It is achieved by adding redundancy and implementing error-handling strategies, ensuring that the failure of one component doesn’t lead to a complete system crash. For instance, a redundant power supply in a server can take over if the primary power supply fails, keeping the system running.
12. Microservices
Microservices is an architectural style where an application is divided into smaller, independent services, each responsible for a specific business function. These services communicate with each other through APIs, enabling flexibility in development and scaling. For example, an e-commerce site might have separate services for inventory, payment, and user authentication.
13. Database Sharding
Database sharding is the practice of splitting a large database into smaller, more manageable pieces called shards. Each shard holds a subset of the data, allowing for faster queries and better performance as the system scales. For instance, a social media platform might use sharding to separate user data by geographic regions to reduce latency and improve performance.
14. Consistency
In distributed systems, consistency ensures that all nodes have the same data view, meaning any write operation is reflected in all subsequent read operations. This helps maintain data integrity across the system. For example, in a banking system, if a user transfers money, the updated balance must be consistent across all servers, preventing discrepancies.
15. Availability
Availability means that a system remains operational and accessible even during hardware failures or network issues. It ensures continuous service to users, which is crucial for applications like online banking or e-commerce. For instance, Amazon’s e-commerce platform uses load balancing and redundant systems to ensure that users can shop even if a server goes down.
16. Caching Strategies
Caching strategies are crucial for improving system performance by storing frequently accessed data in a temporary storage area known as a cache. This reduces latency and load on the primary data source. Three common strategies are:
- Write-Through Caching: Data is written to both the cache and the database simultaneously, ensuring consistency. For example, when a user updates their profile, the change is reflected immediately in the cache and the database.
- Write-Around Caching: Data is written directly to the database, bypassing the cache. This is useful for infrequently accessed data, preventing cache pollution. For instance, if a user uploads a large file, it goes directly to the database, and the cache remains unaffected until that file is accessed.
- Write-Back Caching: Data is written to the cache first and then asynchronously written to the database. This improves performance but can lead to data loss if the cache fails before the data is saved to the database. An example is when a user makes changes to settings; they’re saved in the cache quickly, and the database update occurs later.
17. Message Queue
A message queue is an asynchronous communication mechanism that allows different components or services within a system to send and receive messages independently. This decouples the message sender and receiver, enabling better scalability and reliability. For example, in an e-commerce application, when a user places an order, the order service sends a message to a message queue. The inventory service can then process this message independently, ensuring the order is fulfilled without blocking the user interface.
18. Design Patterns
Design patterns are standardized solutions to common software design problems that improve code organization and maintainability. In system design, patterns such as:
- Singleton Pattern: Ensures a class has only one instance and provides a global point of access to it. For example, a configuration manager that loads settings only once during the application lifecycle.
- Factory Pattern: Creates objects without specifying the exact class of object that will be created. For example, a payment processing system might use a factory pattern to create different payment methods (credit card, PayPal) based on user selection.
- Observer Pattern: Allows an object to notify other objects about state changes. For instance, in a stock market application, when stock prices change, all subscribed clients receive updates.
19. Horizontal Scaling vs Vertical Scaling
Scaling a system can be approached in two main ways:
- Horizontal Scaling: Involves adding more machines or nodes to a system to handle increased load. This allows for distributing workloads across multiple servers. For example, an online service may add more web servers to accommodate growing traffic.
- Vertical Scaling: Involves upgrading the resources (CPU, RAM, etc.) of existing machines to handle more load. While it can provide immediate performance improvements, there is a limit to how much a single machine can be upgraded. For example, upgrading a server from 16GB to 64GB of RAM to improve performance during peak times.
20. CDN (Content Delivery Network)
A CDN is a distributed network of servers located in various geographic locations designed to deliver content (like images, videos, scripts) to users efficiently and quickly. By caching content closer to users, CDNs reduce latency and improve load times. For example, when a user accesses a website, the CDN serves images from the nearest server, enhancing the user experience by speeding up content delivery.
21. Consistent Hashing
Consistent hashing is a technique used to distribute data across nodes in a distributed system efficiently. It minimizes the need for rehashing when nodes are added or removed, thus maintaining balance. For instance, in a distributed cache system, if a new server is added, only a portion of the keys needs to be redistributed, rather than all of them, reducing the overhead and maintaining performance.
22. Polyglot Persistence
Polyglot persistence refers to using multiple types of databases to store different types of data within a system. This approach allows developers to select the most suitable database technology for specific use cases. For example, an application might use a relational database for transactional data (like user accounts) and a NoSQL database for storing user-generated content (like comments and posts).



