How Netflix Stores 140 Million Hours of Viewing Data Per Day PART-3

Reads and Writes in the Initial System The diagram below shows how the initial system handled reads and writes to the viewing history data. Every time a user started watching a show or movie, Netflix added a new column to their viewing history record in the database. If the user paused or stopped watching, that same column was updated to reflect their latest progress. While storing data was easy, retrieving it efficiently became more challenging as users' viewing histories grew. Netflix used three different methods to fetch data, each with its advantages and drawbacks: Retrieving the Entire Viewing History: If a user had only watched a few shows, Netflix could quickly fetch their entire history in one request. However, for long-time users with large viewing histories, this approach became slow and inefficient as more data accumulated. Searching by Time Range: Sometimes, Netflix only needed to fetch records from a specific time period, like a user’s viewing history from the last month. While this method worked well in some cases, performance varied depending on how many records were stored within the selected time range. Using Pagination for Large Histories: To avoid loading huge amounts of data at once, Netflix used pagination. This approach prevented timeouts (when a request takes too long and fails), but it also increased the overall time needed to retrieve all records. At first, this system worked well because it provided a fast and scalable way to store viewing history. However, as more users watched more content, this system started to hit performance limits. Some of the issues were as follows: Too Many SSTables: Apache Cassandra® stores data in files called SSTables (Sorted String Tables). Over time, the number of these SSTables increased significantly. For every read, the system had to scan multiple SSTables across the disk, making the process slower. Compaction Overhead: Apache Cassandra® performs compactions to merge multiple SSTables into fewer, more efficient ones. With more data, these compactions took longer and required more processing power. Other operations like read repair and full column repair also became expensive. To speed up data retrieval, Netflix introduced a caching solution called EVCache. Instead of reading everything from the Apache Cassandra® database every time, each user’s viewing history is stored in a cache in a compressed format. When a user watches a new show, their viewing data is added to Apache Cassandra® and merged with the cached value in EVCache. If a user’s viewing history is not found in the cache, Netflix fetches it from Cassandra®, compresses it, and then stores it in EVCache for future use. STAY TUNED... technology #innovation #future #techupdate" #technews #futuretech #innovation #AI #automation #techtrends #digitaltransformation #fullstackdeveloper reactdeveloper

Mar 22, 2025 - 14:40

How Netflix Stores 140 Million Hours of Viewing Data Per Day PART-3

Reads and Writes in the Initial System
The diagram below shows how the initial system handled reads and writes to the viewing history data.

Every time a user started watching a show or movie, Netflix added a new column to their viewing history record in the database. If the user paused or stopped watching, that same column was updated to reflect their latest progress.

While storing data was easy, retrieving it efficiently became more challenging as users' viewing histories grew. Netflix used three different methods to fetch data, each with its advantages and drawbacks:

Retrieving the Entire Viewing History: If a user had only watched a few shows, Netflix could quickly fetch their entire history in one request. However, for long-time users with large viewing histories, this approach became slow and inefficient as more data accumulated.

Searching by Time Range: Sometimes, Netflix only needed to fetch records from a specific time period, like a user’s viewing history from the last month. While this method worked well in some cases, performance varied depending on how many records were stored within the selected time range.

Using Pagination for Large Histories: To avoid loading huge amounts of data at once, Netflix used pagination. This approach prevented timeouts (when a request takes too long and fails), but it also increased the overall time needed to retrieve all records.

At first, this system worked well because it provided a fast and scalable way to store viewing history. However, as more users watched more content, this system started to hit performance limits. Some of the issues were as follows:

Too Many SSTables: Apache Cassandra® stores data in files called SSTables (Sorted String Tables). Over time, the number of these SSTables increased significantly. For every read, the system had to scan multiple SSTables across the disk, making the process slower.

Compaction Overhead: Apache Cassandra® performs compactions to merge multiple SSTables into fewer, more efficient ones. With more data, these compactions took longer and required more processing power. Other operations like read repair and full column repair also became expensive.

To speed up data retrieval, Netflix introduced a caching solution called EVCache.

Instead of reading everything from the Apache Cassandra® database every time, each user’s viewing history is stored in a cache in a compressed format. When a user watches a new show, their viewing data is added to Apache Cassandra® and merged with the cached value in EVCache. If a user’s viewing history is not found in the cache, Netflix fetches it from Cassandra®, compresses it, and then stores it in EVCache for future use.

STAY TUNED...