How does data store compression speed up data warehouses?
I often see the claim that various data warehouse/analytical database systems derive significant performance benefits from compressing their data stores. On the face of it, though, this seems to be an absurd claim: Fast datastore reads in a database come from indexing, (in the "array index" sense, not necessarily the SQL INDEX sense,) the ability to selectively read only the relevant data in a random-access manner. Data compression does not work on individual data points; it is a bulk operation which is highly stateful and of varying efficiency depending on the data being compressed and the history of data that has previously been compressed. Therefore, indexing into a compressed data store is impossible. Reads will necessarily require a decompression step, possibly over a lot of data you don't care about, in order to get at the data you want. Therefore, data compression must significantly slow down a database. This seems obvious, and yet data warehouses claim that compression gives them performance boosts. So... what am I missing? (Note: Making the data store smaller will obviously result in lower storage costs. But that's not the claim I'm questioning here; data warehouses claim that compression helps them run queries faster and lower compute costs.)

I often see the claim that various data warehouse/analytical database systems derive significant performance benefits from compressing their data stores. On the face of it, though, this seems to be an absurd claim:
- Fast datastore reads in a database come from indexing, (in the "array index" sense, not necessarily the SQL
INDEX
sense,) the ability to selectively read only the relevant data in a random-access manner. - Data compression does not work on individual data points; it is a bulk operation which is highly stateful and of varying efficiency depending on the data being compressed and the history of data that has previously been compressed.
- Therefore, indexing into a compressed data store is impossible. Reads will necessarily require a decompression step, possibly over a lot of data you don't care about, in order to get at the data you want.
- Therefore, data compression must significantly slow down a database.
This seems obvious, and yet data warehouses claim that compression gives them performance boosts. So... what am I missing?
(Note: Making the data store smaller will obviously result in lower storage costs. But that's not the claim I'm questioning here; data warehouses claim that compression helps them run queries faster and lower compute costs.)