The heart of the data warehouse database model, different to both the relational and object models, vaguely combines aspects of both relations and objects. A data warehouse database is effectively a highly denormalized structure, consisting of preferably only two distinct hierarchical layers. A central table contains highly denormalized transactional data. The second layer contains referential static data. This data warehouse database model is known as the dimensional model or the fact-dimensional model. A data warehouse consists of facts and dimensions. Facts and dimensions are types of tables. Each data warehouse structure consists of a single fact table, surrounded by multiple dimensions. It is possible to have more than a single fact table but essentially different fact tables are likely to be related to the same set of dimension tables. Therefore, different fact tables represent an entire new set of tables, or a new modular structure. That entire new set of tables is essentially another subset data warehouse, also known as a data mart.

The term data mart is used to describe a self-contained, subsection of a data warehouse. Fact and dimension tables contain different types of data. Where dimensions contain static data, and facts contain transactional data. Transactional data is the record of a company’s activities, such as invoices sent to its customers. The dimensions describe the facts, such as the customer’s name and address.

For example, an online retailer selling thousands of items per day could ship 20 items to each customer every year. Over the course of a number of years, each customer might be shipped hundreds of separate items.

The detail of a customer (such as the address) is static information. Static information does not change very often. The customer is a dimension. The customer dimension describes the fact, or the transactions (the invoices or details of every item shipped over many years). The active database (OLTP database) would likely have all records of transactions deleted from its active files on a periodical basis (annually, for example). Annual historical data could be archived into a data warehouse. The data warehouse data can then be used for forecasting (making guesses as to what customers might purchase over the next 10 years). The result of all this mishmash of complicated activities and other wonderful stuff is a table structure looking similar to that shown in Figure 6-17. Figure 6-17 shows a pseudo-table structure, describing graphically what is known as a star schema (a single fact table surrounded by a group of dimensions). Data warehouse database models are ideally made up of data mart, subset star schemas. Effectively, different schemas are also known as data marts. Each data mart is a single fact table, all linked to shared dimension tables (not necessarily all the same dimensions, but it is possible). Each fact-dimensional structure is a star schema (a data mart). Each star schema is likely to contain data for a different department of a company, or a different region (however a company may decide to split its data).

Some data warehouses are built using 3NF table structures, or even combine normalized structures with fact-dimensional structures in the same database.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/8a6c5a7b-e871-4c23-be80-e304b3a937d7/Untitled

The Origin of Data Warehouses

Data warehouses were originally devised because existing databases were being subjected to conflicting requirements. Conflict arose between operational use and decision-support requirements as follows:

There is a big difference between requirements for operational and decision-support databases. Operational systems require instant responses on small amounts of information. Decision-support systems need access to large amounts of data (large portions of a database), allowing for good all-round estimates as to future prospects for a company. The invention of data warehouses was inevitable to reduce conflict between small transactional (OLTP databases) and large historical analytical reporting requirements (data warehouses).

The demands of the modern global economy and the Internet dictate that end user operational applications are required to be active 24/7, 365 days a year. There is no window for any type of batch activity because when people are asleep in Europe, others are awake down under in Australia. The global economy requires instant and acceptable servicing of the needs of a global user population.

In reality, the most significant difference between OLTP databases and data warehouses extends all the way down to the hardware layer. OLTP databases need highly efficient sharing of critical resources such as onboard memory (RAM), and have very small I/O requirements. Data warehouses are completely opposite. Data warehouses can consume large portions of RAM by transferring between disk and memory, in detriment to an OLTP database running on the same machine. Where OLTP databases need resource sharing, data warehouses need to hog those resources for extended periods of time. So, a data warehouse hogs machine resources. An OLTP database attempts to share those same resources. It is likely to have unacceptable response times because of a lack of basic I/O resources for both database types. The result, therefore, is a requirement for a complete separation between operational (OLTP) and decision-support (data warehouse) activity. This is why data warehouses exist!

The Relational Database Model and Data Warehouses