It is the process of structuring the data in the problem space to achieve these two goals of eliminating redundancy and ensuring flexibility. The normal forms specify increasingly stringent rules for the structure of relations. Each form extends the previous one in such a way as to prevent certain kinds of update anomalies. Bear in mind that the normal forms are not a prescription for creating a “correct” data model. A data model could be perfectly normalized and still fail to answer the questions asked of it; or, it might provide the answers, but so slowly and awkwardly that the database system built around it is unusable. But if your data model is normalized that is, if it conforms to the rules of relational structure the chances are high that the result will be an efficient and effective data model.

By dictionary definition, the term normalization means to make normal in terms of causing something to conform to a standard, or to introduce consistency with respect to style and content. In terms of relational database modeling, that consistency becomes a process of removing duplication in data, among other factors. Removal of duplication tends to minimize redundancy. Minimization of redundancy implies getting rid of unneeded data present in particular places, or tables.

Normalization manages to divide information into smaller, more manageable parts, preferably not too small. The most obvious redundancies can usually be removed without getting too deeply mathematical about everything. Commercially speaking, primary objectives are usually to save space and organize data for usability and manageability, without sacrificing performance. Additionally the demands of intensely busy applications and end-user needs can tend to necessitate breaking the rules of normalization in many ways to meet performance requirements. Rules are usually broken simply by not applying every possible layer of normalization. Normal Forms beyond 3rd Normal Form are often ignored and sometimes even 3rd Normal Form itself is discounted.

Normalization can be described as being one of introduction of granularity, removal of duplication, or minimizing of redundancy, or simply the introduction of tables, all of which place data into a better organized state. Each Normal Form is a refinement of the previous Normal Form. It means each Normal Form layer adds to whatever Normal Forms have already been applied. For example, 2nd Normal Form can only be applied to tables in 1st Normal Form, and 3rd Normal Form only applied to tables in 2nd Normal Form, and so on.

In general, normalization removes duplication and minimizes redundant chunks of data. The result is better organization and more effective use of physical space, among other factors. Normalization is not always the best solution. For example, in data warehouses, there is a completely different approach. In short, normalization is not the be-all and end-all of relational database model design.

This section describes a brief user-friendly interpretation of Normal Forms. It is just as important to understand Normal Forms from a more academic, more precise but possibly less commercially viable perspective. The problem with the academic approach to normalization is that it seems to insist on always expecting a designer to apply every Normal Form layer in every situation. In a commercial environment this is nearly always a mistake. The trouble with the deeper and more precisely refined aspects of normalization is that normalization tends to over-define itself for the sake of simply defining itself further.

Benefits of Normalization

Effectively minimizing redundancy is another way of describing removal of duplication. The effect of removing duplication is as follows:

Potential Normalization Hazards

Performance is always a problem with too much granularity caused by over-application of normalization. Very demanding concurrency OLTP databases can be very adversely affected by too much granularity. Data warehouses often require non-technical end-user access and over-granularity tends to make table structure more technically oriented to the point of being impossible to interpret by end-users. Keep the following in mind:

The Concept of Anomalies

The intention of relational database theory is to eliminate anomalies from occurring in a database. Anomalies can potentially occur during changes to a database. An anomaly is a bad thing because data can become logically corrupted. An anomaly with respect to relational database design is essentially an erroneous change to data, more specifically to a single record. To put this into perspective, data warehouses can add and change millions of records in single transactions, making accounting for anomalies overzealous. In the interests of mathematical precision, explicit definition is required. Why? Mathematics is very precise and anomalies always should be accounted for. That is just the way it is.