It is the process of structuring the data in the problem space to achieve these two goals of eliminating redundancy and ensuring flexibility. The normal forms specify increasingly stringent rules for the structure of relations. Each form extends the previous one in such a way as to prevent certain kinds of update anomalies. Bear in mind that the normal forms are not a prescription for creating a “correct” data model. A data model could be perfectly normalized and still fail to answer the questions asked of it; or, it might provide the answers, but so slowly and awkwardly that the database system built around it is unusable. But if your data model is normalized that is, if it conforms to the rules of relational structure the chances are high that the result will be an efficient and effective data model.

By dictionary definition, the term normalization means to make normal in terms of causing something to conform to a standard, or to introduce consistency with respect to style and content. In terms of relational database modeling, that consistency becomes a process of removing duplication in data, among other factors. Removal of duplication tends to minimize redundancy. Minimization of redundancy implies getting rid of unneeded data present in particular places, or tables.

Normalization manages to divide information into smaller, more manageable parts, preferably not too small. The most obvious redundancies can usually be removed without getting too deeply mathematical about everything. Commercially speaking, primary objectives are usually to save space and organize data for usability and manageability, without sacrificing performance. Additionally the demands of intensely busy applications and end-user needs can tend to necessitate breaking the rules of normalization in many ways to meet performance requirements. Rules are usually broken simply by not applying every possible layer of normalization. Normal Forms beyond 3rd Normal Form are often ignored and sometimes even 3rd Normal Form itself is discounted.

Normalization can be described as being one of introduction of granularity, removal of duplication, or minimizing of redundancy, or simply the introduction of tables, all of which place data into a better organized state. Each Normal Form is a refinement of the previous Normal Form. It means each Normal Form layer adds to whatever Normal Forms have already been applied. For example, 2nd Normal Form can only be applied to tables in 1st Normal Form, and 3rd Normal Form only applied to tables in 2nd Normal Form, and so on.

In general, normalization removes duplication and minimizes redundant chunks of data. The result is better organization and more effective use of physical space, among other factors. Normalization is not always the best solution. For example, in data warehouses, there is a completely different approach. In short, normalization is not the be-all and end-all of relational database model design.

This section describes a brief user-friendly interpretation of Normal Forms. It is just as important to understand Normal Forms from a more academic, more precise but possibly less commercially viable perspective. The problem with the academic approach to normalization is that it seems to insist on always expecting a designer to apply every Normal Form layer in every situation. In a commercial environment this is nearly always a mistake. The trouble with the deeper and more precisely refined aspects of normalization is that normalization tends to over-define itself for the sake of simply defining itself further.

Benefits of Normalization

Effectively minimizing redundancy is another way of describing removal of duplication. The effect of removing duplication is as follows:

Physical space needed to store data is reduced.
Data becomes better organized.
Normalization allows changes to a single record of data in one table to be made at once. In other words, a single table record is updated when a specific item is added, changed, or removed from the database.

Potential Normalization Hazards

Performance is always a problem with too much granularity caused by over-application of normalization. Very demanding concurrency OLTP databases can be very adversely affected by too much granularity. Data warehouses often require non-technical end-user access and over-granularity tends to make table structure more technically oriented to the point of being impossible to interpret by end-users. Keep the following in mind:

Physical space is not nearly as big a concern as it used to be, because disk space is one of the cheapest cost factors to consider (unless, of course, when dealing with a truly huge data warehouse).
Too much minimization of redundancy implies too much granularity and too many tables. Too many tables can lead to extremely huge SQL join queries. The more tables in a SQL join query, the slower queries execute. Performance can be so drastically affected as to make applications completely useless.
Better organization of data with extreme amounts of redundancy minimization can actually result in more complexity, particularly if end-users are exposed to database model structure.
The deeper the level of normalization, the more mathematical the model becomes, making the model “techie-friendly” and thus very “user-unfriendly.” Ask yourself: Who is accessing the database, end-users or OLTP applications?

The Concept of Anomalies

The intention of relational database theory is to eliminate anomalies from occurring in a database. Anomalies can potentially occur during changes to a database. An anomaly is a bad thing because data can become logically corrupted. An anomaly with respect to relational database design is essentially an erroneous change to data, more specifically to a single record. To put this into perspective, data warehouses can add and change millions of records in single transactions, making accounting for anomalies overzealous. In the interests of mathematical precision, explicit definition is required. Why? Mathematics is very precise and anomalies always should be accounted for. That is just the way it is.

Insert anomaly: Caused when a record is added to a detail table, with no related record existing in a master table.