The most abstract level of a database design is the data model, the conceptual description of a problem space. Data models are expressed in terms of entities, attributes, domains, and relationships.

Entities

The concept of entity is intuitively quite straightforward: An entity is anything about which the system needs to store information. When you begin to design your data model, compiling an initial list of entities isn't difficult. When we talk about the problem space, most of the nouns and verbs used will be candidate entities. "Customers buy products. Employees sell products. Suppliers sell us products." The nouns "Customers," "Products," "Employees," and "Suppliers" are all clearly entities.

The events represented by the verbs "buy" and "sell" are also entities, but a couple of traps exist here. First, the verb "sell" is used to represent two distinct events: the sale of a product to a customer (Salesman è Customer) and the purchase of a product by the organization (Supplier è Company).

The second trap is the inverse of the first: two different verbs ("buy" in the first sentence and "sell" in the second) are used to describe the same event, the purchase of a product by a customer. If the client is a tailor, for example, "customer buys suit" and "customer orders suit" might both result in the sale of a suit, but in the first case it's a ready-to-wear sale and in the second it's bespoke. Sometimes all you need to model is the fact that a relationship exists. Other times you'll want to store additional information about the relationships, such as the date on which it was established or some characteristic of the relationship.

Attributes

Your system will need to keep track of certain facts about each entity. These facts are the entity's attributes. If your system includes a Customer entity, for example, you'll probably want to know the names and addresses of the customers and perhaps the businesses they're in. All of these are attributes.

Determining the attributes to be included in your model is a semantic process. That is, you must make your decisions based on what the data means and how it will be used. There are no specific rules to make sure that a relation design and its attributes are final and complete, but there are some strategies exist.

The first strategy: Start with the result and don't make the design any more complex than it needs to be. What questions does your database have to answer? You need to be careful, of course, that you try to provide the flexibility to handle not just the questions your users are asking now but also the ones you can foresee them asking in the future.

You should also be on the lookout for questions the users would ask if they only knew they could, particularly if you're automating a manual system. Imagine asking a head librarian how many of the four million books in the collection were published in Chicago before 1900. He or she would point you to the card file and tell you to have fun. Yet this is trivial information to request from a well-designed database system.

Second strategy: Find the exceptions. There are two sides to this strategy. First, that you must identify all the exceptions, and second, that you must design the system to handle as many exceptions as you can without confusing users.

Domains

You might recall that a relation heading contains an AttributeName:DomainName pair for each attribute. A domain definition specifies the kind of data represented by the attribute. More particularly, a domain is the set of all possible values that an attribute may validly contain. Domains are often confused with data types; they are not the same. Data type is a physical concept while domain is a logical one. "int" is a data type; "Age" is a domain.

You might be thinking that a domain is the combination of the data type and the validation rule. Well, if you think of it this way, you won't go too far wrong. But validation rules are strictly part of the data integrity, not part of the data description. For example, the validation rule for a zip code might refer to the State attribute, whereas the domain of Zip-Code is "a five-digit string". For any two domains, if it makes sense to compare attributes defined on them, then the two domains are said to be type-compatible.

The question is that why should we bother with domains at all? The answer is that they're extremely useful design tools. "Are these two attributes interchangeable?" These are important questions when you're designing a data model, and domain analysis helps you think about them.

Relationships

A data model must specify the relationships between entities. At the conceptual level, relationships are simply associations between entities. The statement "Customers buy products" indicates that a relationship exists between these entities. The entities involved in a relationship are called its participants. The number of participants is the degree of the relationship. It might look similar to the degree of relation (which is the number of attributes), but they are not same.

The vast majority of relationships are binary, like the "Customers buy products" example, but this is not a requirement. Ternary relationships, those with three participants, are also common. Given the binary relationships "Employees sell products" and "Customers buy products," there is an implicit ternary relationship "Employees sell products to customers." However, specifying the two binary relationships do not allow us to identify which employees sold which products to which customers; only a ternary relationship can do that.

A special case of a binary relationship is an entity that participates in a relationship with itself. This is often called the bill of materials relationship and is most often used to represent hierarchical structures. A common example is the relationship between employees and managers: Any given employee might both be a manager and have a manager. The relationship between any two entities can be one-to-one, one-to-many, or many-to-many.

The participation of any given entity in a relationship can be partial or total. If it is not possible for an entity to exist unless it participates in the relationship, the participation is total; otherwise, it is partial. For example, Salesperson details can't logically exist unless there is a corresponding Employee. The reverse is not true. An employee might be something other than a salesperson, so an Employee record can exist without a corresponding Salesperson record. Thus, the participation of Employee in the relationship is partial, while the participation of Salesperson is total.