chapter 3

 There are hundreds of database types and numerous database classes (relational, graph, NoSQL, flat file, etc.). Why do you believe technology firms keep on re-inventing the wheel and coming up with a new medium for data management (virtually every day). Please cite at least three examples and elaborate with business use cases.  

C H A P T E R

Don't use plagiarized sources. Get Your Custom Essay on
chapter 3
Just from $13/Page
Order Essay

3

THE DATABASE
MANAGEMENT SYSTEM

CONCEPT

D ata has always been the key component of information systems. In the beginning
of the modern information systems era, data was stored in simple files. As

companies became more and more dependent on their data for running their businesses,
shortcomings in simple files became apparent. These shortcomings led to the development
of the database management system concept, which provides a solid basis for the modern
use of data in organizations of all descriptions.

OBJECTIVES

■ Define data-related terms such as entity and attribute and storage-related terms
such as field, record, and file.

■ Identify the four basic operations performed on stored data.
■ Compare sequential access of data with direct access of data.
■ Discuss the problems encountered in a non-database information systems

environment.
■ List the five basic principles of the database concept.
■ Describe how data can be considered to be a manageable resource.
■ List the three problems created by data redundancy.
■ Describe the nature of data redundancy among many files.
■ Explain the relationship between data integration and data redundancy in one file.
■ State the primary defining feature of a database management system.
■ Explain why the ability to store multiple relationships is an important feature of

the database approach.

■ Explain why providing support for such control issues as data security, backup

and recovery, and concurrency is an important feature of the database approach.
■ Explain why providing support for data independence is an important feature of

the database approach.

42 C h a p t e r 3 The Database Management System Concept

CHAPTER OUTLINE

Introduction

Data Before Database Management

Records and Files

Basic Concepts in Storing and

Retrieving

Data

The Database Concept

Data as a Manageable Resource

Data Integration and Data
Redundancy

Multiple

Relationships

Data Control Issues

Data Independence

DBMS Approaches

Summary

INTRODUCTION

Before the database concept was developed, all data in information systems (then
generally referred to as ‘‘data processing systems’’) was stored in simple linear
files. Some applications and their programs required data from only one file. Some
applications required data from several files. Some of the more complex applications
used data extracted from one file as the search argument (the item to be found)
for extracting data from another file. Generally, files were created for a single
application and were used only for that application. There was no sharing of files or
of data among applications and, as a result, the same data often appeared redundantly
in multiple files. In addition to this data redundancy among multiple files, a lack of
sophistication in the design of individual files often led to data redundancy within
those individual files.

As information systems continued to grow in importance, a number of
the ground rules began to change. Hardware became cheaper—much cheaper
relative to the computing power that it provided. Software development took on a
more standardized, ‘‘structured’’ form. Large backlogs of new applications to be
implemented built up, making the huge amount of time spent on maintaining existing
programs more and more unacceptable. It became increasingly clear that the lack of
a focus on data was one of the major factors in this program maintenance dilemma.
Furthermore, the redundant data across multiple files and even within individual
files was causing data accuracy nightmares (to be explained further in this chapter),
just as companies were relying more and more on their information systems to
substantially manage their businesses. As we will begin to see in this chapter, the
technology that came to the rescue was the database management system.

Summarizing, the problems included:

■ Data was stored in different formats in different files.
■ Data was often not shared among different programs that needed it, necessitating

the duplication of data in redundant files.
■ Little was understood about file design, resulting in redundant data within

individual files.
■ Files often could not be rebuilt after damage by a software error or a hardware

failure.
■ Data was not secure and was vulnerable to theft or malicious mischief by people

inside or outside the company.
■ Programs were usually written in such a manner that if the way that the data was

stored changed, the program had to be modified to continue working.
■ Changes in everything from access methods to tax tables required programming

changes.

Data Before Database Management 43

This chapter will begin by presenting some basic definitions and concepts
about data. Then it will describe the type of file environment that existed before
database management emerged. Then it will describe the problems inherent in the
file environment and show how the database concept overcame them and set the
stage for a vastly improved information systems environment.

DATA BEFORE DATABASE MANAGEMENT

As we said in Chapter 1, pieces of data are facts in our environment that are
important to us. Usually we have many facts to describe something of interest to us.
For example, let’s consider the facts we might be interested in about an employee
of ours named John Baker. Our company is a sales-oriented company and John
Baker is one of our salespersons. We want to remember that his employee number
(which we will now call his salesperson number) is 137. We are also interested in
the facts that his commission percentage on the sales he makes is 10%, his home
city is Detroit, his home state is Michigan, his office number is 1284, and he was
hired in 1995. There are, of course, reasons that we need to keep track of these facts
about John Baker, such as generating his paycheck every week. It certainly seems
reasonable to collect together all of the facts about Baker that we need and to hold
all of them together. Figure 3.1 shows all of these facts about John Baker presented
in an organized way.

Records and Files

Since we have to generate a paycheck each week for every employee in our
company, not just for Baker, we are obviously going to need a collection of facts
like those in Figure 3.1 for every one of our employees. Figure 3.2 shows a portion
of that collection.

F I G U R E 3.1
Facts about salesperson

Baker

Salesperson Salesperson Office Commission Year of
Number Name City State Number Percentage Hire

137 Baker Detroit MI 1284 10

1995

F I G U R E 3.2
Salesperson file

Salesperson Salesperson Office Commission Year of
Number Name City State Number Percentage Hire

119 Taylor

New York

NY 1211 15 2003

137 Baker Detroit MI 1284 10 1995

186 Adams Dallas TX 1253

15

2001

204 Dickens

Dallas TX 1209 10

1998

255 Lincoln Atlanta GA 1268 20 2003

361 Carlyle Detroit MI 1227 20 2001

420 Green Tucson AZ 1263 10 1993

44 C h a p t e r 3 The Database Management System Concept

C O N C E P T S

I N A C T I O N

3-A MEMPHIS LIGHT, GAS AND WATER

Memphis Light, Gas and Water
(MLGW) is the largest ‘‘three-service’’ (electricity, natu-
ral gas and water) municipal utility system in the United
States. It serves over 400,000 customers in Memphis and
Shelby County, TN, and has 2,600 employees. MLGW is
the largest of the 159 distributors of the federal Tennessee
Valley Authority’s electricity output. It brings in natural
gas via commercial pipelines and it supplies water from
a natural aquifer beneath the city of Memphis.

Like any supplier of electricity, MLGW is particularly
sensitive to electrical outages. It has developed a two-
stage application system to determine the causes of
outages and to dispatch crews to fix them. The first
stage is the Computer-Aided Restoration of Electric
Service (CARES) system, which was introduced in 1996.
Beginning with call-in patterns as customers report
outages, CARES uses automated data from MLGW’s
electric grid, wiring patterns to substations, and other
information, to function as an expert system to determine
the location and nature of the problem. It then feeds
its conclusion to the second-stage Mobile Dispatching
System (MDS), which was introduced in 1999. MDS

‘‘Photo Courtesy of Memphis Light, Gas, and Water Division’’

sends a repairperson to an individual customer’s location
if that is all that has been affected or sends a crew to
a malfunctioning or damaged piece of equipment in the
grid that is affecting an entire neighborhood. There is a
feedback loop in which the repairperson or crew reports
back to indicate whether the problem has been fixed or
a higher-level crew is required to fix it.

The CARES and MDS systems are supported by
an Oracle database running on Hewlett-Packard and
Compaq Alpha Unix platforms. The database includes
a wide range of tables: a Customer Call table has one
record per customer reporting call; an Outage table has
one record per outage; a Transformer table has one
record for each transformer in the grid; a Device table
has records for other devices in the grid. These can
also interface to the Customer Information System, which
has a Customer table with one record for each of the
over 400,000 customers. In addition to its operational
value, CARES and other systems feed a System Reliability
Monitoring database that generates reports on outages
and can be queried to gain further knowledge of outage
patterns for improving the grid.

Data Before Database Management 45

Let’s proceed by revisiting some terminology from Chapter 2, and introducing
some additional terminology along with some additional concepts. What we have
been loosely referring to as a ‘‘thing’’ or ‘‘object’’ in our environment that we want
to keep track of is called an entity. Remember that this is the real physical object or
event, not the facts about it. John Baker, the real, living, breathing person whom you
can go over to and touch, is an entity. A collection of entities of the same type (e.g.,
all the company’s employees) is called an entity set. An attribute is a property of,
a characteristic of, or a fact that we know about an entity. Each characteristic or
property of John Baker, including his salesperson number 137, his name, city of
Detroit, state of Michigan, office number 1284, commission percentage 10, and year
of hire 1995, are all attributes of John Baker. Some attributes have unique values
within an entity set. For example, the salesperson numbers are unique within the
salesperson entity set, meaning each salesperson has a different salesperson number.
We can use the fact that salesperson numbers are unique to distinguish among the
different salespersons.

Using the structure in Figure 3.2, we can define some standard file-structure
terms and relate them to the terms entity, entity set, and attribute. Each row in
Figure 3.2 describes a single entity. In fact, each row contains all the facts that we
know about a particular entity. The first row contains all the facts about salesperson
119, the second row contains all the facts about salesperson 137, and so on. Each
row of a structure like this is called a record. The columns representing the facts
are called fields. The entire structure is called a file. The file in Figure 3.2, which
is about the most basic kind of file imaginable, is often called a simple file or a
simple linear file (linear because it is a collection of records listed one after the
other in a long line). Since the salesperson attribute is unique, the salesperson field
values can be used to distinguish the individual records of the file. Speaking loosely
at this point, the salesperson number field can be referred to as the key field or key
of the file.

Tying together the two kinds of terminology that we have developed, a record
of a file describes an entity, a whole file contains the descriptions of an entire entity
set, and a field of a record contains an attribute of the entity described by that
record. In Figure 3.2, each row is a record that describes an entity, specifically a
single salesperson. The whole file, row by row or record by record, describes each
salesperson in the collection of salespersons. Each column of the file represents a
different attribute of salespersons. At the row or entity level, the salesperson name
field for the third row of the file indicates that the third salesperson, salesperson
186, has Adams as his salesperson name attribute, i.e. he is named Adams.

One last terminology issue is the difference between the terms ‘‘type’’ and
‘‘occurrence.’’ Let’s talk about it in the context of a record. If you look at a file,
like that in Figure 3.2, there are two ways to describe ‘‘a record.’’ One, which is
referred to as the record type, is a structural description of each and every record in
the file. Thus, we would describe the salesperson record type as a record consisting
of a salesperson number field, a salesperson name field, a city field, and so forth.
This is a general description of what any of the salesperson records looks like. The
other way of describing a record is referred to as a record occurrence or a record
instance. A specific record of the salesperson file is a record occurrence or instance.
Thus, we would say that, for example, the set of values {186, Adams, Dallas, TX,
1253, 15, 2001} is an occurrence of the salesperson record type.

46 C h a p t e r 3 The Database Management System Concept

Y O U R

T U R N

3.1 ENTITIES AND ATTRIBUTES

Entities and their attributes are all
around us in our everyday lives. Normally, we don’t stop
to think about the objects or events in our world formally
as entities with their attributes, but they’re there.

QUESTION:

Choose an object in your world that you interact with

frequently. It might be a university, a person, an
automobile, your home, etc. Make a list of some of

the chosen entity’s attributes. Then, generalize them to
‘‘type.’’ For example, you may have a backpack (an
entity) that is green in color (an attribute of that entity).
Generalize that to the entity set of all backpacks and
to the attribute type color. Next, go through the same
exercise for an event in your life, such as taking a
particular exam, your last birthday party, eating dinner
last night, etc.

Basic Concepts in Storing and Retrieving Data

Having established the idea of a file and its records, we can now, in simple terms at
this point, envision a company’s data as a large collection of files. The next step is to
discuss how we might want to access data from these files and otherwise manipulate
the data in them.

Retrieving and Manipulating Data There are four fundamental operations that can
be performed on stored data, whether it is stored in the form of a simple linear file,
such as that of Figure 3.2, or in any other form. They are:

■ Retrieve or Read
■ Insert
■ Delete
■ Update

It is convenient to think of each of these operations as basically involving one
record at a time, although in practice they can involve several records at once, as
we will see later in the book. Retrieving or reading a record means looking at a
record’s contents without changing them. For example, using the Salesperson file
of Figure 3.2, we might read the record for salesperson 204 because we want to
find out what year she was hired. Insertion means adding a new record to the file,
as when a new salesperson is hired. Deletion means deleting a record from the
file, as when a salesperson leaves the company. Updating means changing one or
more of a record’s field values, for example if we want to increase salesperson
420’s commission percentage from 10 to 15. There is clearly a distinction between
retrieving or reading data and the other three operations. Retrieving data allows a
user to refer to the data for some business purpose without changing it. All of the
other three operations involve changing the data. Different topics in this book will
focus on one or another of these operations simply because a particular one of the
four operations may be more important for a particular topic than the others.

One particularly important concept concerning data retrieval is that, while
information systems applications come in a countless number of variations, there
are fundamentally only two kinds of access to stored data that any of them require.

Data Before Database Management 47

These two ways of retrieving data are known as sequential access and direct
access.

Sequential Access The term sequential access means the retrieval of all or a portion
of the records of a file one after another, in some sequence, starting from the
beginning, until all the required records have been retrieved. This could mean all the
records of the file, if that is the goal, or all the records up to some point, such as up
to the point that a record being searched for is found. The records will be retrieved
in some order and there are two possibilities for this. In ‘‘physical’’ sequential
access, the records are retrieved one after the other, just as they are stored on the disk
device (more on these devices later). In ‘‘logical’’ sequential access the records
are retrieved in order based on the values of one or a combination of the fields.

Assuming the records of the Salesperson file of Figure 3.2 are stored on the
disk in the order shown in the figure, if they are retrieved in physical sequence they
will be retrieved in the order shown in the figure. However, if, for example, they
are to be retrieved in logical sequence based on the Salesperson Name field, then
the record for Adams would be retrieved first, followed by the record for Baker,
followed by the record for Carlyle, and so on in alphabetic order. An example of
an application that would require the sequential retrieval of the records of this file
would be the weekly payroll processing. If the company wants to generate a payroll
check for each salesperson in the order of their salesperson numbers, it can very
simply retrieve the records physically sequentially, since that’s the order in which
they are stored on the disk. If the company wants to produce the checks in the order
of the salespersons’ names, it will have to perform a logical sequential retrieval
based on the Salesperson Name field. It can do this either by sorting the records on
the Salesperson Name field or by using an index (see below) that is built on this
field.

We said that sequential access could involve retrieving a portion of the records
of a file. This sense of sequential retrieval usually means starting from the beginning
of the file and searching every record, in sequence, until finding a particular record
that is being sought. Obviously, this could take a long time for even a moderately
large file and so is not a particularly desirable kind of operation, which leads to the
concept of direct access.

Direct Access The other mode of access is direct access. Direct access is the
retrieval of a single record of a file or a subset of the records of a file based on
one or more values of a field or a combination of fields in the file. For example, in
the Salesperson file of Figure 3.2, if we need to retrieve the record for salesperson
204 to find out her year of hire, we would perform a direct access operation on
the file specifying that we want the record with a value of 204 in the Salesperson
Number field. How do we know that we would retrieve only one record? Because
the Salesperson Number field is the unique, key field of the file, there can only be
one record (or none) with any one particular value. Another possibility is that we
want to retrieve the records for all the salespersons with a commission percentage of
10. The subset of the records retrieved would consist of the records for salespersons
137, 204, and 420.

Direct access is a crucial concept in information systems today. If you
telephone a bank with a question about your account, you would not be happy
having to wait on the phone while the bank’s information system performs a
sequential access of its customer file until it finds your record. Clearly this example

48 C h a p t e r 3 The Database Management System Concept

calls for direct access. In fact, the vast majority of information systems operations
that all companies perform today require direct access.

Both sequential access and direct access can certainly be accomplished with
data stored in simple files. But simple files leave a lot to be desired. What is the
concept of database and what are its advantages?

THE DATABASE CONCEPT

The database concept is one of the most powerful, enduring technologies in
the information systems environment. It encompasses a variety of technical and
managerial issues and features that are at the heart of today’s information systems
scene. In order to get started and begin to develop the deep understanding of
database that we seek, we will focus on five issues that establish a set of basic
principles of the database concept:

1. The creation of a datacentric environment in which a company’s data can
truly be thought of as a significant corporate resource. A key feature of this
environment is the ability to share data among those inside and outside of the
company who require access to it.

2. The ability to achieve data integration while at the same time storing data
in a non-redundant fashion. This, alone, is the central, defining feature of the
database approach.

3. The ability to store data representing entities involved in multiple relationships
without introducing data redundancy or other structural problems.

4. The establishment of an environment that manages certain data control issues,
such as data security, backup and recovery, and concurrency control.

5. The establishment of an environment that permits a high degree of data
independence.

Data as a Manageable Resource

Broadly speaking, the information systems environment consists of several
components including hardware, networks, applications software, systems software,
people, and data. The relative degree of focus placed on each of these has varied
over time. In particular, the amount of attention paid to data has undergone a
radical transformation. In the earlier days of ‘‘data processing,’’ most of the time
and emphasis in application development was spent on the programs, as opposed
to on the data and data structures. Hardware was expensive and the size of main
memory was extremely limited by today’s standards. Programming was a new
discipline and there was much to be learned about it in order to achieve the goal
of efficient processing. Standards for effective programming were unknown. In this
environment, the treatment of the data was hardly the highest-priority concern.

At the same time, as more and more corporate functions at the operational,
tactical, and strategic levels became dependent on information systems, data
increasingly became recognized as an important corporate resource. Furthermore,
the corporate community became increasingly convinced that a firm’s data
about its products, manufacturing processes, customers, suppliers, employees,
and competitors could, with proper storage and use, give the firm a significant
competitive advantage.

The Database Concept 49

F I G U R E 3.3
Corporate resources

People

Money Plant &
Equipment

Inventory

Data

0 0 0 0 1 1 0 0
1 0 1 1 1 0 1 1
0 1 1 0 0 1 1 0
1 0 0 0 1 1 1 1

0 0 0 0 1 1 0 0
1 0 1 1 1 0 1 1
0 1 1 0 0 1 1 0
1 0 0 0 1 1 1 1

Money, plant and equipment, inventories, and people are all important
enterprise resources and, indeed, a great deal of effort has always been expended
to manage them. As corporations began to realize that data is also an important
enterprise resource, it became increasingly clear that data would have to be managed
in an organized way, too, Figure 3.3. What was needed was a software utility that
could manage and protect data while providing controlled shared access to it so that
it could fulfill its destiny as a critical corporate resource. Out of this need was born
the database management system.

As we look to the future and look back at the developments of the last few years,
we see several phenomena that emphasize the importance of data and demand its
careful management as a corporate resource. These include reengineering, electronic
commerce, and enterprise resource planning (ERP) systems that have placed an
even greater emphasis on data. In reengineering, data and information systems are
aggressively used to redesign business processes for maximum efficiency. At the
heart of every electronic commerce Web site is a database through which companies
and their customers transact business. Another very important development was that
of enterprise resource planning (ERP) systems, which are collections of application
programs built around a central shared database. ERP systems very much embody
the principles of shared data and of data as a corporate resource.

Data Integration and Data Redundancy

Data integration and data redundancy, each in their own right, are critical issues in
the field of database management.

■ Data integration refers to the ability to tie together pieces of related data within
an information system. If a record in one file contains customer name, address,
and telephone data and a record in another file contains sales data about an item
that the customer has purchased, there may come a time when we want to contact
the customer about the purchased item.

■ Data redundancy refers to the same fact about the business environment being
stored more than once within an information system. Data integration is clearly a

50 C h a p t e r 3 The Database Management System Concept

positive feature of a database management system. Data redundancy is a negative
feature (except for performance reasons under certain circumstances that will be
discussed later in this book).

In terms of the data structures used in database management systems, data
integration and data redundancy are tied together and will be discussed together in
this section of the book.

Data stored in an information system describes the real-world business
environment. Put another way, the data is a reflection of the environment. Over the
years that information systems have become increasingly sophisticated, they and
the data that they contain have revolutionized the ways that we conduct virtually
all aspects of business. But, as valuable as the data is, if the data is duplicated
and stored multiple times within a company’s information systems facilities, it can
result in a nightmare of poor performance, lack of trust in the accuracy of the data,
and a reduced level of competitiveness in the marketplace. Data redundancy and
the problems it causes can occur within a single file or across multiple files. The
problems caused by data redundancy are threefold:

■ First, the redundant data takes up a great deal of extra disk space. This alone can
be quite significant.

■ Second, if the redundant data has to be updated, additional time is needed to do
so since, if done correctly, every copy of the redundant data must be updated.
This can create a major performance issue.

■ Third and potentially the most significant is the potential for data integrity
problems. The term data integrity refers to the accuracy of the data. Obviously,
if the data in an information system is inaccurate, it and the whole information
system are of limited value. The problem with redundant data, whether in a single
file or across multiple files, occurs when it has to be updated (or possibly when
it is first stored). If data is held redundantly and all the copies of the data record
being updated are not all correctly updated to the new values, there is clearly
a problem in data integrity. There is an old saying that has some applicability
here, ‘‘The person with one watch always knows what time it is. The person with
several watches is never quite sure,’’ Figure 3.4.

Data Redundancy Among Many Files Beginning with data redundancy across multiple
files, consider the following situation involving customer names and addresses.
Frequently, different departments in an enterprise in the course of their normal
everyday work need the same data. For example, the sales department, the accounts
receivable department, and the credit department may need customer name and

F I G U R E 3.4
With several watches the correct time
might not be clear

The Database Concept 51

F I G U R E 3.5
Three files with redundant data

Sales file

Customer

Customer

Number Name Address

2746795 John Jones 123 Elm Street

Accounts Receivable file

Customer

Customer
Number Name Address

2746795 John Jones 123 Elm Street

Credit file

Customer Customer

Customer
Customer
Customer
Number Name Address
2746795 John Jones 123 Elm Street

address data. Often, the solution to this multiple need is redundant data. The sales
department has its own stored file that, among other things, contains the customer
name and address, and likewise for the accounts receivable and credit departments,
Figure 3.5.

One day customer John Jones, who currently lives at 123 Elm Street, moves
to 456 Oak Street. If his address is updated in two of the files but not the third, then
the company’s data is inconsistent, Figure 3.6. Two of the files indicate that John
Jones lives at 456 Oak Street but one file still shows him living at 123 Elm Street.
The company can no longer trust its information system. How could this happen?
It could have been a software or a hardware error. But more likely it was because
whoever received the new information and was responsible for updating one or two
of the files simply did not know of the existence of the third. As mentioned earlier,

F I G U R E 3.6
Three files with a data integrity problem

Sales file

Customer Customer
Number Name Address

2746795 John Jones 456 Oak Street

Accounts Receivable file
Customer Customer
Number Name Address
2746795 John Jones 456 Oak Street
Credit file
Customer Customer
Customer
Customer
Customer
Number Name Address
2746795 John Jones 123 Elm Street

52 C h a p t e r 3 The Database Management System Concept

at various times in information systems history it has not been unusual in large
companies for the same data to be held redundantly in sixty or seventy files! Thus,
the possibility of data integrity problems is great.

Multiple file redundancy begins as more a managerial issue than single file
redundancy, but it also has technical components. The issue is managerial to the
extent that a company’s management does not encourage data sharing among
departments and their applications. But it is technical when it comes to the reality
of whether the company’s software systems are capable of providing shared access
to the data without compromising performance and data security.

Data Integration and Data Redundancy Within One File Data redundancy in a single
file results in exactly the same three problems that resulted from data redundancy
in multiple files: wasted storage space, extra time on data update, and the potential
for data integrity problems. To begin developing this scenario, consider Figure 3.7,
which shows two files from the General Hardware Co. information system. General
Hardware is a wholesaler of hardware, tools, and related items. Its customers are
hardware stores, home improvement stores, and department stores, or chains of
such stores. Figure 3.7a shows the Salesperson file, which has one record for each
of General Hardware’s salespersons. Salesperson Number is the unique identifying
‘‘key’’ field and as such is underlined in the figure. Clearly, there is no data
redundancy in this file. There is one record for each salesperson and each individual
fact about a salesperson is listed once in the salesperson’s record.

Figure 3.7b shows General Hardware’s Customer file. Customer Number is
the unique key field. Again, there is no data redundancy, but two questions have

F I G U R E 3.7
General Hardware Company files

(a) Salesperson file

Salesperson Salesperson Commission

Year

Number Name Percentage of Hire

137 Baker 10 1995

186 Adams 15 2001

204 Dickens 10

1998

361 Carlyle 20 2001

(b) Customer file

Customer Customer Salesperson
Number Name Number HQ City

0121 Main St. Hardware 137

New York

0839 Jane’s Stores 186

Chicago

0933 ABC Home Stores 137

Los Angeles

1047 Acme Hardware Store 137 Los Angeles

1525 Fred’s Tool Stores 361

Atlanta

1700 XYZ Stores 361

Washington

1826 City Hardware 137 New York

2198 Western Hardware 204 New York

2267 Central Stores 186 New York

The Database Concept 53

to be answered regarding the Salesperson Number field appearing in this file. First,
why is it there? After all, it seems already to have a good home as the unique
identifying field of the Salesperson file. The Salesperson Number field appears in
the Customer file to record which salesperson is responsible for a given customer
account. In fact, there is a one-to-many relationship between salespersons and
customers. A salesperson can and generally does have several customer accounts,
while each customer is serviced by only one General Hardware salesperson. The
second question involves the data in the Salesperson Number field in the Customer
file. For example, salesperson number 137 appears in four of the records (plus once
in the first record of the Salesperson file!). Does this constitute data redundancy?
The answer is no. For data to be redundant (and examples of data redundancy will be
coming up shortly), the same fact about the business environment must be recorded
more than once. The appearance of salesperson number 137 in the first record of
the Salesperson file establishes 137 as the identifier of one of the salespersons.
The appearance of salesperson number 137 in the first record of the Customer file
indicates that salesperson number 137 is responsible for customer number 0121. This
is a different fact about the business environment. The appearance of salesperson
number 137 in the third record of the Customer file indicates that salesperson
number 137 is responsible for customer number 0933. This is yet another distinct
fact about the business environment. And so on through the other appearances of
salesperson number 137 in the Customer file.

Retrieving data from each of the files of Figure 3.7 individually is
straightforward and can be done on a direct basis if the files are set-up for direct
access. Thus, if there is a requirement to find the name or commission percentage
or year of hire of salesperson number 204, it can be satisfied by retrieving the
record for salesperson number 204 in the Salesperson file. Similarly, if there is a
requirement to find the name or responsible salesperson (by salesperson number!)
or headquarters city of customer number 1525, we simply retrieve the record for
customer number 1525 in the Customer file.

But, what if there is a requirement to find the name of the salesperson
responsible for a particular customer account, say for customer number 1525? Can
this requirement be satisfied by retrieving data from only one of the two files of
Figure 3.7? No, it cannot! The information about which salesperson is responsible
for which customers is recorded only in the Customer file and the salesperson
names are recorded only in the Salesperson file. Thus, finding the salesperson
name will be an exercise in data integration. In order to find the name of the
salesperson responsible for a particular customer, first the record for the customer
in the Customer file would have to be retrieved. Then, using the salesperson number
found in that record, the correct salesperson record can be retrieved from the
Salesperson file to find the salesperson name. For example, if there is a need to
find the name of the salesperson responsible for customer number 1525, the first
operation would be to retrieve the record for customer number 1525 in the Customer
file. As shown in Figure 3.7b, this would yield salesperson number 361 as the
number of the responsible salesperson. Then, accessing the record for salesperson
361 in the Salesperson file in Figure 3.7a determines that the name of the salesperson
responsible for customer 1525 is Carlyle. While it’s true that the data in the record
in the Salesperson file and the data in the record in the Customer file have been
integrated, the data integration process has been awfully laborious.

This kind of custom-made, multicommand, multifile access (which, by the
way, could easily require more than two files, depending on the query and the files

54 C h a p t e r 3 The Database Management System Concept

involved) is clumsy, potentially error prone, and expensive in terms of performance.
While the two files have the benefit of holding data non-redundantly, what is lacking
is a good level of data integration. That is, it is overly difficult to find and retrieve
pieces of data in the two files that are related to each other. For example, customer
number 1525 and salesperson name Carlyle in the two files in Figure 3.7 are related
to each other by virtue of the fact that the two records they are in both include
a reference to salesperson number 361. Yet, as shown above, ultimately finding
the salesperson name Carlyle by starting with the customer number 1525 is an
unacceptably laborious process.

A fair question to ask is, if we knew that data integration was important in
this application environment and if we knew that there would be a frequent need to
find the name of the salesperson responsible for a particular customer, why were
the files structured as in Figure 3.7 in the first place? An alternative arrangement is
shown in Figure 3.8. The single file in Figure 3.8 combines the data in the two files
of Figure 3.7. Also, the Customer Number field values of both are identical.

The file in Figure 3.8 was created by merging the salesperson data from
Figure 3.7a into the records of Figure 3.7b, based on corresponding salesperson
numbers. As a result, notice that the number of records in the file in Figure 3.8
is identical to the number of records in the Customer file of Figure 3.7b. This is
actually a result of the ‘‘direction’’ of the one-to-many relationship in which each
salesperson can be associated with several customers. The data was ‘‘integrated’’
in this merge operation. Notice, for example, that in Figure 3.7b, the record
for customer number 1525 is associated with salesperson number 361. In turn,
in Figure 3.7a, the record for salesperson number 361 is shown to have the name
Carlyle. Those two records were merged, based on the common salesperson number,
into the record for customer number 1525 in Figure 3.8. (Notice, by the way, that the
Salesperson Number field appears twice in Figure 3.8 because it appeared in each
of the files of Figure 3.7. The field values in each of those two fields are identical
in each record in the file in Figure 3.8, which must be the case since it was on those
identical values that the record merge that created the file in Figure 3.8 was based.
That being the case, certainly one of the two Salesperson Number fields in the file
in Figure 3.8 could be deleted without any loss of information.)

The file in Figure 3.8 is certainly well integrated. Finding the name of
the salesperson who is responsible for customer number 1525 now requires a
single record access of the record for customer number 1525. The salesperson
name, Carlyle, is right there in that record. This appears to be the solution to the

F I G U R E 3.8
General Hardware Company combined file

Customer Customer Salesperson Salesperson Salesperson Commission Year
Number Name Number HQ City Number Name Percentage of Hire

0121 Main St. Hardware 137 New York 137 Baker 10 1995

0839 Jane’s Stores 186 Chicago 186 Adams 15 2001

0933 ABC Home Stores 137 Los Angeles 137 Baker 10 1995

1047 Acme Hardware Store 137 Los Angeles 137 Baker 10 1995

1525 Fred’s Tool Stores 361 Atlanta 361 Carlyle 20 2001

1700 XYZ Stores 361 Washington 361 Carlyle 20 2001

1826 City Hardware 137 New York 137 Baker 10 1995

2198 Western Hardware 204 New York

204 Dickens 10 1998

2267 Central Stores 186 New York 186 Adams 15 2001

The Database Concept 55

earlier multifile access problem. Unfortunately, integrating the two files caused
another problem: data redundancy. Notice in Figure 3.8 that, for example, the fact
that salesperson number 137 is named Baker is repeated four times, as are his
commission percentage and year of hire. This is, indeed, data redundancy, as it
repeats the same facts about the business environment multiple times within the
one file. If a given salesperson is responsible for several customer accounts, then
the data about the salesperson must appear in several records in the merged or
integrated file. It would make no sense from a logical or a retrieval standpoint to
specify, for example, the salesperson name, commission percentage, and year of
hire for one customer that the salesperson services and not for another. This would
imply a special relationship between the salesperson and that one customer that
does not exist and would remove the linkage between the salesperson and his other
customers. To be complete, the salesperson data must be repeated for every one of
his customers.

The combined file in Figure 3.8 also illustrates what have come to be referred
to as anomalies in poorly structured files. The problems arise when two different
kinds of data, like salesperson and customer data in this example, are merged into
one file. Look at the record in Figure 3.8 for customer number 2198, Western
Hardware. The salesperson for this customer is Dickens, salesperson number 204.
Look over the table and note that Western Hardware happens to be the only
customer that Dickens currently has. If Western Hardware has gone out of business
or General Hardware has stopped selling to it and they decide to delete the record
for Western Hardware from the file, they also lose everything they know about
Dickens: his commission percentage, his year of hire, even his name associated with
his salesperson number, 204. This situation, which is called the deletion anomaly,
occurs because salesperson data doesn’t have its own file, as in Figure 3.7a. The
only place in the combined file of Figure 3.8 that you can store salesperson data is
in the records with the customers. If you delete a customer and that record was the
only one for that salesperson, the salesperson’s data is gone.

Conversely, in the insertion anomaly, General Hardware can’t record data in
the combined file of Figure 3.8 about a new salesperson the company just hired until
she is assigned at least one customer. After all, the identifying field of the records
of the combined file is Customer Number! Finally, the update anomaly notes that
the redundant data of the combined file, such as Baker’s commission percentage of
10 repeated four times, must be updated each place it exists when it changes (for
example, if Baker is rewarded with an increase to a commission percentage of 15).

There appears to be a very significant tradeoff in the data structures between
data integration and data redundancy. The two files of Figure 3.7 are non-redundant
but have poor data integration. Finding the name of the salesperson responsible for
a particular customer account requires a multicommand, multifile access that can be
slow and error-prone. The merged file of Figure 3.8, in which the data is very well
integrated, eliminates the need for a multicommand, multifile access for this query,
but is highly data redundant. Neither of these situations is acceptable. A poor level
of data integration slows down the company’s information systems and, perhaps, its
business! Redundant data can cause data accuracy and other problems. Yet both the
properties of data integration and of non-redundant data are highly desirable. And,
while the above example appears to show that the two are hopelessly incompatible,
over the years a few—very few—ways have been developed to achieve both goals
in a single data management system. In fact, this concept is so important that it is
the primary defining feature of database management systems:

56 C h a p t e r 3 The Database Management System Concept

A database management system is a software utility for storing and retrieving
data that gives the end-user the impression that the data is well integrated
even though the data can be stored with no redundancy at all.

Any data storage and retrieval system that does not have this property should
not be called a database management system. Notice a couple of fine points in the
above definition. It says, ‘‘data can be stored with no redundancy,’’ indicating that
non-redundant storage is feasible but not required. In certain situations, particularly
involving performance issues, the database designer may choose to compromise
on the issue of data redundancy. Also, it says, ‘‘that gives the end-user the
impression that the data is well integrated.’’ Depending on the approach to database
management taken by the particular database management system, data can be
physically integrated and stored that way on the disk or it can be integrated at the
time that a data retrieval query is executed. In either case, the data will, ‘‘give the
end-user the impression that the data is well integrated.’’ Both of these fine points
will be explored further later in this book.

Multiple Relationships

Chapter 2 demonstrated how entities can relate to each other in unary, binary,
and ternary one-to-one, one-to-many, and many-to-many relationships. Clearly,
a database management system must be able to store data about the entities in
a way that reflects and preserves these relationships. Furthermore, this must be
accomplished in such a way that it does not compromise the fundamental properties
of data integration and non-redundant data storage described above. Consider the
following problems with attempting to handle multiple relationships in simple
linear files, using the binary one-to-many relationship between General Hardware
Company’s salespersons and customers as an example.

First, the Customer file of Figure 3.7 does the job with its Salesperson Number
field. The fact that, for example, salesperson number 137 is associated with four
of the customers (it appears in four of the records) while, for example, customer
number 1826 has only one salesperson associated with it demonstrates that the
one-to-many relationship has been achieved. However, as has already been shown,
the two files of this figure lack an efficient data integration mechanism; i.e., trying to
link detailed salesperson data with associated customer data is laborious. (Actually,
as will be seen later in this book, the structures of Figure 3.7 are quite viable in
the relational DBMS environment. In that case, the relational DBMS software will
handle the data integration requirement. But without that relational DBMS software,
these structures are deficient in terms of data integration.) Also, the combined file
of Figure 3.8 supports the one-to-many relationship but, of course, introduces data
redundancy.

Figure 3.9 shows a ‘‘horizontal’’ solution to the problem. The Salesperson
Number field has been removed from the Customer file. Instead, each record in
the Salesperson file lists all the customers, by customer number, that the particular
salesperson is responsible for. This could conceivably be implemented as one
variable-length field of some sort containing all the associated customer numbers
for each salesperson, or it could be implemented as a series of customer number

The Database Concept 57

F I G U R E 3.9
General Hardware Company combined
files: One-to-many relationship horizontal
variation

(a) Salesperson file

Salesperson Salesperson Commission Year Customer
Number Name Percentage of Hire Numbers

137 Baker 10 1995 0121, 0933, 1047,

1826

186 Adams 15 2001 0839,

2

267

204 Dickens 10 1998

2198

361 Carlyle 20 2001 1525,

1700

(b) Customer file

Customer Customer
Number Name HQ City

0121 Main St. Hardware New York

0839 Jane’s Stores Chicago

0933 ABC Home Stores Los Angeles

1047 Acme Hardware Store Los Angeles

1525 Fred’s Tool Stores Atlanta

1700 XYZ Stores Washington

1826 City Hardware New York

2198 Western Hardware New York

2267 Central Stores New York

fields. While this arrangement does represent the one-to-many relationship, it is
unacceptable for two reasons. One is that the record length could be highly variable
depending on how many customers a particular salesperson is responsible for. This
can be tricky from a space management point of view. If a new customer is added
to a salesperson’s record, the new larger size of the record may preclude its being
stored in the same place on the disk as it came from, but putting it somewhere else
may cause performance problems in future retrievals. The other reason is that once
a given salesperson record is retrieved, the person or program that retrieved it would
have a difficult time going through all the associated customer numbers looking for
the one desired. With simple files like these, the normal expectation is that there
will be one value of each field type in each record (e.g. one salesperson number,
one salesperson name, and so on). In the arrangement in Figure 3.9, the end-user
or supporting software would have to deal with a list of values, i.e. of customer
numbers, upon retrieving a salesperson record. This would be an unacceptably
complex process.

Figure 3.10 shows a ‘‘vertical’’ solution to the problem. In a single file, each
salesperson record is immediately followed by the records for all of the customers
for which the salesperson is responsible. While this does preserve the one-to-many
relationship, the complexities involved in a system that has to manage multiple
record types in a single file make this solution unacceptable, too.

A database management system must be able to handle all of the various
unary, binary, and ternary relationships in a logical and efficient way that does
not introduce data redundancy or interfere with data integration. The database
management system approaches that are in use today all satisfy this requirement. In

58 C h a p t e r 3 The Database Management System Concept

F I G U R E 3.

10

General Hardware Company combined
files: One-to-many relationship vertical
variation

0121

0933

1047

1826

Main St. Hardware

ABC Home Stores

Acme Hardware Store

City Hardware

137

137
137
137
New York
Los Angeles
Los Angeles
New York
2198 Western Hardware 204 New York
361 Carlyle 20 2001
204 Dickens 10 1998
186 Adams 15 2001
137 Baker 10 1995

1525

1700

Fred’s Tool Stores

XYZ Stores

361

361
Atlanta
Washington

0839

2267

Jane’s Stores

Central Stores

186

186
Chicago
New York

particular, the way that the relational approach to database management handles it
will be explained in detail.

Data Control Issues

The people responsible for managing the data in an information systems environment
must be concerned with several data control issues. This is true regardless of which
database management system approach is in use. It is even true if no database
management system is in use, that is, if the data is merely stored in simple files.
Most prominent among these data control issues are data security, backup and
recovery, and concurrency control, Figure 3.11. These are introduced here and will
be covered in more depth later in this book. The reason for considering these data
control issues in this discussion of the essence of the database management system

F I G U R E 3.11
Three data control issues

Concurrency Control

Security Backup and Recovery

The Database Concept 59

concept is that such systems should certainly be expected to handle these issues
frequently for all the data stored in the system’s databases.

Computer security has become a very broad topic with many facets and
concerns. These include protecting the physical hardware environment, defending
against hacker attacks, encrypting data transmitted over networks, educating
employees on the importance of protecting the company’s data, and many more. All
computer security exposures potentially affect a company’s data. Some exposures
represent direct threats to data while others are more indirect. For example, the theft
of transmitted data is a direct threat to data while a computer virus, depending on
its nature, may corrupt programs and systems in such a way that the data is affected
on an incidental or delayed basis. The types of direct threats to data include outright
theft of the data, unauthorized exposure of the data, malicious corruption of the
data, unauthorized updates of the data, and loss of the data. Protecting a company’s
data assets has become a responsibility that is shared by its operating systems,
special security utility software, and its database management systems. All database
management systems incorporate features that are designed to help protect the data
in their databases.

Data can be lost or corrupted in any of a variety of ways, not just from the
data security exposures just mentioned. Entire files, portions of databases, or entire
databases can be lost when a disk drive suffers a massive accidental or deliberate
failure. At the extreme, all of a company’s data can be lost to a disaster such as
a fire, a hurricane, or an earthquake. Hackers, computer viruses, or even poorly
written application programs can corrupt from a few to all of the records of a file
or database. Even an unintentional error in entering data into a single record can
be propagated to other records that use its values as input into the creation of their
values. Clearly, every company (and even every PC user!) must have more than
one copy of every data file and database. Furthermore, some of the copies must be
kept in different buildings, or even different cities, to prevent a catastrophe from
destroying all copies of the data. The process of using this duplicate data, plus
other data, special software, and even specially designed disk devices to recover
lost or corrupted data is known as ‘‘backup and recovery.’’ As a key issue in data
management, backup and recovery must be considered and incorporated within the
database management system environment.

In today’s multi-user environments, it is quite common for two or more users
to attempt to access the same data record simultaneously. If they are merely trying
to read the data without updating it, this does not cause a problem. However, if two
or more users are trying to update a particular record simultaneously, say a bank
account balance or the number of available seats on an airline flight, they run the
risk of generating what is known as a ‘‘concurrency problem.’’ In this situation,
the updates can interfere with each other in such a way that the resulting data values
will be incorrect. This intolerable possibility must be guarded against and, once
again, the database management system must be designed to protect its databases
from such an eventuality.

A fundamental premise of the database concept is that these three data control
issues—data security, backup and recovery, and concurrency—must be managed
by or coordinated with the database management system. This means that when a
new application program is written for the database environment, the programmers
can concentrate on the details of the application and not have to worry about writing
code to manage these data control issues. It means that there is a good comfort
level that the potential problems caused by these issues are under control since

60 C h a p t e r 3 The Database Management System Concept

they are being managed by long-tested components of the DBMS. It means that
the functions are standard for all of the data in the environment, which leads to
easier management and economies of scale in assigning and training personnel to
be responsible for the data. This kind of commonality of control is a hallmark of the
database approach.

Data Independence

In the earlier days of ‘‘data processing,’’ many decisions involving the way that
application programs were written were made in concert with the specific file
designs and the choice of file organization and access method used. The program
logic itself was dependent upon the way in which the data is stored. In fact,
the ‘‘data dependence’’ was often so strong that if for any reason the storage
characteristics of the data had to be changed, the program itself had to be modified,
often extensively. That was a very undesirable characteristic of the data storage
and programming environments because of the time and expense involved in such
efforts. In practice, storage structures sometimes have to change, to reflect improved
storage techniques, application changes, attempts at sharing data, and performance
tuning, to name a few reasons. Thus, it is highly desirable to have a data storage and
programming environment in which as many types of changes in the data structure
as possible would not require changes in the application programs that use them.
This goal of ‘‘data independence’’ is an objective of today’s database management
systems.

DBMS APPROACHES

We have established a set of principles for the database concept and said that a
database management system is a software utility that embodies those concepts. The
next question concerns the nature of a DBMS in terms of how it organizes data and
how it permits its retrieval. Considering that the database concept is such a crucial
component of the information systems environment and that there must be a huge
profit motive tied up with it, you might think that many people have worked on the
problem over the years and come up with many different approaches to designing
DBMSs. It’s true that many very bright people have worked on this problem for a
long time but, interestingly, you can count the number of different viable approaches
that have emerged on the fingers of one hand. In particular, the central issue of
providing a non-redundant data environment that also looks as though it is integrated
is a very hard nut to crack. Let’s just say that we’re fortunate that even a small
number of practical ways to solve this problem have been discovered.

Basically, there are four major DBMS approaches:

■ Hierarchical
■ Network
■ Relational
■ Object-Oriented

The hierarchical and network approaches to database are both called
‘‘navigational’’ approaches because of the way that programs have to ‘‘navigate’’
through hierarchies and networks of data to find the data they need. Both

DBMS Approaches 61

C O N C E P T S
I N A C T I O N

3-B LANDAU UNIFORMS

Landau Uniforms is a premier sup-
plier of professional apparel to the healthcare community,
offering a comprehensive line of healthcare uniforms and
related apparel. Headquartered in Olive Branch, MS, the
company, which dates back to 1938, has continuously
expanded its operations both domestically and interna-
tionally and today includes corporate apparel among
its products. Landau sells its apparel though authorized
dealers throughout the U.S. and abroad.

Controlling Landau’s product flow in its warehouse
is a sophisticated information system that is anchored
in database management. Their order filling system,

‘‘Photo Courtesy of Landau Uniforms’’

implemented in 2001, is called the Garment Sortation
System It begins with taking orders that are then queued
in preparation for ‘‘waves’’ of as many as 80 orders to
be filled simultaneously. Each order is assigned a bin
at the end of a highly automated conveyor line. The
garments for the orders are picked from the shelves and
placed onto the beginning of the conveyor line. Scanning
devices then automatically direct the bar-coded garments
into the correct bin. When an order is completed, it
is boxed and sealed. The box then goes on another
conveyor where it is automatically weighed, a shipping
label is printed and attached to it, and it is routed to one

62 C h a p t e r 3 The Database Management System Concept

of several shipping docks, depending on which shipper is
being used. In addition, a bill is automatically generated
and sent to the customer. In fact, Landau bills its more
sophisticated customers electronically using an electronic
data interchange (EDI) system.

There are two underlying relational databases. The
initial order processing is handled using a DB2 database

running on an IBM ‘‘i’’ series computer. The orders are
passed on to the Garment Sortation System’s Oracle
database running on PCs. The shipping is once again
under the control of the DB2/‘‘i’’ series system. The
relational tables include an order table, a customer table,
a style master table, and, of course, a garment table with
2.4 million records.

of these technologies were developed in the 1960s and, relative to the other
approaches, are somewhat similar in structure. IBM’s Information Management
System (IMS), a DBMS based on the hierarchical approach, was released in 1969.
It was followed in the early 1970s by several network-based DBMSs developed
by such computer manufacturers of the time as UNIVAC, Honeywell, Burroughs,
and Control Data. There was also a network-based DBMS called Integrated Data
Management Store (IDMS) produced by an independent software vendor originally
called Cullinane Systems, which was eventually absorbed into Computer Associates.
These navigational DBMSs, which were suitable only for mainframe computers,
were an elegant solution to the redundancy/integration problem at the time that
they were developed. But they were complex, difficult to work with in many
respects, and, as we said, required a mainframe computer. Now often called ‘‘legacy
systems,’’ some of them interestingly have survived to this very day for certain
applications that require a lot of data and fast data response times.

The relational database approach became commercially viable in about 1980.
After several years of user experimentation, it became the preferred DBMS approach
and has remained so ever since. Chapters 4–8 of this book, as well as portions of later
chapters, are devoted to the relational approach. The object-oriented approach has
proven useful for a variety of niche applications and will be discussed in Chapter 9.
It is interesting to note that some key object-oriented database concepts have found

Y O U R
T U R N

3.2 INTEGRATING DAT

A

The need to integrate data is all
around us, even in our personal lives. We integrate data
many times each day without realizing that that’s what
we’re doing. When we compare the ingredients needed
for a recipe with the food ‘‘inventory’’ in our cupboards,
we are integrating data. When we think about buying
something and relate its price to the money we have in our
wallets or in our bank accounts or to the credit remaining
on our credit cards, we are integrating data. When we
compare our schedules with our children’s schedules and
perhaps those of others with whom we carpool, we are

integrating data. Can you think of other ways in which
you integrate data on a daily basis?

QUESTION:
Consider a medical condition for which you or someone

you know is being treated. Describe the different ways
that you integrate data in taking care of that condition.
Hints: Consider your schedule, your doctors’ schedules,
the amount of prescription medication you have on
hand, the inventory of medication at the pharmacy you
use, and so on.

Questions 63

their way into some of the mainstream relational DBMSs and some are described
as taking a hybrid ‘‘object/relational’’ approach to database.

SUMMARY

There are five major components in the database concept. One is the development of
a datacentric environment that promotes the idea of data being a significant corporate
resource and encourages the sharing of data. Another, which is really the central
premise of database management, is the ability to achieve data integration while
at the same time storing data in a non-redundant fashion. The third, which at the
structural level is actually closely related to the integration/redundancy paradigm,
is the ability to store data representing entities involved in multiple relationships
without introducing redundancy. Another component is the presence of a set of
data controls that address such issues as data security, backup and recovery, and
concurrency control. The final component is that of data independence, the ability
to modify data structures without having to modify programs that access them.

There are basically four approaches to database management: the early
hierarchical and network approaches, the current standard relational approach, and
the specialty object-oriented approach, many features of which are incorporated
into today’s expanded relational database management systems.

KEY TERMS

Attribute
Backup and recovery
Computer security
Concurrency control
Concurrency problem
Corporate resource
Data control issues
Data dependence
Data independence
Data integration
Data integrity problem

Data redundancy
Data retrieval
Data security
Datacentric environment
Direct access
Enterprise resource planning (ERP)

system
Entity
Entity set
Fact
Field

File
Logical sequential access
Manageable resource
Multiple relationships
Physical sequential access
Record
Sequential access
Software utility
Well integrated

QUESTIONS

1. What is data? Do you think the word ‘‘data’’ should
be treated as a singular or plural word? Why?

2. Name some entities and their attributes in a
university environment.

3. Name some entities and attributes in an insurance
company environment.

4. Name soe entities and attributes in a furniture store
environment.

5. What is the relationship between:
a. An entity and a record?

b. An attribute and a field?
c. An entity set and a file?

6. What is the difference between a record type and an
occurrence of that record? Give some examples.

7. Name the four basic operations on stored data. In
what important way is one in particular different
from the other three?

8. What is sequential access? What is direct access?
Which of the two is more important in today’s
business environment? Why?

64 C h a p t e r 3 The Database Management System Concept

9. Give an example of and describe an application that
would require sequential access in:
a. The university environment.
b. The insurance company environment.
c. The furniture store environment.

10. Give an example of and describe an application that
would require direct access in:
a. The university environment.
b. The insurance company environment.
c. The furniture store environment.

11. Should data be considered a true corporate resource?
Why or why not? Compare and contrast data to other
corporate resources (capital, plant and equipment,
personnel, etc.) in terms of importance, intrinsic
value, and modes of use.

12. Defend or refute the following statement: ‘‘Data is
the most important corporate resource because it
describes all of the others.’’

13. What are the two kinds of data redundancy, and
what are the three types of problems that they cause
in the information systems environment?

14. What factors might lead to redundant data across
multiple files? Is the problem managerial or techni-
cal in nature?

15. Describe the apparent tradeoff between data redun-
dancy and data integration in simple linear files.

16. In your own words, describe the key quality of a
DBMS that sets it apart from other data handling
systems.

17. Do you think that the single-file redundancy problem
is more serious, less serious, or about the same as
the multifile redundancy problem? Why?

18. What are the two defining goals of a database
management system?

19. What expectation should there be for a database
management system with regard to handling multi-
ple relationships? Why?

20. What are the problems with the ‘‘horizontal’’ and
‘‘vertical’’ solutions to the handling of multiple
relationships as described in the chapter?

21. What expectation should there be for a database
management system with regard to handling data
control issues such as data security, backup and
recovery, and concurrency control? Why?

22. What would the alternative be if database man-
agement systems were not designed to handle data
control issues such as data security, backup and
recovery, and concurrency control?

23. What is data independence? Why is it desirable?
24. What expectation should there be for a database

management system with regard to data indepen-
dence? Why?

25. What are the four major DBMS approaches? Which
approaches are used the most and least today?

EXERCISES

1. Consider a hospital in which each doctor is
responsible for many patients while each patient
is cared for by just one doctor. Each doctor has a
unique employee number, name, telephone number,
and office number. Each patient has a unique patient
number, name, home address, and home telephone
number.
a. What kind of relationship is there between

doctors and patients?
b. Develop sample doctor and patient data and

construct two files in the style of Figure 3.5
in which to store your sample data.

c. Do any fields have to be added to one or the
other of the two files to record the relationship
between doctors and patients? Explain.

d. Merge these two files into one, in the style of
Figure 3.6. Does this create any problems with
the data? Explain.

2. The Dynamic Chemicals Corp. keeps track of its
customers and its orders. Customers typically have
several outstanding orders while each order was
generated by a single customer. Each customer has a
unique customer number, a customer name, address,
and telephone number. An order has a unique order
number, a date, and a total cost.
a. What kind of relationship is there between

customers and orders?
b. Develop sample customer and order data and

construct two files in the style of Figure 3.5 in
which to store your sample data.

Minicases 65

c. Do any fields have to be added to one or the
other of the two files to record the relationship
between customers and orders? Explain.

d. Merge these two files into one, in the style of
Figure 3.6. Does this create any problems with
the data? Explain.

MINICASES

1. Answer the following questions based on the following
Happy Cruise Lines’ data.

(a) Ship table

Ship Ship Year Weight
Number Name Built (Tons)

005 Sea Joy 1999 80,000

009 Ocean IV 2003 75,000

012 Prince Al 2004 90,000

020 Queen Shirley 1999 80,000

(b) Crew Member table

Sailor Sailor Ship Home Job

Number Name Number Country Title

00536 John Smith 009 USA Purser

00732 Ling Chang 012 China Engineer

06988 Maria Gonzalez 020 Mexico Purser

16490 Prashant Kumar 005 India Navigator

18535 Alan Jones 009 UK Cruise Director

20254 Jane Adams 012 USA Captain

23981 Rene Lopez 020 Philippines Captain

27467 Fred Jones 020 UK Waiter

27941 Alain DuMont 009 France Captain

28184 Susan Moore 009 Canada Wine Steward

31775 James Collins 012 USA Waiter

32856 Sarah McLachlan 012 Ireland Cabin Steward

a. Regarding the Happy Cruise Lines Crew Member
file.

i. Describe the file’s record type.
ii. Show a record occurrence.

iii. Describe the set or range of values that the Ship
Number field can take.

iv. Describe the set or range of values that the
Home Country field can take.

b. Assume that the records of the Crew Memberfile
are physically stored in the order shown.

i. Retrieve all of the records of the file physically
sequentially.

ii. Retrieve all of the records of the file logically
sequentially based on the Sailor Name field.

iii. Retrieve all of the records of the file logi-
cally sequentially based on the Sailor Number
field.

iv. Retrieve all of the records of the file logi-
cally sequentially based on the Ship Number
field.

v. Perform a direct retrieval of the records with a
Sailor Number field value of 27467.

vi. Perform a direct retrieval of the records with a
Ship Number field value of 020.

vii. Perform a direct retrieval of the records with a
Job Title field value of Captain.

c. The value 009 appears as a ship number once in the
Ship file and four times in the Crew Member file.
Does this constitute data redundancy? Explain.

d. Merge the Ship and Crew Member files based on
the common ship number field (in a manner similar
to Figure 3.8 for the General Hardware database).
Is the merged file an improvement over the two
separate files in terms of:

i. Data redundancy? Explain.
ii. Data integration? Explain.

e. Explain why the Ship Number field is in the Crew
Member file.

f. Explain why ship number 012 appears three times
in the Crew Member file.

g. How many files must be accessed to find:
i. The year that ship number 012 was built?

ii. The home country of sailor number 27941?
iii. The name of the ship on which sailor number

18535 is employed?
h. Describe the procedure for finding the weight of the

ship on which sailor number 00536 is employed.
i. What is the mechanism for recording the one-to-

many relationship between crew members and ships
in the Happy Cruise Lines database above?

66 C h a p t e r 3 The Database Management System Concept

2. Answer the following questions based on the following
Super Baseball League data.

(a) TEAM file.

Team Team
Number Name City Manager

137 Eagles Orlando Smith

275 Cowboys San Jose Jones

294 Statesmen Springfield Edwards

368 Pandas El Paso

Adams

422 Sharks Jackson Vega

(b) PLAYER file.

Player Player Team
Number Name Age Position Number

1209 Steve Marks 24 Catcher 294

1254 Roscoe Gomez 19 Pitcher 422

1536 Mark Norton 32 First Baseman 368

1953 Alan Randall 24 Pitcher 137

2753 John Harbor 22 Shortstop 294

2843 John Yancy 27 Center Fielder 137

3002 Stuart Clark 20 Catcher 422

3274 Lefty Smith 31 Third Baseman 137

3388 Kevin Taylor 25 Shortstop 294

3740 Juan Vidora 25 Catcher 368

a. Regarding the Super Baseball League Player file
shown below.

i. Describe the file’s record type.
ii. Show a record occurrence.

iii. Describe the set or range of values that the
Player Number field can take.

b. Assume that the records of the Player file are
physically stored in the order shown.

i. Retrieve all of the records of the file physically
sequentially.

ii. Retrieve all of the records of the file logically
sequentially based on the Player Name field.

iii. Retrieve all of the records of the file logically
sequentially based on the Player Number field.

iv. Retrieve all of the records of the file logically
sequentially based on the Team Number field.

v. Perform a direct retrieval of the records with a
Player Number field value of 3834.

vi. Perform a direct retrieval of the records with a
Team Number field value of 20.

vii. Perform a direct retrieval of the records with an
Age field value of 24.

c. The value 294 appears as a team number once in the
Team file and three times in the Player file. Does
this constitute data redundancy? Explain.

d. Merge the Team and Player files based on the
common Team Number field (in a manner similar
to Figure 3.8 for the General Hardware database).
Is the merged file an improvement over the two
separate tables in terms of:

i. Data redundancy? Explain.
ii. Data integration? Explain.

e. Explain why the Team Number field is in the Player
file.

f. Explain why team number 422 appears twice in the
Player file.

g. How many files must be accessed to find:
i. The age of player number 1953?

ii. The name of the team on which player number
2288 plays?

iii. The number of the team on which player number
2288 plays?

h. Describe the procedure for finding the name of the
city in which player number 3002 is based.

i. What is the mechanism for recording the one-to-
many relationship between players and teams in the
Super Baseball League database, above?

C H A P T E R 4

RELATIONAL DATA
RETRIEVAL: SQL

A s we move forward into the discussion of database management systems, we
will cover a wide range of topics and skills including how to design databases,

how to modify database designs to improve performance, how to organize corporate
departments to manage databases, and others. But first, to whet your appetites for what
is to come, we’re going to dive right into one of the most intriguing aspects of database
management: retrieving data from relational databases using the industry-standard SQL
database management language.

Note: Some instructors may prefer to cover relational data retrieval with SQL
after logical database design, Chapter 7, or after physical database design,
Chapter 8. This chapter, Chapter 4 on relational data retrieval with SQL, is
designed to work just as well in one of those positions as it is here.

OBJECTIVES

■ Write SQL SELECT commands to retrieve relational data using a variety of
operators including GROUP BY, ORDER BY, and the built-in functions AVG,
SUM, MAX, MIN, COUNT.

■ Write SQL SELECT commands that join relational tables.
■ Write SQL SELECT subqueries.
■ Describe a strategy for writing SQL SELECT statements.
■ Describe the principles of a relational query optimizer.

68 C h a p t e r 4 Relational Data Retrieval: SQL

CHAPTER OUTLINE

Introduction
Data Retrieval with the SQL SELECT

Command
Introduction to the SQL SELECT

Command

Basic Functions

Built-In Functions

Grouping Rows

The Join

Subqueries

A Strategy for Writing SQL
SELECT Commands

Example: Good Reading Book Stores
Example: World Music Association
Example: Lucky Rent-A-Car
Relational Query Optimizer

Relational DBMS Performance

Relational Query Optimizer Concepts

Summary
INTRODUCTION

There are two aspects of data management: data definition and data manipulation.
Data definition, which is operationalized with a data definition language (DDL),
involves instructing the DBMS software on what tables will be in the database,
what attributes will be in the tables, which attributes will be indexed, and so
forth. Data manipulation refers to the four basic operations that can and must be
performed on data stored in any DBMS (or in any other data storage arrangement,
for that matter): data retrieval, data update, insertion of new records, and deletion
of existing records. Data manipulation requires a special language with which users
can communicate data manipulation commands to the DBMS. Indeed, as a class,
these are known as data manipulation languages (DMLs).

A standard language for data management in relational databases, known as
Structured Query Language or SQL, was developed in the early 1980s. SQL
incorporates both DDL and DML features. It was derived from an early IBM
research project in relational databases called ‘‘System R.’’ SQL has long since
been declared a standard by the American National Standards Institute (ANSI) and
by the International Standards Organization (ISO). Indeed, several versions of the
standards have been issued over the years. Using the standards, many manufacturers
have produced versions of SQL that are all quite similar, at least at the level at which
we will look at SQL in this book. These SQL versions are found in such mainstream
DBMSs as DB2, Oracle, MS Access, Informix, and others. SQL in its various imple-
mentations is used very heavily in practice today by companies and organizations
of every description, Advance Auto Parts being one of countless examples.

SQL is a comprehensive database management language. The most interesting
aspect of SQL and the aspect that we want to explore in this chapter is its rich
data retrieval capability. The other SQL data manipulation features, as well as the
SQL data definition features, will be considered in the database design chapters that
come later in this book.

DATA RETRIEVAL WITH THE SQL SELECT COMMAND

Introduction to the SQL SELECT Command

Data retrieval in SQL is accomplished with the SELECT command. There are a few
fundamental ideas about the SELECT command that you should understand before
looking into the details of using it. The first point is that the SQL SELECT command

Data Retrieval with the SQL SELECT Command 69

C O N C E P T S
I N A C T I O N

4-A ADVANCE AUTO PARTS

Advance Auto Parts is the second
largest retailer of automotive parts and accessories in the
U. S. The company was founded in 1932 with three stores
in Roanoke, VA, where it is still headquartered today. In
the 1980s, with fewer than 175 stores, the company
developed an expansion plan that brought it to over 350
stores by the end of 1993. It has rapidly accelerated its
expansion since then and, with mergers and acquisitions,
now has more than 2,400 stores and over 32,000
employees throughout the United States. Advance Auto
Parts sells over 250,000 automotive components. Its
innovative ‘‘Parts Delivered Quickly’’ (PDQ) system, which
was introduced in 1982, allows its customers access to
this inventory within 24 hours.

One of Advance Auto Parts’ key database appli-
cations, its Electronic Parts Catalog, gives the company
an important competitive advantage. Introduced in the
early 1990s and continually upgraded since then, this
system allows store personnel to look up products they
sell based on the customer’s vehicle type. The system’s
records include part descriptions, images, and drawings.

Photo Courtesy of Advance Auto Parts

Once identified, store personnel pull an item from the
store’s shelves if it’s in stock. If it’s not in stock, then
using the system they send out a real-time request for the
part to the home office to check on the part’s warehouse
availability. Within minutes the part is picked at a regional
warehouse and it’s on its way. In addition to its in-store
use, the system is used by the company’s purchasing and
other departments.

The system runs on an IBM mid-range system at
company headquarters and is built on the SQL Server
DBMS. Parts catalog data, in the form of updates,
is downloaded weekly from this system to a small
server located in each store. Additional data retrieval
at headquarters is accomplished with SQL. The 35-table
database includes a Parts table with 2.5 million rows
that accounts not only for all of the items in inventory
but for different brands of the same item. There is also
a Vehicle table with 31,000 records. These two lead to
a 45-million-record Parts Application table that describes
which parts can be used in which vehicles.

70 C h a p t e r 4 Relational Data Retrieval: SQL

is not the same thing as the relational algebra Select operator discussed in
Chapter 5. It’s a bit unfortunate that the same word is used to mean two different
things, but that’s the way it is. The fact is that the SQL SELECT command is
capable of performing relational Select, Project, and Join operations singly or in
combination, and much more

SQL SELECT commands are considered, for the most part, to be ‘‘declarative’’
rather than ‘‘procedural’’ in nature. This means that you specify what data you are
looking for rather than provide a logical sequence of steps that guide the system in
how to find the data. Indeed, as we will see later in this chapter, the relational DBMS
analyzes the declarative SQL SELECT statement and creates an access path, a
plan for what steps to take to respond to the query. The exception to this, and the
reason for the qualifier ‘‘for the most part’’ at the beginning of this paragraph, is
that a feature of the SELECT command known as ‘‘subqueries’’ permits the user to
specify a certain amount of logical control over the data retrieval process.

Another point is that SQL SELECT commands can be run in either a ‘‘query’’
or an ‘‘embedded’’ mode. In the query mode, the user types the command at a
workstation and presses the Enter key. The command goes directly to the relational
DBMS, which evaluates the query and processes it against the database. The
result is then returned to the user at the workstation. Commands entered this way
can normally also be stored and retrieved at a later time for repetitive use. In
the embedded mode, the SELECT command is embedded within the lines of a
higher-level language program and functions as an input or ‘‘read’’ statement for
the program. When the program is run and the program logic reaches the SELECT
command, the program executes the SELECT. The SELECT command is sent to
the DBMS which, as in the query-mode case, processes it against the database and
returns the results, this time to the program that issued it. The program can then use
and further process the returned data. The only tricky part to this is that traditional
higher-level language programs are designed to retrieve one record at a time. The
result of a relational retrieval command is itself, a relation. A relation, if it consists
of a single row, can resemble a record, but a relation of several rows resembles, if
anything, several records. In the embedded mode, the program that issued the SQL
SELECT command and receives the resulting relation back, must treat the rows of
the relation as a list of records and process them one at a time.

SQL SELECT commands can be issued against either the actual, physical
database tables or against a ‘‘logical view’’ of one table or of several joined tables.
Good business practice dictates that in the commercial environment, SQL SELECT
commands should be issued against such logical views rather than directly against
the base tables. As we will see later in this book, this is a simple but effective
security precaution.

Finally, the SQL SELECT command has a broad array of features and options
and we will only cover some of them at this introductory level. But what is also
very important is that our discussion of the SELECT command and the features that
we will cover will work in all of the major SQL implementations, such as Oracle,
MS Access, SQL Server, DB2, Informix, and so on, possibly with minor syntax
variations in some cases.

Basic Functions

The Basic SELECT Format In the simplest SELECT command, we will indicate from
which table of the database we want to retrieve data, which rows of that table we

Data Retrieval with the SQL SELECT Command 71

are interested in, and which attributes of those rows we want to retrieve. The basic
format of such a SELECT statement is:

SELECT FROM

WHERE;

We will illustrate the SQL SELECT command with the General Hardware
Co. database of Figure 4.1, which is derived from the General Hardware

F I G U R E 4.1
The General Hardware Company relational
database

(a) SALESPERSON table

SPNUM SPNAME COMMPERCT YEARHIRE OFFNUM

137 Baker 10 1995 1284

186 Adams 15 2001 1253

204 Dickens 10 1998 1209

361 Carlyle 20 2001 1227

(b)

CUSTOMER table

CUSTNUM CUSTNAME SPNUM

HQCITY

0121 Main St. Hardware 137 New York

0839 Jane’s Stores 186 Chicago

0933 ABC Home Stores 137 Los Angeles

1047 Acme Hardware Store 137 Los Angeles

1525 Fred’s Tool Stores 361 Atlanta

1700 XYZ Stores 361 Washington

1826 City Hardware 137 New York
2198 Western Hardware 204 New York

2267 Central Stores 186 New York

(c) CUSTOMER EMPLOYEE table

CUSTNUM EMPNUM EMPNAME TITLE

0121 27498 Smith Co-Owner

0121 30441 Garcia Co-Owner

0933 25270 Chen VP Sales

0933 30441 Levy Sales Manager

0933 48285 Morton President

1525 33779 Baker Sales Manager

2198 27470 Smith President

2198 30441 Jones VP Sales

2198 33779 Garcia VP Personnel

2198 35268 Kaplan Senior Accountant

(Continues)

72 C h a p t e r 4 Relational Data Retrieval: SQL

F I G U R E 4.1 (Continued)
The General Hardware Company relational
database

(d) PRODUCT table

PRODNUM PRODNAME UNITPRICE

16386 Wrench 12.95

19440 Hammer 17.50

21765 Drill 32.99

24013 Saw 26.25

26722 Pliers 11.50

(e) SALES table

SPNUM PRODNUM QUANTITY

137 19440 473

137 24013

170

137 26722 688

186 16386 1745

186 19440 2529

186 21765 1962

186 24013 3071

204 21765

809

204 26722 734

361 16386 3729

361 21765

3110

361 26722

2738

(f) OFFICE Table

OFFNUM TELEPHONE SIZE

1253 901-555-4276 1

20

1227 901-555-0364 100

1284 901-555-7335 120

1209 901-555-3108 95

entity-relationship diagram of Figure 2.9. If you have not as yet covered the
database design chapters in this book, just keep in mind that some of the columns
are present to tie together related data from different tables, as discussed in Chapter
3. For example, the SPNUM column in the CUSTOMER table is present to tie
together related salespersons and customers.

As is traditional with SQL, the SQL statements will be shown in all capital
letters, except for data values taken from the tables. Note that the attribute names in
Figure 4.1 have been abbreviated for convenience and set in capital letters to make
them easily recognizable in the SQL statements. Also, spaces in the names have
been removed. Using the General Hardware database, an example of a simple query
that demonstrates the basic SELECT format is:

‘‘Find the commission percentage and year of hire of salesperson number 186.’’

Data Retrieval with the SQL SELECT Command 73

The SQL statement to accomplish this would be:

SELECT COMMPERCT, YEARHIRE

FROM

SALESPERSON

WHERE SPNUM=186;

How is this command constructed? The desired attributes are listed in the SELECT
clause, the required table is listed in the FROM clause, and the restriction or
predicate indicating which row(s) is involved is shown in the WHERE clause in
the form of an equation. Notice that SELECT statements always end with a single
semicolon (;) at the very end of the entire statement.

The result of this statement is:

COMMPERCT YEARHIRE

15 2001

As is evident from this query, an attribute like SPNUM that is used to search
for the required rows, also known as a ‘‘search argument,’’ does not have to appear
in the query result, as long as its absence does not make the result ambiguous,
confusing, or meaningless.

To retrieve the entire record for salesperson 186, the statement would change
to:

SELECT *
FROM SALESPERSON
WHERE SPNUM=186;

resulting in:

SPNUM SPNAME COMMPERCT YEARHIRE OFFNUM
186 Adams 15 2001 1253

The ‘‘*’’ in the SELECT clause indicates that all attributes of the selected
row are to be retrieved. Notice that this retrieval of an entire row of the table is, in
fact, a relational Select operation (see Chapter 5)! A relational Select operation can
retrieve one or more rows of a table, depending, in this simple case, on whether
the search argument is a unique or non-unique attribute. The search argument is
non-unique in the following query:

‘‘List the salesperson numbers and salesperson names of those salespersons
who have a commission percentage of 10.’’

SELECT SPNUM,

SPNAME

FROM SALESPERSON

WHERE COMMPERCT=10;

which

results in:

SPNUM SPNAME

137 Baker

204

Dickens

74 C h a p t e r 4 Relational Data Retrieval: SQL

The SQL SELECT statement can also be used to accomplish a relational
Project operation. This is a vertical slice through a table involving all rows and
some attributes. Since all of the rows are included in the Project operation, there
is no need for a WHERE clause to limit which rows of the table are included. For
example,

‘‘List the salesperson number and salesperson name of all of the salespersons.’’

SELECT SPNUM, SPNAME

FROM SALESPERSON;

results in:
SPNUM SPNAME
137 Baker

186 Adams

204 Dickens

361

Carlyle

To retrieve an entire table, that is to design an SQL SELECT statement that
places no restrictions on either the rows or the attributes, you would issue:

SELECT *
FROM SALESPERSON;

and have as the result:

SPNUM SPNAME COMMPERCT YEARHIRE OFFNUM
137 Baker 10 1995 1284
186 Adams 15 2001 1253
204 Dickens 10 1998 1209
361 Carlyle 20 2001 1227

Comparisons In addition to equal (=), the standard comparison operators, greater
than (>), less than (<), greater than or equal to (>=), less than or equal to (<=), and not equal to (<>) can be used in the WHERE clause.

‘‘List the salesperson numbers, salesperson names, and commission percentages
of the salespersons whose commission percentage is less than 12.’’

SELECT SPNUM, SPNAME, COMMPERCT
FROM SALESPERSON
WHERE COMMPERCT<12;

This results in:

SPNUM SPNAME COMMPERCT

137 Baker 10

204 Dickens 10

Data Retrieval with the SQL SELECT Command 75

As another example:

‘‘List the customer numbers and headquarters cities of the customers that have
a customer number of at least 1700.’’

SELECT CUSTNUM, HQCITY
FROM

CUSTOMER

WHERE CUSTNUM>=1700;

results in:

CUSTNUM HQCITY

1700 Washington

1826 New York

2198 New York

2267 New York

ANDs and ORs Frequently, there is a need to specify more than one limiting
condition on a table’s rows in a query. Sometimes, for a row to be included in
the result it must satisfy more than one condition. This requires the Boolean AND
operator. Sometimes a row can be included if it satisfies one of two or more
conditions. This requires the Boolean OR operator.

AND An example in which two conditions must be satisfied is:
‘‘List the customer numbers, customer names, and headquarters cities of the

customers that are headquartered in New York and that have a customer number
higher than 1500.’’

SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE HQCITY=‘New York’
AND CUSTNUM>1500;

resulting in:

CUSTNUM CUSTNAME HQCITY

1826 City Hardware New York
2198 Western Hardware New York
2267 Central Stores New York

Notice that customer number 0121, which is headquartered in New York, was
not included in the results because it failed to satisfy the condition of having a
customer number greater than 1500. With the AND operator, it had to satisfy both
conditions to be included in the result.

OR To look at the OR operator, let’s change the last query to:
‘‘List the customer numbers, customer names, and headquarters cities of the

customers that are headquartered in New York or that have a customer number
higher than 1500.’’

76 C h a p t e r 4 Relational Data Retrieval: SQL

SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE HQCITY=‘New York’
OR CUSTNUM>1500;

results in:
CUSTNUM CUSTNAME HQCITY
0121 Main St. Hardware New York
1525 Fred’s Tool Stores Atlanta
1700 XYZ Stores Washington
1826 City Hardware New York
2198 Western Hardware New York
2267 Central Stores New York

Notice that the OR operator really means one or the other or both. Customer
0121 is included because it is headquartered in New York. Customers 1525 and 1700
are included because they have customer numbers higher than 1500. Customers
1826, 2198, and 2267 are included because they satisfy both conditions.

Both AND and OR What if both AND and OR are specified in the same WHERE
clause? AND is said to be ‘‘higher in precedence’’ than OR, and so all ANDs are
considered before any ORs are considered. The following query, which has to be
worded very carefully, illustrates this point:

‘‘List the customer numbers, customer names, and headquarters cities of the
customers that are headquartered in New York or that satisfy the two conditions
of having a customer number higher than 1500 and being headquartered in
Atlanta.’’

SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE HQCITY=‘New York’
OR CUSTNUM>1500
AND HQCITY=‘Atlanta’;

The result of this query is:

CUSTNUM CUSTNAME HQCITY
0121 Main St. Hardware New York
1525 Fred’s Tool Stores Atlanta
1826 City Hardware New York
2198 Western Hardware New York
2267 Central Stores New York

Notice that since the AND is considered first, one way for a row to qualify
for the result is if its customer number is greater than 1500 and its headquarters city
is Atlanta. With the AND taken first, it’s that combination or the headquarters city
has to be New York. Considering the OR operator first would change the whole

Data Retrieval with the SQL SELECT Command 77

complexion of the statement. The best way to deal with this, especially if there are
several ANDs and ORs in a WHERE clause, is by using parentheses. The rule is that
anything in parentheses is done first. If the parentheses are nested, then whatever
is in the innermost parentheses is done first and then the system works from there
towards the outermost parentheses. Thus, a ‘‘safer’’ way to write the last SQL
statement would be:

SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE HQCITY=‘New York’
OR (CUSTNUM>1500
AND HQCITY=‘Atlanta’);

If you really wanted the OR to be considered first, you could force it by writing
the query as:

SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE (HQCITY=‘New York’
OR CUSTNUM>1500)
AND HQCITY=‘Atlanta’;

This would mean that, with the AND outside of the parentheses, both of two
conditions have to be met for a row to qualify for the results. One condition is that
the headquarters city is New York or the customer number is greater than 1500.
The other condition is that the headquarters city is Atlanta. Since for a given row,
the headquarters city can’t be both Atlanta and New York, the situation looks grim.
But, in fact, customer number 1525 qualifies. Its customer number is greater than
1500, which satisfies the OR of the first of the two conditions, and its headquarters
city is Atlanta, which satisfies the second condition. Thus, both conditions are met
for this and only this row.

BETWEEN, IN, and LIKE BETWEEN, IN, and LIKE are three useful operators.
BETWEEN allows you to specify a range of numeric values in a search. IN allows
you to specify a list of character strings to be included in a search. LIKE allows you
to specify partial character strings in a ‘‘wildcard’’ sense.

BETWEEN Suppose that you want to find the customer records for those customers
whose customer numbers are between 1000 and 1700 inclusive (meaning that both
1000 and 1700, as well as all numbers in between them, are included). Using the
AND operator, you could specify this as:

SELECT *
FROM CUSTOMER
WHERE (CUSTNUM>=1000
AND CUSTNUM>=1700);

Or, you could use the BETWEEN operator and specify it as:

SELECT *
FROM CUSTOMER
WHERE CUSTNUM BETWEEN 1000 AND 1700;

78 C h a p t e r 4 Relational Data Retrieval: SQL

With either way of specifying it, the result would be:

CUSTNUM CUSTNAME SPNUM HQCITY
1047 Acme Hardware Store 137 Los Angeles
1525 Fred’s Tool Stores 361 Atlanta
1700 XYZ Stores 361 Washington

IN Suppose that you want to find the customer records for those customers
headquartered in Atlanta, Chicago, or Washington. Using the OR operator, you
could specify this as:

SELECT *
FROM CUSTOMER
WHERE (HQCITY=‘Atlanta’
OR HQCITY=‘Chicago’
OR HQCITY=‘Washington’);

Or, you could use the IN operator and specify it as:

SELECT *
FROM CUSTOMER
WHERE HQCITY IN (‘Atlanta’, ‘Chicago’, ‘Washington’);

With either way of specifying it, the result would be:
CUSTNUM CUSTNAME SPNUM HQCITY
0839 Jane’s Stores 186 Chicago
1525 Fred’s Tool Stores 361 Atlanta
1700 XYZ Stores 361 Washington

LIKE Suppose that you want to find the customer records for those customers whose
names begin with the letter ‘‘A’’. You can accomplish this with the LIKE operator
and the ‘‘%’’ character used as a wildcard to represent any string of characters.
Thus, ‘A%’ means the letter ‘‘A’’ followed by any string of characters, which is the
same thing as saying ‘any word that begins with ‘‘A’’.’

SELECT *
FROM CUSTOMER
WHERE CUSTNAME LIKE ‘A%’;

The result would be:

CUSTNUM CUSTNAME SPNUM HQCITY
0933 ABC Home Stores 137 Los Angeles
1047 Acme Hardware Store 137 Los Angeles

Note that, unlike BETWEEN and IN, there is no easy alternative way in SQL
of accomplishing what LIKE can do.

In a different kind of example, suppose that you want to find the customer
records for those customers whose names have the letter ‘‘a’’ as the second letter of
their names. Could you specify ‘%a%’? No, because the ‘%a’ portion of it would

Data Retrieval with the SQL SELECT Command 79

mean any number of letters followed by ‘‘a’’, which is not what you want. In
order to make sure that there is just one character followed by ‘‘a’’, which is the
same thing as saying that ‘‘a’’ is the second letter, you would specify ‘_a%’. The
‘‘_’’ wildcard character means that there will be exactly one letter (any one letter)
followed by the letter ‘‘a’’. The ‘‘%’’, as we already know, means that any string
of characters can follow afterwards.

SELECT *
FROM CUSTOMER
WHERE CUSTNAME LIKE ‘_a%’;

The result would be:
CUSTNUM CUSTNAME SPNUM HQCITY
0121 Main St. Hardware 137 New York
0839 Jane’s Stores 186 Chicago

Notice that both the words ‘‘Main’’ and ‘‘Jane’s’’ have ‘‘a’’ as their second
letter. Also notice that, for example, customer number 2267 was not included in the
result. Its name, ‘‘Central Stores’’, has an ‘‘a’’ in it but it is not the second letter
of the name. Again, the single ‘‘_’’ character in the operator LIKE ‘_a%’ specifies
that there will be one character followed by ‘‘a’’. If the operator had been LIKE
‘%a%’, then Central Stores would have been included in the result.

Filtering the Results of an SQL Query Two ways to modify the results of an SQL SELECT
command are by the use of DISTINCT and the use of ORDER BY. It is important
to remember that these two devices do not affect what data is retrieved from the
database but rather how the data is presented to the user.

DISTINCT There are circumstances in which the result of an SQL query may contain
duplicate items and this duplication is undesirable. Consider the following query:

‘‘Which cities serve as headquarters cities for General Hardware customers?’’

This could be taken as a simple relational Project that takes the HQCITY column of
the CUSTOMER table as its result. The SQL command would be:

SELECT HQCITY
FROM CUSTOMER;

which results in:

HQCITY New York

Chicago
Los Angeles
Los Angeles
Atlanta
Washington
New York
New York
New York

80 C h a p t e r 4 Relational Data Retrieval: SQL

Technically, this is the correct result, but why is it necessary to list New York
four times or Los Angeles twice? Not only is it unnecessary to list them more than
once, but doing so produces unacceptable clutter. Based on the way the query was
stated, the result should have each city listed once. The DISTINCT operator is used
to eliminate duplicate rows in a query result. Reformulating the SELECT statement
as:

SELECT DISTINCT HQCITY
FROM CUSTOMER;

results in:
HQCITY
New York
Chicago
Los Angeles
Atlanta
Washington

ORDER BY The ORDER BY clause simply takes the results of an SQL query and
orders them by one or more specified attributes. Consider the following query:

‘‘Find the customer numbers, customer names, and headquarters cities of
those customers with customer numbers greater than 1000. List the results in
alphabetic order by headquarters cities.’’

SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE CUSTNUM>1000
ORDER BY HQCITY;

This results in:
CUSTNUM CUSTNAME HQCITY
1525 Fred’s Tool Stores Atlanta
1047 Acme Hardware Store Los Angeles
1826 City Hardware New York
2198 Western Hardware New York
2267 Central Stores New York
1700 XYZ Stores Washington

If you wanted to have the customer names within the same city alphabetized,
you would write:

SELECT CUSTNUM, CUSTNAME, HQCITY
FROM CUSTOMER
WHERE CUSTNUM>1000
ORDER BY HQCITY, CUSTNAME;

Data Retrieval with the SQL SELECT Command 81

This results in:
CUSTNUM CUSTNAME HQCITY
1525 Fred’s Tool Stores Atlanta
1047 Acme Hardware Store Los Angeles
2267 Central Stores New York
1826 City Hardware New York
2198 Western Hardware New York
1700 XYZ Stores Washington

The default order for ORDER BY is ascending. The clause can include the
term ASC at the end to make ascending explicit or it can include DESC for
descending order.

Built-In Functions

A number of so-called ‘‘built-in functions’’ give the SQL SELECT command
additional capabilities. They involve the ability to perform calculations based on
attribute values or to count the number of rows that satisfy stated criteria.

AVG and SUM Recall that the SALES table shows the lifetime quantity of particular
products sold by particular salespersons. For example, the first row indicates that
Salesperson 137 has sold 473 units of Product Number 19440 dating back to when
she joined the company or when the product was introduced. Consider the following
query:

‘‘Find the average number of units of the different products that Salesperson
137 has sold (i.e., the average of the quantity values in the first three records of
the SALES table).’’

Using the AVG operator, you would write:

SELECT

AVG(QUANTITY)

FROM

SALES

WHERE SPNUM=137;

and the result would be:

AVG(QUANTITY)

443.67

To find the total number of units of all products that she has sold, you would
use the SUM operator and write:

SELECT

SUM(QUANTITY)

FROM SALES
WHERE SPNUM=137;

and the result would be:
SUM(QUANTITY)

1331

82 C h a p t e r 4 Relational Data Retrieval: SQL

MIN and MAX You can also find the minimum or maximum of a set of attribute
values. Consider the following query:

‘‘What is the largest number of units of Product Number 21765 that any
individual salesperson has sold?’’

Using the MAX operator, you would write:

SELECT

MAX(QUANTITY)

FROM SALES
WHERE PRODNUM=21765;

and the result would be:
MAX(QUANTITY)
3110

To find the smallest number of units you simply replace MAX with MIN:

SELECT

MIN(QUANTITY)

FROM SALES
WHERE PRODNUM=21765;

and get:

MIN(QUANTITY)
809

COUNT COUNT is a very useful operator that counts the number of rows that
satisfy a set of criteria. It is often used in the context of ‘‘how many of something’’
meet some stated conditions. Consider the following query:

‘‘How many salespersons have sold Product Number 21765?’’

Remember that each row of the SALES table describes the history of a particular
salesperson selling a particular product. That is, each combination of

SPNUM

and PRODNUM is unique; there can only be one row that involves a particular
SPNUM/PRODNUM combination. If you can count the number of rows of that
table that involve Product Number 21765, then you know how many salespersons
have a history of selling it. Using the notational device COUNT(*), the SELECT
statement is:

SELECT

COUNT(*)

FROM SALES
WHERE PRODNUM=21765;

and the answer is:

COUNT(*)
3

Data Retrieval with the SQL SELECT Command 83

Don’t get confused by the difference between SUM and COUNT. As we
demonstrated above, SUM adds up a set of attribute values; COUNT counts the
number of rows of a table that satisfy a set of stated criteria.

Grouping Rows

Using the built-in functions, we were able to calculate results based on attribute
values in several rows of a table. In effect, we formed a single ‘‘group’’ of rows and
performed some calculation on their attribute values. There are many situations that
require such calculations to be made on several different groups of rows. This is a
job for the GROUP BY clause.

GROUP BY A little earlier we found the total number of units of all products that
one particular salesperson has sold. It seems reasonable that at some point we might
want to find the total number of units of all products that each salesperson has sold.
That is, we want to group together the rows of the SALES table that belong to each
salesperson and calculate a value—the sum of the Quantity attribute values in this
case—for each such group. Here is how such a query might be stated:

‘‘Find the total number of units of all products sold by each salesperson.’’

The SQL statement, using the GROUP BY clause, would look like this:

SELECT SPNUM, SUM(QUANTITY)
FROM SALES
GROUP BY SPNUM;

and the results would be:

SPNUM SUM(QUANTITY)

137

1331

186

9307

204

1543

361

9577

Notice that GROUP BY SPNUM specifies that the rows of the table are to
be grouped together based on having the same value in their SPNUM attribute.
All the rows for Salesperson Number 137 will form one group, all of the rows for
Salesperson Number 186 will form another group, and so on. The Quantity attribute
values in each group will then be summed—SUM(QUANTITY)—and the results
returned to the user. But it is not enough to provide a list of sums:

1331
9307
1543
9577

These are indeed the sums of the quantities for each salesperson But, without
identifying which salesperson goes with which sum, they are meaningless! That’s

84 C h a p t e r 4 Relational Data Retrieval: SQL

why the SELECT clause includes both the SPNUM and the SUM(QUANTITY).
Including the attribute(s) specified in the GROUP BY clause in the SELECT clause
allows you to properly identify the sums calculated for each group.

An SQL statement with a GROUP BY clause may certainly also include a
WHERE clause. Thus, the query:

‘‘Find the total number of units of all products sold by each salesperson whose
salesperson number is at least 150.’’

would look like:

SELECT SPNUM, SUM(QUANTITY)
FROM SALES
WHERE SPNUM>=150
GROUP BY SPNUM;

and the results would be:
SPNUM SUM(QUANTITY)

186 9307

204 1543

361 9577

HAVING Sometimes there is a need to limit the results of a GROUP BY based on
the values calculated for each group with the built-in functions. For example, take
the last query above,

‘‘Find the total number of units of all products sold by each salesperson whose
salesperson number is at least 150.’’

Now modify it with an additional sentence so that it reads:

‘‘Find the total number of units of all products sold by each salesperson
whose salesperson number is at least 150. Include only salespersons whose total
number of units sold is at least 5000.’’

This would be accomplished by adding a HAVING clause to the end of the SELECT
statement:

SELECT SPNUM, SUM(QUANTITY)
FROM SALES
WHERE SPNUM>=150
GROUP BY SPNUM
HAVING SUM(QUANTITY)>=5000;

and the results would be:
SPNUM SUM(QUANTITY)
186 9307
361 9577

with Salesperson Number 204, with a total of only 1543 units sold, dropping out of
the results.

Data Retrieval with the SQL SELECT Command 85

Notice that in this last SELECT statement, there are two limitations One,
that the Salesperson Number must be at least 150, appears in the WHERE clause
and the other, that the sum of the number of units sold must be at least 5000,
appears in the HAVING clause. It is important to understand why this is so. If the
limitation is based on individual attribute values that appear in the database, then
the condition goes in the WHERE clause. This is the case with the limitation based
on the Salesperson Number value. If the limitation is based on the group calculation
performed with the built-in function, then the condition goes in the HAVING clause.
This is the case with the limitation based on the sum of the number of product units
sold.

The Join

Up to this point, all the SELECT features we have looked at have been shown in
the context of retrieving data from a single table. The time has come to look at how
the SQL SELECT command can integrate data from two or more tables or ‘‘join’’
them. There are two specifications to make in the SELECT statement to make a
join work. One is that the tables to be joined must be listed in the FROM clause.
The other is that the join attributes in the tables being joined must be declared and
matched to each other in the WHERE clause. And there is one more point. Since two
or more tables are involved in a SELECT statement that involves a join, there is the
possibility that the same attribute name can appear in more than one of the tables.
When this happens, these attribute names must be ‘‘qualified’’ with a table name
when used in the SELECT statement. All of this is best illustrated in an example.

Consider the following query, which we discussed earlier in this book:

‘‘Find the name of the salesperson responsible for Customer Number 1525.’’

The SELECT statement to satisfy this query is:

SELECT SPNAME

FROM SALESPERSON, CUSTOMER

WHERE SALESPERSON.SPNUM=CUSTOMER.SPNUM

AND CUSTNUM=1525;

and the result is:

SPNAME
Carlyle

Let’s take a careful look at this last SELECT statement. Notice that the two
tables involved in the join, SALESPERSON and CUSTOMER, are listed in the
FROM clause. Also notice that the first line of the WHERE clause:

SALESPERSON.SPNUM = CUSTOMER.SPNUM
links the two join attributes: the SPNUM attribute of the SALESPERSON table
(SALESPERSON.SPNUM) and the SPNUM attribute of the CUSTOMER table
(CUSTOMER.SPNUM). The notational device of having the table name ‘‘.’’ the
attribute name is known as ‘‘qualifying’’ the attribute name. As we said earlier,
this qualification is necessary when the same attribute name is used in two or more
tables in a SELECT statement. By the way, notice in the SELECT statement that

86 C h a p t e r 4 Relational Data Retrieval: SQL

the attributes SPNAME and CUSTNUM don’t have to be qualified because each
appears in only one of the tables included in the SELECT statement.

Here is an example of a join involving three tables, assuming for the moment
that salesperson names are unique:

‘‘List the names of the products of which salesperson Adams has sold more
than 2000 units.’’

The salesperson name data appears only in the SALESPERSON table and the
product name data appears only in the PRODUCT table. The SALES table shows
the linkage between the two, including the quantities sold. And so the SELECT
statement will be:

SELECT

PRODNAME

FROM SALESPERSON, PRODUCT, SALES
WHERE SALESPERSON.SPNUM=SALES.SPNUM
AND SALES.PRODNUM=PRODUCT.PRODNUM
AND SPNAME=‘Adams’
AND QUANTITY>2000;

which results in:
PRODNAME

Hammer

Saw

Subqueries

A variation on the way that the SELECT statement works is when one SELECT
statement is ‘‘nested’’ within another in a format known as a subquery. This can
go on through several levels of SELECT statements, with each successive SELECT
statement contained in a pair of parentheses. The execution rule is that the innermost
SELECT statement is executed first and its results are then provided as input to the
SELECT statement at the next level up. This procedure can be an alternative to
the join. Furthermore, there are certain circumstances in which this procedure must

Y O U R
T U R N

4.1 QUERIES GALORE!

Having a relational database to
query in any business environment opens up a new world
of information for managers to use to help them run their
portion of the business.

QUESTION:
Think about a business environment that you are familiar

with from your daily life. It might be a university, a

supermarket, a department store, even a sports league.
Write a list of ten questions that you would like to be
able to ask that would enhance your interaction with
that environment. Is it reasonable that a database could
be constructed that would support your ability to ask
the questions you’ve come up with? Do you think that
you would be able to formulate your questions using
SQL? Explain.

Data Retrieval with the SQL SELECT Command 87

be used. These latter circumstances are common enough and important enough to
include in this treatment of the SQL SELECT command.

Subqueries as Alternatives to Joins Let’s reconsider the first join example given
above:

‘‘Find the name of the salesperson responsible for Customer Number 1525.’’

If you methodically weave through the database tables to solve this, as we discussed
earlier in the book, you start at the CUSTOMER table, find the record for Customer
Number 1525 and discover in that record that the salesperson responsible for
this customer is Salesperson Number 361. You then take that information to the
SALESPERSON table where you look up the record for Salesperson Number 361
and discover in it that the salesperson’s name is Carlyle. Using a subquery, this
logic can be built into an SQL statement as:

SELECT SPNAME
FROM SALESPERSON
WHERE SPNUM=

(SELECT SPNUM
FROM CUSTOMER
WHERE CUSTNUM=1525);

and the result will again be:

SPNAME
Carlyle

Follow the way that the description given above of methodically solving
the problem is reconstructed as a SELECT statement with a subquery. Since the
innermost SELECT (the indented one), which constitutes the subquery, is considered
first, the CUSTOMER table is queried first, the record for Customer Number 1525
is found and 361 is returned as the SPNUM result. How do we know that only one
salesperson number will be found as the result of the query? Because CUSTNUM is
a unique attribute, Customer Number 1525 can only appear in one record and that
one record only has room for one salesperson number! Moving along, Salesperson
Number 361 is then fed to the outer SELECT statement. This, in effect, makes the
main query, that is the outer SELECT, look like:

SELECT SPNAME
FROM SALESPERSON
WHERE SPNUM=361;

and this results in:

SPNAME
Carlyle

Notice, by the way, that in the SELECT statement, there is only one semicolon
at the end of the entire statement, including the subquery.

88 C h a p t e r 4 Relational Data Retrieval: SQL

When a Subquery is Required There is a very interesting circumstance in which a
subquery is required. This situation is best explained with an example up front.
Consider the following query:

‘‘Which salespersons with salesperson numbers greater than 200 have the lowest
commission percentage?’’ (We’ll identify salespersons by their salesperson
number.)

This seems like a perfectly reasonable request, and yet it turns out to be deceptively
difficult. The reason is that the query really has two very different parts. First, the
system has to determine what the lowest commission percentage is for salespersons
with salesperson numbers greater than 200. Then, it has to see which of these
salespersons has that lowest percentage. It’s really tempting to try to satisfy this
type of query with an SQL SELECT statement like:

SELECT SPNUM, MIN(COMMPERCT)
FROM SALESPERSON
WHERE SPNUM>200;

or, perhaps:

SELECT SPNUM
FROM SALESPERSON

WHERE SPNUM>200
AND COMMPERCT=MIN(COMMPERCT);

But these will not work! It’s like asking SQL to perform two separate operations
and somehow apply one to the other in the correct sequence. This turns out to be
asking too much. But there is a way to do it and it involves subqueries. In fact, what
we will do is ask the system to determine the minimum commission percentage
first, in a subquery, and then use that information in the main query to determine
which salespersons have it:

SELECT SPNUM
FROM SALESPERSON
WHERE SPNUM>200
AND COMMPERCT=

(SELECT MIN(COMMPERCT)
FROM SALESPERSON)
WHERE SPNUM>200);

which results in:
SPNUM

204

The minimum commission percentage across all of the salespersons with
salesperson numbers greater than 200 is determined first in the subquery and the
result is 10. The main query then, in effect, looks like:

SELECT SPNUM
FROM SALESPERSON

Data Retrieval with the SQL SELECT Command 89

WHERE SPNUM>200
AND COMMPERCT=10;

which yields the result of salesperson number 204, as shown.
Actually, this is a very interesting example of a required subquery. What

makes it really interesting is why the predicate, SPNUM>200, appears in both
the main query and the subquery. Clearly it has to be in the subquery because
you must first find the lowest commission percentage among the salespersons with
salesperson numbers greater than 200. But then why does it have to be in the
main query, too? The answer is that the only thing that the subquery returns to the
main query is a single number, specifically a commission percentage. No memory
is passed on to the main query of how the subquery arrived at that value. If you
remove SPNUM>200 from the main query, so that it now looks like:

SELECT SPNUM
FROM SALESPERSON
WHERE COMMPERCT=

(SELECT MIN(COMMPERCT)
FROM SALESPERSON)
WHERE SPNUM>200);

you would find every salesperson with any salesperson number whose commission
percentage is equal to the lowest commission percentage of the salespersons with
salesperson numbers greater than 20. (Of course, if for some reason you do want
to find all of the salespersons, regardless of salesperson number, who have the
same commission percentage as the salesperson who has the lowest commission
percentage of the salespersons with salesperson numbers greater than 20, then this
last SELECT statement is exactly what you should write!)

A Strategy for Writing SQL SELECT Commands

Before we go on to some more examples, it will be helpful to think about developing
a strategy for writing SQL SELECT statements. The following is an ordered list of
steps.

1. Determine what the result of the query is to be and write the needed attributes
and functions in the SELECT clause. This may seem an obvious instruction, but
it will really pay to think this through carefully before going on. In fact, it is
at this very first step that you must determine whether the query will require a
GROUP BY clause or a subquery. If either of these is required, you should start
outlining the overall SELECT statement by writing the GROUP BY clause or
the nested SELECT for the subquery further down the page (or screen).

2. Determine which tables of the database will be needed for the query and write
their names in the FROM clause. Include only those tables that are really
necessary for the query. Sometime this can be tricky. For example, you might
need an attribute that is the primary key of a table and you might be tempted
immediately to include that table in the FROM clause. However, it could be that
the attribute in question is a foreign key in another table that is already in the
FROM clause for other reasons. It is then unnecessary to include the table in
which it is the primary key unless, of course, other attributes from that table are
needed, too.

90 C h a p t e r 4 Relational Data Retrieval: SQL

3. Begin constructing the WHERE clause by equating the join attributes from the
tables that are in the FROM clause. Once this job is out of the way, you can
begin considering the row limitations that must be stated in the WHERE clause.

4. Continue filling in the details of the WHERE clause, the GROUP BY clause,
and any subqueries.

One final piece of advice: If you are new to writing SQL SELECT commands
but you have a programming background, you may be tempted to avoid setting up
joins and try writing subqueries instead. Resist this temptation, for two reasons!
One is that joins are an essential part of the relational database concept. Embrace
them; don’t be afraid of them. The other is that writing multiple levels of nested
subqueries can be extremely error prone and difficult to debug.

EXAMPLE: GOOD READING BOOK STORES

The best way to gain confidence in understanding SQL SELECT statements is to
write some! And there are some further refinements of the SQL SELECT that we
have yet to present. We will use the same three example databases that appeared
in previous chapters but, as with the General Hardware database, we will shorten
the attribute names. We will state a variety of queries and then give the SELECT
statements that will satisfy them, plus commentary as appropriate. You should try
to write the SELECT statements yourself before looking at our solutions!

Figure 4.2 is the Good Reading Bookstores relational database. Here is a list
of queries for Good Reading Bookstores.

F I G U R E 4.2
Good reading Bookstores Relational
database

SALE table

BOOKNUM CUSTNUM DATE PRICE QUANTITY

WRITING table

BOOKNUM AUTHORNUM

CUSTOMER table

CUSTNUM CUSTNAME STREET CITY STATE COUNTRY

BOOK table

BOOKNUM BOOKNAME PUBYEAR PAGES PUBNAME

AUTHOR table

AUTHORNUM AUTHORNAME YEARBORN YEARDIED

PUBLISHER table

PUBNAME CITY COUNTRY TELEPHONE YRFOUND

Example: Good Reading Book Stores 91

1. ‘‘Find the book number, book name, and number of pages of all the books
published by London Publishing Ltd. List the results in order by book
name.’’

This query obviously requires the PUBNAME attribute but it does not require
the PUBLISHER table. All of the information needed is in the BOOK table,
including the PUBNAME attribute, which is there as a foreign key. The SELECT
statement is:

SELECT BOOKNUM, BOOKNAME, PAGES
FROM

BOOK

WHERE PUBNAME=‘London Publishing Ltd.’
ORDER BY BOOKNAME;

2. ‘‘How many books of at least 400 pages does Good Reading Bookstores
carry that were published by publishers based in Paris, France?’’

This is a straightforward join between the PUBLISHER and BOOK tables that
uses the built-in function COUNT. All of the attribute names are unique between
the two tables, except for PUBNAME, which must be qualified with a table name
every time it is used. Notice that ‘Good Reading Bookstores’ does not appear as
a condition in the SELECT statement, although it was mentioned in the query.
The entire database is about Good Reading Bookstores and no other! There is no
BOOKSTORE CHAIN table in the database and there is no STORENAME or
CHAINNAME attribute in any of the tables.

SELECT COUNT(*)
FROM PUBLISHER, BOOK
WHERE PUBLISHER.PUBNAME=BOOK.PUBNAME
AND CITY=‘Paris’
AND COUNTRY=‘France’
AND PAGES>=400;

3. ‘‘List the publishers in Belgium, Brazil, and Singapore that publish books
written by authors who were born before 1920.’’

Sometimes a relatively simple-sounding query can be fairly involved. This
query actually requires four tables of the database! To begin with, we need the
PUBLISHER table because that’s the only place that a publisher’s country is
stored. But we also need the AUTHOR table because that’s where author birth
years are stored. The only way to tie the PUBLISHER table to the AUTHOR table
is to connect PUBLISHER to BOOK, then to connect BOOK to WRITING, and
finally to connect WRITING to AUTHOR. With simple, one-attribute keys such
as those in these tables, the number of joins will be one fewer than the number
of tables. The FROM clause below shows four tables and the first three lines of
the WHERE clause show the three joins. Also, notice that since a publisher may
have published more than one book with the stated specifications, DISTINCT
is required to prevent the same publisher name from appearing several, perhaps
many, times in the result. Finally, since we want to include publishers in three
specific countries, we list the three countries as Belgium, Brazil, and Singapore.
But, in the SELECT statement, we have to indicate that for a record to be
included in the result, the value of the COUNTRY attribute must be Belgium,
Brazil or Singapore.

92 C h a p t e r 4 Relational Data Retrieval: SQL

SELECT DISTINCT PUBNAME
FROM PUBLISHER, BOOK, WRITING,

AUTHOR

WHERE PUBLISHER.PUBNAME=BOOK.PUBNAME
AND BOOK.BOOKNUM=WRITING.BOOKNUM
AND WRITING.AUTHORNUM=AUTHOR.AUTHORNUM
AND COUNTRY IN (‘Belgium’, ‘Brazil’, ‘Singapore’)
AND YEARBORN < 1920;

4. ‘‘How many books did each publisher in Oslo, Norway; Nairobi, Kenya;
and Auckland, New Zealand, publish in 2001?’’

The keyword here is ‘‘each.’’ This query requires a separate total for each
publisher that satisfies the conditions. This is a job for the GROUP BY clause.
We want to group together the records for each publisher and count the number of
records in each group. Each line of the result must include both a publisher name
and count of the number of records that satisfy the conditions. This SELECT
statement requires both a join and a GROUP BY. Notice the seeming complexity
but really the unambiguous beauty of the ANDs and ORs structure regarding the
cities and countries.

SELECT PUBNAME, CITY, COUNTRY, COUNT(*)
FROM PUBLISHER, BOOK
WHERE PUBLISHER.PUBNAME=BOOK.PUBNAME
AND ((CITY=‘Oslo’ AND COUNTRY=‘Norway’)

OR (CITY=‘Nairobi’ AND COUNTRY=‘Kenya’)
OR (CITY=‘Auckland’ AND COUNTRY=‘New Zealand’))

AND PUBYEAR=2001
GROUP BY PUBNAME;

5. ‘‘Which publisher published the book that has the earliest publication year
among all the books that Good Reading Bookstores carries?’’

All that is called for in this query is the name of the publisher, not the name
of the book. This is a case that requires a subquery. First the system has to
determine the earliest publication year, then it has to see which books have that
earliest publication year. Once you know the books, their records in the BOOK
table give you the publisher names. Since more than one publisher may have
published a book in that earliest year, there could be more than one publisher
name in the result. And, since a particular publisher could have published more
than one book in that earliest year, DISTINCT is required to avoid having that
publisher’s name listed more than once.

SELECT DISTINCT PUBNAME
FROM BOOK
WHERE PUBYEAR=

(SELECT MIN(PUBYEAR)
FROM BOOK);

EXAMPLE: WORLD MUSIC ASSOCIATION

Figure 4.3 is the World Music Association relational database. Here is a list of
queries for the World Music Association.

Example: World Music Association 93

F I G U R E 4.3
World Music Association relational
database

RECORDING table

ORCHNAME YEAR PRICE

COMPOSITIONNAME

COMPOSITION table

COMPOSITIONNAME
COMPOSITIONNAME

YEAR

YEAR

COMPOSERNAME

COMPOSERNAME

COMPOSER table

COMPOSERNAME COUNTRY DATEBIRTH

DEGREE table

MUSNUM DEGREE UNIVERSITY YEAR

MUSICIAN table

MUSNUM MUSNAME INSTRUMENT ANNSALARY ORCHNAME

ORCHESTRA table

ORCHNAME CITY COUNTRY MUSICDIR

1. ‘‘What is the total annual salary cost for all the violinists in the Berlin
Symphony Orchestra?’’

SELECT SUM(ANNSALARY)
FROM

MUSICIAN

WHERE ORCHNAME=‘Berlin Symphony Orchestra’
AND INSTRUMENT=‘Violin’;

2. ‘‘Make a single list, in alphabetic order, of all of the universities attended by
the cellists in India.’’

SELECT DISTINCT UNIVERSITY
FROM ORCHESTRA, MUSICIAN,

DEGREE

WHERE ORCHESTRA.ORCHNAME=MUSICIAN.ORCHNAME
AND MUSICIAN.MUSNUM=DEGREE.MUSNUM
AND INSTRUMENT=‘Cello’
AND COUNTRY=‘India’
ORDER BY UNIVERSITY;

3. ‘‘What is the total annual salary cost for all of the violinists of each orchestra
in Canada? Include in the result only those orchestras whose total annual
salary for its violinists is in excess of $150,000.’’

Since this query requires a separate total for each orchestra, the SELECT
statement must rely on the GROUP BY clause. Since the condition that the

94 C h a p t e r 4 Relational Data Retrieval: SQL

total must be over 150,000 is based on figures calculated by the SUM built-in
function, it must be placed in a HAVING clause rather than in the WHERE
clause.

SELECT ORCHNAME, SUM(ANNSALARY)
FROM ORCHESTRA, MUSICIAN
WHERE ORCHESTRA.ORCHNAME=MUSICIAN.ORCHNAME
AND COUNTRY=‘Canada’
AND INSTRUMENT=‘Violin’
GROUP BY ORCHNAME
HAVING SUM(ANNSALARY)>150,000;

4. ‘‘What is the name of the most highly paid pianist?’’

It should be clear that a subquery is required. First the system has to determine
what the top salary of pianists is and then it has to find out which pianists have
that salary.

SELECT MUSNAME
FROM MUSICIAN
WHERE INSTRUMENT=‘Piano’
AND ANNSALARY=

(SELECT MAX(ANNSALARY)
FROM MUSICIAN
WHERE INSTRUMENT=‘Piano’);

This is another example in which a predicate, INSTRUMENT=‘Piano’ in this
case, appears in both the main query and the subquery. Clearly it has to be in
the subquery because you must first find out how much money the highest-paid
pianist makes. But then why does it have to be in the main query, too? The
answer is that the only thing that the subquery returns to the main query is a single
number, specifically a salary value. No memory is passed on to the main query of
how the subquery arrived at that value. If you remove INSTRUMENT=‘Piano’
from the main query so that it now looks like:

SELECT MUSNAME
FROM MUSICIAN
WHERE ANNSALARY=

(SELECT MAX(ANNSALARY)
FROM MUSICIAN
WHERE INSTRUMENT=‘Piano’);

you would find every musician who plays any instrument whose salary is equal
to the highest- paid pianist. Of course, if for some reason you do want to find
all of the musicians, regardless of the instrument they play, who have the same
salary as the highest-paid pianist, then this last SELECT statement is exactly
what you should write.

5. ‘‘What is the name of the most highly paid pianist in any orchestra in
Australia?’’

This is the same idea as the last query but involves two tables, both of which
must be joined in both the main query and the subquery. The reasoning for this
is the same as in the last query. The salary of the most highly paid pianist in

Example: Lucky Rent-A-Car 95

Australia must be determined first in the subquery. Then that result must be used
in the main query, where it must be compared only to the salaries of Australian
pianists.

SELECT MUSNAME
FROM MUSICIAN,

ORCHESTRA

WHERE MUSICIAN.ORCHNAME=ORCHESTRA.ORCHNAME
AND INSTRUMENT=‘Piano’
AND COUNTRY=‘Australia’
AND ANNSALARY=

(SELECT MAX(ANNSALARY)
FROM MUSICIAN, ORCHESTRA
WHERE MUSICIAN.ORCHNAME=ORCHESTRA.ORCHNAME
AND INSTRUMENT=‘Piano’
AND COUNTRY=‘Australia’);

EXAMPLE: LUCKY RENT-A-

CAR

Figure 4.4 is the Lucky Rent-A-Car relational database. Here is a list of queries for
Lucky Rent-A-Car.

1. ‘‘List the manufacturers whose names begin with the letter ‘‘C’’ or the letter
‘‘D’’ and that are located in Japan.’’

SELECT MANUFNAME
FROM

MANUFACTURER

F I G U R E 4.4
Lucky Rent-A-Car relational database

RENTAL table

CARNUM CUSTNUM RENTALDATE RETURNDATE COST

CUSTOMER table

CUSTNUM CUSTNAME CUSTADDR CUSTPHONE

MAINTENANCE table

REPAIRNUM CARNUM DATE PROCEDURE MILEAGE REPAIRTIME

CAR table

CARNUM MODEL YEAR CLASS MANUFNAME

MANUFACTURER table

MANUFNAME COUNTRY SALESREPNAME SALESREPPHONE

96 C h a p t e r 4 Relational Data Retrieval: SQL

WHERE (MANUFNAME LIKE ‘C%’
OR MANUFNAME LIKE ‘D%’)

AND COUNTRY=‘Japan’;

2. ‘‘What was the average mileage of the cars that had tune-ups in August,
2003?’’

SELECT AVG(MILEAGE)
FROM

MAINTENANCE

WHERE PROCEDURE=‘Tune-Up’
AND DATE BETWEEN ‘AUG-01-2003’ AND ‘AUG-31-2003’;

The exact format for specifying dates may differ among SQL processors and a
given processor may have several options.

3. ‘‘How many different car models are made by manufacturers in Italy?’’

This query will use an interesting combination of COUNT and DISTINCT that
may not work in all SQL processors. In this case it literally counts the different
models among the cars made in Italy. Since many different cars are of the same
model, DISTINCT is needed to make sure that each model is counted just once.

SELECT COUNT(DISTINCT MODEL)
FROM MANUFACTURER, CAR
WHERE MANUFACTURER.MANUFNAME=CAR.MANUFNAME
AND COUNTRY=‘Italy’;

4. ‘‘How many repairs were performed on each car manufactured by Superior
Motors during the month of March, 2004? Include only cars in the result
that had at least three repairs.’’

SELECT CAR.CARNUM, COUNT(*)
FROM CAR, MAINTENANCE
WHERE CAR.CARNUM=MAINTENANCE.CARNUM
AND MANUFNAME=‘Superior Motors’
AND DATE BETWEEN ‘MAR-01-2004’ AND ‘MAR-31-2004’
GROUP BY CAR.CARNUM
HAVING COUNT(*)>=3;

5. ‘‘List the cars of any manufacturer that had an oil change in January, 2004,
and had at least as many miles as the highest-mileage car manufactured by
Superior Motors that had an oil change that same month.’’

SELECT MAINTENANCE.CARNUM
FROM MAINTENANCE
WHERE PROCEDURE=‘Oil Change’
AND DATE BETWEEN ‘JAN-01-2004’ AND ‘JAN-31-2004’
AND MILEAGE>=

(SELECT MAX(MILEAGE)
FROM CAR, MAINTENANCE
WHERE CAR.CARNUM, MAINTENANCE.CARNUM
AND PROCEDURE=‘Oil Change’
AND DATE BETWEEN ‘JAN-01-2004’ AND ‘JAN-31-2004’
AND MANUFNAME=‘Superior Motors’);

Relational Query Optimizer 97

RELATIONAL QUERY OPTIMIZER

Relational DBMS Performance

An ever-present issue in data retrieval is performance: the speed with which
the required data can be retrieved. In a typical relational database application
environment, and as we’ve seen in the examples above, many queries require only
one table. It is certainly reasonable to assume that such single-table queries using
indexes, hashing, and the like, should, more or less, not take any longer in a relational
database system environment than in any other kind of file management system.
But,what about the queries that involve joins? Recall the detailed explanation of how
data integration works earlier in the book that used the Salesperson and Customer
tables as an example. These very small tables did not pose much of a performance
issue, even if the join was carried out in the worst-case way, comparing every row
of one table to every row of the other table, as was previously described. But what
if we attempted to join a 1-million-row table with a 3-million-row table? How long
do you think that would take—even on a large, fast computer? It might well take
much longer than a person waiting for a response at a workstation would be willing
to tolerate. This was actually one of the issues that caused the delay of almost ten
years from the time the first article on relational database was published in 1970
until relational DBMSs were first offered commercially almost ten years later.

The performance issue in relational database management has been approached
in two different ways. One, the tuning of the database structure, which is known
as ‘‘physical database design,’’ will be the subject of an entire chapter of this
book, Chapter 8. It’s that important. The other way that the relational database
performance issue has been approached is through highly specialized software in
the relational DBMS itself. This software, known as a relational query optimizer,
is in effect an ‘‘expert system’’ that evaluates each SQL SELECT statement sent to
the DBMS and determines an efficient way to satisfy it.

Relational Query Optimizer Concepts

All major SQL processors (meaning all major relational DBMSs) include a query
optimizer. Using a query optimizer, SQL attempts to figure out the most efficient
way of answering a query, before actually responding to it. Clearly, a query that
involves only one table should be evaluated to take advantage of aids such as indexes
on pertinent attributes. But, again, the most compelling and interesting reason for
having a query optimizer in a relational database system is the goal of executing
multiple-table data integration or join-type operations without having to go through
the worst-case, very time-consuming, exhaustive row-comparison process. Exactly
how a specific relational DBMS’s query optimizer works is typically a closely
held trade secret. Retrieval performance is one way in which the vendors of these
products compete with one another. Nevertheless, there are some basic ideas that
we can discuss here.

When an SQL query optimizer is presented with a new SELECT statement to
evaluate, it seeks out information about the tables named in the FROM clause. This
information includes:

■ Which attributes of the tables have indexes built over them.
■ Which attributes have unique values.
■ How many rows each table has.

98 C h a p t e r 4 Relational Data Retrieval: SQL

The query optimizer finds this information in a special internal database known
as the ‘‘relational catalog,’’ which will be described further in Chapter 10.

The query optimizer uses the information about the tables, together with the
various components of the SELECT statement itself, to look for an efficient way to
retrieve the data required by the query. For example, in the General Hardware Co.
SELECT statement:

SELECT SPNUM, SPNAME
FROM SALESPERSON
WHERE COMMPERCT=10;

the query optimizer might check on whether the COMMPERCT attribute has an
index built over it. If this attribute does have an index, the query optimizer might
decide to use the index to find the rows with a commission percentage of 10.
However, if the number of rows of the SALESPERSON table is small enough, the
query optimizer might decide to read the entire table into main memory and scan it
for the rows with a commission percentage of 10.

Another important decision that the query optimizer makes is how to satisfy
a join. Consider the following General Hardware Co. example that we looked at
above:

SELECT SPNAME
FROM SALESPERSON, CUSTOMER
WHERE SALESPERSON.SPNUM=CUSTOMER.SPNUM
AND CUSTNUM=1525;

In this case, the query optimizer should be able to recognize that since CUSTNUM
is a unique attribute in the CUSTOMER table and only one customer number is
specified in the SELECT statement, only a single record from the CUSTOMER
table, the one for customer number 1525, will be involved in the join. Once it finds
this CUSTOMER record (hopefully with an index), it can match the SPNUM value
found in it against the SPNUM values in the SALESPERSON records looking for
a match. If it is clever enough to recognize that SPNUM is a unique attribute in
the SALESPERSON table, then all it has to do is find the single SALESPERSON
record (hopefully with an index) that has that salesperson number and pull the
salesperson name (SPNAME) out of it to satisfy the query. Thus, in this type of
case, an exhaustive join can be completely avoided.

When a more extensive join operation can’t be avoided, the query optimizer
can choose from one of several join algorithms. The most basic, known as a
Cartesian product, is accomplished algorithmically with a ‘‘nested-loop join.’’ One
of the two tables is selected for the ‘‘outer loop’’ and the other for the ‘‘inner loop.’’
Each of the records of the outer loop is chosen in succession and, for each, the
inner-loop table is scanned for matches on the join attribute. If the query optimizer
can determine that only a subset of the rows of the outer or inner tables is needed,
then only those rows need be included in the comparisons.

A more efficient join algorithm than the nested-loop join, the ‘‘merge-scan
join,’’ can be used only if certain conditions are met. The principle is that for the
merge-scan join to work, each of the two join attributes either must be in sorted
order or must have an index built over it. An index, by definition, is in sorted order
and so, one way or the other, each join attribute has a sense of order to it. If this

Questions 99

condition is met, then comparing every record of one table to every record of the
other table as in a nested-loop join is unnecessary. The system can simply start at
the top of each table or index, as the case may be, and move downwards, without
ever having to move upwards.

SUMMARY

SQL has become the standard relational database management data definition
and data manipulation language. Data retrieval in SQL is accomplished with the
SELECT command. SELECT commands can be run in a direct query mode or
embedded in higher-level language programs in an embedded mode. The SELECT
command can be used to retrieve one or more rows of a table, one or more columns
of a table, or particular columns of particular rows. There are built-in functions
that can sum and average data, find the minimum and maximum values of a set of
data, and count the number of rows that satisfy a condition. These built-in functions
can also be applied to multiple subsets or groups of rows. The SELECT command
can also integrate data by joining two or more tables. Subqueries can be developed
for certain specific circumstances. There is a strategy for writing SQL commands
successfully.

Performance is an important issue in the retrieval of data from relational
databases. All relational database management systems have a relational query
optimizer, which is software that looks for a good way to solve each relational query
presented to it. While the ways that these query optimizers work are considered
trade secrets, there are several standard concepts and techniques that they generally
incorporate.

KEY TERMS

Access path
AND/OR
Base table
BETWEEN
Built-in functions
Comparisons
Data definition language (DDL)
Data manipulation language (DML)
Declarative

DISTINCT
Embedded mode
Filtering
GROUP BY
HAVING
IN
LIKE
Merge-scan join
ORDER BY

Nested-loop join
Procedural
Query
Relational query optimizer
Search argument
SELECT
Structured Query Language (SQL)
Subquery

QUESTIONS

1. What are the four basic operations that can be
performed on stored data?

2. What is Structured Query Language (SQL)?
3. Name several of the fundamental SQL commands

and discuss the purpose of each.
4. What is the purpose of the SQL SELECT command?

5. How does the SQL SELECT command relate to the
relational Select, Project, and Join concepts?

6. Explain the difference between running SQL in
query mode and in embedded mode.

7. Describe the basic format of the SQL SELECT
command.

100 C h a p t e r 4 Relational Data Retrieval: SQL

8. In a general way, describe how to write an SQL
SELECT command to accomplish a relational Select
operation.

9. In a general way, describe how to write an SQL
SELECT command to accomplish a relational
Project operation.

10. In a general way, describe how to write an SQL
SELECT command to accomplish a combination of
a relational Select operation and a relational Project
operation.

11. What is the purpose of the WHERE clause in SQL
SELECT commands?

12. List and describe some of the common operators
that can be used in the WHERE clause.

13. Describe the purpose of each of the following
operators in the WHERE clause:
a. AND
b. OR
c. BETWEEN
d. IN
e. LIKE

14. What is the purpose of the DISTINCT operator?
15. What is the purpose of the ORDER BY clause?

16. Name the five SQL built-in functions and describe
the purpose of each.

17. Explain the difference between the SUM and
COUNT built-in functions.

18. Describe the purpose of the GROUP BY clause.
Why must the attribute in the GROUP BY clause
also appear in the SELECT clause?

19. Describe the purpose of the HAVING clause. How
do you decide whether to place a row-limiting
predicate in the WHERE clause or in the HAVING
clause?

20. How do you construct a Join operation in an SQL
SELECT statement?

21. What is a subquery in an SQL SELECT statement?
22. Describe the circumstances in which a subquery

must be used.
23. What is a relational query optimizer? Why are they

important?
24. How do relational query optimizers work?
25. What information does a relational query optimizer

use in making its decisions?
26. What are some of the ways that relational query

optimizers can handle joins?

EXERCISES

1. Consider the following relational database that
Best Airlines uses to keep track of its mechanics,
their skills, and their airport locations. Mechanic
number (MECHNUM), airport name (AIRNAME),
and skill number are all unique fields. SIZE is
an airport’s size in acres. SKILLCAT is a skill
category, such as an engine skill, wing skill, tire
skill, etc. YEARQUAL is the year that a mechanic
first qualified in a particular skill; PROFRATE is the
mechanic’s proficiency rating in a particular skill.

MECHANIC Table

MECHNUM MECHNAME AGE SALARY −−−−−−AIRNAME

AIR

PORT Table

AIRNAME CITY STATE SIZE YEAROPENED

SKILL Table

SKILLNUM SKILLNAME SKILLCAT

QUALIFICATION Table

−−−−−−−MECHNUM −−−−−−−SKILLNUM YEARQUAL PROFRATE

Write SQL SELECT commands to answer the
following queries.
a. List the names and ages of all the mechanics.
b. List the airports in California that are at least

20 acres in size and have been open since 1935.
Order the results from smallest to largest airport.

c. List the airports in California that are at least 20
acres in size or have been open since 1935.

d. Find the average size of the airports in California
that have been open since 1935.

Exercises 101

e. How many airports have been open in California
since 1935?

f. How many airports have been open in each state
since 1935?

g. How many airports have been open in each state
since 1935? Include in your answer only those
states that have at least five such airports.

h. List the names of the mechanics who work in
California.

i. Fan blade replacement is the name of a skill.
List the names of the mechanics who have a
proficiency rating of 4 in fan blade replacement.

j. Fan blade replacement is the name of a skill.
List the names of the mechanics who work in
California who have a proficiency rating of 4 in
fan blade replacement.

k. List the total, combined salaries of all of the
mechanics who work in each city in California.

l. Find the largest of all of the airports.
m. Find the largest airport in California.

2. Consider the following relational database for the
Quality Appliance Manufacturing Co. The database
is designed to track the major appliances (refrig-
erators, washing machines, dishwashers, etc.) that
Quality manufactures. It also records information
about Quality’s suppliers, the parts they supply, the
buyers of the finished appliances, and the finished
goods inspectors. Note the following facts about this
environment:

• Suppliers are the companies that supply Quality
with its major components, such as electric
motors, for the appliances. Supplier number is
a unique identifier.

• Parts are the major components that the suppliers
supply to Quality. Each part comes with a part
number but that part number is only unique within
a supplier. Thus, from Quality’s point of view, the
unique identifier of a part is the combination of
part number and supplier number.

• Each appliance that Quality manufactures is given
an appliance number that is unique across all of
the types of appliances that Quality makes.

• Buyers are major department stores, home
improvement chains, and wholesalers. Buyer
numbers are unique.

• An appliance may be inspected by several
inspectors. There is clearly a many-to-many
relationship among appliances and inspectors, as
indicated by the INSPECTION table.

• There are one-to-many relationships between
suppliers and parts (Supplier Number is a foreign
key in the PART table), parts and appliances
(Appliance Number is a foreign key in the PART
table), and appliances and buyers (Buyer Number
is a foreign key in the APPLIANCE table).

SUPPLIER Table

SUPPLIERNUM SUPPLIERNAME CITY COUNTRY PRESIDENT

PART Table

PARTNUM −−−−−−−−−SUPPLIERNUM PARTTYPE COST −−−−−−−−−−APPLIANCENUM

APPLIANCE Table

APPLIANCENUM APPLIANCETYPE DATEMANUF −−−−−−−BUYERNUM PRICE

BUYER Table

BUYERNUM BUYERNAME CITY COUNTRY CREDITRATING

INSPECTOR Table

INSPECTORNUM INSPECTORNAME SALARY DATEHIRE

INSPECTION Table

−−−−−−−−−−APPLIANCENUM −−−−−−−−−−INSPECTORNUM DATEINSPECTION SCORE

Write SQL SELECT commands to answer the
following queries.
a. List the names, in alphabetic order, of the

suppliers located in London, Liverpool, and
Manchester, UK.

b. List the names of the suppliers that supply motors
(see PARTTYPE) costing between $50 and $100.

102 C h a p t e r 4 Relational Data Retrieval: SQL

c. Find the average cost of the motors (see
PARTTYPE) supplied by supplier number 3728.

d. List the names of the inspectors who were
inspecting refrigerators (see APPLIANCE-
TYPE) on April 17, 2011.

e. What was the highest inspection score achieved
by a refrigerator on November 3, 2011?

f. Find the total amount of money spent on Quality
Appliance products by each buyer from Mexico,
Venezuela, and Argentina.

g. Find the total cost of the parts used in each
dishwasher manufactured on February 28, 2010.
Only include in the results those dishwashers that
used at least $200 in parts.

h. List the highest0paid inspectors.
i. List the highest0paid inspectors who were hired

in 2009.
j. Among all of the inspectors, list those who earn

more than the highest-paid inspector who was
hired in 2009.

MINICASES

1. Consider the following relational database for Happy
Cruise Lines. It keeps track of ships, cruises, ports, and
passengers. A ‘‘cruise’’ is a particular sailing of a ship
on a particular date. For example, the seven-day journey

of the ship Pride of Tampa that leaves on June 13,
2011, is a cruise. Note the following facts about this
environment.

• Both ship number and ship name are unique in the
SHIP Table.

• A ship goes on many cruises over time. A cruise is
associated with a single ship.

• A port is identified by the combination of port name
and country.

• As indicated by the VISIT Table, a cruise includes
visits to several ports and a port is typically included
in several cruises.

• Both Passenger Number and Social Security Number
are unique in the PASSENGER Table. A particular
person has a single Passenger Number that is used for
all the cruises she takes.

• The VOYAGE Table indicates that a person can
take many cruises and a cruise, of course, has many
passengers.

SHIP Table

SHIPNUM SHIPNAME BUILDER LAUNCHDATE WEIGHT

CRUISE Table

CRUISENUM STARTDATE ENDDATE DIRECTOR −−−−−−SHIPNUM

PORT Table

PORTNAME COUNTRY NUMDOCKS MANAGER

VISIT Table

CRUISENUM −−−−−−−PORTNAME −−−−−−COUNTRY ARRDATE DEPDATE

PASSENGER Table

PASSENGERNUM PASSENGERNAME SOCSECNUM STATE COUNTRY

VOYAGE Table

−−−−−−−−−−−PASSENGERNUM −−−−−−−−CRUISENUM ROOMNUM FARE

Write SQL SELECT commands to answer the
following queries.
a. Find the start and end dates of cruise number 35218.
b. List the names and ship numbers of the ships built

by the Ace Shipbuilding Corp. that weigh more than
60,000 tons.

c. List the companies that have built ships for Happy
Cruise Lines.

d. Find the total number of docks in all the ports in
Canada.

e. Find the average weight of the ships built by the Ace
Shipbuilding Corp. that have been launched since
2000.

f. How many ports in Venezuela have at least three
docks?

Minicases 103

g. Find the total number of docks in each country. List
the results in order from most to least.

h. Find the total number of ports in each country.
i. Find the total number of docks in each country but

include only those countries that have at least twelve
docks in your answer.

j. Find the name of the ship that operated on (was used
on) cruise number 35218.

k. List the names, states and countries of the passengers
who sailed on The Spirit of Nashville on cruises that
began during July, 2011.

l. Find the names of the company’s heaviest ships.
m. Find the names of the company’s heaviest ships that

began a cruise between July 15, 2011 and July 31,
2011.

2. Consider the following relational database for the Super
Baseball League. It keeps track of teams in the league,
coaches and players on the teams, work experience of
the coaches, bats belonging to each team, and which
players have played on which teams. Note the following
facts about this environment:

• The database keeps track of the history of all the teams
that each player has played on and all the players who
have played on each team.

• The database only keeps track of the current team that
a coach works for.

• Team number, team name, and player number are
each unique attributes across the league.

• Coach name is unique only within a team (and we
assume that a team cannot have two coaches of the
same name).

• Serial number (for bats) is unique only within a team.
• In the Affiliation table, the Years attribute indicates

the number of years that a player played on a team;
the batting average is for the years that a player played
on a team.

TEAM Table

TEAMNUM TEAMNAME CITY MANAGER

COACH Table

−−−−−−−TEAMNUM COACHNAME ADDRESS

WORK EXPERIENCE Table

−−−−−−−TEAMNUM −−−−−−−−COACHNAME EXPERIENCETYPE YEARSEXPERIENCE

BATS Table

−−−−−−−TEAMNUM SERIALNUM MANUFACTURER

PLAYER Table

PLAYERNUM PLAYERNAME AGE

AFFILIATION Table

−−−−−−−−PLAYERNUM −−−−−−−TEAMNUM YEARS BATTINGAVG

Write SQL SELECT commands to answer the
following queries.
a. Find the names and cities of all of the teams

with team numbers greater than 15. List the results
alphabetically by team name.

b. List all of the coaches whose last names begin with
‘‘D’’ and who have between 5 and 10 years of experi-
ence as college coaches (see YEARSEXPERIENCE
and EXPERIENCETYPE).

c. Find the total number of years of experience of Coach
Taylor on team number 23.

d. Find the number of different types of experience that
Coach Taylor on team number 23 has.

e. Find the total number of years of experience of each
coach on team number 23.

f. How many different manufacturers make bats for the
league’s teams?

g. Assume that team names are unique. Find the names
of the players who have played for the Dodgers for
at least five years (see YEARS in the AFFILIATION
Table.)

h. Assume that team names are unique. Find the total
number of years of work experience of each coach
on the Dodgers, but include in the result only
those coaches who have more than eight years of
experience.

i. Find the names of the league’s youngest players.
j. Find the names of the league’s youngest players

whose last names begin with the letter ‘‘B’’.

C H A P T E R 5

THE RELATIONAL
DATABASE MODEL:

INTRODUCTION

I n 1970, Dr. Edgar F. (Ted) Codd of IBM published in Communications of the
ACM a paper entitled ‘‘A Relational Model of Data for Large Shared Data Banks.’’

This paper marked the beginning of the field of relational databases. During the 1970s,
the relational approach to databases progressed from being a technical curiosity to a
subject of serious interest in the information systems community. But it was not until the
early 1980s that commercially viable relational database management systems became
available. There were two basic reasons for this. One was that, while the relational
database was very tempting in concept, its application in a real-world environment
was elusive for performance-related reasons. The second reason was that at exactly the
time that Codd’s paper was published, the earlier hierarchical and network database
management systems were just coming onto the commercial scene and were the focus of
intense marketing efforts by the software and hardware vendors of the day. Eventually,
both of these obstacles were overcome and the relational model became and remains the
database model of choice.

OBJECTIVES

■ Explain why the relational database model became practical in about 1980.
■ Define such basic relational database terms as relation and tuple.
■ Describe the major types of keys including primary, candidate, and foreign.
■ Describe how one-to-one, one-to-many, and many-to-many binary relationships

are implemented in a relational database.
■ Describe how relational data retrieval is accomplished using the relational algebra

select, project, and join operators.
■ Understand how the join operator facilitates data integration in relational database.

CHAPTER OUTLINE

Introduction
The Relational Database Concept

Relational Terminology

Primary and Candidate Keys

Foreign Keys and Binary
Relationships

Data Retrieval from a Relational Database
Extracting Data from a Relation

106 C h a p t e r 5 The Relational Database Model: Introduction

The Relational Select Operator

The Relational Project Operator

Combination of the Relational

Select and Project Operators
Extracting Data Across Multiple

Relations: Data Integration

Example: Good Reading Book Stores
Example: World Music Association
Example: Lucky Rent-A-Car
Summary

INTRODUCTION

Several factors converged in the early 1980s to begin turning the tide toward
relational database. One was that the performance issues that held back its adoption
in the 1970s began to be resolved. Another was that, after a decade of use
of hierarchical and network database management systems, information systems
professionals were interested in an alternative that would move toward simplifying
the database design process and produce database structures that were easier to use
and understand at all levels. Also, at this time there was increasing interest in a
DBMS environment that would allow easier, more intuitive access to the data by an
increasingly broad range of personnel. Finally, the early 1980s saw the advent of
the personal computer. As software developers began trying to create all manner
of applications and supporting software utilities for the PC, it quickly became clear
that the existing hierarchical and network database approaches would not work in
the PC environment, for two reasons. One was that these DBMSs were simply too
large to store and use on the early PCs. The other was that they were too complex to
be used by the very broad array of non-information-systems professionals at whom
the PCs were targeted.

Today, the relational approach to database management is by far the primary
database management approach used in all levels of information systems and for
most application purposes, from accounting to banking to manufacturing to sales on
the World Wide Web. Relational database management is represented today by such
products as Microsoft Access and SQL Server, Oracle, Sybase, and IBM’s DB2 and
Informix. While these and other relational database systems differ in their features
and implementations, they all share a common data structure philosophy and a
common data access tool: Structured Query Language (SQL) (often pronounced
‘‘sequel’’). This chapter will focus on the basic concepts of how data is stored
and retrieved in a relational database by a relational DBMS. Chapter 6 will discuss
some additional relational database concepts. Then, Chapter 7 will describe logical
database design and Chapter 8 will go into physical database design.

THE RELATIONAL DATABASE CONCEPT

Relational Terminology

In spite of the apparent conflict between non-redundant, linear file data storage
and data integration demonstrated in Chapter 3, the relative simplicity of simple,
linear files or structures that resemble them in a true database environment is very
desirable. After all, the linear file arrangement is the most basic and commonly used
data structure there is. This is precisely one of the advantages of relational database
management.

The Relational Database Concept 107

C O N C E P T S
I N A C T I O N

5-A BLACK & DECKER

Black & Decker is one of the world’s
largest producers of electric power tools and power tool
accessories; it is among the largest-selling residential lock
manufacturers in the U.S., and is a major manufacturer
of faucets sold in the U.S. It is also the world’s
largest producer of certain types of technology-based
industrial fastening systems. The company’s brand names
include Black & Decker and DeWalt power tools, Emhart
Teknologies, Kwikset locks and other home security
products, and Price Pfister plumbing fixtures. Based in
Towson, MD, Black & Decker has manufacturing plants
in ten countries and markets its products in over 100
countries around the globe.

One of the major factors in Black & Decker’s Power
Tools Division’s leadership position is its highly advanced,
database-focused information system that assures a steady
and accurate supply of raw materials to the manufacturing
floor. Using Manugistics’ Demand and Supply Planning
software, the system forecasts demand for Black &
Decker’s power tools and then generates a raw material

Printed by permission of Black & Decker

supply plan based on the forecast and on the company’s
manufacturing capacity. These results are fed into SAP’s
Plant Planning System that takes into account suppliers’
capabilities and lead-time constraints to set up orders for
the raw materials.

Both the Manugistics and SAP software use Ora-
cle databases to keep track of all the data involved in
these processes. Black & Decker runs the system, which
became fully integrated in 1998, on clustered Compaq
Alphas. The databases are also shared by the company’s
purchasing, receiving, finance, and accounting depart-
ments, assuring very high degrees of accuracy and speed
throughout the company’s operations and procedures.
Included among the major database tables that support
this information system are a material master table, a
vendor master table, a bill-of-materials table (indicating
which parts go into making which other parts), a rout-
ing table (indicating the work stations that the part will
move through during manufacturing), planning, purchase
order, customer, and other tables.

108 C h a p t e r 5 The Relational Database Model: Introduction

F I G U R E 5.1
Relational database terminology

Relation (or Table or File)

Student Student
Number Name Class Major

03657 Robert Shaw Senior Biology

05114 Gloria Stuart Freshman English

05950 Fred Simpson Junior Mathematics

12746 W. Shin Junior English

15887 Pedro Marcos Senior History

19462 H. Yamato Sophomore French

21682 Mary Jones Freshman Chemistry

24276 Steven Baker Sophomore History

Attribute
(or Column or Field)

Tuple
(or Row or Record)

To begin with, consider the data structure used in relational databases. In a
relational database, the data appears to be stored in what we have been referring
to as simple, linear files. Following the conventions of the area of mathematics on
which relational database is based, we will begin calling those simple linear files
relations, although in common practice they are also referred to as ‘‘tables.’’ In the
terminology of files, each row is called a ‘‘record,’’ while in a relation, each row
is called a tuple. In files, each column is called a ‘‘field,’’ while in a relation each
column is called an attribute. In practice, in speaking about relational database,
people commonly use the terms relation, table, and file synonymously. Similarly,
tuple, row, and record are often used synonymously, as are attribute, column, and
field, Figure 5.1. We will use an appropriate term in each particular situation during
our discussion. In particular, we will use the term ‘‘relation’’ in this chapter and
the next, in which we are talking about relational database concepts. Following
common usage, we will generally use the word ‘‘table’’ in the more applied parts
of the book, such as in the corporate database stories in each chapter and in the
discussion of SQL in Chapter 4.

It is important to note that there are technical differences between the concept
of a file and the concept of a relation (which is why we say that in a relational
database the data only appears to be stored in structures that look like files).The
differences include:

■ The columns of a relation can be arranged in any order without affecting the
meaning of the data. This is not true of a file.

■ Similarly, the rows of a relation can be arranged in any order, which is not true
of a file.

■ Every row/column position, sometimes referred to as a ‘‘cell,’’ can have only a
single value, which is not necessarily true in a file.

■ No two rows of a relation are identical, which is not necessarily true in a file.

A relational database is simply a collection of relations that, as a group,
contain the data describing a particular business environment.

The Relational Database Concept 109

Primary and Candidate Keys

Primary Keys Figure 5.2 contains two relations, the

SALESPERSON relation

and the CUSTOMER relation, from General Hardware Company’s relational
database. The SALESPERSON relation has four rows, each representing one
salesperson. Also, the SALESPERSON relation has four columns, each representing
a characteristic of salespersons. Similarly, the CUSTOMER relation has nine rows,
each representing a customer, and four columns.

A relation always has a unique primary key. A primary key (sometimes
shortened in practice to just ‘‘the key’’) is an attribute or group of attributes whose
values are unique throughout all rows of the relation. In fact, the primary key
represents the characteristic of a collection of entities that uniquely identifies each
one. For example, in the situation described by the relations in Figure 5.2, each
salesperson has been assigned a unique salesperson number and each customer
has been assigned a unique customer number. Therefore the Salesperson Number
attribute is the primary key of the SALESPERSON relation and the Customer
Number attribute is the primary key of the CUSTOMER relation. As in Figure 5.2,
we will start marking the primary key attribute(s) with a single, solid underline.

The number of attributes involved in the primary key is always the minimum
number of attributes that provide the uniqueness quality. For example, in the
SALESPERSON relation, it would make no sense to have the combination of
Salesperson Number and Salesperson Name as the primary key because Salesperson
Number is unique by itself. However, consider the situation of a SALESPERSON

F I G U R E 5.2
General Hardware Company
relational database

(a) SALESPERSON relation

Salesperson Salesperson Commission Year
Number Name Percentage of Hire
137 Baker 10 1995
186 Adams 15 2001
204 Dickens 10 1998
361 Carlyle 20 2001

(b)

CUSTOMER relation

Customer Customer Salesperson
Number Name Number HQ City
0121 Main St. Hardware 137 New York
0839 Jane’s Stores 186 Chicago
0933 ABC Home Stores 137 Los Angeles
1047 Acme Hardware Store 137 Los Angeles
1525 Fred’s Tool Stores 361 Atlanta
1700 XYZ Stores 361 Washington
1826 City Hardware 137 New York
2198 Western Hardware 204 New York
2267 Central Stores 186 New York

110 C h a p t e r 5 The Relational Database Model: Introduction

relation that does not include a Salesperson Number attribute, but instead has a First
Name attribute, a Middle Name attribute, and a Last Name attribute. The primary
key might then be the combination of the First, Middle, and Last Name attributes
(assuming this would always produce a unique combination of values. If it did not,
then a fourth attribute could be added to the relation and to the primary key as a
sequence field to produce, for example, John Alan Smith #1, John Alan Smith #2,
and so forth). Some attribute or combination of attributes of a relation has to be
unique and this can serve as the unique primary key, since, by definition, no two
rows can be identical. In the worst case, all of the relation’s attributes combined
could serve as the primary key if necessary (but this situation is uncommon in
practice).

Candidate Keys If a relation has more than one attribute or minimum group of
attributes that represents a way of uniquely identifying the entities, then they are
each called a candidate key. (Actually, if there is only one unique attribute or
minimum group of attributes it can also be called a candidate key.) For example, in
a personnel relation, an employee number attribute and a Social Security Number
attribute (each of which is obviously unique) would each be a candidate key of that
relation. When there is more than one candidate key, one of them must be chosen to
be the primary key of the relation. That is where the term ‘‘candidate key’’ comes
from, since each one is a candidate for selection as the primary key. The decision of
which candidate key to pick to be the primary key is typically based on which one
will be the best for the purposes of the applications that will use the relation and the
database. Sometimes the term alternate key is used to describe a candidate key that
was not chosen to be the primary key of the relation, Figure 5.3.

F I G U R E 5.3
Candidate keys become either primary or
alternate keys

CandidateKey 1

CandidateKey 3

CandidateKey 2

CandidateKey 1
CandidateKey 3
CandidateKey 2

Alternate
Key

Alternate
Key

The Winner and
Primary Key

The Relational Database Concept 111

Foreign Keys and Binary Relationships

Foreign Keys If, in a collection of relations that make up a relational database, an
attribute or group of attributes serves as the primary key of one relation and also
appears in another relation, then it is called a foreign key in that other relation. Thus
Salesperson Number, which is the primary key of the SALESPERSON relation,
is considered a foreign key in the CUSTOMER relation, Figure 5.4. As shown
in Figure 5.4, we will start marking the foreign key attribute(s) with a dashed
underline. The concept of the foreign key is crucial in relational databases, as
the foreign key is the mechanism that ties relations together to represent unary,
binary, and ternary relationships. We begin the discussion by considering how
binary relationships are stored in relational databases. These are both the most
common and the easiest to deal with. The unary and ternary relationships will come
later. Recall from the discussion of the entity-relationship model that the three
kinds of binary relationships among the entities in the business environment are
the one-to-one, one-to-many, and many-to-many relationships. The first case is the
one-to-many relationship, which is typically the most common of the three.

One-to-Many Binary Relationship Consider the SALESPERSON and CUSTOMER
relations of Figure 5.2, repeated in Figure 5.4. As one would expect in most
sales-oriented companies, notice that each salesperson is responsible for several
customers while each customer has a single salesperson as their point of contact with
General Hardware. This one-to-many binary relationship can be represented as:

Salesperson Customer

137
186
204
361
Baker
Adams
Dickens
Carlyle
10

15

10
20
1995
2001
1998
2001

Salesperson
Number

Salesperson

Name

Year
of Hire

Commission
Percentage

0121
0839
0933
1047
1525
1700
1826
2198
2267
Main St. Hardware

Jane’s Hardware

ABC Home Stores
Acme Hardware Store
Fred’s Tool Stores
XYZ Stores
City Hardware

Western Hardware

Central Stores
137
186
137
137
361
361
137
204
186
New York
Chicago
Los Angeles
Los Angeles
Atlanta
Washington
New York
New York
New York

Customer
Number

Customer
Name HQ City

Salesperson
Number

Primary
Key

Foreign
Key

(a) SALESPERSON relation

(b) CUSTOMER relation

United States Europe

F I G U R E 5.4
A foreign key

112 C h a p t e r 5 The Relational Database Model: Introduction

F I G U R E 5.5
A salesperson and his four customers

Customer 1826

Customer 1047

Customer 0933

Customer 0121

Salesperson 137

Mr. Baker

For example, the Salesperson Number attribute of the CUSTOMER relation
shows that salesperson 137 is responsible for customers 0121, 0933, 1047, and 1826.
Looking at it from the point of view of the customer, the same relation shows that
the only salesperson associated with customer 0121 is salesperson 137, Figure 5.5.
This last point has to be true. After all, there is only one record for each customer in
the CUSTOMER relation (the Customer Number attribute is unique since it is the
relation’s primary key) and there is only one place to put a salesperson number in
it. The bottom line is that the Salesperson Number foreign key in the CUSTOMER
relation effectively establishes the one-to-many relationship between salespersons
and customers.

By the way, notice that, in this case, the primary key of the SALESPERSON
relation and the corresponding foreign key in the CUSTOMER relation both have
the same attribute name, Salesperson Number. This will often be the case but it
does not have to be. What is necessary is that both attributes have the same domain
of values; that is, they must both have values of the same type, such as (in this case)
three-digit whole numbers that are the identifiers for salespersons.

It is the presence of a salesperson number in a customer record that indicates
which salesperson the customer is associated with. Fundamentally, that is why
the Salesperson Number attribute is in the CUSTOMER relation and that is the
essence of its being a foreign key in that relation. In Chapter 7, we will discuss
database design issues in detail. But, for now, note that when building a one-to-many
relationship into a relational database, it will always be the case that the unique
identifier of the entity on the ‘‘one side’’ of the relationship (Salesperson Number,
in this example) will be placed as a foreign key in the relation representing the entity
on the ‘‘many side’’ of the relationship (the CUSTOMER relation, in this example).

Here’s something else about foreign keys. There are situations in which a
relation doesn’t have a single, unique attribute to serve as its primary key. Then, it

The Relational Database Concept 113

requires a combination of two or more attributes to reach uniqueness and serve as
its primary key. Sometimes one or more of the attributes in that combination can be
a foreign key! Yes, when this happens, a foreign key is actually part of the relation’s
primary key! This was not the case in the CUSTOMER relation of Figure 5.2b.
In this relation, the primary key only consists of one attribute, Customer Number,
which is unique all by itself. The foreign key, Salesperson Number, is clearly not a
part of the primary key.

Here is an example of a situation in which a foreign key is part of a relation’s
primary key. Figure 5.6 adds the CUSTOMER EMPLOYEE relation, Figure 5.6c,
to the General Hardware database. Remember that General Hardware’s customers
are the hardware stores, home improvement stores, or chains of such stores that it
supplies. Figure 5.6c, the CUSTOMER EMPLOYEE relation, lists the employees of
each of General Hardware’s customers. In fact, there is a one-to-many relationship
between customers and customer employees. A customer (like a hardware store)
has many employees but an employee, a person, works in only one store:

Customer Customer Employee

For example, Figure 5.6c shows that customer 2198 has four employees,
Smith, Jones, Garcia, and Kaplan. Each of those people is assumed to work for
only one customer company, customer 2198. Following the rule we developed for
setting up a one-to-many relationship with a foreign key, the Customer attribute
must appear in the CUSTOMER EMPLOYEE relation as a foreign key, and indeed
it does.

Now, what about finding a legitimate primary key for the CUSTOMER
EMPLOYEE relation? The assumption here is that employee numbers are unique
only within a company; they are not unique across all of the customer companies.
Thus, as shown in the CUSTOMER EMPLOYEE relation in Figure 5.6c, there can
be an employee of customer number 0121 who is employee number 30441 in that
company’s employee numbering system, an employee of customer number 0933
who is employee number 30441 in that company’s system, and also an employee of
customer number 2198 who is also employee number 30441. That being the case, the
Employee Number is not a unique attribute in this relation. Neither it nor any other
single attribute of the CUSTOMER EMPLOYEE relation is unique and can serve,
alone, as the relation’s primary key. But the combination of Customer Number and
Employee Number is unique. After all, we know that customer numbers are unique
and within each customer company, employee numbers are unique. That means
that, as shown in Figure 5.6c, the combination of Customer Number and Employee
Number can be and is the relation’s primary key. Further, that means that Customer
Number is both a foreign key in the CUSTOMER EMPLOYEE relation and a part
of its primary key. As shown in Figure 5.6c, we will start marking attributes that
are both a foreign key and a part of the primary key with an underline consisting of
a dashed line over a solid line.

Many-to-Many Binary Relationship

Storing the Many-to-Many Binary Relationship Figure 5.7 expands the General Hardware
database by adding two more relations, the PRODUCT relation, Figure 5.7d, and the
SALES relation, Figure 5.7e. The PRODUCT relation simply lists the products that
General Hardware sells, one row per product, with Product Number as the unique
identifier and thus the primary key of the relation. Each of General Hardware’s

114 C h a p t e r 5 The Relational Database Model: Introduction

F I G U R E 5.6
General Hardware Company relational
database including the CUSTOMER
EMPLOYEE relation

(a) SALESPERSON relation
Salesperson Salesperson Commission Year
Number Name Percentage of Hire
137 Baker 10 1995
186 Adams 15 2001
204 Dickens 10 1998
361 Carlyle 20 2001
(b) CUSTOMER relation
Customer Customer Salesperson
Number Name Number HQ City
0121 Main St. Hardware 137 New York
0839 Jane’s Stores 186 Chicago
0933 ABC Home Stores 137 Los Angeles
1047 Acme Hardware Store 137 Los Angeles
1525 Fred’s Tool Stores 361 Atlanta
1700 XYZ Stores 361 Washington
1826 City Hardware 137 New York
2198 Western Hardware 204 New York
2267 Central Stores 186 New York

(c) CUSTOMER EMPLOYEE relation

Customer Employee Employee
Number Number Name Title

0121 27498 Smith Co-Owner
0121 30441 Garcia Co-Owner
0933 25270 Chen VP Sales
0933 30441 Levy Sales Manager
0933 48285 Morton President
1525 33779 Baker Sales Manager
2198 27470 Smith President
2198 30441 Jones VP Sales
2198 33779 Garcia VP Personnel
2198 35268 Kaplan Senior Accountant

salespersons can sell any or all of the company’s products and each product can be
sold by any or all of its salespersons. Therefore the relationship between salespersons
and products is a many-to-many relationship.

Salesperson Product

The Relational Database Concept 115

So, the database will somehow have to keep track of this many-to-many
relationship between salespersons and products. The way that a many-to-many
relationship is represented in a relational database is by the creation of an additional
relation, in this example, the SALES relation in Figure 5.7e. The SALES relation
of Figure 5.7e is intended to record the lifetime sales of a particular product
by a particular salesperson. Thus, there will be a single row in the relation for
each applicable combination of salesperson and product (i.e., when a particular
salesperson has actually sold some of the particular product). For example, the first
row of the SALES relation indicates that salesperson 137 has sold product 19440.

F I G U R E 5.7
General Hardware Company relational
database including the PRODUCT and
SALES relation

(a) SALESPERSON relation
Salesperson Salesperson Commission Year
Number Name Percentage of Hire
137 Baker 10 1995
186 Adams 15 2001
204 Dickens 10 1998
361 Carlyle 20 2001
(b) CUSTOMER relation
Customer Customer Salesperson
Number Name Number HQ City
0121 Main St. Hardware 137 New York
0839 Jane’s Stores 186 Chicago
0933 ABC Home Stores 137 Los Angeles
1047 Acme Hardware Store 137 Los Angeles
1525 Fred’s Tool Stores 361 Atlanta
1700 XYZ Stores 361 Washington
1826 City Hardware 137 New York
2198 Western Hardware 204 New York
2267 Central Stores 186 New York
(c) CUSTOMER EMPLOYEE relation
Customer Employee Employee
Number Number Name Title
0121 27498 Smith Co-Owner
0121 30441 Garcia Co-Owner
0933 25270 Chen VP Sales
0933 30441 Levy Sales Manager
0933 48285 Morton President
1525 33779 Baker Sales Manager
2198 27470 Smith President
2198 30441 Jones VP Sales
2198 33779 Garcia VP Personnel
2198 35268 Kaplan Senior Accountant

(Continues)

116 C h a p t e r 5 The Relational Database Model: Introduction

F I G U R E 5.7 (Continued)
General Hardware Company relational
database including the PRODUCT and
SALES relation

(d)

PRODUCT relation

Product Product Unit
Number Name Price

16386 Wrench 12.95
19440 Hammer 17.50
21765 Drill 32.99
24013 Saw 26.25
26722 Pliers 11.50

(e) SALES relation

Salesperson Product
Number Number

Quantity

137 19440 473
137 24013 170
137 26722 688

186 16386 1,745

186 19440 2,529

186 21765 1,962

186 24013 3,071

204 21765 809

204 26722 734

361 16386 3,729

361 21765 3,110

361 26722 2,738

Since it is sufficient to record that fact once, the combination of the Salesperson
Number and Product Number attributes always produces unique values. So, in
general, the new relation created to record the many-to-many relationship will
have as its primary key the combined unique identifiers of the two entities in the
many-to-many relationship. That’s why, in this example, the Salesperson Number
and Product Number attributes both appear in the SALES relation. Each of the two
is a foreign key in the SALES relation since each is the primary key of another
relation in the database. The combination of these two attributes is unique, and
combined they comprise the primary key of the newly created SALES relation.

The new SALES relation of Figure 5.7e effectively records the many-to-
many relationship between salespersons and products. This is illustrated from the
‘‘salesperson side’’ of the many-to-many relationship by looking at the first three
rows of the SALES relation and seeing that salesperson 137 sells products 19440,
24013, and 26722. It is illustrated from the ‘‘product side’’ of the many-to-many
relationship by scanning down the Product Number column of the SALES relation,
looking for the value 19440, and seeing that product 19440 is sold by salespersons
137 and 186, Figure 5.8.

Intersection Data What about the Quantity attribute in the SALES relation? In addition
to keeping track of which salespersons have sold which products, General Hardware

The Relational Database Concept 117

F I G U R E 5.8
Many-to-many relationship between
salespersons and products as shown in
the SALES relation

Salesperson 137
Mr. Baker

Product 19440

Hammer

Product 24013

Saw

Product 26722
Pliers

Salesperson 186

Ms. Adams

wants to record how many of each particular product each salesperson has sold
since the product was introduced or since the salesperson joined the company. So,
it sounds like there has to be a ‘‘Quantity’’ attribute. And, an attribute describes
an entity, right? Then, which entity does the Quantity attribute describe? Does
it describe salespersons the way the Year of Hire does in the SALESPERSON
relation? Does it describe products the way Unit Price does in the

PRODUCT

relation? Each salesperson has exactly one date of hire. Each product has exactly
one unit price. But a salesperson doesn’t have just one ‘‘quantity’’ associated with
her because she sells many products and similarly, a product doesn’t have just one
‘‘quantity’’ associated with it because it is sold by many salespersons.

While year of hire is clearly a characteristic of salespersons and unit price is
clearly a characteristic of products, ‘‘quantity’’ is a characteristic of the relationship
between salesperson and product. For example, the fact that salesperson 137 appears
in the first row of the SALES relation of Figure 5.7e along with product 19440
indicates that he has a history of selling this product. But do we know more about
his history of selling it? Yes! That first row of Figure 5.7e indicates that salesperson
137 has sold 473 units of product 19440. Quantity describes the many-to-many
relationship between salespersons and products. In a sense it falls at the intersection
between the two entities and is thus called ‘‘intersection data,’’ Figure 5.9.

Since the many-to-many relationship has its own relation in the database and
since it can have attributes, does that mean that we should think of it as a kind of
entity? Yes! Many people do just that and refer to it as an ‘‘associative entity,’’ a
concept we first described when discussing data modeling in Chapter 2!

Additional Many-to-Many Concepts Before leaving the subject of many-to-many relation-
ships, there are a few more important points to make. First, will the combination
of the two primary keys representing the two entities in the many-to-many
relationship always serve as a unique identifier or primary key in the additional
relation representing the many-to-many relationship? The answer is that this
depends on the precise nature of the many-to-many relationship. For example,

118 C h a p t e r 5 The Relational Database Model: Introduction

F I G U R E 5.9
Intersection data that indicates that
salesperson 137 has sold 473 units of
product 19440

Salesperson 137
Product 19440

473
Units

Intersection
Data

in the situation of the SALES relation in Figure 5.7e, the combination of the two
entity identifier attributes works perfectly as the primary key, as described above.
But, what if General Hardware decides it wants to keep track of each salesperson’s
annual sales of each product instead of their lifetime sales? Fairly obviously, a
new attribute, Year, would have to be added to the SALES relation, as shown in
Figure 5.10. Moreover, as demonstrated by a few sample rows of that relation, the
combination of Salesperson Number and Product Number is no longer unique. For
example, salesperson 137 sold many units of product 19440 in each of 1999, 2000,
and 2001. The first three records of the relation all have the salesperson number,
product number combination of 137, 19440. Clearly, the way to solve the problem
in this instance is to add the Year attribute to the Salesperson Number and Product
Number attributes to form a three-attribute unique primary key. It is quite common
in practice to have to add such a ‘‘timestamp’’ to a relation storing a many-to-many
relationship in order to attain uniqueness and have a legitimate primary key.
Sometimes, as in the example in Figure 5.10, this is accomplished with a Year
attribute. A Date attribute is required if the data may be stored two or more times in a
year. A Time attribute is required if the data may be stored more than once in a day.

Next is the question of why an additional relation is necessary to represent
a many-to-many relationship. For example, could the many-to-many relationship
between salespersons and products be represented in either the SALESPERSON or
PRODUCT relations? The answer is no! If, for instance, you tried to represent the
many-to-many relationship in the SALESPERSON relation, you would have to list
all of the products (by Product Number) that a particular salesperson has sold in that
salesperson’s record. Furthermore, you would have to carry the Quantity intersection
data along with it in some way. For example, in the SALESPERSON relation, the
row for salesperson 137 would have to be extended to include products 19440,

The Relational Database Concept 119

F I G U R E 5.10
Modified SALES relation of the General
Hardware Company relational database,
including a Year attribute

SALES relation (modified)

Salesperson Product
Number Number Year Quantity

137 19440 1999 132

137 19440 2000 168

137 19440 2001 173

137 24013 2000 52

137 24013 2001 118

137 26722 1999 140

137 26722 2000 203

137 26722 2001 345

186 16386 1998 250

186 16386 1999 245

186 16386 2000 581

186 16386 2001 669

24013, and 26722, plus the associated intersection data, Figure 5.11a. Alternatively,
one could envision a single additional attribute in the SALESPERSON relation into
which all the related product number and intersection data for each salesperson
would somehow be stuffed, Figure 5.11b (although, aside from other problems,
this would violate the rule that every cell in a relation must have only a single
value). In either case, it would be unworkable. Because, in general, each salesperson
has been involved in selling different numbers of product types, each record of
the SALESPERSON relation would be a different length. Furthermore, additions,

(a) Additional Product and Quantity columns

Salesperson Salesperson Commission Year
Number Name Percentage of Hire Product Qty Product Qty Product Qty Product Qty

137 Baker 10 1995 19440 473 24013 170 26722 688

186 Adams 15 2001 16386 1745 19440 2529 21765 1962 24013 3071

204 Dickens 10 1998 21765 809 26722 734

361 Carlyle 20 2001 16386 3729 21765 3110 26722 2738

(b) One additional column for Product and Quantity Pairs

Salesperson Salesperson Commission Year
Number Name Percentage of Hire Product and Quantity Pairs

137 Baker 10 1995 (19440, 473) (24013, 170) (26722, 688)

186 Adams 15 2001 (16386, 1745) (19440, 2529) (21765, 1962) (24013, 3071)

204 Dickens 10 1998 (21765, 809) (26722, 734)

361 Carlyle 20 2001 (16386, 3729) (21765, 3110) (26722, 2738)

F I G U R E 5.11
Unacceptable ways of storing a binary many-to-many relationship

120 C h a p t e r 5 The Relational Database Model: Introduction

deletions, and updates of product/quantity pairs would be a nightmare. Also, trying
to access the related data from the ‘‘product side,’’ for example looking for all
of the salespersons who have sold a particular product, would be very difficult.
And, incidentally, trying to make this work by putting the salesperson data into the
PRODUCT relation, instead of putting the product data into the SALESPERSON
relation as in Figure 5.11, would generate an identical set of problems. No,
the only way that’s workable is to create an additional relation to represent the
many-to-many relationship. Each combination of a related salesperson and product
has its own record, making the insertion, deletion, and update of related items
feasible, providing a clear location for intersection data, and avoiding the issue of
variable-length records.

Finally, there is the question of whether an additional relation is required to
represent a many-to-many relationship if there is no intersection data. For example,
suppose that General Hardware wants to track which salespersons have sold which
products, but has no interest in how many units of each product they have sold.
The SALES relation of Figure 5.7e would then have only the Salesperson Number
and Product Number attributes, Figure 5.12. Could this information be stored in
some way other than with the additional SALES relation? The answer is that the
additional relation is still required. Note that in the explanation above of why an
additional relation is necessary in general to represent a many-to-many relationship,
the intersection data played only a small role. The issues would still be there, even
without intersection data.

One-to-One Binary Relationship After considering one-to-many and many-to-many
binary relationships in relational databases, the remaining binary relationship is the
one-to-one relationship. Each of General Hardware’s salespersons has exactly one
office and each office is occupied by exactly one salesperson, Figure 5.13.

Salesperson Office

F I G U R E 5.12
The many-to-many SALES relation without
intersection data

SALES relation
(without intersection data)

Salesperson Product
Number Number

137 19440

137 24013

137 26722

186 16386

186 19440

186 21765

186 24013

204 21765

204 26722

361 16386

361 21765

361 26722

The Relational Database Concept 121

F I G U R E 5.13
A one-to-one binary relationship

Salesperson 186

Salesperson 204

Salesperson 361

Salesperson 137 Office 1253

Office 1227

Office 1284

Office 1209

Figure 5.14f shows the addition of the OFFICE relation to the General
Hardware relational database. The SALESPERSON relation has the Office Number
attribute as a foreign key so that the company can look up the record for a salesperson
and see to which office she is assigned. Because this is a one-to-one relationship and
each salesperson has only one office, the company can also scan down the Office
Number column of the SALESPERSON relation, find a particular office number
(which can only appear once, since it’s a one-to-one relationship), and see which
salesperson is assigned to that office. In general, this is the way that one-to-one
binary relationships are built into relational databases. The unique identifier, the
primary key, of one of the two entities in the one-to-one relationship is inserted into
the other entity’s relation as a foreign key. The question of which of the two entities
is chosen as the ‘‘donor’’ of its primary key and which as the ‘‘recipient’’ will be
discussed further when we talk about logical design in Chapter 7.

But there is another interesting question about this arrangement. Could the
SALESPERSON and OFFICE relations of Figure 5.14 be combined into one
relation? After all, a salesperson has only one office and an office has only
one salesperson assigned to it. So, if an office and its unique identifier, Office
Number, ‘‘belongs’’ to one particular salesperson, so does that office’s Telephone

122 C h a p t e r 5 The Relational Database Model: Introduction

F I G U R E 5.14
General Hardware Company relational
database including the OFFICE relation

(a) SALESPERSON relation

Salesperson Salesperson Commission Year Office
Number Name Percentage of Hire Number

137 Baker 10 1995 1284
186 Adams 15 2001 1253
204 Dickens 10 1998 1209
361 Carlyle 20 2001 1227
(b) CUSTOMER relation
Customer Customer Salesperson
Number Name Number HQ City
0121 Main St. Hardware 137 New York
0839 Jane’s Stores 186 Chicago
0933 ABC Home Stores 137 Los Angeles
1047 Acme Hardware Store 137 Los Angeles
1525 Fred’s Tool Stores 361 Atlanta
1700 XYZ Stores 361 Washington
1826 City Hardware 137 New York
2198 Western Hardware 204 New York
2267 Central Stores 186 New York
(c) CUSTOMER EMPLOYEE relation
Customer Employee Employee
Number Number Name Title
0121 27498 Smith Co-Owner
0121 30441 Garcia Co-Owner
0933 25270 Chen VP Sales
0933 30441 Levy Sales Manager
0933 48285 Morton President
1525 33779 Baker Sales Manager
2198 27470 Smith President
2198 30441 Jones VP Sales
2198 33779 Garcia VP Personnel
2198 35268 Kaplan Senior Accountant
(Continues)

Number and Size. Indeed, when we want to contact a salesperson, we ask for her
phone number, not for ‘‘her office’s phone number!’’ So, could we combine the
SALESPERSON and OFFICE relations of Figure 5.14 into the single relation of
Figure 5.15? The answer is, it’s possible in some cases, but you have to be very
careful about making such a decision. In the General Hardware case, how would
you store an unoccupied office in the database? The relation of Figure 5.15 allows
data about an office to be stored only if the office is occupied. After all, the primary
key of Figure 5.15’s relation is Salesperson Number! You can’t have a record with
office data in it and no salesperson data. A case where it might work is a database
of U.S. states and their governors. Every state always has exactly one governor and

The Relational Database Concept 123

F I G U R E 5.14 (Continued)
General Hardware Company relational
database including the OFFICE relation

(d) PRODUCT relation
Product Product Unit
Number Name Price
16386 Wrench 12.95
19440 Hammer 17.50
21765 Drill 32.99
24013 Saw 26.25
26722 Pliers 11.50
(e) SALES relation

Salesperson Product
Number Number Quantity

137 19440 473
137 24013 170
137 26722 688
186 16386 1,745
186 19440 2,529
186 21765 1,962
186 24013 3,071
204 21765 809
204 26722 734
361 16386 3,729
361 21765 3,110
361 26722 2,738

(f) OFFICE relation

Office
Number Telephone Size (sq. ft.)

1253 901-555-4276 120

1227 901-555-0364 100
1284 901-555-7335 120
1209 901-555-3108 95

F I G U R E 5.15
Combining the SALESPERSON and

OFFICE

relations into a single relation

Combined SALESPERSON/OFFICE relation

Salesperson Salesperson Commission Year Office
Number Name Percentage of Hire Number Telephone Size (sq. ft.)

137 Baker 10 1995 1284 901-555-7335 120

186 Adams 15 2001 1253 901-555-4276 120

204 Dickens 10 1998 1209 901-555-3108 95

361 Carlyle 20 2001 1227 901-555-0364 100

124 C h a p t e r 5 The Relational Database Model: Introduction

Y O U R
T U R N

5.1 ENTITIES, KEYS, AND RELATIONSHIPS

Think about a retail store that sells
TVs, computers, cameras, DVDs, etc. What entities must
it keep track of? What are some of the attributes of
those entities? What about relationships among those
entities? What do you think would be appropriate
primary, candidate, and foreign keys in the relational
tables designed to store data about the entities?

QUESTION:
Specifically consider a chain of retail electronics stores.

List several of the main entities that the chain must keep
track of. What are the relationships between those
entities? What would appropriate primary, candidate,
and foreign keys be in the relational tables that would
store the data about these entities?

anyone who is a governor must be associated with one state. There can’t be a state
without a governor or a governor without a state.

At any rate, in practice, there are a variety of reasons for keeping the two
relations involved in the one-to-one relationship separate. It may be that because
each of the two entities involved is considered sufficiently important in its own
right, this separation simply adds clarity to the database. It may be because most
users at any one time seek data about only one of the two entities. It may have to
do with splitting the data between different geographic sites. It can even be done
for system performance in the case where the records would be unacceptably long
if the data was all contained in one relation. These issues will be discussed later in
this book but it is important to have at least a basic idea of the intricacies of the
one-to-one relationship, at this point.

DATA RETRIEVAL FROM A RELATIONAL DATABASE

Extracting Data from a Relation

Thus far, the discussion has concentrated on how a relational database is structured.
But building relations and loading them with data is only half of the story. The other
half is the effort to retrieve the data in a way that is helpful and beneficial to the
business organization that built the database. If the database management system did
not provide any particular help in this effort, then the problem would revert to simply
writing a program in some programming language to retrieve data from the relations,
treating them as if they were simple, linear files. But the crucial point is that a
major, defining feature of a relational DBMS is the ability to accept high-level data
retrieval commands, process them against the database’s relations, and return the
desired data. The data retrieval mechanism is a built-in part of the DBMS and does
not have to be written from scratch by every program that uses the database. As we
shall soon see, this is true even to the extent of matching related records in different
relations (integrating data), as in the earlier example of finding the name of the sales-
person on a particular customer account. We shall address what relational retrieval
might look like, first in terms of single relations and then across multiple relations.

Since a relation can be viewed as a tabular or rectangular arrangement of data
values, it would seem to make sense to want to approach data retrieval horizontally,
vertically, or in a combination of the two. Taking a horizontal slice of a relation
implies retrieving one or more rows of the relation. In effect, that’s an expression

Data Retrieval from a Relational Database 125

for retrieving one or more records or retrieving the data about one or more entities.
Taking a vertical slice of a relation means retrieving one or more entire columns of
the relation (down through all of its rows). Taken in combination, we can retrieve
one or more columns of one or more rows, the minimum of which is a single column
of a single row, or a single attribute value of a single record. That’s as fine a sense
of retrieval as we would ever want.

Using terminology from a database formalism called relational algebra and
an informal, hypothetical command style for now, there are two commands called
Select and Project that are capable of the kinds of horizontal and vertical manipula-
tions just suggested. (Note: the use of the word ‘‘Select’’ here is not the same as its
use in the SQL data retrieval language discussed in Chapter 4.)

The Relational Select Operator

Consider the database of Figure 5.14 and its SALESPERSON relation, Figure 5.14a.
To begin with, suppose that we want to find the row or record for salesperson number
204. In a very straightforward way, the informal command might be:

Select rows from the SALESPERSON relation in which Salesperson Number
= 204.

The result would be:

Salesperson Salesperson Commission Year
Number Name Percentage Of Hire

204 Dickens 10 1998

Notice that the result of the Select operation is itself a relation, in this case
consisting of only one row. The result of a relational operation will always be a
relation, whether it consists of many rows with many columns or one row with one
column (i.e., a single attribute value).

In order to retrieve all of the records with a common value in a particular
(nonunique) attribute, for example all salespersons with a commission percentage
of 10, the command looks the same as when dealing with a unique attribute:

Select rows from the SALESPERSON relation in which Commission
Percentage = 10.

But the result of the operation may include several rows:

Salesperson Salesperson Commission Year
Number Name Percentage Of Hire
137 Baker 10 1995
204 Dickens 10 1998

If the requirement is to retrieve the entire relation, the command would be:
Select all rows from the SALESPERSON relation.

The Relational Project Operator

To retrieve what we referred to earlier as a vertical slice of the relation requires the
Project operator. For example, the command to retrieve the number and name of
each salesperson in the file might look like:

Project the Salesperson Number and Salesperson Name over the SALESPER-
SON relation.

126 C h a p t e r 5 The Relational Database Model: Introduction

The result will be a long narrow relation:

Salesperson Salesperson
Number Name

137 Baker
186 Adams
204 Dickens

361 Carlyle

If we project a nonunique attribute, then a decision must be made on whether
or not we want duplicates in the result (although, since the result is itself a relation,
technically there should not be any duplicate rows). For example, whether:

Project the Year of Hire over the SALESPERSON relation.
produces

Year Of Hire

1995
2001
1998
2001

or (eliminating the duplicates in the identical rows) produces

Year Of Hire
1995
2001
1998

would depend on exactly how this hypothetical informal command language was
implemented.

Combination of the Relational Select and Project Operators

More powerful still is the combination of the Select and Project operators. Suppose
we apply them serially, with the relation that results from one operation being used
as the input to the next operation. For example, to retrieve the numbers and names
of the salespersons working on a 10 % commission, we would issue:

Select rows from the SALESPERSON relation in which Commission
Percentage = 10.

Project the Salesperson Number and Salesperson Name over that result.
The first command ‘‘selects out’’ the rows for salespersons 137 and 204. Then

the second command ‘‘projects’’ the salesperson numbers and names from those
two rows, resulting in:

Salesperson Salesperson
Number Name
137 Baker
204 Dickens

Data Retrieval from a Relational Database 127

The following combination illustrates the ability to retrieve a single attribute
value. Suppose that there is a need to find the year of hire of salesperson number
204. Since Salesperson Number is a unique attribute, only one row of the relation
can possibly be involved. Since the goal is to find one attribute value in that row,
the result must be just that: a single attribute value. The command is:

Select rows from the SALESPERSON relation in which Salesperson Number
= 204.

Project the Year of Hire over that result.
The result is the single value:

Year of Hire

1998

Extracting Data Across Multiple Relations: Data Integration

In Chapter 3, the issue of data integration was broached and the concept was defined.
First, the data in the Salesperson and Customer files of Figure 3.7 was shown to be
non-redundant. Then it was shown that integrating data would require extracting
data from one file and using that extracted data as a search argument to find
the sought-after data in the other file. For example, recall that finding the name of
the salesperson who was responsible for customer number 1525 required finding the
salesperson number in customer 1525’s record in the Customer file (i.e. salesperson
number 361) and then using that salesperson number as a search argument in the
Salesperson file to discover that the sought-after name was Carlyle. The alternative
was the combined file of Figure 3.8 that introduced data redundancy.

A fundamental premise of the database approach is that a DBMS must be
able to store data non-redundantly while also providing a data integration facility.
But it seems that we may have a problem here. Since relations appear to be largely
similar in structure to simple, linear files, do the lessons learned from the files
of Figure 3.7 and Figure 3.8 lead to the conclusion that it is impossible to have
simultaneously non-redundant data storage and data integration with relations in a
relational database? In fact, one of the elegant features of relational DBMSs is that
they automate the cross-relation data extraction process in such a way that it appears
that the data in the relations is integrated while also remaining non-redundant. The
data integration takes place at the time that a relational query is processed by the
relational DBMS for solution. This is a unique feature of relational databases and
is substantially different from the functional equivalents in the older navigational
database systems and in some of the newer object-oriented database systems, in both
of which the data integration is much more tightly built into the data structure itself.
In relational algebra terms, the integration function is known as the Join command.

Now, focus on the SALESPERSON and CUSTOMER relations of Figure 5.14,
which outwardly look just like the SALESPERSON and CUSTOMER files of
Figure 3.7. Adding the Join operator to our hypothetical, informal command style,
consider the following commands designed to find the name of the salesperson
responsible for customer number 1525. Again, this was the query that seemed to be
so problematic in Chapter 3.

Join the SALESPERSON relation and the CUSTOMER relation, using the
Salesperson Number of each as the join fields.

Select rows from that result in which Customer Number = 1525.
Project the Salesperson Name over that last result.

128 C h a p t e r 5 The Relational Database Model: Introduction

Obviously, the first sentence represents the use of the join command. The
join operation will take advantage of the common Salesperson Number attribute,
which for this purpose is called the join field, in both relations. The Salesperson
Number attribute is, of course, the primary key of the SALESPERSON relation
and is a foreign key in the CUSTOMER relation. Remember that the point of
the foreign key is to represent a one-to-many (in this case) relationship between
salespersons and customers. Some rows of the SALESPERSON relation are related
to some rows of the CUSTOMER relation by virtue of having the same salesperson
number. The Salesperson Number attribute serves to identify each salesperson in the
SALESPERSON relation, while the Salesperson Number attribute indicates which
salesperson is responsible for a particular customer in the CUSTOMER relation.
Thus, the rows of the two relations that have identical Salesperson Number values
are related. It is these related rows that the join operation will bring together in
order to satisfy the query that was posed.

The join operation tries to find matches between the join field values of the
rows in the two relations. For example, it finds a match between the Salesperson
Number value of 137 in the first row of the SALESPERSON relation and the
Salesperson Number value of 137 in the first, third, fourth, and seventh rows of the
CUSTOMER relation. When it finds such a pair of rows, it takes all the attribute
values from both rows and creates a single new row out of them in the resultant
relation. In its most basic form, as shown here, the join is truly an exhaustive
operation, comparing every row of one relation to every row of the other relation,
looking for a match in the join fields. (Comparing every possible combination of
two sets, in this case rows from the two relations, is known as taking the ‘‘Cartesian
product.’’) So the result of the join command, the first of the three commands in the
example command sequence we’re executing, is:

SalesPerson SalesPerson Commission Year of Customer Customer SalesPerson
Number Name Percentage Hire Number Name Number HQ City

137 Baker 10 1995 0121 Main St. Hardware 137 New York

137 Baker 10 1995 0933 ABC Home Stores 137 Los Angeles

137 Baker 10 1995 1047 Acme Hardware Store 137 Los Angeles

137 Baker 10 1995 1826 City Hardware 137 New York

186 Adams 15 2001 0839 Jane’s Stores 186 Chicago

186 Adams 15 2001 2267 Central Stores 186 New York

204 Dickens 10 1998 2198 Western Hardware 204 New York

361 Carlyle 20 2001 1525 Fred’s Tool Stores 361 Atlanta

361 Carlyle 20 2001 1700 XYZ Stores 361 Washington

Notice that the first and seventh columns are identical in all of their
values, row by row. They represent the Salesperson Number attributes from
the SALESPERSON and CUSTOMER relations respectively. Remember that two
rows from the SALESPERSON and CUSTOMER relations would not be combined
together to form a row in the resultant relation unless their two join field values
were identical in the first place. This leads to identical values of the two Salesperson
Number attributes within each of the rows of the resultant relation. This type of
join is called an ‘‘equijoin.’’ If, as seems reasonable, one of the two identical join
columns is eliminated in the process, the result is called a ‘‘natural join.’’

Example: Good Reading Book Stores 129

Continuing with the command sequence to eventually find the name of the
salesperson responsible for customer number 1525, the next part of the command
issued is:

Select rows from that result (the relation that resulted from the join) in which
Customer Number = 1525.

This produces:

SalesPerson SalesPerson Commission Year of Customer Customer SalesPerson
Number Name Percentage Hire Number Name Number HQ City
361 Carlyle 20 2001 1525 Fred’s Tool Stores 361 Atlanta

Finally, we issue the third command
Project the Salesperson Name over that last result.
and get:

SalesPerson Name

Carlyle

Notice that the process could have been streamlined considerably if the relational
DBMS had more ‘‘intelligence’’ built into it. The query dealt with only a single
customer, customer 1525, and there is only one row for each customer in the
CUSTOMER relation, since Customer Number is the unique key attribute. Therefore,
the query needed to look at only one row in the CUSTOMER relation, the one for
customer 1525. Since this row references only one salesperson, salesperson 361,
it follows that, in turn, it needed to look at only one row in the SALESPERSON
relation, the one for salesperson 1525. This type of performance issue in relational
query processing will be covered later in this book in Chapter 8.

EXAMPLE: GOOD READING BOOK STORES

Figure 5.16 shows the relational database for the Good Reading Book Stores
example described earlier. Since publishers are in a one-to-many relationship to
books, the primary key of the PUBLISHER Relation, Publisher Name, is inserted
into the BOOK relation as a foreign key. There are two many-to-many relationships.
One, between books and authors, keeps track of which authors wrote which books.
Recall that a book can have multiple authors and a particular author may have written
or partly written many books. The other many-to-many relationship, between books
and customers, records which customers bought which books.

The WRITING relation handles the many-to-many relationship between
books and authors. The primary key is the combination of Book Number and
Author Number. There is no intersection data! Could there be a reason for having
intersection data in this relation? If, for example, this database belonged to a
publisher instead of a bookstore chain, an intersection data attribute might be
Royalty Percentage, i.e. the percentage of the royalties to which a particular author
is entitled for a particular book. The SALE relation takes care of the many-to-many
relationship between books and customers. Certainly Book Number and Customer
Number are part of the primary key of the SALE relation, but is the combination
of the two the entire primary key? The answer is that this depends on whether the
assumption is made that a given customer can or cannot buy copies of a given

130 C h a p t e r 5 The Relational Database Model: Introduction

F I G U R E 5.16
Good Reading Bookstores relational
database

SALE relation

Book Customer
Number Number Date Price Quantity

WRITING relation

Book Author
Number Number

CUSTOMER relation

Customer Customer
Number Name Street City State Country

BOOK relation

Book Book Publication Publisher
Number Name Year Pages Name

AUTHOR relation

Author Author Year Year
Number Name Born Died

PUBLISHER relation

Publisher Year
Name City Country Telephone Founded

book on different days. If the assumption is that a customer can only buy copies
of a particular book on one single day, then the combination of Book Number and
Customer Number is fine as the primary key. If the assumption is that a customer
may indeed buy copies of a given book on different days, then the Date attribute
must be part of the primary key to achieve uniqueness.

EXAMPLE: WORLD MUSIC ASSOCIATION

Figure 5.17 shows the relational database for the World Music Association example
described earlier. There is a one-to-many relationship from orchestras to musicians
and, in turn, a one-to-many relationship from musicians to degrees. Thus, the primary
key of the ORCHESTRA relation, Orchestra Name, appears in the MUSICIAN
relation as a foreign key. In turn, the primary key of the MUSICIAN relation,
Musician Number, appears in the DEGREE relation as a foreign key. In fact, since
the DEGREE attribute is unique only within a musician, the Musician Number
attribute and the Degree attribute together serve as the compound primary key of the
DEGREE relation. A similar situation exists between composers and compositions.
The one-to-many relationship from composers to compositions requires that the

Example: World Music Association 131

F I G U R E 5.17
World Music Association relational
database

RECORDING relation

Orchestra Composition
Name Name Year Price

COMPOSITION relation

Composition
Name Year

COMPOSER relation

Composer Date of
Country Birth

DEGREE relation

Musician
Number Degree University Year

MUSICIAN relation

Musician Musician Annual Orchestra
Number Name Instrument Salary Name

ORCHESTRA relation

Orchestra Music
Name City Country Director

Name

Composer
Name

Composer
Name

primary key of the COMPOSER relation, Composer Name, appear as a foreign key
in the COMPOSITION relation. Since composition names are unique only within
composers, the combination of Composition Name and Composer Name serves as
the compound primary key of the COMPOSITION relation.

The many-to-many relationship between orchestras and compositions indicates
which orchestras have recorded which compositions and which compositions have
been recorded by which orchestras. As a many-to-many relationship, it requires
that an additional relation be created. The primary key of this new

RECORDING

relation has three attributes: Orchestra Name, Composition Name, and Composer
Name. Orchestra Name is the unique identifier of orchestras. The combination of
Composition Name and Composer Name is the unique identifier of compositions.
The combination of Orchestra Name, Composition Name, and Composer Name is
the unique identifier of the RECORDING relation. The Year and Price attributes are
intersection data in the RECORDING relation. If a particular orchestra could have
recorded a particular composition multiple times in different years (although we
assume that this is limited to once per year), Year must also be part of the primary
key of the RECORDING relation to provide uniqueness.

132 C h a p t e r 5 The Relational Database Model: Introduction

F I G U R E 5.18
Lucky Rent-A-Car relational database

RENTAL relation

Car Serial Customer Rental Return Total
Number Number Date Date Cost

CUSTOMER relation

Customer Customer Customer Customer
Number Name Address Telephone

MAINTENANCE relation

Repair Car Serial Repair
Number Number Date Procedure Mileage Time

CAR Relation

Car Serial Manufacturer
Number Model Year Class Name

MANUFACTURER relation

Manufacturer Manufacturer Sales Rep Sales Rep
Name Country Name Telephone

EXAMPLE: LUCKY RENT-A-CAR

Figure 5.18 shows the relational database for the Lucky Rent-A-Car example
described earlier. There is a one-to-many relationship from manufacturers to cars
and another one-to-many relationship from cars to maintenance events. The former
requires the manufacturer primary key, Manufactuer Name, to be placed in the CAR
relation as a foreign key. The latter requires the car primary key, Car Serial Number,
to be placed in the MAINTENANCE relation as a foreign key. The many-to-many
relationship among cars and customers requires the creation of a new relation, the
RENTAL relation. Each record of the RENTAL relation records the rental of a
particular car by a particular customer. Note that the combination of the Car Serial
Number and Customer Number attributes is not sufficient as the primary key of the
RENTAL relation. A given customer might have rented a given car more than once.
Adding Rental Date to the primary key achieves the needed uniqueness.

SUMMARY

The relational approach to database management is by far the primary database
management approach used in all levels of information systems applications today.
The basic structural component of a relational database is the relation, which
appears to be a simple linear file but has some technical differences.

Questions 133

Every relation has a unique primary key consisting of one or more attributes that
have unique values in that relation. Multiple such unique attributes or combinations
of attributes that have the uniqueness property are called candidate keys. The
candidate keys that are not chosen to be the one primary key are called alternate
keys. If the primary key of one relation in the database also appears in another
relation of the database, it called a foreign key in that second relation. Foreign keys
tie relations together in the sense that they implement relationships between the
entities represented by the relations. A one-to-many relationship is implemented
by adding the primary key on the ‘‘one side’’ of the relationship to the relation
representing the ‘‘many side’’ of the relationship. Many-to-many relationships are
implemented by constructing an additional relation that includes the primary keys of
the two entities in the many-to-many relationship. Additional attributes that describe
the many-to-many relationship are called intersection data.

Three basic relational algebra commands permit data retrieval from a relational
database. The Select command retrieves one or more rows of a relation. The
Project command retrieves one or more columns of a relation. The Join command
accomplishes data integration by tying together relations that have a common
primary key/foreign key pair. These three commands can be used in combination to
retrieve the specific data required in a particular query.

KEY TERMS

Alternate key
Attribute
Candidate key
Cell
Column
Data retrieval
Domain of values
Entity identifier
Entity type

Equijoin
Foreign key
Integrating data
Join operator
Natural join
Non-redundant data
Personal computer (PC)
Primary key
Project operator

Redundant data
Relation
Relational algebra
Relational database
Relational model
Row
Select operator
Tuple
Unique attribute

QUESTIONS

1. Why was the commercial introduction of relational
database delayed during the 1970s? What factors
encouraged its introduction in the early 1980s?

2. How does a relation differ from an ordinary file?
3. Define the terms ‘‘tuple’’ and ‘‘attribute.’’
4. What is a relational database?
5. What are the characteristics of a candidate key?
6. What is a primary key? What is an alternate key?
7. Define the term ‘‘foreign key.’’
8. In your own words, describe how foreign keys are

used to set up one-to-many binary relationships in
relational databases.

9. Describe why an additional relation is needed to
represent a many-to-many relationship in a relational
database.

10. Describe what intersection data is, what it describes,
and why it does not describe a single entity.

11. What is a one-to-one binary relationship?
12. Describe the purpose and capabilities of:

a. The relational Select operator.
b. The relational Project operator.
c. The relational Join operator.

13. Describe how the join operator works.

134 C h a p t e r 5 The Relational Database Model: Introduction

EXERCISES

1. The main relation of a motor vehicle registration
bureau’s relational database includes the following
attributes:

Vehicle License Owner
Identification Plate Serial
Number Number Number Manufacturer Model Year Color

The Vehicle Identification Number is a unique num-
ber assigned to the car when it is manufactured. The
License Plate Number is, in effect, a unique number
assigned to the car by the government when it is
registered. The Owner Serial Number is a unique
identifier of each owner. Each owner can own more
than one vehicle. The other attributes are not unique.
What is/are the candidate key(s) of this relation? If
there is more than one candidate key, choose one
as the primary key and indicate which is/are the
alternate key(s).

2. A relation consists of attributes A, B, C, D, E, F, G,
and H.

No single attribute has unique values.
The combination of attributes A and E is unique.
The combination of attributes B and D is unique.
The combination of attributes B and G is unique.
Select a primary key for this relation and indicate

and alternate keys.
3. In the General Hardware Corp. relational database

of Figure 5.14:
a. How many foreign keys are there in each of the

six relations?

b. List the foreign keys in each of the six relations.
4. Identify the relations that support many-to-many

relationships, the primary keys of those relations,
and any intersection data in the General Hardware
Corp. database.

5. Consider the General Hardware Corp. relational
database. Using the informal relational command
language described in this chapter, write commands
to:

a. List the product name and unit price of all of the
products.

b. List the employee names and titles of all the
employees of customer 2198.

c. Retrieve the record for office number 1284.
d. Retrieve the records for customers headquartered

in Los Angeles.
e. Find the size of office number 1209.
f. Find the name of the salesperson assigned to

office number 1209.
g. List the product name and quantity sold of each

product sold by salesperson 361.
6. Consider the General Hardware Corp. relational

database and the data stored in it, as shown
in Figure 5.14. Find the answer to each of the
following queries (written in the informal relational
command language described in this chapter).
a. Select rows from the

CUSTOMER EMPLOYEE

relation in which Customer Number = 2198.
b. Select rows from the CUSTOMER EMPLOYEE

relation in which Customer Number = 2198.
Project Employee Number and Employee Name
over that result.

c. Select rows from the PRODUCT relation in
which Product Number = 21765.

d. Select rows from the PRODUCT relation in
which Product Number = 21765. Project Unit
Price over that result.

e. Join the SALESPERSON and CUSTOMER
relations using the Salesperson Number attribute
of each as the join fields. Select rows from that
result in which Salesperson Name = Baker.
Project Customer Name over that result.

f. Join the PRODUCT relation and the SALES
relation using the Product Number attribute of
each as the join fields. Select rows in which
Product Name = Pliers. Project Salesperson
Number and Quantity over that result.

7. For each of Exercise 6, describe in words what the
query is trying to accomplish.

MINICASES
1. Consider the following relational database for Happy
Cruise Lines. It keeps track of ships, cruises, ports, and
passengers. A ‘‘cruise’’ is a particular sailing of a ship
on a particular date. For example, the seven-day journey

of the ship Pride of Tampa that leaves on June 13,
2009, is a cruise. Note the following facts about this
environment.

Minicases 135

• Both ship number and ship name are unique in the
SHIP Relation.

• A ship goes on many cruises over time. A cruise is
associated with a single ship.
• A port is identified by the combination of port name
and country.

• As indicated by the VISIT Relation, a cruise includes
visits to several ports, and a port is typically included
in several cruises.

• Both Passenger Number and Social Security Number
are unique in the PASSENGER Relation. A particular
person has a single Passenger Number that is used for
all of the cruises that she takes.

• The VOYAGE Relation indicates that a person can
take many cruises and a cruise, of course, has many
passengers.

SHIP Relation

Ship Number Ship Ship Launch Gross
Number Name Builder Date Weight

CRUISE Relation

Cruise Start End Cruise Ship
Number Date Date Director Number

PORT Relation

Port Number Port
Name Country of Docks Manager

VISIT Relation

Cruise Port Arrival Departure
Number Name Country Date Date

PASSENGER Relation

Passenger Passenger Social Security Home Telephone
Number Name Number Address Number

VOYAGE Relation

Passenger Cruise Stateroom
Number Number Number Fare

a. Identify the candidate keys of each relation.
b. Identify the primary key and any alternate keys of

each relation.
c. How many foreign keys does each relation have?
d. Identify the foreign keys of each relation.
e. Indicate any instances in which a foreign key serves

as part of the primary key of the relation in which
it is a foreign key. Why does each of those relations
require a multi-attribute primary key?

f. Identify the relations that support many-to-many
relationships, the primary keys of those relations,
and any intersection data.

g. Using the informal relational command language
described in this chapter, write commands to:
i. Retrieve the record for passenger number 473942.

ii. Retrieve the record for the port of Nassau in the
Bahamas.

iii. List all of the ships built by General Shipbuilding,
Inc.

iv. List the port name and number of docks of every
port in Mexico.

v. List the name and number of every ship.
vi. Who was the cruise director on cruise number

38232?
vii. What was the gross weight of the ship used for

cruise number 39482?
viii. List the home address of every passenger on cruise

number 17543.
2. Super Baseball League Consider the following relational

database for the Super Baseball League. It keeps track
of teams in the league, coaches and players on the teams,
work experience of the coaches, bats belonging to each
team, and which players have played on which teams.
Note the following facts about this environment:

• The database keeps track of the history of all of the
teams that each player has played on and all of the
players who have played on each team.

• The database keeps track of only the current team that
a coach works for.

• Team Number, Team Name, and Player Number are
each unique attributes across the league.

• Coach Name is unique only within a team (and we
assume that a team cannot have two coaches of the
same name).

136 C h a p t e r 5 The Relational Database Model: Introduction

• Serial Number (for bats) is unique only within a team.
• In the AFFILIATION relation, the Years attribute

indicates that number of years that a player played
on a team; the Batting Average is for the years that a
player played on a team.

TEAM Relation

Team Team
Number Name City Manager

COACH Relation

Team Coach Coach
Number Name Telephone

WORK EXPERIENCE Relation

Team Coach Experience Years Of
Number Name Type Experience

BATS Relation

Team Serial
Number Number Manufacturer

PLAYER Relation

Number Name
Player Player Age

AFFILIATION Relation

Player Team Batting
Number Number Years Average

a. Identify the candidate keys of each relation.
b. Identify the primary key and any alternate keys of
each relation.
c. How many foreign keys does each relation have?
d. Identify the foreign keys of each relation.
e. Indicate any instances in which a foreign key serves
as part of the primary key of the relation in which
it is a foreign key. Why does each of those relations
require a multi-attribute primary key?
f. Identify the relations that support many-to-many
relationships, the primary keys of those relations,
and any intersection data.

g. Assume that we add the following STADIUM
relation to the Super Baseball League relational
database. Each team has one home stadium, which
is what is represented in this relation. Assume that a
stadium can serve as the home stadium for only one
team. Stadium Name is unique across the league.

STADIUM Relation

Stadium Year Team
Name Built Size Team Number

What kind of binary relationship exists between the
STADIUM relation and the TEAM relation? Could
the data from the two relations be combined into one
without introducing data redundancy? If so, how?

h. Using the informal relational command language
described in this chapter, write commands to:

i. Retrieve the record for team number 12.
ii. Retrieve the record for coach Adams on team

number 12.
iii. List the player number and age of every player.
iv. List the work experience of every coach.
v. List the work experience of every coach on team

number 25.
vi. Find the age of player number 42459.

vii. List the serial numbers and manufacturers of all
of the Vultures’ (the name of a team) bats.

viii. Find the number of years of college coaching
experience that coach Taylor of the Vultures has.

C H A P T E R 6

THE RELATIONAL DATABASE
MODEL: ADDITIONAL

CONCEPTS

C hapter 5 defined the basic terminology of relational database and then
demonstrated some fundamental ideas about constructing relations in relational

databases and manipulating data in them. The discussion focused on relationships between
two different entity types, i.e. binary relationships. This chapter will go beyond binary
relationships into unary and ternary relationships. It will then address the important issue
of referential integrity.

OBJECTIVES

■ Describe how unary and ternary relationships are implemented in a relational
database.

■ Explain the concept of referential integrity.
■ Describe how the referential integrity restrict, cascade, and set-to-null delete rules

operate in a relational database.

CHAPTER OUTLINE

Introduction
Relational Structures for Unary and

Ternary Relationships

Unary One-to-Many Relationships

Unary Many-to-Many Relationships

Ternary Relationships
Referential Integrity

The Referential Integrity Concept

Three Delete Rules

Summary

138 C h a p t e r 6 The Relational Database Model: Additional Concepts

INTRODUCTION

The previous chapter talked about how binary relationships, i.e. those involving
two entity types, can be constructed in relational databases so that the data can be
integrated while data redundancy is avoided. Unary relationships, with one entity
type, and ternary relationships, with three entity types, while perhaps not quite as
common as binary relationships, are also facts of life in the real world and so must
also be handled properly in relational databases.

Referential integrity addresses a particular issue that can arise between two
tables in a relational database. The issue has to do with a foreign key value in one
table being able to find a matching primary key value in another table during a join
operation. Interestingly, in the older hierarchical and network database management
systems, the equivalents of primary and foreign keys were linked by physical
address pointers and so were always tied together. But, in relational databases, the
tables are basically independent of each other. So, if there are no controls in place,
the proper foreign key-primary key matches can be lost when data is updated or
records are deleted.

This chapter will address the issues of unary relationships, ternary relation-
ships, and referential integrity, all of which will move us much closer to modeling
real-world business environments properly in relational databases.

C O N C E P T S
I N A C T I O N

6-A CITY OF MEMPHIS, TN—VEHICLE SERVICE CENTER

The city of Memphis, TN, is the
18th largest city in the United States in both population
(650,000) and land area (280 square miles). Memphis
was founded in 1819 by General/President Andrew
Jackson and others and was incorporated as a city in
1826. Because of its position on the Mississippi River in
the midst of the country’s largest cotton-farming region,
Memphis has traditionally been the center of the U.S.
cotton industry. It is still the world’s largest spot-cotton
market and also the world’s largest hardwood market.
The concept of the grocery supermarket was invented in
Memphis in the early 1900s and the concept of the motel
chain was invented in Memphis in the 1950s. Today,
because of its central location in the country and because
of its position as a major transportation hub, Memphis is
known as the ‘‘Distribution Center’’ of the United States.

The Vehicle Service Center of the Memphis city
government’s General Services Division is responsible
for all of the city’s municipal vehicles except for Fire
Department vehicles. The approximately 4,000 vehicles

include everything from police cruisers and sanitation
trucks to street cleaners and even lawn-mowing tractors.
Since 1991, the city has kept track of all these vehicles
with a database application that manages them through
their complete lifecycle. New vehicles are entered into
the database when they are purchased and they’re
assigned to a city department. The application then keeps
each vehicle’s maintenance history, generates automatic
reports on maintenance due dates, tracks mileage and
gasoline use, and produces monthly reports for the
departments listing all of this activity for each of their
vehicles. Finally, the system tracks the reassignment of
older vehicles and the auctioning of vehicles being
disposed of.

Memphis’ vehicle tracking system uses an Oracle
database running on Dell servers. For vehicle main-
tenance, the system’s major tables include a Vehicle
Downtime Detail table with 1.6 million records, a Work
Order Job Notes table with 3.3 million records, and a
Parts Journal table with 950,000 records.

Relational Structures for Unary and Ternary Relationships 139

‘‘Photo by Permission of City of Memphis’’

RELATIONAL STRUCTURES FOR UNARY AND TERNARY RELATIONSHIPS

Unary One-to-Many Relationships

Let’s continue with the General Hardware Co. example of Figure 5.14, reprinted
here for convenience as Figure 6.1. Suppose that General Hardware’s salespersons
are organized in such a way that some of the salespersons, in addition to having
their customer responsibilities, serve as the sales managers of other salespersons,
Figure 6.2. A salesperson reports to exactly one sales manager, but each salesperson
who serves as a sales manager typically has several salespersons reporting to
him. Thus, there is a one-to-many relationship within the set or entity type of
salespersons.

Salesperson who is also a sales manager Salesperson

This is known as a unary one-to-many relationship. It is unary because there is
only one entity type involved. It is one-to-many because among the individual
entity occurrences, i.e. among the salespersons, a particular salesperson reports

140 C h a p t e r 6 The Relational Database Model: Additional Concepts

F I G U R E 6.1
General Hardware Company relational
database

(a) SALESPERSON relation
Salesperson Salesperson Commission Year Office
Number Name Percentage of Hire Number
137 Baker 10 1995 1284
186 Adams 15 2001 1253
204 Dickens 10 1998 1209
361 Carlyle 20 2001 1227
(b) CUSTOMER relation
Customer Customer Salesperson
Number Name Number HQ City
0121 Main St. Hardware 137 New York
0839 Jane’s Stores 186 Chicago
0933 ABC Home Stores 137 Los Angeles
1047 Acme Hardware Store 137 Los Angeles
1525 Fred’s Tool Stores 361 Atlanta
1700 XYZ Stores 361 Washington
1826 City Hardware 137 New York
2198 Western Hardware 204 New York
2267 Central Stores 186 New York
(c) CUSTOMER EMPLOYEE relation
Customer Employee Employee
Number Number Name Title
0121 27498 Smith Co-Owner
0121 30441 Garcia Co-Owner
0933 25270 Chen VP Sales
0933 30441 Levy Sales Manager
0933 48285 Morton President
1525 33779 Baker Sales Manager
2198 27470 Smith President
2198 30441 Jones VP Sales
2198 33779 Garcia VP Personnel
2198 35268 Kaplan Senior Accountant
(Continues)

to one salesperson who is his sales manager while a salesperson who is a sales
manager may have several salespersons reporting to her. Note that, in general, this
arrangement can involve as few as two levels of entity occurrences or can involve
many levels in a hierarchical arrangement. In general, in a company, an employee
can report to a manager who in turn reports to a higher-level manager, and so on up
to the CEO.

Relational Structures for Unary and Ternary Relationships 141

F I G U R E 6.1 (Continued)
General Hardware Company relational
database

(d) PRODUCT relation

Product Number Unit
Number Name Price

16386 Wrench 12.95
19440 Hammer 17.50
21765 Drill 32.99
24013 Saw 26.25
26722 Pliers 11.50
(e) SALES relation
Salesperson Product
Number Number Quantity
137 19440 473
137 24013 170
137 26722 688
186 16386 1745
186 19440 2529
186 21765 1962
186 24013 3071
204 21765 809
204 26722 734
361 16386 3729

361 21765 3110

361 26722 2738

(f) OFFICE relation
Office
Number Telephone Size (sq. ft.)
1253 901-555-4276 120
1227 901-555-0364 100
1284 901-555-7335 120
1209 901-555-3108 95

Assume that the General Hardware Co. has two levels of sales managers,
resulting in a three-level hierarchy. That is, each salesperson reports to a sales
manager (who is himself a salesperson) and each sales manager reports to one of
several chief sales managers (who is herself a salesperson). Figure 6.3 shows two
levels of sales managers plus the salespersons who report to them. For example,
salespersons 142, 323, and 411 all report to salesperson (and sales manager) 137.
Salespersons 137 and 439, both of whom are sales managers, report to salesperson
186 who is a chief sales manager. As you go upward in the hierarchy, each
salesperson is associated with exactly one other salesperson. As you go downward
in the hierarchy from any salesperson/sales manager, each salesperson/sales manager

142 C h a p t e r 6 The Relational Database Model: Additional Concepts

F I G U R E 6.2
Salespersons 142, 323, and 411
reporting to salesperson 137 who is their
sales manager

Salesperson 142 Salesperson 323 Salesperson 411

Sales Manager
Salesperson 137

Reports to

F I G U R E 6.3
General Hardware Company salesperson
reporting hierarchy

137

412
323
411

186

439

170
267

198
204

285

483

361
388
446

is associated with many salespersons below, except for the bottom-level salespersons
who are not sales managers and thus have no one reporting to them.

Figure 6.4, which is an expansion of the General Hardware Co. SALESPER-
SON relation in Figure 6.1a, demonstrates how this type of relationship is reflected
in a relational database. A one-to-many unary relationship requires the addition of
one column to the relation that represents the single entity involved in the unary
relationship. In Figure 6.4, the Sales Manager Number attribute is the new attribute
that has been added to the SALESPERSON relation. The domain of values of the
new column is the same as the domain of values of the relation’s primary key. Thus,
the values in the new Sales Manager Number column will be three-digit whole
numbers representing the unique identifiers for salespersons, just like the values
in the Salesperson Number column. The value in the new column for a particular
row represents the value of the next entity ‘‘upward’’ in the unary one-to-many
hierarchy. For example, in the row for salesperson number 323, the sales manager

Relational Structures for Unary and Ternary Relationships 143

F I G U R E 6.4
General Hardware Company
SALESPERSON relation including Sales
Manager Number attribute

SALESPERSON relation

Salesperson Salesperson Commission Year Sales Manager
Number Name Percentage of Hire Number

137 Baker 10 1995 186

142 Smith 15 2001 137

170 Taylor 18 1992 439

186 Adams 15 2001

198 Wang 20 1990 267

204 Dickens 10 1998 267

267 Perez 22 2000 285

285 Costello 10 1996

323 McNamara 15 1995 137

361 Carlyle 20 2001 483

388 Goldberg 20 1997 483

411 Davidson 18 1992 137

439 Warren 10 1996 186

446 Albert 10 2001 483

483 Jones 15 1995 285

value is 137 because salesperson 323’s sales manager is salesperson/sales manager
137, as shown in Figure 6.3. Similarly, the row for salesperson 137, who happens
also to be a sales manager, shows salesperson number 186 in its sales manager
number column. Salesperson/sales manager 137 reports to chief sales manager 186,
also as shown in Figure 6.3. The sales manager column value for salesperson/chief
sales manager 186 is blank because the reporting structure happens to end with each
chief sales manager; i.e., there is nothing ‘‘above’’ salesperson 186 in Figure 6.3.

Note that a unary one-to-one relationship, for example one salesperson
backing-up another (see Figure 2.7a) is handled in a manner similar to Figure 6.4.
The difference is that the Sales Manager Number column would be replaced by
a Back-Up Number column and a particular salesperson number would appear at
most once in that column.

Unary Many-to-Many Relationships

The unary many-to-many relationship is a special case that has come to be known
as the ‘‘bill of materials’’ problem. Among the entity occurrences of a single entity
type, which is what makes this ‘‘unary,’’ each particular entity occurrence can be
related to many other occurrences and each of those latter occurrences can, in turn,
be related to many other occurrences. Put another way, every entity occurrence
can be related to many other occurrences, which, if you think about it, makes this
a many-to-many relationship because only one entity type is involved. (Yes, that
sounds a little strange, but keep reading.) The general idea is that in a complex
item, say an automobile engine, small parts are assembled together to make a small
component or assembly. Then some of those small components or assemblies (and
maybe some small parts) are assembled together to make medium-sized components
or assemblies, and so on until the final, top-level ‘‘component’’ is the automobile
engine. The key concept here is that an assembly at any level is considered to be

144 C h a p t e r 6 The Relational Database Model: Additional Concepts

F I G U R E 6.5
General Hardware Company product bill of
materials

Wrench Model A (#11)

Wrench Model B (#14)
Deluxe Wrench Set (#43)

Wrench Model C (#17) Supreme Tool Set (#53)

Wrench Model D (#19)
Master Wrench Set (#44)

Hammer Model A (#22)

Hammer Model B (#24) Deluxe Hammer Set (#48)
Grand ToolSet (#56)

Hammer Model C (#28)

Drill Model A (#31)

Drill Model B (#35)

both a part made up of smaller units and a unit that can be a component of a larger
part. Parts and assemblies at all levels are all considered occurrences of the same
entity type and they all have a unique identifier in a single domain of values.

Certainly, this requires an example! Figure 6.5 illustrates this concept using
an expansion of General Hardware Co.’s product set.

Product Product

The numbers in parentheses are product numbers. Assume, as is quite reasonable,
that General Hardware not only sells individual tools but also sells sets of tools. Both
individual tools and sets of tools are considered to be ‘‘products,’’ which also makes
sense. As shown in Figure 6.5, General Hardware carries several types (or perhaps
sizes) of wrenches, hammers, and drills. Various combinations of wrenches and
hammers are sold as wrench and hammer sets. Various combinations of these sets
and other tools such as drills are sold as even larger sets. Very importantly, notice
the many-to-many nature of this arrangement. For example, the Master Wrench Set
(product number 44), looking to its left, is comprised of three different wrenches,
including Wrench Model A (#11). Conversely, Wrench Model A, looking to its
right, is a component of two different wrench sets, both the Deluxe Wrench Set
(#43) and the Master Wrench Set (#44). This demonstrates the many-to-many nature
of products. Similarly, both the Supreme Tool Set (#53) and the Grand Tool Set
(#56) are, obviously, comprised of several smaller sets and tools, while the Deluxe
Hammer Set (#48) is a component of both the Supreme Tool Set (#53) and the
Grand Tool Set (#56).

How can this unary many-to-many relationship be represented in a relational
database? First of all, note that Figure 6.6 is a modification and expansion of the
PRODUCT relation in the General Hardware Co. relational database of Figure 6.1d.
Note that the product numbers matching the product numbers in Figure 6.5 have
been reduced to two digits for simplicity in the explanation. Every individual unit
item and every set in Figure 6.5 has its own row in the relation in Figure 6.6 because
every item and set in Figure 6.5 is a product that General Hardware has for sale.

Now, here is the main point. Just as a binary many-to-many relationship
requires the creation of an additional relation in a relational database, so does
a unary many-to-many relationship. The new additional relation is shown in
Figure 6.7. It consists of two attributes. The domain of values of each column is
that of the Product Number column in the PRODUCT relation of Figure 6.6. The
relation of Figure 6.7 represents, in a tabular format, the way that the assemblies
of Figure 6.5 are constructed. The first two rows of Figure 6.7 literally say that
product (assembly) number 43 (the Deluxe Wrench Set) is comprised of products

Relational Structures for Unary and Ternary Relationships 145

F I G U R E 6.6
General Hardware Company modified
PRODUCT relation

PRODUCT relation
Product Product Unit
Number Name Price

11 Wrench Model A 12.50

14 Wrench Model B 13.75

17 Wrench Model C 11.62

19 Wrench Model D 15.80

22 Hammer Model A 17.50

24 Hammer Model B 18.00

28 Hammer Model C 19.95

31 Drill Model A 31.25

35 Drill Model B 38.50

43 Deluxe Wrench Set 23.95

44 Master Wrench Set 35.00

48 Deluxe Hammer Set 51.00

53 Supreme Tool Set 100.00

56 Grand Tool Set 109.95

F I G U R E 6.7
General Hardware Company unary
many-to-many relation

Assembly Part

43 11

43 14

44 11

44 17

44 19

48 22

48 24

48 28

53 43

53 48

53 31

56 44

56 48

56 35

11 and 14, as indicated in Figure 6.5. Next, product (assembly) 44 is comprised
of products 11, 17, and 19. Moving to the last three rows of the relation, product
(assembly) 56 is comprised of products 44 and 48, both of which happen to be
assemblies, and product 35. Again, notice the many-to-many relationship as it is
represented in the relation of Figure 6.7. The first two rows indicate that assembly
43 is comprised of two parts. Conversely, the first and third rows indicate that part
11 is a component of two different assemblies.

146 C h a p t e r 6 The Relational Database Model: Additional Concepts

Ternary Relationships

A ternary relationship is a relationship that involves three different entity types. If
the entity types are A, B, and C, then we might illustrate this as:

B

C

A

To demonstrate this concept in the broadest way using the General Hardware
Co. database, let’s slightly modify part of the General Hardware premise. The
assumption has always been that there is a one-to-many relationship between
salespersons and customers. A salesperson is responsible for several customers,
while a customer is in contact with (is sold to by) exactly one of General Hardware’s
salespersons. For the purposes of describing a general ternary relationship,
we change that premise temporarily to a many-to-many relationship between
salespersons and customers. That is, we now assume that any salesperson can make
a sale to any customer and any customer can buy from any salesperson.

With that change, consider the ternary relationship among salespersons,
customers, and products. Such a relationship allows us to keep track of which
salesperson sold which product to which customer. This is very significant. In this
environment, a salesperson can sell many products and a salesperson can sell to
many customers. A product can be sold by many salespersons and can be sold
to many customers. A customer can buy many products and can buy from many
salespersons. All of this leads to a lot of different possibilities for any given sale. So,
it is very important to be able to tie down a particular sale by noting and recording
which salesperson sold which product to which customer. For example, we might
store the fact that salesperson 137 sold some of product number 24013 to customer
0839, Figure 6.8.

Relations a, b, and c of Figure 6.9 show the SALESPERSON, CUSTOMER,
and PRODUCT relations, respectively, from the General Hardware relational
database of Figure 6.1, except for one change. Since there is no longer a one-to-
many relationship between salespersons and customers, the Salesperson Number
foreign key in the CUSTOMER relation has been removed! The three relations are
now all quite independent with no foreign keys in any of them.

Figure 6.9d, the SALES relation, shows how this ternary relationship is
represented in a relational database. Similarly to how we created an additional
relation to accommodate a binary many-to-many relationship, an additional relation
has to be created to accommodate a ternary relationship, and that relation is
Figure 6.9d. Clearly, as in the binary many-to-many case, the primary key of the
additional relation will be (at least) the combination of the primary keys of the
entities involved in the relationship. Thus, in Figure 6.9d, the Salesperson Number,
Customer Number, and Product Number attributes all appear as foreign keys and
the combination of the three serve as part of the primary key. Why just ‘‘part of’’

Relational Structures for Unary and Ternary Relationships 147

F I G U R E 6.8
A ternary relationship

Customer 0839

Salesperson 137

Salesperson 137 sold
Product 24013 to
Customer 0839

Product 24013

F I G U R E 6.9
A portion of General Hardware Company
relational database modified to
demonstrate a ternary relationship

(a) SALESPERSON relation
Salesperson Salesperson Commission Year
Number Name Percentage of Hire
137 Baker 10 1995
186 Adams 15 2001
204 Dickens 10 1998
361 Carlyle 20 2001
(b) CUSTOMER relation
Customer Customer
Number Name HQ City
0121 Main St. Hardware New York
0839 Jane’s Stores Chicago
0933 ABC Home Stores Los Angeles
1047 Acme Hardware Store Los Angeles
1525 Fred’s Tool Stores Atlanta
1700 XYZ Stores Washington
1826 City Hardware New York
2198 Western Hardware New York
2267 Central Stores New York
(Continues)

148 C h a p t e r 6 The Relational Database Model: Additional Concepts

F I G U R E 6.9 (Continued)
A portion of General Hardware Company
relational database modified to
demonstrate a ternary relationship

(c) PRODUCT relation

Product Product Unit
Number Name Price
16386 Wrench 12.95
19440 Hammer 17.50
21765 Drill 32.99
24013 Saw 26.25
26722 Pliers 11.50

(d) SALES relation

Salesperson Customer Product
Number Number Number Date Quantity

137 0839 24013 2/21/2002 25

361 1700 16386 2/27/2002 70

137 2267 19440 3/1/2002 40

204 1047 19440 3/1/2002 15

186 0839 26722 3/12/2002 35

137 1700 16386 3/17/2002 65

361 0121 21765 3/21/2002 40

204 2267 19440 4/03/2002 30

204 0839 19440 4/17/2002 20

the primary key? Because in this example, a particular salesperson may have sold
a particular product to a particular customer more than once on different dates.
Thus the Date attribute must also be part of the primary key. (We assume that this
combination of the three could not have happened more than once on the same
date. If it could, then there would also need to be a ‘‘time’’ attribute in the key.)
Recall that this need for an additional attribute in the primary key also came up
when we discussed binary many-to-many relationships in the last chapter. Finally,
the Quantity attribute in Figure 6.9d is intersection data, just as it would be in a
binary many-to-many relationship. The quantity of the product that the salesperson
sold to the customer is clearly an attribute of the ternary relationship, not of any one
of the entities.

There is one more important point to make about ternary relationships. In the
process of describing the ternary relationship, you may have noticed that, taken two
at a time, every pair of the three entities, salespersons, customers, and products, are
in a binary many-to-many relationship. In general, this would be shown as:

A B

B C

A C

The question is: are these three many-to-many relationships the equivalent of
the ternary relationship? Do they provide the same information that the ternary
relationship does? The answer is, no!

Relational Structures for Unary and Ternary Relationships 149

F I G U R E 6.10
Ternary relationship counter-example

(a) Salespersons and customers

Salesperson 137 Customer 0839

Salesperson 204 Customer 1826

(b) Customers and products

Customer 0839 Product 19440

Customer 1826 Product 24013

(c) Salespersons and products

Salesperson 137 Product 19440

Salesperson 204 Product 24013

Again, consider salespersons, customers, and products. You might know that
a particular salesperson has made sales to a particular customer. You might also
know that a particular salesperson has sold certain products at one time or another.
And,you might know that a particular customer has bought certain products. But all
of that is not the same thing as knowing that a particular salesperson sold a particular
product to a particular customer. Still skeptical? Look at Figure 6.10. Parts a, b, and
c of the figure clearly illustrate three many-to-many relationships. They are between
(a) salespersons and customers, (b) customers and products, and (c) salespersons
and products. Part a shows, among other things, that salesperson 137 sold something
to customer 0839. Part b shows that customer 0839 bought product 19440. Does that
mean that we can infer that salesperson 137 sold product 19440 to customer 0839?
No! That’s a possibility and, indeed, part c of the figure shows that salesperson 137
did sell product 19440. But part c of the figure also shows that salesperson 204
sold product 19440. Is it possible that salesperson 204 sold it to customer 0839?
According to part a, salesperson 204 sold something to customer 0839, but it doesn’t
indicate what. You can go around and around Figure 6.10 and never conclude with
certainty that salesperson 137 sold product 19440 to customer 0839. That would

Y O U R
T U R N

6.1 TERNARY RELATIONSHIPS

Ternary relationships are all around
us. Think about an automobile dealership. Certainly the
dealership management wants to keep track of which
car was sold to which customer by which salesperson.
Certainly this is important for billing, accounting, and
commission purposes. But also, in that kind of high-
priced product environment, it’s simply good business
to keep track of such information for future marketing and
customer relationship reasons.

QUESTION:

Consider a hospital environment involving patients,
doctors, nurses, procedures, medicines, hospital rooms,
etc. Make a list of five ternary relationships in this
environment. Remember that each one has to make
sense from a business point of view.

150 C h a p t e r 6 The Relational Database Model: Additional Concepts

require a ternary relationship and a relation like the one in Figure 6.9d. Notice that
the last row of Figure 6.9d shows, without a doubt, that it was salesperson 204 who
sold product 19440 to customer 0839.

REFERENTIAL INTEGRITY

The Referential Integrity Concept

Thus far in this chapter and the previous one, we have been concerned with how
relations are constructed and how data can be retrieved from them. Data retrieval is
the operation that clearly provides the ultimate benefit from maintaining a database,
but it is not the only operation needed. Certainly, we should expect that, as with any
data storage scheme, in addition to retrieving data we must be prepared to perform
such data maintenance operations as inserting new records (or rows of a relation),
deleting existing records, and updating existing records. All database management
systems provide the facilities and commands to accomplish these data maintenance
operations. But there are some potential pitfalls in these operations that must be
dealt with.

The problem is that the logically related (by foreign keys) but physically
independent nature of the relations in a relational database exposes the database
to the possibility of a particular type of data integrity problem. This problem has
come to be known as a referential integrity problem because it revolves around
the circumstance of trying to refer to data in one relation in the database, based
on values in another relation. (Actually, referential integrity is an issue in all of
the DBMS approaches, not just the relational approach. We discuss this issue here
because we are focusing on relational databases and the concept is much easier
to explain in the context of an example, again the General Hardware database.)
Also, while referential integrity problems can surface in any of the three operations
that result in changes to the database—insert, delete, and update records—we will
generally use the case of delete to explain the concept while mentioning insert and
update where appropriate.

First, consider the situation of record deletion in the two relations of
Figure 6.11, which is a repeat of Figure 5.2. Suppose that salesperson 361, Carlyle,
left the company and his record was deleted from the SALESPERSON relation. The
problem is that there are still two records in the CUSTOMER relation (the records
for customers 1525 and 1700) that refer to salesperson 361, i.e. that have the value
361 in the Salesperson Number foreign key attribute. It is as if Carlyle left the
company and his customers have not as yet been reassigned to other salespersons.
If a relational join command was issued to join the two relations in order to (say)
find the name of the salesperson responsible for customer 1525, there would be a
problem. The relational DBMS would pick up the salesperson number value 361 in
the record for customer 1525 in the CUSTOMER relation, but would not be able to
match 361 to a record in the SALESPERSON relation because there no longer is a
record for salesperson 361 in the SALESPERSON relation—it was deleted! Notice
that the problem arose because the deleted record, a salesperson record, was on
the ‘‘one side’’ of a one-to-many relationship. What about the customer records on
the ‘‘many side’’ of the one-to-many relationship? Suppose customer 1047, Acme
Hardware Store, is no longer one of General Hardware’s customers. Deleting the
record for customer 1047 in the CUSTOMER relation has no referential integrity
exposure. Nothing else in these two relations refers to customer 1047.

Referential Integrity 151

F I G U R E 6.11
General Hardware Company
SALESPERSON and CUSTOMER relations

(a) SALESPERSON relation
Salesperson Salesperson Commission Year
Number Name Percentage of Hire
137 Baker 10 1995
186 Adams 15 2001
204 Dickens 10 1998
361 Carlyle 20 2001
(b) CUSTOMER relation
Customer Customer Salesperson
Number Name Number HQ City
0121 Main St. Hardware 137 New York
0839 Jane’s Stores 186 Chicago
0933 ABC Home Stores 137 Los Angeles
1047 Acme Hardware Store 137 Los Angeles
1525 Fred’s Tool Stores 361 Atlanta
1700 XYZ Stores 361 Washington
1826 City Hardware 137 New York
2198 Western Hardware 204 New York
2267 Central Stores 186 New York

Similar referential integrity arguments can be made for the record insertion
and update operations, but the issue of whether the exposure is on the ‘‘one side’’
or the ‘‘many side’’ of the one-to-many relationship changes! Again, in the case
of deletion, the problem occurred when a record was deleted on the ‘‘one side’’
of the one-to-many relationship. But, for insertion, if a new salesperson record is
inserted into the Salesperson relation, i.e. a new record is inserted into the ‘‘one
side’’ of the one-to-many relationship, there is no problem. All it means is that a
new salesperson has joined the company but, as yet, has no customer responsibility.
On the other hand, if a new customer record is inserted into the CUSTOMER
relation, i.e. a new record is inserted into the ‘‘many side’’ of the one-to-many
relationship, and it happens to include a salesperson number that does not have a
match in the SALESPERSON relation, that would cause the same kind of problem
as the deletion example above. Similarly, the update issue would concern updating
a foreign key value, i.e. a salesperson number in the CUSTOMER relation with a
new salesperson number that has no match in the SALESPERSON relation.

The early relational DBMSs did not provide any control mechanisms for
referential integrity. Programmers and users were on their own to keep track of
it and this upset many people. This was particularly the case because referential
integrity issues in the older hierarchical and network DBMSs were more naturally
controlled by the nature of the hierarchical and network data structures on which they
were based, at the expense of some flexibility in database design. Modern relational
DBMS’s provide sophisticated control mechanisms for referential integrity with
so-called ‘‘delete rules,’’ ‘‘insert rules,’’ and ‘‘update rules.’’ These rules are
specified between pairs of relations. We will take a look at the three most common
delete rules, ‘‘restrict,’’ ‘‘cascade,’’ and ‘‘set-to-null,’’ to illustrate the problem.

152 C h a p t e r 6 The Relational Database Model: Additional Concepts

F I G U R E 6.12
Delete rule: Restrict

Customer 1700

Customer 1525

Salesperson 361
Mr. Carlyle

Delete Rule: Restrict

Three Delete Rules

Delete Rule: Restrict Again, consider the two relations in Figure 6.11. If the delete
rule between the two relations is restrict and an attempt is made to delete a record on
the ‘‘one side’’ of the one-to-many relationship, the system will forbid the delete to
take place if there are any matching foreign key values in the relation on the ‘‘many
side.’’ For example, if an attempt is made to delete the record for salesperson 361 in
the SALESPERSON relation, the system will not permit the deletion to take place
because the CUSTOMER relation records for customers 1525 and 1700 include
salesperson number 361 as a foreign key value, Figure 6.12. This is as if to say,
‘‘You can’t delete a salesperson record as long as there are customers for whom
that salesperson is responsible.’’ Clearly, this is a reasonable and necessary course
of action in many business situations.

Delete Rule: Cascade If the delete rule between the two relations is cascade and
an attempt is made to delete a record on the ‘‘one side’’ of the relationship, not
only will that record be deleted but all of the records on the ‘‘many side’’ of the
relationship that have a matching foreign key value will also be deleted. That is,
the deletion will cascade from one relation to the other. For example, if an attempt
is made to delete the record for salesperson 361 in the SALESPERSON relation
and the delete rule is cascade, that salesperson record will be deleted and so too,
automatically, will the records for customers 1525 and 1700 in the CUSTOMER
relation because they have 361 as a foreign key value, Figure 6.13. It is as if the
assumption is that when a salesperson leaves the company she always takes all of
her customers along with her. While that might be a bit of a stretch in this case,
there are many other business situations where it is not a stretch at all. For example,
think about a company that has a main employee relation with name, home address,
telephone number, etc., plus a second relation that lists and describes the several
skills of each employee. Certainly, when an employee leaves the company you
would expect to delete both his record in the main employee relation and all his
records in the skills relation.

Delete Rule: Set-to-Null If the delete rule between the two relations is set-to-null
and an attempt is made to delete a record on the ‘‘one side’’ of the one-to-many
relationship, that record will be deleted and the matching foreign key values in

Summary 153

Customer 1700
Customer 1525
Salesperson 361
Mr. Carlyle

Delete Rule: Cascade

F I G U R E 6.13
Delete rule: Cascade

F I G U R E 6.14
Delete rule: Set-to-Null

Salesperson 361
Mr. Carlyle
Customer 1700
Customer 1525

Delete Rule: Set-to-Null

Temporarily
Without a
Saleperson
Assigned

the records on the ‘‘many side’’ of the relationship will be changed to null. For
example, if an attempt is made to delete the record for salesperson 361 in the
SALESPERSON relation, that record will be deleted, and the Salesperson Number
attribute values in the records for customers 1525 and 1700 in the CUSTOMER
relation will be changed from 361 to null, Figure 6.14. This is as if to say, ‘‘You
can delete a salesperson record and, we will indicate that, temporarily at least, their
former customers are without a salesperson.’’ Obviously this is the appropriate
response in many business situations.

SUMMARY

Relational databases must be capable of handling unary and ternary relationships,
as well as binary relationships. All of these have to promote data integration while
avoiding data redundancy. As this chapter demonstrated, the relational database
concept is up to this task.

154 C h a p t e r 6 The Relational Database Model: Additional Concepts

Referential integrity is an important issue in relational databases. Relational
database management systems must be able to allow users to specify referential
integrity controls between related tables. Otherwise, changes to one table that are
not coordinated with a related table may cause serious data integrity problems.

KEY TERMS

Cascade delete rule
Delete rules
Entity occurrence

Insert rules
Record deletion
Referential integrity

Restrict delete rule
Set-to-null delete rule
Update rules

QUESTIONS

1. Describe the concept of the unary one-to-many
relationship.

2. How is a unary one-to-many relationship con-
structed in a relational database?

3. Describe the concept of the unary many-to-many
relationship.

4. How is a unary many-to-many relationship con-
structed in a relational database?

5. Describe the concept of the ternary relationship.

6. How is a ternary relationship constructed in a
relational database?

7. Is a ternary relationship the equivalent of the
three possible binary relationships among the three
entities involved? Explain.

8. Describe the problem of referential integrity.
9. Compare and contrast the three delete rules: restrict,

cascade, and set-to-null.

EXERCISES

1. Leslie’s Auto Sales has a relational database with
which it maintains data on its salespersons, its
customers, and the automobiles it sells. Each of these
three entity types has a unique attribute identifier.
The attributes that it stores are as follows:

• Salesperson Number (unique), Salesperson Name,
Salesperson Telephone, Years with Company

• Customer Number (unique), Customer Name,
Customer Address, Value of Last Purchase From
Us

• Vehicle Identification Number (unique), Manu-
facturer, Model, Year, Sticker Price Leslie’s also
wants to keep track of which salesperson sold
which car to which customer, including the date
of the sale and the negotiated price. Construct a
relational database for Leslie’s Auto Sales.

2. The State of New York certifies firefighters through-
out the state and must keep track of all of them,
as well as of the state’s fire departments. Each
fire department has a unique department number, a
name that also identifies its locale (city, county, etc.),
the year it was established, and its main telephone

number. Each certified firefighter has a unique fire-
fighter number, a name, year of certification, home
telephone number, and a rank (firefighter, fire lieu-
tenant, fire captain, etc.) The state wants to record the
fire department for which each firefighter currently
works and each firefighter’s supervisor. Supervi-
sors are always higher-ranking certified firefighters.
Construct a relational database for New York’s fire
departments and firefighters.

3. The ABC Consulting Corp. contracts for projects
that, depending on their size and skill requirements,
can be assigned to an individual consultant or to
a team of consultants. A consultant or a team can
work on several projects simultaneously. Several
employees can be organized into a team. Larger
teams can consist of a combination of smaller teams,
sometimes with additional individual consultants
added. This pattern can continue to larger and larger
teams. ABC wants to keep track of its consultants,
teams, and projects, including which consultant or
team is responsible for each project. Each consultant
has a unique employee number, plus a name, home
address, and telephone number. Each project has a

Minicases 155

unique project number, plus a name, budgeted cost,
and due date. Construct a relational database for
ABC Consulting. Hint: You may want to develop
an attribute called ‘‘responsible party’’ that can
be either a team or an individual consultant. Each
project has one responsible party that is responsible
for its completion. Or you may want to think of an
individual consultant as a potential ‘‘team of one’’
and have the responsibility for each project assigned
to a ‘‘team’’ that could then be an individual
consultant or a genuine team.

4. Consider the General Hardware Corp. database
of Figure 6.1. Describe the problem of referen-
tial integrity in terms of the CUSTOMER and
CUSTOMER EMPLOYEE relations if the record
for customer 2198 in the CUSTOMER relation is
deleted. (Assume that no delete rules exist.)

5. In the General Hardware Corp. database of
Figure 6.1, what would happen if:
a. The delete rule between the CUSTOMER and

CUSTOMER EMPLOYEE relations is restrict
and an attempt is made to delete the record for
customer 2198 in the CUSTOMER relation?

b. The delete rule between the CUSTOMER and
CUSTOMER EMPLOYEE relations is cascade
and an attempt is made to delete the record for
customer 2198 in the CUSTOMER relation?

c. The delete rule between the CUSTOMER and
CUSTOMER EMPLOYEE relations is set-to-
null and an attempt is made to delete the record
for customer 2198 in the CUSTOMER relation?

d. The delete rule between the CUSTOMER and
CUSTOMER EMPLOYEE relations is restrict
and an attempt is made to delete the record
for employee 33779 of customer 2198 in the
CUSTOMER EMPLOYEE relation?

e. The delete rule between the CUSTOMER and
CUSTOMER EMPLOYEE relations is cascade
and an attempt is made to delete the record
for employee 33779 of customer 2198 in the
CUSTOMER EMPLOYEE relation?

f. The delete rule between the CUSTOMER and
CUSTOMER EMPLOYEE relations is set-to-
null and an attempt is made to delete the record
for employee 33779 of customer 2198 in the
CUSTOMER EMPLOYEE relation?

MINICASES

1. Happy Cruise Lines
a. Look at the Happy Cruise Lines database of Chapter

5, Minicase 1 but, for this question, consider only
the SHIP, PORT, and PASSENGER relations. The
company wants to keep track of which passengers
visited which ports on which ships on which dates.
Reconstruct these three relations as necessary and/or
add additional relation(s) as necessary to store this
information.

b. Consider the following data from the SHIP and
CRUISE relations of the Happy Cruise Lines database
of Chapter 5, Minicase 1:

SHIP Relation

Ship Ship Ship Launch Gross
Number Name Builder Date Weight

005 Sea Joy Jones 1999 80,000

009 Ocean IV Ajax 2003 75,000

012 Prince Al Ajax 2004 90,000

020 Queen Shirley Master 1999 80,000

CRUISE Relation
Cruise Start End Cruise Ship
Number Date Date Director Number

21644 7/5/2002 7/12/2002 Smith 009

23007 8/14/2002 8/24/2002 Chen 020

24288 3/28/2003 4/4/2003 Smith 009

26964 7/1/2003 7/11/2003 Gomez 020

27045 7/15/2003 7/22/2003 Adams 012

28532 8/17/2003 8/24/2003 Adams 012

29191 12/20/2003 12/27/2003 Jones 009

29890 1/15/2004 1/22/2004 Levin 020

What would happen if:
i. The delete rule between the SHIP and CRUISE

relations is restrict and an attempt is made to
delete the record for ship number 012 in the
SHIP relation?

ii. The delete rule between the SHIP and CRUISE
relations is restrict and an attempt is made to

156 C h a p t e r 6 The Relational Database Model: Additional Concepts

delete the record for ship number 005 in the
SHIP relation?

iii. The delete rule between the SHIP and CRUISE
relations is cascade and an attempt is made to
delete the record for ship number 012 in the
SHIP relation?

iv. The delete rule between the SHIP and CRUISE
relations is cascade and an attempt is made to
delete the record for ship number 005 in the
SHIP relation?

v. The delete rule between the SHIP and CRUISE
relations is set-to-null and an attempt is made to
delete the record for ship number 012 in the SHIP
relation?

vi. The delete rule between the SHIP and CRUISE
relations is set-to-null and an attempt is made to
delete the record for ship number 005 in the SHIP
relation?

vii. The delete rule between the SHIP and CRUISE
relations is restrict and an attempt is made to
delete the record for cruise number 26964 in the
CRUISE relation?

viii. The delete rule between the SHIP and CRUISE
relations is cascade and an attempt is made to
delete the record for cruise number 26964 in the
CRUISE relation?

ix. The delete rule between the SHIP and CRUISE
relations is set-to-null and an attempt is made to
delete the record for cruise number 26964 in the
CRUISE relation?

2. Super Baseball League
a. In the Super Baseball League database of Chapter

5, Minicase 2, assume that instead of having
coaches who are different from players, now some

of the players serve as coaches to other players.
A player/coach can have several players whom
he coaches. Each player is coached by only one
player/coach. Reconstruct the database structure to
reflect this change.

b. In the Super Baseball League database of Chapter
5, Minicase 2, assume that the TEAM relation has
a record for team number 17 and that the COACH
relation has records for three coaches on that team.
What would happen if:
i. The delete rule between the TEAM and COACH

relations is restrict and an attempt is made to
delete the record for team 17 in the TEAM
relation?

ii. The delete rule between the TEAM and COACH
relations is cascade and an attempt is made to
delete the record for team 17 in the TEAM
relation?

iii. The delete rule between the TEAM and COACH
relations is set-to-null and an attempt is made
to delete the record for team 17 in the TEAM
relation?

iv. The delete rule between the TEAM and COACH
relations is restrict and an attempt is made to
delete the record for one of team 17’s coaches in
the COACH relation?

v. The delete rule between the TEAM and COACH
relations is cascade and an attempt is made to
delete the record for one of team 17’s coaches in
the COACH relation?

vi. The delete rule between the TEAM and COACH
relations is set-to-null and an attempt is made to
delete the record for one of team 17’s coaches in
the COACH relation?

C H A P T E R 7

LOGICAL DATABASE
DESIGN

L ogical database design is the process of deciding how to arrange the attributes
of the entities in a given business environment into database structures, such as

the tables of a relational database. The goal of logical database design is to create well
structured tables that properly reflect the company’s business environment. The tables will
be able to store data about the company’s entities in a non-redundant manner and foreign
keys will be placed in the tables so that all the relationships among the entities will be
supported. Physical database design, which will be treated in the next chapter, is the
process of modifying the logical database design to improve performance.

OBJECTIVES

■ Describe the concept of logical database design.
■ Design relational databases by converting entity-relationship diagrams into

relational tables.
■ Describe the data normalization process.
■ Perform the data normalization process.
■ Test tables for irregularities using the data normalization process.
■ Learn basic SQL commands to build data structures.
■ Learn basic SQL commands to manipulate data.

CHAPTER OUTLINE

Introduction
Converting E-R Diagrams into Relational

Tables
Introduction

Converting a Simple Entity

Converting Entities in Binary

Relationships
Converting Entities in Unary

Relationships

Converting Entities in Ternary
Relationships

Designing the General
Hardware Co. Database

Designing the Good Reading
Bookstores Database

Designing the World Music
Association Database

Designing the Lucky Rent-A-Car
Database

158 C h a p t e r 7 Logical Database Design

The Data Normalization Process
Introduction to the Data

Normalization Technique
Steps in the Data Normalization

Process
Example: General Hardware Co.
Example: Good Reading Bookstores
Example: World Music Association

Example: Lucky Rent-A-Car
Testing Tables Converted from E-R

Diagrams
with Data Normalization

Building the Data Structure with SQL
Manipulating the Data with SQL
Summary

INTRODUCTION

Historically, a number of techniques have been used for logical database design. In
the 1970s, when the hierarchical and network approaches to database management
were the only ones available, a technique known as data normalization was
developed. While data normalization has some very useful features, it was difficult
to apply in that environment. Data normalization can also be used to design
relational databases and, actually, is a better fit for relational databases than it
was for the hierarchical and network databases. But, as the relational approach
to database management and the entity-relationship approach to data modeling
both blossomed in the 1980s, a very natural and pleasing approach to logical
database design evolved in which rules were developed to convert E-R diagrams
into relational tables. Optionally, the result of this process can then be tested with the
data normalization technique. Thus, this chapter on the logical design of relational
databases will proceed in three parts: first, the conversion of E-R diagrams into
relational tables, then the data normalization technique, and finally the use of the
data normalization technique to test the tables resulting from the E-R diagram
conversions.

CONVERTING E-R DIAGRAMS INTO RELATIONAL TABLES

Introduction

Converting entity-relationship diagrams to relational tables is surprisingly straight-
forward, with just a few simple rules to follow. Basically, each entity will convert
to a table, plus each many-to-many relationship or associative entity will convert
to a table. The only other issue is that during the conversion, certain rules must be
followed to ensure that foreign keys appear in their proper places in the tables. We
will demonstrate these techniques by methodically converting the E-R diagrams of
Chapter 2 into relational tables.

Converting a Simple Entity

Figure 7.1 repeats the simple entity box in Figure 2.1. Figure 7.2 shows a relational
table that can store the data represented in the entity box. The table simply contains
the attributes that were specified in the entity box. Notice that Salesperson Number
is underlined to indicate that it is the unique identifier of the entity, and the primary
key of the table. Clearly, the more interesting issues and rules come about when, as
almost always happens, entities are involved in relationships with other entities.

Converting E-R Diagrams into Relational Tables 159

C O N C E P T S
I N A C T I O N

7-A ECOLAB

Ecolab is a $3-billion-plus developer
and marketer of cleaning, sanitizing, pest elimination,
and industrial maintenance and repair products and
services that was founded in 1923. Its customers include
restaurants, hotels, hospitals, food and beverage plants,
laundries, schools, and other retail and commercial
facilities. Headquartered in St. Paul, MN, Ecolab is truly
a global company, operating directly in 70 countries and
through distributors, licensees, and export operations in
an additional 100 countries. Its domestic and worldwide
operations are supported by 20,000 employees and
over 50 manufacturing and distribution facilities. A large
percentage of the employees are sales and service
individuals who work in a mobile, remote environment.

One of Ecolab’s applications with a significant
database component is called ‘‘EcoNet.’’ EcoNet gives
the large sales and service work force access to infor-
mation distributed across many databases. EcoNet pro-
vides Ecolab’s North American sales and service people
with a portal into pertinent information needed when

‘‘Photo Courtesy of Ecolab’’ Printed by permission of Ecolab, Inc. (c) 2002 Ecolab
Inc. All rights reserved. Ecolab Inc., 370 Wabasha Street North, St. Paul, Minnesota
55102, U.S.A.

interacting with customers for sales and service purposes.
EcoNet also enables the standardization of processes
across the sales and service organizations within the
seven various North American business units. This is
achieved by having one application get data from
different databases.

The system is also used as a sales planning tool.
Using EcoNet, a salesperson can access such customer
information as past and outstanding invoices, service
reports, and order status. The salesperson can also use
the system to place new orders. Being Web-based, Econet
can be accessed from a home or office PC, from a laptop
at the customer location, and even through handheld
devices. In addition, customers can view their own data
through ‘‘My Ecolab.com.’’

Implemented in 2002, EcoNet uses an interesting
mix of databases.

1. The transactional data, including the last six month’s
orders, is held in a Computer Associates IDMS

160 C h a p t e r 7 Logical Database Design

network-type database. EcoNet accesses this ‘‘up-
to-the-minute’’ information using screen scrapping
technology against the IBM mainframe computer
rather than migrating the data in real time to a
relational DBMS.

2. Completed transaction data is bridged nightly to a
data warehouse holding seven years of sales data in
IBM DB2 Unix.

3. Summarized Sales tables and Key Performance
Indicators are also bridged to Microsoft SQL Server
relational databases.

Ecolab is continually looking for additional informa-
tion to add to the EcoNet application in order to provide
their sales and service people with valuable information
when interacting with customers.

F I G U R E 7.1
The entity box from Figure 2.1

SALESPERSON

PK

Salesperson
Number

Salesperson
Name
Commission
Percentage
Year of Hire

F I G U R E 7.2
Conversion of an E-R diagram entity
box to a relational table

SALESPERSON
Salesperson Salesperson Commission Year
Number Name Percentage of Hire

Converting Entities in Binary Relationships

One-to-One Binary Relationship Figure 7.3 repeats the one-to-one binary relation-
ship of Figure 2.4a. There are three options for designing tables to represent this
data, as shown in Figure 7.4. In Figure 7.4a, the two entities are combined into one
relational table. On the one hand, this is possible because the one-to-one relationship
means that for one salesperson, there can only be one associated office and con-
versely, for one office there can be only one salesperson. So a particular salesperson
and office combination can fit together in one record, as shown in Figure 7.4a. On
the other hand, this design is not a good choice for two reasons. One reason is that
the very fact that salesperson and office were drawn in two different entity boxes
in the E-R diagram of Figure 7.3 means that they are thought of separately in this
business environment and thus should be kept separate in the database. The other
reason is the modality of zero at the salesperson in Figure 7.3. Reading that diagram
from right to left, it says that an office might have no one assigned to it. Thus, in
the table in Figure 7.4a, there could be a few or possibly many record occurrences
that have values for the office number, telephone, and size attributes but have the
four attributes pertaining to salespersons empty or null! This could result in a lot of
wasted storage space, but it is worse than that. If Salesperson Number is declared

Converting E-R Diagrams into Relational Tables 161

F I G U R E 7.3
The one-to-one (1-1) binary
relationship from Figure 2.4a

OFFICE

PK Office
Number

Telephone
Size

SALESPERSON

PK Salesperson
Number

Salesperson
Name
Commission
Percentage
Year of Hire

Works in

Occupied by

to be the primary key of the table, this scenario would mean that there would be
records with no primary key values, a situation which is clearly not allowed.

Figure 7.4b is a better choice. There are separate tables for the salesperson
and office entities. In order to record the relationship, i.e. which salesperson is
assigned to which office, the Office Number attribute is placed as a foreign key in
the SALESPERSON table. This connects the salespersons with the offices to which

F I G U R E 7.4
Conversion of an E-R diagram with two
entities in a one-to-one binary relationship
into one or two relational tables

a. One-to-one binary relationship converted to a single relational table.

b. One-to-one binary relationship converted to two relational tables, with the for-
eign key in the SALESPERSON table.

c. One-to-one binary relationship converted to two relational tables, with the for-
eign key in the OFFICE table.

OFFICE

Office Salesperson
Number Telephone Number Size

SALESPERSON

Salesperson Salesperson Commission Year of
Number Name Percentage Hire

OFFICE

Office
Number Telephone Size

SALESPERSON

Salesperson Salesperson Commission Year of Office
Number Name Percentage Hire Number

SALESPERSON/OFFICE

Salesperson Salesperson Commission Year of Office
Number Name Percentage Hire Number Telephone Size

162 C h a p t e r 7 Logical Database Design

they are assigned. Again, look at the modalities in the E-R diagram in Figure 7.3.
Reading from left to right, each salesperson is assigned to exactly one office
(indicated by the two ‘‘ones’’ adjacent to the office entity). That translates directly
into each record in the SALESPERSON table of Figure 7.4b having a value (and a
single value, at that) for its Office Number foreign key attribute. That’s good! But
what about the problem of unassigned offices mentioned in the previous paragraph?
In Figure 7.4b, unassigned offices will each have a record in the OFFICE table, with
Office Number as the primary key, which is fine. Their office numbers will simply
not appear as foreign key values in the SALESPERSON table.

Finally, instead of placing Office Number as a foreign key in the
SALESPERSON table, could you instead place Salesperson Number as a foreign key
in the OFFICE table, Figure 7.4c? Recall that, reading the E-R diagram of Figure 7.3
from right to left, the modality of zero adjacent to the salesperson entity says that
an office might be empty, i.e. it might not be assigned to any salesperson. But then,
some or perhaps many records of the OFFICE table of Figure 7.4c would have no
value or a null in their Salesperson Number foreign key attribute positions. Why
bother having to deal with this situation when the design in Figure 7.4b avoids it?

Certainly, it follows that if the modalities were reversed, meaning that the zero
modality was adjacent to the office entity box and the one modality was adjacent to
the salesperson entity box, then the design in Figure 7.4c would be the preferable
one. This would mean that every office must have a salesperson assigned to it but a
salesperson may or may not be assigned to an office. Perhaps lots of the salespersons
travel most of the time and don’t need offices. By the way, while we’re in ‘‘what if’’
mode, what if the modality was zero on both sides? Then there would be a judgment
call to make between the designs of Figure 7.4b and Figure 7.4c. If the goal is to
minimize the number of null values in the foreign key, then you have to decide
whether it is more likely that a salesperson is not assigned to an office (Figure 7.4c
is preferable) or that an office is empty (Figure 7.4b is preferable).

One-to-Many Binary Relationship Figure 7.5 (copied from Figure 2.4b) shows an
E-R diagram for a one-to-many binary relationship. Figure 7.6 shows the conversion
of this E-R diagram into two relational tables. This is, perhaps, the simplest case
of all. The rule is that the unique identifier of the entity on the ‘‘one side’’ of the
one-to-many relationship is placed as a foreign key in the table representing the
entity on the ‘‘many side.’’ In this case, the Salesperson Number attribute is placed
in the CUSTOMER table as a foreign key. Each salesperson has one record in
the SALESPERSON table, as does each customer in the CUSTOMER table. The
Salesperson Number attribute in the CUSTOMER table links the two and, since

F I G U R E 7.5
The one-to-many (1-M) binary
relationship from Figure 2.4b

CUSTOMER

PK

Customer
Number

Customer
Name
HQ City

SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire

Sells to

Buys from

Converting E-R Diagrams into Relational Tables 163

F I G U R E 7.6
Conversion of an E-R diagram with two
entities in a one-to-many binary
relationship into two relational tables

CUSTOMER

Customer Customer Salesperson
Number Name HQ City Number

SALESPERSON
Salesperson Salesperson Commission Year
Number Name Percentage of Hire

the E-R diagram tells us that every customer must have a salesperson, there are no
empty attributes in the CUSTOMER table records.

Many-to-Many Binary Relationship Figure 7.7 shows the E-R diagram with the
many-to-many binary relationship from Figure 2.5. The equivalent diagram from
Figure 2.6, using an associative entity, is shown in Figure 7.8. An E-R diagram with
two entities in a many-to-many relationship converts to three relational tables, as
shown in Figure 7.9. Each of the two entities converts to a table with its own attributes

F I G U R E 7.7
The many-to-many binary relationship
from Figure 2.5

Sells

Sold by

PRODUCT

PK

Product
Number

Product
Name
Unit Price

SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
Quantity

Sold

Sold by
Sold

Sold
Product

PRODUCT

PK Product
Number

Product
Name
Unit Price

SALESSALESPERSON

PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire

PK

Quantity
PK Product
Number
Salesperson
Number

F I G U R E 7.8
The associative entity from Figure 2.6

164 C h a p t e r 7 Logical Database Design

F I G U R E 7.9
Conversion of an E-R diagram in Figure
7.7 (and Figure 7.8) with two entities in
a many-to-many binary relationship into
three relational tables

SALE

Salesperson Product
Number Number Quantity
PRODUCT

Product Product
Number Name Unit Price

SALESPERSON
Salesperson Salesperson Commission Year
Number Name Percentage of Hire

but with no foreign keys (regarding this relationship). The SALESPERSON table
and the PRODUCT table in Figure 7.9 each contain only the attributes shown in the
salesperson and product entity boxes of Figure 7.7 and Figure 7.8.

In addition, there must be a third ‘‘many-to-many’’ table for the many-to-many
relationship, the reasons for which were explained in Chapter 5. The primary key of
this additional table is the combination of the unique identifiers of the two entities in
the many-to-many relationship. Additional attributes consist of the intersection data,
Quantity in this example. Also as explained in Chapter 5, there are circumstances
in which additional attributes, such as date and timestamp attributes, must be added
to the primary key of the many-to-many table to achieve uniqueness.

Converting Entities in Unary Relationships

One-to-One Unary Relationship Figure 7.10 repeats the E-R diagram with a one-
to-one unary relationship from Figure 2.7a. In this case, with only one entity type
involved and with a one-to-one relationship, the conversion requires only one table,
as shown in Figure 7.11. For a particular salesperson, the Backup Number attribute
represents the salesperson number of his backup person, i.e. the person who handles
his accounts when he is away for any reason.

F I G U R E 7.10
The one-to-one (1-1) unary relationship
from Figure 2.7a

SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire

Backs-up

Backed-up by

Converting E-R Diagrams into Relational Tables 165

F I G U R E 7.11
Conversion of the E-R diagram in Figure
7.10 with a one-to-one unary relationship
into a relational table

SALESPERSON

Salesperson Salesperson Commission Year Backup
Number Name Percentage of Hire Number

F I G U R E 7.12
The one-to-many (1-M) unary relationship
from Figure 2.7b

SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire

Manages

Reports to

One-to-Many Unary Relationship The one-to-many unary relationship situation is
very similar to the one-to-one unary case. Figure 7.12 repeats the E-R diagram
from Figure 2.7b. Figure 7.13 shows the conversion of this diagram into a relational
database. Some employees manage other employees. An employee’s manager
is recorded in the Manager Number attribute in the table in Figure 7.13. The
manager numbers are actually salesperson numbers since some salespersons are
sales managers who manage other salespersons. This arrangement works because
each employee has only one manager. For any particular SALESPERSON record,
there can only be one value for the Manager Number attribute. However, if you
scan down the Manager Number column, you will see that a particular value may
appear several times because a person can manage several other salespersons.

Many-to-Many Unary Relationship Figure 7.14 shows the E-R diagram for the
many-to-many unary relationship of Figure 2.7c. As Figure 7.15 indicates, this
relationship requires two tables in the conversion. The PRODUCT table has no
foreign keys. The COMPONENT table indicates which items go into making up
which other items, as was described in the bill-of-materials discussion in Chapter 6.
This table also contains any intersection data that may exist in the many-to-many
relationship. In this example, the Quantity attribute indicates how many of a
particular item go into making up another item.

The fact that we wind up with two tables in this conversion is really not
surprising. The general rule is that in the conversion of a many-to-many relationship
of any degree (unary, binary, or ternary), the number of tables will be equal to the
number of entity types (one, two, or three, respectively) plus one more table for
the many-to-many relationship. Thus, the conversion of the many-to-many unary
relationship required two tables, the many-to-many binary relationship three tables,
and, as will be shown next, the many-to-many ternary relationship four tables.

F I G U R E 7.13
Conversion of the E-R diagram in Figure
7.12 with a one-to-many unary
relationship into a relational table

SALESPERSON

Salesperson Salesperson Commission Year
Number Name Percentage of Hire Manager

166 C h a p t e r 7 Logical Database Design

F I G U R E 7.14
The many-to-many unary relationship
from Figure 2.7c

COMPONENT

PK
Quantity

PK Subassembly
Number

Product
Number
PRODUCT
PK Product
Number
Product
Name
Unit Price

Part of
Includes

Part of
Includes

F I G U R E 7.15
Conversion of the E-R diagram in Figure
7.14 with a many-to-many unary
relationship into two relational tables

COMPONENT

Product Subassembly
Number Number Quantity

PRODUCT
Product Product
Number Name Unit Price

Converting Entities in Ternary Relationships

Finally, Figure 7.16 repeats the E-R diagram with the ternary relationship from
Figure 2.8. Figure 7.17 shows the four tables necessary for the conversion to
relational tables. Notice that the primary key of the SALE table, which is the
table added for the many-to-many relationship, is the combination of the unique
identifiers of the three entities involved, plus the Date attribute. In this case, with
the premise being that a particular salesperson can have sold a particular product to
a particular customer on different days, the Date attribute is needed in the primary
key to achieve uniqueness.

Designing the General Hardware Co. Database

Having explored the specific E-R diagram-to-relational database conversion rules,
let’s look at a few examples, beginning with the General Hardware Co. Figure 7.18
is the General Hardware E-R diagram. It is convenient to begin the database
design process with an important, central E-R diagram entity, such as salesperson,
that has relationships with several other entities. Thus, the relational database in

Converting E-R Diagrams into Relational Tables 167

CUSTOMER

PK Customer
Number

Customer
Name
HQ City
SALE
PK Salesperson
Number
PK Product
Number
PK Customer
Number

Date
Quantity

SALESPERSON
PK Salesperson
Number
Salesperson
Name
Commission
Percentage
Year of Hire
PRODUCT
PK Product
Number
Product
Name
Unit Price

Purchased
Sold to

Sold
Sold
Product
Sold
Sold by

F I G U R E 7.16
The ternary relationship from Figure 2.8

F I G U R E 7.17
Conversion of the E-R diagram in Figure
7.16 with three entities in a ternary
relationship into four relational tables

SALE
Salesperson Customer Product
Number Number Number Date Quantity
PRODUCT
Product Product
Number Name Unit Price
CUSTOMER
Customer Customer
Number Name HQ City
SALESPERSON
Salesperson Salesperson Commission Year
Number Name Percentage of Hire

Figure 7.19 includes a SALESPERSON table with the four salesperson attributes
shown in Figure 7.18’s salesperson entity box (plus the Office Number attribute, to

168 C h a p t e r 7 Logical Database Design

F I G U R E 7.18
The General Hardware
Company E-R diagram

PK Employee
Number

Customer
Number
PK

CUSTOMER
EMPLOYEE

Employee
Name
Title

Office
Number

OFFICE
PK
Telephone
Size
Salesperson
Number
SALESPERSON
PK
Salesperson
Name
Commission
Percentage
Year of Hire
Customer
Number
CUSTOMER
PK
Customer
Name
HQ City
Product
Number
PRODUCT
PK
Product
Name
Unit Price
PK Product
Number
Salesperson
Number
SALES
PK
Quantity

Occupied by
Works in

Sells to

Buys from

Sold
Sold by

Sold
Sold
Product

Employs

Employed by

Converting E-R Diagrams into Relational Tables 169

F I G U R E 7.19
The General Hardware Company
relational database

OFFICE
Office
Number Telephone Size
SALES
Salesperson Product
Number Number Quantity
PRODUCT
Product Product
Number Name Unit Price
CUSTOMER EMPLOYEE
Customer Employee Employee
Number Number Name Title
CUSTOMER
Customer Customer Salesperson
Number Name Number HQ City
SALESPERSON
Salesperson Salesperson Commission Year Office
Number Name Percentage of Hire Number

which we will return shortly). To the right of the salesperson entity box in the E-R
diagram, there is a one-to-many relationship (‘‘Sells To’’) between salespersons and
customers. The database then includes a CUSTOMER table with the Salesperson
Number attribute as a foreign key, because salesperson is on the ‘‘one side’’ of the
one-to-many relationship and customer is on the ‘‘many side’’ of the one-to-many
relationship.

Customer employee is a dependent entity of customer and there is a one-to-
many relationship between them. Because of this relationship, the CUSTOMER
EMPLOYEE table in the database includes the Customer Number attribute as a
foreign key. Furthermore, the Customer Number attribute is part of the primary key
of the CUSTOMER EMPLOYEE table because customer employee is a dependent
entity and we’re told that employee numbers are unique only within a customer.

The PRODUCT table contains the three attributes of the product entity.
The many-to-many relationship between the salesperson and product entities is
represented by the SALES table in the database. Notice that the combination
of the unique identifiers (Salesperson Number and Product Number) of the two
entities in the many-to-many relationship is the primary key of the SALES table.
Finally, the office entity has its table in the database with its three attributes, which
brings us to the presence of the Office Number attribute as a foreign key in the

170 C h a p t e r 7 Logical Database Design

SALESPERSON table. This is needed to maintain the one-to-one binary relationship
between salesperson and office. A fair question is, since the relationship is ‘‘one’’
on both sides, why did we decide to put the foreign key in the SALESPERSON
table rather than in the OFFICE table? The answer lies in the fact that the modality
adjacent to SALESPERSON is zero while the modality adjacent to OFFICE is one.
An office may or may not have a salesperson assigned to it but a salesperson must
be assigned to an office. The result is that every salesperson must have an associated
office number; the Office Number attribute in the SALESPERSON table can’t be
null. If we reversed it and put the Salesperson Number attribute in the OFFICE
table, many of the Salesperson Number attribute values could be null since the zero
modality going from office to salesperson tells us that an office can be empty.

One last thought: Why did the PRODUCT table end-up without any foreign
keys? Because it is not the ‘‘target’’ (it is not on the ‘‘many side’’) of any one-to-
many binary relationship. It is also not involved in a one-to-one binary relationship
that would require the presence of a foreign key. Finally, it is not involved in a
unary relationship that would require repeating the primary key in the table.

Designing the Good Reading Bookstores Database

The Good Reading Bookstores’ E-R diagram is repeated in Figure 7.20. Beginning
with the central book entity and looking to its left, we see that there is a one-to-many

Publisher
Name

PUBLISHER

PK

City
Country
President
Year Founded

Customer
Number

Author
Number

Book
Number

BOOK
PK

Book Name
Publication
Year
Pages

PK Author
Number

Book
Number

WROTE

PK
PK Customer
Number
Book
Number
CUSTOMER
PK

Customer
Name
Street
City
State
Country

AUTHOR
PK

Author Name
Year Born
Year Died

SALE
PK

Date
Price
Quantity

Published

Published by

Wrote

Written by

Wrote
Written by

Bought

Bought by

Sold
In sale

F I G U R E 7.20
Good Reading Bookstores entity-relationship diagram

Converting E-R Diagrams into Relational Tables 171

F I G U R E 7.21
The Good Reading Bookstores
relational database

SALE
Book Customer
Number Number Date Price Quantity

WRITING

Book Author
Number Number
CUSTOMER
Customer Customer
Number Name Street City State Country
BOOK
Book Book Publication Publisher
Number Name Year Pages Name
AUTHOR
Author Author Year Year
Number Name Born Died
PUBLISHER
Publisher Year
Name City Country Telephone Founded

relationship between books and publishers. A publisher publishes many books but
a book is published by just one publisher. The Good Reading Bookstores relational
database of Figure 7.21 shows the BOOK and PUBLISHER tables. Publisher Name
is a foreign key in the BOOK table because publisher is on the ‘‘one side’’ of the one-
to-many relationship and book is on the ‘‘many side.’’ Next is the AUTHOR table,
which is straightforward. The many-to-many binary relationship between books and
authors is reflected in the WRITING table, which has no intersection data. Finally,
there is the customer entity and the many-to-many relationship between books and
customers. Correspondingly, the relational database includes a CUSTOMER table
and a SALE table to handle the many-to-many relationship. Notice the Date, Price,
and Quantity attributes appearing in the SALE table as intersection. Also notice that
since a customer can buy the same book on more than one day, the Date attribute
must be part of the primary key to achieve uniqueness.

Designing the World Music Association Database

Looking at the World Music Association E-R diagram in Figure 7.22, it appears that
the orchestra entity would be a good central starting point for the database design

172 C h a p t e r 7 Logical Database Design

Orchestra
Name

ORCHESTRA
PK

City
Country
Music
Director

Orchestra
Name

Composer
Name

Musician
Number

MUSICIAN
PK

Musician
Name
Instrument
Annual
Salary

PK Degree

Musician
Number
DEGREE
PK

University
Year

PK Composer
Name

Composition
Name

RECORDING
PK
Composition
Name
PK
Composer
Name
PK

Year
Price

COMPOSER

PK

Country
Date of Birth

COMPOSITION

PK
Year
Employs

Employed by Earned by

Earned

Recorded

Contains

Wrote
Written by

Recorded
Recorded by

F I G U R E 7.22
World Music Association entity-relationship diagram

process. Thus, the relational database in Figure 7.23 begins with the ORCHESTRA
table. The Orchestra Name foreign key in the MUSICIAN table reflects the one-to-
many relationship from orchestra to musician. Since degree is a dependent entity
of musician in a one-to-many relationship and degrees (e.g. B.A.) are unique only
within a musician, not only does Musician Number appear as a foreign key in the
DEGREE table but also it must be part of that table’s primary key. A similar situation
exists between the composer and composition entities, as shown in the COMPOSER
and COMPOSITION tables in the database. Finally, the many-to-many relationship
between orchestra and composition is converted into the RECORDING table.

Y O U R
T U R N

7.1 THE E-R DIAGRAM CONVERSION LOGICAL DESIGN TECHNIQUE

In Your Turn in Chapter 2, you
created an entity-relationship diagram for your university
environment.

QUESTION:
Using the logical design techniques just described, convert

your university E-R diagram into a logical database
design.

Converting E-R Diagrams into Relational Tables 173

F I G U R E 7.23
The World Music Association
relational database

RECORDING
Orchestra Composition
Name Name Year Price
COMPOSITION

Composition Composer
Name Name

Composer
Name
Year
COMPOSER

Composer Date of
Name Country Birth

DEGREE
Musician
Number Degree University Year
MUSICIAN
Musician Musician Annual Orchestra
Number Name Instrument Salary Name
ORCHESTRA
Orchestra Music
Name City Country Director

Notice that the primary key of the RECORDING table begins with the Orchestra
Name attribute and then continues with both the Composition Name and Composer
Name attributes. This is because the primary key of one of the two entities in the
many-to-many relationship, composition, is the combination of those two latter
attributes.

Designing the Lucky Rent-A-Car Database

Figure 7.24 shows the Lucky Rent-A-Car E-R diagram. The conversion to a
relational database structure begins with the car entity and its four attributes, as
shown in the CAR table of the database in Figure 7.25. Because car is on the ‘‘many
side’’ of a one-to-many relationship with the manufacturer entity, the CAR table
also has the Manufacturer Name attribute as a foreign key. The straightforward one-
to-many relationship from car to maintenance event produces a MAINTENANCE
EVENT table with Car Serial Number as a foreign key. The customer entity converts
to the CUSTOMER table with its four attributes. The many-to-many relationship
between car and customer converts to the RENTAL table. Car Serial Number, the
unique identifier of the car entity, and Customer Number, the unique identifier
of the customer entity, plus the Rental Date intersection data attribute form the
three-attribute primary key of the RENTAL table, with Return Date and Total
Cost as additional intersection data attributes. Rental Date has to be part of the

174 C h a p t e r 7 Logical Database Design

F I G U R E 7.24
Lucky Rent-A-Car entity-
relationship diagram

PK Customer
Number

Car Serial
Number

RENTAL

PK

Rental Date
Return Date
Total Cost

Manufacturer
Name

MANUFACTURER
PK

Manufacturer
Country
Sales Rep
Name
Sales Rep
Number

Car Serial
Number
CAR
PK

Model
Year
Class

Customer
Number
CUSTOMER
PK

Customer
Name
Customer
Address
Customer
Credit Rating

MAINTENANCE
EVENT

Manufactured
Manufactured by

Rented

Car rented

Repaired
Car Repaired

Rented
Rented by

Repair
Number

PK

Date
Procedure
Mileage
Repair Time

primary key to achieve uniqueness because a particular customer may have rented
a particular car on several different dates.

THE DATA NORMALIZATION PROCESS

Data normalization was the earliest formalized database design technique and
at one time was the starting point for logical database design. Today, with the
popularity of the Entity-Relationship model and other such diagramming tools and
the ability to convert its diagrams to database structures, data normalization is used
more as a check on database structures produced from E-R diagrams than as a

The Data Normalization Process 175

F I G U R E 7.25
The Lucky Rent-A-Car relational database

RENTAL
Car Serial Customer Rental Return Total
Number Number Date Date Cost
CUSTOMER
Customer Customer Customer Customer
Number Name Address Telephone
MAINTENANCE
Repair Car Serial Repair
Number Number Date Procedure Mileage Time
CAR
Car Serial Manufacturer
Number Model Year Class Name
MANUFACTURER
Manufacturer Manufacturer Sales Rep Sales Rep
Name Country Name Telephone

full-scale database design technique. That’s one of the reasons for learning about
data normalization. Another reason is that the data normalization process is another
way of demonstrating and learning about such important topics as data redundancy,
foreign keys, and other ideas that are so central to a solid understanding of database
management.

Data normalization is a methodology for organizing attributes into tables so
that redundancy among the non-key attributes is eliminated. Each of the resultant
tables deals with a single data focus, which is just another way of saying that
each resultant table will describe a single entity type or a single many-to-many
relationship. Furthermore, foreign keys will appear exactly where they are needed.
In other words, the output of the data normalization process is a properly structured
relational database.

Introduction to the Data Normalization Technique

The input required by the data normalization process has two parts. One is a list of all
the attributes that must be incorporated into the database: that is, all of the attributes
in all of the entities involved in the business environment under discussion plus all
of the intersection data attributes in all of the many-to-many relationships between
these entities. The other input, informally, is a list of all of the defining associations
among the attributes. Formally, these defining associations are known as functional
dependencies. And what are defining associations or functional dependencies? They
are a means of expressing that the value of one particular attribute is associated with

176 C h a p t e r 7 Logical Database Design

a specific single value of another attribute. If we know that one of these attributes
has a particular value, then the other attribute must have some other value. For
example, for a particular Salesperson Number, 137, there is exactly one Salesperson
Name, Baker, associated with it. Why is this true? In this example, a Salesperson
Number uniquely identifies a salesperson and, after all, a person can have only one
name! And this is true for every person! Informally, we might say that Salesperson
Number defines Salesperson Name. If I give you a Salesperson Number, you
can give me back the one and only name that goes with it. (It’s a little like the
concept of independent and dependent variables in mathematics. Take a value of the
independent variable, plug it into the formula and you get back the specific value of
the dependent variable associated with that independent variable.) These defining
associations are commonly written with a right-pointing arrow like this:

Salesperson Number Salesperson Name

In the more formal terms of functional dependencies, Salesperson Number, in
general the attribute on the left side, is referred to as the determinant. Why? Because
its value determines the value of the attribute on the right side. Conversely, we also
say that the attribute on the right is functionally dependent on the attribute on the left.

Data normalization is best explained with an example and this is a good place
to start one. In order to demonstrate the main points of the data normalization
process, we will modify part of the General Hardware Co. business environment
and focus on the salesperson and product entities. Let’s assume that salespersons are
organized into departments and each department has a manager who is not herself
a salesperson. Then the list of attributes we will consider is shown in Figure 7.26.
The list of defining associations or functional dependencies is shown in Figure 7.27.

Notice a couple of fine points about the list of defining associations in
Figure 7.27. The last association:

Salesperson Number, Product Number Quantity

shows that the combination of two or more attributes may possibly define another
attribute. That is, the combination of a particular Salesperson Number and a
particular Product Number defines or specifies a particular Quantity. Put another
way, in this business context, we know how many units of a particular product
a particular salesperson has sold. Another point, which will be important in

F I G U R E 7.26
List of attributes for salespersons
and products

Salesperson Number
Salesperson Name
Commission

Percentage
Year of Hire
Department

Number
Manager Name
Product Number
Product Name
Unit Price
Quantity

The Data Normalization Process 177

F I G U R E 7.27
List of defining associations (functional
dependencies) for the attributes of
salespersons and products

Salesperson Number Salesperson Name
Salesperson Number Commission Percentage
Salesperson Number Year of Hire

Salesperson Number Department Number

Salesperson Number Manager Name

Product Number Product Name
Product Number Unit Price

Department Number Manager Name

Salesperson Number, Product Number Quantity

demonstrating one step of the data normalization process, is that Manager Name
is defined, independently, by two different attributes: Salesperson Number and
Department Number:

Salesperson Number Manager Name
Department Number Manager Name

Both these defining associations are true! If I identify a salesperson by his
Salesperson Number, you can tell me who his manager is. Also, if I state a
department number, you can tell me who the manager of the department is. How
did we wind up with two different ways to define the same attribute? Very easily!
It simply means that during the systems analysis process, both these equally true
defining associations were discovered and noted. By the way, the fact that I know
the department that a salesperson works in:

Salesperson Number Department Number

(and that each of these two attributes independently define Manager Name) will
also be an issue in the data normalization process. More about this later.

Steps in the Data Normalization Process

The data normalization process is known as a ‘‘decomposition process.’’ Basically,
we are going to line up all the attributes that will be included in the relational
database and start subdividing them into groups that will eventually form the
database’s tables. Thus, we are going to ‘‘decompose’’ the original list of all of
the attributes into subgroups. To do this, we are going to step through a number
of normal forms. First, we will demonstrate what unnormalized data looks like.
After all, if data can exist in several different normal forms, then there should be
the possibility that data is in none of the normal forms, too! Then we will basically
work through the three main normal forms in order:

First Normal Form
Second Normal Form
Third Normal Form

There arc certain ‘‘exception conditions’’ that have also been described as normal
forms. These include the Boyce-Codd Normal Form, Fourth Normal Form, and
Fifth Normal Form. They are less common in practice and will not be covered here.

178 C h a p t e r 7 Logical Database Design

Here are three additional points to remember:

1. Once the attributes are arranged in third normal form (and if none of the
exception conditions are present), the group of tables that they comprise is, in
fact, a well-structured relational database with no data redundancy.

2. A group of tables is said to be in a particular normal form if every table in the
group is in that normal form.

3. The data normalization process is progressive. If a group of tables is in second
normal form it is also in first normal form. If they are in third normal form they
are also in second normal form.

Unnormalized Data Figure 7.28 shows the salesperson and product-related attributes
listed in Figure 7.26 arranged in a table with sample data. The salesperson and
product data is taken from the General Hardware Co. relational database of
Figure 5.14, with the addition of Department Number and Manager Name data.
Note that salespersons 137, 204, and 361 are all in department number 73 and their
manager is Scott. Salesperson 186 is in department number 59 and his manager is
Lopez.

The table in Figure 7.28 is unnormalized. The table has four records, one for
each salesperson. But, since each salesperson has sold several products and there is
only one record for each salesperson, several attributes of each record must have
multiple values. For example, the record for salesperson 137 has three product
numbers, 19440, 24013, and 26722, in its Product Number attribute, because
salesperson 137 has sold all three of those products. Having such multivalued
attributes is not permitted in first normal form, and so this table is unnormalized.

First Normal Form The table in Figure 7.29 is the first normal form representation
of the data. The attributes under consideration have been listed out in one table and

SALESPERSON/PRODUCT table

Salesperson Product Salesperson Commission Year of Department Manager Product Unit
Number Number Name Percentage Hire Number Name Name Price Quantity

137 19440 Baker 10 1995 73 Scott Hammer 17.50 473

24013 Saw 26.25 170
26722 Pliers 11.50 688

186 16386 Adams 15 2001 59 Lopez Wrench 12.95 1745
19440 Hammer 17.50 2529
21765 Drill 32.99 1962
24013 Saw 26.25 3071

204 21765 Dickens 10 1998 73 Scott Drill 32.99 809
26722 Pliers 11.50 734

361 16386 Carlyle 20 2001 73 Scott Wrench 12.95 3729
21765 Drill 32.99 3110
26722 Pliers 11.50 2738

F I G U R E 7.28
The salesperson and product attributes, unnormalized with sample data

The Data Normalization Process 179

SALESPERSON/PRODUCT table
Salesperson Product Salesperson Commission Year of Department Manager Product Unit
Number Number Name Percentage Hire Number Name Name Price Quantity

F I G U R E 7.29
The salesperson and product attributes in first normal form

a primary key has been established. As the sample data of Figure 7.30 shows, the
number of records has been increased (over the unnormalized representation) so
that every attribute of every record has just one value. The multivalued attributes
of Figure 7.28 have been eliminated. Indeed, the definition of first normal form is a
table in which every attribute value is atomic, that is, no attribute is multivalued.

The combination of the Salesperson Number and Product Number attributes
constitutes the primary key of this table. What makes this combination of attributes a
legitimate primary key? First of all, the business context tells us that the combination
of the two provides unique identifiers for the records of the table and that there is no
single attribute that will do the job. That, of course, is how we have been approaching
primary keys all along. Secondly, in terms of data normalization, according to the list
of defining associations or functional dependencies of Figure 7.27, every attribute
in the table is either part of the primary key or is defined by one or both attributes
of the primary key. Salesperson Name, Commission Percentage, Year of Hire,
Department Number, and Manager Name are each defined by Salesperson Number.
Product Name and Unit Price are each defined by Product Number. Quantity is
defined by the combination of Salesperson Number and Product Number.

Are these two different ways of approaching the primary key selection
equivalent? Yes! If the combination of a particular Salesperson Number and a
particular Product Number is unique, then it identifies exactly one record of the
table. And, if it identifies exactly one record of the table, then that record shows
the single value of each of the non-key attributes that is associated with the unique
combination of the key attributes.

SALESPERSON/PRODUCT table
Salesperson Product Salesperson Commission Year of Department Manager Product Unit
Number Number Name Percentage Hire Number Name Name Price Quantity

137 19440 Baker 10 1995 73 Scott Hammer 17.50 473
137 24013 Baker 10 1995 73 Scott Saw 26.25 170
137 26722 Baker 10 1995 73 Scott Pliers 11.50 688
186 16386 Adams 15 2001 59 Lopez Wrench 12.95 1475
186 19440 Adams 15 2001 59 Lopez Hammer 17.50 2529
186 21765 Adams 15 2001 59 Lopez Drill 32.99 1962
186 24013 Adams 15 2001 59 Lopez Saw 26.25 3071
204 21765 Dickens 10 1998 73 Scott Drill 32.99 809
204 26722 Dickens 10 1998 73 Scott Pliers 11.50 734
361 16386 Carlyle 20 2001 73 Scott Wrench 12.95 3729
361 21765 Carlyle 20 2001 73 Scott Drill 32.99 3110
361 26722 Carlyle 20 2001 73 Scott Pliers 11.50 2738

F I G U R E 7.30
The salesperson and product attributes in first normal form with sample data

180 C h a p t e r 7 Logical Database Design

But that is the same thing as saying that each of the non-key attributes is
defined by or is functionally dependent on the primary key! For example, consider
the first record of the table in Figure 7.30.

Sales-person Product Sales-person Commission Year of Department Manager Product Unit
Number Number Name Percentage Hire Number Name Name Price Quantity

137 19440 Baker 10 1995 73 Scott Hammer 17.50 473

The combination of Salesperson Number 137 and Product Number 19440 is
unique. There is only one record in the table that can have that combination of
Salesperson Number and Product Number values. Therefore, if someone specifies
those values, the only Salesperson Name that can be associated with them is Baker,
the only Commission Percentage is 10, and so forth. But that has the same effect
as the concept of functional dependency. Since Salesperson Name is functionally
dependent on Salesperson Number, given a particular Salesperson Number, say
137, there can be only one Salesperson Name associated with it, Baker. Since
Commission Percentage is functionally dependent on Salesperson Number, given
a particular Salesperson Number, say 137, there can be only one Commission
Percentage associated with it, 10. And so forth.

First normal form is merely a starting point in the normalization process. As
can immediately be seen from Figure 7.30, there is a great deal of data redundancy
in first normal form. There are three records involving salesperson 137 (the first
three records) and so there are three places in which his name is listed as Baker, his
commission percentage is listed as 10, and so on. Similarly, there are two records
involving product 19440 (the first and fifth records) and this product’s name is listed
twice as Hammer and its unit price is listed twice as 17.50. Intuitively, the reason
for this is that attributes of two different kinds of entities, salespersons and products,
have been mixed together in one table.

Second Normal Form Since data normalization is a decomposition process, the
next step will be to decompose the table of Figure 7.29 into smaller tables to
eliminate some of its data redundancy. And, since we have established that at least
some of the redundancy is due to mixing together attributes about salespersons
and attributes about products, it seems reasonable to want to separate them out at
this stage. Informally, what we are going to do is to look at each of the non-key
attributes of the table in Figure 7.29 and, on the basis of the defining associations
of Figure 7.27, decide which attributes of the key are really needed to define it. For
example, Salesperson Name really only needs Salesperson Number to define it; it
does not need Product Number. Product Name needs only Product Number to define
it; it does not need Salesperson Number. Quantity indeed needs both attributes,
according to the last defining association of Figure 7.27.

More formally, second normal form, which is what we are heading for, does
not allow partial functional dependencies. That is, in a table in second normal form,
every non-key attribute must be fully functionally dependent on the entire key of
that table. In plain language, a non-key attribute cannot depend on only part of
the key, in the way that Salesperson Name, Product Name, and most of the other
non-key attributes of Figure 7.29 do.

Figure 7.31 shows the salesperson and product attributes arranged in second
normal form. There is a SALESPERSON Table in which Salesperson Number is

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

image

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

image

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

  • Most Qualified Writer $10FREE
  • Plagiarism Scan Report $10FREE
  • Unlimited Revisions $08FREE
  • Paper Formatting $05FREE
  • Cover Page $05FREE
  • Referencing & Bibliography $10FREE
  • Dedicated User Area $08FREE
  • 24/7 Order Tracking $05FREE
  • Periodic Email Alerts $05FREE
image

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

  • On-time Delivery
  • 24/7 Order Tracking
  • Access to Authentic Sources
Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

image

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

Categories
All samples
Essay (any type)
Essay (any type)
The Value of a Nursing Degree
Undergrad. (yrs 3-4)
Nursing
2
View this sample

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate
image

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

See How We Helped 9000+ Students Achieve Success

image

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

  • Clear elicitation of your requirements.
  • Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

  • Proactive analysis of your writing.
  • Active communication to understand requirements.
image
image

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

  • Thorough research and analysis for every order.
  • Deliverance of reliable writing service to improve your grades.
image

Disclaimer: All Original and Custom Writing Services are solely for the purpose of your understanding and information.

Copyrights © 2024 Assignment Research Writer. All Rights Reserved.

Follow Us |

Order your essay today and save 30% with the discount code Happy