Last column, we took a look at creating relationships between different tables in databases as part of our preparation for learning how to use SQL JOINs. This week, we dig deeper into each of the three types of relationships I introduced last week.
The one-to-many relationship is the workhorse of relational databases as well as being the easiest relationship to understand. Let's say you need to build a shopping cart application for an e-commerce site. Your first draft of the database has columns for Item1, Item2, and Item3 with the corresponding Quantity1, Quantity2, and Quantity3 fields.
OrderNum | ShippingInfo | Item1 | Quantity1 | Item2 | Quantity2 | Item3 | Quantity3 |
Of course, this immediately starts to break down with more than three orders! Any time you find yourself designing a database and adding similar fields like this to the same table, you need to break the table into two (or more!) related tables using a one-to-many relationship.
A one-to-many relationship allows records in Table 1 to be connected to an arbitrary number of records in Table 2 without the limitations imposed by resorting to redundant or limited numbers of fields in a single table. This reduces the size of the database and greatly increases the flexibility and performance of queries operating on that data. We can take our shopping cart example and break it into an Order table and an Item table quite simply.
Order Table
OrderID | ShippingInfo |
OrderItem Table
OrderItemID | OrderID | Item | Quantity |
The two tables are linked together using the OrderID field. The contents of any order in the Order table can easily be found by finding all the items with that value in the OrderID field. There is also the added advantage that the two pieces of data are independent and can easily be modified. If we now want to add an ItemNumber to the OrderItem table, we add a single column; in our original monolithic data table, we'd be adding ItemNumber1, ItemNumber2, etc.
One-to-one table relationships are a little more interesting and more underused than either of the other two types of relationships. The key indicator of a possible need for a one-to-one relationship is a table that contains fields that are only used for a certain subset of the records in that table.
Let's take a look at building a Catalog table for the items that your store sells. Odds are that you need to store some information about the individual items like catalog numbers, weight, and other common data. But if you're selling different kinds of items, books and CDs for example, you may want some item-specific information in the database. For example, you may want a page count, author, publish date, and ISBN for books, while you want playing time, number of tracks, artist, and label for the CDs. You could come up with some way to fit both sets of data into the same structure, but then when management decides you're also selling pet supplies, your system will probably break!
A better solution would be a one-to-one relationship between the Item table and another table of item-specific data for each type of item. The resulting structure is essentially one "master" table (CatalogItems) with one or more "subtables" (CDs and Books in this example). You link the two subtables to the master table through the primary key of the master table.
Catalog Table
CatalogID | Price | Description | QuantityOnHand |
CDs
CatalogID | PlayingTime | NumOfTracks | Artist | Label |
Books
CatalogID | PageCount | Author | PublishDate | ISBN |
It may take a few minutes for this design to sink in. As a comparison, here's what the proposed database table would look like as a single monolithic table.
The one-to-one relationship has saved us from doubling the number of fields in the Catalog table and, more importantly, helped us break the database into more discrete entities. In this scenario, we can get all the general information about an item from the Catalog table and can use the primary key of that table to pull up the appropriate information from the subtable.
Finally, there is the many-to-many table. This relationship is a little more complex than the one-to-many because, in addition to the two tables of data, we need another table to join the two tables of interest together. That's right, we're adding a table to the database -- but it is a simple table and saves us lots of effort down the road. As an example, let's say you want to add the ability to search for CDs by the musicians on any given song. From the musician side, you have one musician related to many songs.
Musician Table
MusicianID | MusicianName |
44 | Paul McCartney |
Song Table
SongID | MusicianID | SongName |
200 | 44 | Sgt. Pepper's Lonely Heart's Club Band |
201 | 44 | Ebony and Ivory |
But from the song side, you potentially have a song related to many musicians. The following visual represents that situation.
Song Table
SongID | SongName |
200 | Sgt. Pepper's Lonely Heart's Club Band |
Musician Table
MusicianID | SongID | MusicianName |
43 | 200 | John Lennon |
44 | 200 | Paul McCartney |
These two tables work individually, but when you try to put them together you end up with this mish-mash table.
Song Table
SongID | MusicianID | SongName |
200 | 43 | Sgt. Pepper's Lonely Heart's Club Band |
200 | 44 | Sgt. Pepper's Lonely Heart's Club Band |
201 | 44 | Ebony and Ivory |
Musician Table
MusicianID | SongID | MusicianName |
43 | 200 | John Lennon |
44 | 200 | Paul McCartney |
44 | 201 | Paul McCartney |
This has saved us nothing -- in fact, it has complicated the structure by introducing lots of redundant data to manage. The way to handle this situation is to create two one-to-many relationships involving a linking table which we'll call Song_Musician, since it links those tables. We create a one-to-many from Song to Song_Musician since one song will have 0-N musicians and then another one-to-many from Musician to Song_Musician since any one musician will be in one or more songs. The results look like the following:
Musician Table
MusicianID | MusicianName |
43 | John Lennon |
44 | Paul McCartney |
Song_Musician Table
SongID | MusicianID |
200 | 43 |
200 | 44 |
201 | 44 |
Song Table
SongID | SongName |
200 | Sgt. Pepper's Lonely Heart's Club Band |
201 | Ebony and Ivory |
This time around, all of the redundant data is in the Song_Musician table, which is only two columns of integers. Any changes to the structure of the Song or Musician table remain independent of their relationship, which is precisely what we're after.
After our whirlwind two-week tour of database relationships, we're now ready to jump into the SQL JOIN statement, which is the core of what relational databases are all about. Until then, feel free to contact me with comments, questions, and criticisms aboutSQL.
John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.
oreillynet.com Copyright © 2003 O'Reilly & Associates, Inc.