start page | rating of books | rating of authors | reviews | copyrights

Book HomeManaging and Using MySQLSearch this book

3.7. Queries

The last common SQL command, SELECT, enables you to view the data in the database. This action is by far the most common action performed in SQL. While data entry and modifications do happen on occasion, most databases spend the vast majority of their lives serving up data for reading. The general form of the SELECT statement is as follows:

SELECT column1, column2, ..., columnN 
FROM table1, table2, ..., tableN
[WHERE clause]

This syntax is certainly the most common way to retrieve data from any SQL database. The SELECT statement enables you to identify the columns you want from one or more tables. The WHERE clause identifies the rows with the data you seek.

Of course, there are variations for performing complex and powerful queries. (We cover the full range of the SELECT syntax in Chapter 15.) The simplest form is:

SELECT 1;

This simple, though completely useless query returns a result set with a single row containing a single column with the value of 1. A more useful version of this query might be something like:

mysql> SELECT DATABASE( );
+------------+
| DATABASE( ) |
+------------+
| test       |
+------------+
1 row in set (0.01 sec)

The expression DATABASE( ) is a MySQL function that returns the name of the current database. (We will cover functions in more detail later in the chapter.) Nevertheless, you can see how simple SQL can provide a quick-and-dirty way of finding out important information.

Most of the time, however, you should use slightly more complex queries that help you pull data from a table in the database. The first part of a SELECT statement enumerates the columns you wish to retrieve. You may specify a * to say that you want to select all columns. The FROM clause specifies which tables those columns come from. The WHERE clause identifies the specific rows to be used and enables you to specify how to join two tables.

3.7.1. Joins

Joins put the "relational" in relational databases by enabling you to relate the data in one table with data in other tables. The basic form of a join is sometimes described as an inner join . Joining tables is a matter of specifying equality in columns from two tables:

SELECT book.title, author.name 
FROM author, book
WHERE book.author = author.id

This query pulls columns from two different tables where a relationship exists between rows in the two tables. Specifically, this query looks for situations in which the value of the author column in the book table matches the id value in the author table. Consider a database in which the book table looks like Table 3-3 and the author table looks like Table 3-4.

Table 3-3. A book table

ID

Title

Author

Pages

1

The Green Mile

4

894

2

Guards, Guards!

2

302

3

Imzadi

3

354

4

Gold

1

405

5

Howling Mad

3

294

Table 3-4. An author table

ID

Name

Citizen

1

Isaac Asimov

US

2

Terry Pratchett

UK

3

Peter David

US

4

Stephen King

US

5

Neil Gaiman

UK

An inner join creates a virtual table by combining the fields of both tables for rows that satisfy the query in both tables. In our example, the query specifies that the author field of the book table must be identical to the id field of the author table. The query's result would look like Table 3-5.

Table 3-5. Query results based on an inner join

Book title

Author name

The Green Mile

Stephen King

Guards, Guards!

Terry Pratchet

Imzadi

Peter David

Gold

Isaac Asimov

Howling Mad

Peter David

Neil Gaiman is nowhere to be found in these results. He is left out because there is no value for his author.id value found in the author column of the book table. In other words, he did not write any of the books in our database! An inner join contains only those rows that match the query exactly. We will discuss the concept of an outer join later in the chapter for situations in which we have an author in the database who does not have a book in the database.

3.7.2. Aliasing

When you use column names that are fully qualified with their table and column name, the names can grow to be quite unwieldy. In addition, when referencing SQL functions (which will be discussed later in the chapter), you will likely find it cumbersome to refer to the same function more than once within a statement. You can get around these issues by using aliases. An alias is usually a shorter and more descriptive way of referring to a cumbersome name. You can use it anywhere in the same SQL statement in place of the longer name. For example:

# A column alias
SELECT long_field_names_are_annoying AS myfield
FROM table_name
WHERE myfield = 'Joe'
# A table alias
SELECT people.names, tests.score 
FROM tests, really_long_people_table_name AS people

3.7.3. Ordering and Grouping

The results from a SELECT are, by default, indeterminate in the order they will appear. Fortunately, SQL provides some tools for imposing discipline on this seemingly random list: ordering and grouping.

3.7.3.1. Basic ordering

You can tell a database to order any results you see by a certain column. For example, if you specify that a query should order the results by last_name, then the results will appear alphabetized according to the last_name value. Ordering is handled by the ORDER BY clause:

SELECT last_name, first_name, age
FROM people
ORDER BY last_name, first_name

In this situation, we are ordering by two columns. You can order by any number of columns. You can also use the special ORDER BY RAND( ) clause to return results in a random order.

If you want to see things in reverse order, add the DESC (descending) keyword:

ORDER BY last_name DESC

The DESC keyword applies only to the field that comes directly before it. If you are sorting on multiple fields, only the field directly before DESC is reversed; the others are sorted in ascending order.

3.7.3.2. Localized sorting

Sorting is actually a complex problem for applications that need to run on computers all over the world. The rules for sorting strings vary from alphabet to alphabet, even when two alphabets use mostly the same symbols. MySQL handles the problem of sorting by making it dependent on the character set used by the MySQL engine. Out of the box, the default character set is ISO-8859-1 (Latin-1). MySQL uses the sorting rules for Swedish and Finnish with ISO-8859-1.

To change the sorting rules, you change the character set. First, you need to make sure the correct character set was compiled into the server when you compiled and installed MySQL. With the proper character set compiled into the server, you can change the default character set by launching the server with the argument --default-character-set=CHARSET.

Because of the simplicity of the English alphabet, the use of a single set of sorting rules MySQL associates with ISO-8859-1 does not affect English sorting. This is not true, however, for languages such as Swedish and German, which both use the ISO-8859-1 character set. Swedish sorts ä after z, while German sorts ä before a. The default rules therefore fail German users.

MySQL lets you address this problem by creating custom character sets. When you compile the driver, you can compile in support for whatever character sets you desire as long as you have a configuration file for that character set. This file contains the characters that make up the character set and the rules for sorting them. You can write your own or use the ones that come with MySQL.

TIP: The real problem here is that MySQL incorrectly associates sorting rules with character sets. A character set is nothing more than a grouping of characters with a related purpose. Nothing about the ISO-8859-1 character set implies sorting for Swedes, Italians, Germans, or anyone else. When working with MySQL, however, you need to remember that sorting rules are directly tied to the character set.

3.7.3.3. Grouping

Grouping lets you group rows with matching values for a specific column into a single row in order to operate on them together. You usually do this to perform aggregate functions on the results. We will go into functions a little later in the chapter.

Consider the following:

mysql> SELECT name, rank, salary FROM people;
+--------------+----------+--------+
| name         | rank     | salary |
+--------------+----------+--------+
| Jack Smith   | Private  |  23000 |
| Jane Walker  | General  | 125000 |
| June Sanders | Private  |  22000 |
| John Barker  | Sergeant |  45000 |
| Jim Castle   | Sergeant |  38000 |
+--------------+----------+--------+
5 rows in set (0.01 sec)

If you want to get a list of different ranks, you can use the GROUP BY clause to get a full account of the ranks:

mysql> SELECT rank FROM people GROUP BY rank;
+----------+
| rank     |
+----------+
| General  |
| Private  |
| Sergeant |
+----------+
3 rows in set (0.01 sec)

You should not, however, think of these results as simply a listing of the different ranks. The GROUP BY clause actually groups all of the rows matching the WHERE clause (in this case, every row) based on the GROUP BY clause. The two privates are thus grouped together into a single row with the rank Private. The two sergeants are similarly aggregated. With the individuals grouped according to rank, you can find out the average salary for each rank. Again, we will further discuss the functions you see in this example later in the chapter.

mysql> SELECT rank, AVG(salary) FROM people GROUP BY rank;
+----------+-------------+
| rank     | AVG(salary) |
+----------+-------------+
| General  | 125000.0000 |
| Private  |  22500.0000 |
| Sergeant |  41500.0000 |
+----------+-------------+
3 rows in set (0.04 sec)

Here you see the true power of grouping. This query uses an aggregate function, AVG( ), to operate on all of the rows grouped together for each row. In this case, the salaries of the two privates (23000 and 22000) are grouped together in the same row, and the AVG( ) function is applied to them.

The power of ordering and grouping combined with the utility of SQL functions enables you to do a great deal of data manipulation even before you retrieve the data from the server. However, you should take great care not to rely too heavily on this power. While it may seem more efficient to place as much processing load as possible onto the database server, this is not really the case. Your client application is dedicated to the needs of a particular client, while the server is shared by many clients. Because of the greater amount of work a server already has to do, it is almost always more efficient to place as little load as possible on the database server. MySQL may be the fastest database around, but you do not want to waste that speed on processing that can be handled by client applications.

3.7.4. Limiting Results

A WHERE clause is not the only way to constrain the results you see from a query. MySQL provides two other common mechanisms: HAVING and LIMIT.

You will most commonly use HAVING with the GROUP BY clause we just described. Like a WHERE clause, it defines your result set based on some set of calculations. Unlike a WHERE clause, it performs these calculations after your results have been retrieved from the tables in which they are stored. A WHERE clause, for example, scans the table in the database and pulls all records matching the WHERE clause. A HAVING clause, on the other hand, looks only at rows that have been pulled from a database after they have been extracted. The following query goes one step beyond our previous search for the average salary of different ranks in getting the average salaries only for ranks with an average salary greater than $100,000.

mysql> SELECT rank, AVG(salary) FROM people
     > GROUP BY rank HAVING AVG(salary) > 100000.00;
+----------+-------------+
| rank     | AVG(salary) |
+----------+-------------+
| General  | 125000.0000 |
+----------+-------------+
1 row in set (0.04 sec)

Restricting the result set in a WHERE clause would make no sense. If it were to be valid SQL, it would work on the entire table! Instead, we first want to perform the select and then find only those groups in the result set whose average salary is greater than $100,000. The HAVING clause enables us to perform that further restriction. More importantly, consider a case with both a WHERE clause and a HAVING clause:

mysql> SELECT rank, AVG(salary) FROM people
     > WHERE rank <> 'Private'
     > GROUP BY rank HAVING AVG(salary) > 100000.00;
+----------+-------------+
| rank     | AVG(salary) |
+----------+-------------+
| General  | 125000.0000 |
+----------+-------------+
1 row in set (0.02 sec)

Because the HAVING clause executes on the results of the query, the average is calculated only for generals and sergeants—not the excluded privates.

Sometimes an application is looking for only the first few rows that match a query. Limiting queries can help prevent bogging down the network with unwanted results. MySQL enables an application to limit the number of results through a LIMIT clause in a query:

SELECT * FROM people ORDER BY name LIMIT 10;

To get the last 10 people from the table, you can use the DESC keyword. If you want people from the middle, however, you have to get a bit trickier; you need to specify the number of the first record you want to see (record 0 is the first record, 1 the second) and the number of rows you want to see:

SELECT * FROM people ORDER BY name LIMIT 19, 30;

This sample displays records 20 through 49. The 19 in the LIMIT clause tells MySQL to start with record 19, which is the twentieth record. The 30 then tells MySQL to return the next 30 records.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.