MySQL is a complex piece of software that may seem overwhelming when you're first trying to learn it. This article describes a set of best practices for MySQL administrators, architects, and developers that should help in the security, maintenance, and performance of a MySQL installation.
The first thing you should do with a clean MySQL install is set a password for the root user:
[01:19:00] george@firenze$ mysqladmin -u root password
'somepassword'
Once you've set the password, change the name of the "root" user to something else. A hack attempt on a MySQL server might target the one user that exists on most systems, "root", both because it has superuser powers and because it is a known user. By changing the name of the "root" user, you make it more difficult for would-be hackers to try a brute-force attack. The following sequence of commands will rename the "root" user:
[01:25:29] george@firenze$ mysql -u root -p mysql
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 72 to server version: 4.0.0-alpha-log
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> UPDATE user set user = 'admin' where user = 'root';
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2 Changed: 2 Warnings: 0
mysql> quit;
Bye
[01:25:51] george@firenze$ mysqladmin -u root -p reload
Enter password:
Of course, you may want to select a more creative alternative name than "admin".
MySQL has a pretty solid track record for security of a network service. Nevertheless, there simply is no good reason to expose MySQL directly to the Internet-- so don't do it. When you hide MySQL behind a firewall and enable communication to the server for only hosts running application servers and Web servers, you constrain the path of attack a would-be hacker might take.
First, MySQL should be installed under a special user ID such as "mysql". Second, no user on the system but MySQL should be able to access MySQL's data directories. If you grant access to those data directories to other users, you begin down a path of compromising the internal security built into MySQL. Even a database administrator should not be put into the "mysql" group. Instead, most DBA functions should pass through MySQL itself. For those few operations that need to happen at the file-system level, the DBA should login as the MySQL user.
It is true that MySQL supports binary data types. Just because you can, however, does not imply that you should. MySQL sends results to a client all at once. Consequently, any application parsing a result set with binary data needs to wait for each row to arrive before it can be processed. Furthermore, there is no real benefit to storing the binary data in MySQL.
A better approach to binary data is to store that data on the file system and store pointers to those files in MySQL. With this approach, you can actually stream the binary data in background threads while you process the result set.
This tip does not apply only to binary data; it applies to any kind of large data objects. The performance issues that plague binary data also plague character data. In other words, the reading of any part of a result set is done serially. You will take more notice with binary data since it is usually large. You will notice the problems equally with large character data. You do need to weigh the performance benefits of storing large character data on the file system against the ability to search against that data in the database.
MySQL provides many convenient additions to the ANSI standard that are very tempting for programmers. These additions include timesaving tools like multitable deletes and multirow inserts. When you rely on these features in a MySQL application, you limit the ability to adapt the application to any other database engine. In fact, you may make it impossible to port the application to another database without a significant rewrite. For maximal portability, you should therefore stick to ANSI SQL for your applications.
On the other hand, I do not want to give the impression that you should ignore these cool tools. They definitely have their place in MySQL maintenance. When you are working at the MySQL command line or you're building scripts specifically for the maintenance of a MySQL installation, it would be folly to ignore the advantages of these tools.
This "best practice" is related to the last one. MySQL provides a MySQL-specific tool
for generating unique sequences using the AUTO_INCREMENT
keyword. Specifically,
MySQL enables you to define one column in each table as auto-generating unique values
whenever you do an insert. Unfortunately, it is specific to MySQL and comes with some
important limitations:
You can have only one AUTO_INCREMENT
column per table.
You cannot have a unique sequence for multiple tables. For example, you cannot
use
You cannot easily determine from an application what values MySQL has automatically generated.
It is better to develop your own cross-platform approach to sequence generation. In the forthcoming O'Reilly book Java Best Practices: J2EE Edition, I describe such an approach. In short, this approach involves the creation of a special table in the database that can seed different sequences. You can have as many sequences as you like and even share a sequence across multiple columns--you simply access the sequence by name. An application then uses the seed from this special table to generate unique values in memory. There is no need to go back to the database until the list of available sequences for a seed is exhausted.
It is very difficult to maintain an application in which database code mingles with display code. An example of such a monstrosity is a JSP page that contains JDBC code. This situation should never happen.
Instead, applications are much easier to maintain when you divide application logic according to the model-view-controller (MVC) design pattern. This best practice applies both to Web programming and GUI application programming. In short, MVC forces you to split your code between the model (a component housing your database code), a view (a component that describes the user interface), and a controller (an object that handles user actions).
Normalization is the process of removing redundant data from your database design. By removing redundancies, you minimize the places where you have to maintain data consistency. The result is increased system maintainability. You should always take your data model to the Third Normal Form (3NF).
In some cases, normalization incurs performance penalties. These performance penalties do not occur, however, as often as people might lead you to believe. The process of optimizing the data model for performance by compromising its normalization is called denormalization. It is important to note that a data model that has not gone through normalization is not a denormalized database--it is instead unnormalized.
Denormalization is a very conscious, deliberate process by which normalizations to a data model are removed for specific performance goals. It should always occur after the data model is fully normalized and it should only occur when the benefits are readily provable.
Connecting to a database is an expensive operation. For client/server applications, this cost goes mostly without notice. Web servers and application servers, however, are constantly starting and ending user sessions with the database and therefore suffer if a new connection is made for each user session. Fortunately, most programming languages these days provide tools for pooling database connections. In other words, a connection pool enables you to reuse the same connection for multiple user sessions without constantly connecting and disconnecting. You should seek out and take advantage of this support in whatever your choice of programming language may be.
EXPLAIN SELECT
.EXPLAIN SELECT
is a critical SQL command to understand. Its output can help you
understand how your SQL will perform before you ever execute a single query with
them. It can also provide strong indications of where you need to make changes, such as with the
creation of indexes.
O'Reilly & Associates recently released (April 2002) Managing & Using MySQL, 2nd Edition.
Sample Chapter 13, Java, is available free online.
You can also look at the Table of Contents, the Index, and the Full Description of the book.
For more information, or to order the book, click here.
George Reese
(
oreillynet.com Copyright © 2003 O'Reilly & Associates, Inc.