start page | rating of books | rating of authors | reviews | copyrights

JavaScript: The Definitive GuideJavaScript: The Definitive GuideSearch this book

10.2. String Methods for Pattern Matching

Until now, we've been discussing the grammar used to create regular expressions, but we haven't examined how those regular expressions can actually be used in JavaScript code. In this section, we discuss methods of the String object that use regular expressions to perform pattern matching and search-and-replace operations. In the sections that follow this one, we'll continue the discussion of pattern matching with JavaScript regular expressions by discussing the RegExp object and its methods and properties. Note that the discussion that follows is merely an overview of the various methods and properties related to regular expressions. As usual, complete details can be found in the core reference section of this book.

Strings support four methods that make use of regular expressions. The simplest is search( ). This method takes a regular expression argument and returns either the character position of the start of the first matching substring, or -1 if there is no match. For example, the following call returns 4:

"JavaScript".search(/script/i); 

If the argument to search( ) is not a regular expression, it is first converted to one by passing it to the RegExp constructor. search( ) does not support global searches -- it ignores the g flag of its regular expression argument.

The replace( ) method performs a search-and-replace operation. It takes a regular expression as its first argument and a replacement string as its second argument. It searches the string on which it is called for matches with the specified pattern. If the regular expression has the g flag set, the replace( ) method replaces all matches in the string with the replacement string; otherwise, it replaces only the first match it finds. If the first argument to replace( ) is a string rather than a regular expression, the method searches for that string literally rather than converting it to a regular expression with the RegExp( ) constructor, as search( ) does. As an example, we could use replace( ) as follows to provide uniform capitalization of the word "JavaScript" throughout a string of text:

// No matter how it is capitalized, replace it with the correct capitalization
text.replace(/javascript/gi, "JavaScript"); 

replace( ) is more powerful than this, however. Recall that parenthesized subexpressions of a regular expression are numbered from left to right and that the regular expression remembers the text that each subexpression matches. If a $ followed by a digit appears in the replacement string, replace( ) replaces those two characters with the text that matched the specified subexpression. This is a very useful feature. We can use it, for example, to replace straight quotes in a string with curly quotes, simulated with ASCII characters:

// A quote is a quotation mark, followed by any number of
// non-quotation-mark characters (which we remember), followed
// by another quotation mark.
var quote = /"([^"]*)"/g;
// Replace the straight quotation marks with "curly quotes,"
// and leave the contents of the quote (stored in $1) unchanged.
text.replace(quote, "``$1''"); 

The replace( ) method has other important features as well, which are described in the "String.replace( )" reference page in the core reference section. Most notably, the second argument to replace( ) can be a function that dynamically computes the replacement string.

The match( ) method is the most general of the String regular expression methods. It takes a regular expression as its only argument (or converts its argument to a regular expression by passing it to the RegExp( ) constructor) and returns an array that contains the results of the match. If the regular expression has the g flag set, the method returns an array of all matches that appear in the string. For example:

"1 plus 2 equals 3".match(/\d+/g)  // returns ["1", "2", "3"] 

If the regular expression does not have the g flag set, match( ) does not do a global search; it simply searches for the first match. However, match( ) returns an array even when it does not perform a global search. In this case, the first element of the array is the matching string, and any remaining elements are the parenthesized subexpressions of the regular expression. Thus, if match( ) returns an array a, a[0] contains the complete match, a[1] contains the substring that matched the first parenthesized expression, and so on. To draw a parallel with the replace( ) method, a[n] holds the contents of $n.

For example, consider parsing a URL with the following code:

var url = /(\w+):\/\/([\w.]+)\/(\S*)/;
var text = "Visit my home page at http://www.isp.com/~david";
var result = text.match(url);
if (result != null) {
    var fullurl = result[0];   // Contains "http://www.isp.com/~david"
    var protocol = result[1];  // Contains "http"
    var host = result[2];      // Contains "www.isp.com"
    var path = result[3];      // Contains "~david"
} 

Finally, there is one more feature of the match( ) method that you should know about. The array it returns has a length property, as all arrays do. When match( ) is invoked on a nonglobal regular expression, however, the returned array also has two other properties: the index property, which contains the character position within the string at which the match begins; and the input property, which is a copy of the target string. So in the previous code, the value of the result.index property would be 21, since the matched URL begins at character position 21 in the text. The result.input property would hold the same string as the text variable. For a regular expression r that does not have the g flag set, calling s.match(r) returns the same value as r.exec(s). We'll discuss the RegExp.exec( ) method a little later in this chapter.

The last of the regular expression methods of the String object is split( ). This method breaks the string on which it is called into an array of substrings, using the argument as a separator. For example:

"123,456,789".split(",");  // Returns ["123","456","789"] 

The split( ) method can also take a regular expression as its argument. This ability makes the method more powerful. For example, we can now specify a separator character that allows an arbitrary amount of whitespace on either side:

"1,2, 3 , 4 ,5".split(/\s*,\s*/); // Returns ["1","2","3","4","5"] 

The split( ) method has other features as well. See the "String.split( )" entry in the core reference section for complete details.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.