Languages were first invented by humans, for the benefit of humans. In the annals of computer science, this fact has occasionally been forgotten.[2] Since Perl was designed (loosely speaking) by an occasional linguist, it was designed to work smoothly in the same ways that natural language works smoothly. Naturally, there are many aspects to this, since natural language works well at many levels simultaneously. We could enumerate many of these linguistic principles here, but the most important principle of language design is that easy things should be easy, and hard things should be possible. (Actually, that's two principles.) They may seem obvious to you, but many computer languages fail at one or the other.
[2] More precisely, this fact has occasionally been remembered.
Natural languages are good at both because people are continually trying to express both easy things and hard things, so the language evolves to handle both. Perl was designed first of all to evolve, and indeed it has evolved. Many people have contributed to the evolution of Perl over the years. We often joke that a camel is a horse designed by a committee, but if you think about it, the camel is pretty well adapted for life in the desert. The camel has evolved to be relatively self-sufficient. (On the other hand, the camel has not evolved to smell good. Neither has Perl.) This is one of the many strange reasons we picked the camel to be Perl's mascot, but it doesn't have much to do with linguistics.
Now when someone utters the word "linguistics", many folks focus in on one of two things. Either they think of words, or they think of sentences. But words and sentences are just two handy ways to "chunk" speech. Either may be broken down into smaller units of meaning or combined into larger units of meaning. And the meaning of any unit depends heavily on the syntactic, semantic, and pragmatic context in which the unit is located. Natural language has words of various sorts: nouns and verbs and such. If someone says "dog" in isolation, you think of it as a noun, but you can also use the word in other ways. That is, a noun can function as a verb, an adjective, or an adverb when the context demands it. If you dog a dog during the dog days of summer, you'll be a dog tired dogcatcher.[3]
[3] And you're probably dog tired of all this linguistics claptrap. But we'd like you to understand why Perl is different from the typical computer language, doggone it!
Perl also evaluates words differently in various contexts. We will see how it does that later. Just remember that Perl is trying to understand what you're saying, like any good listener does. Perl works pretty hard to try to keep up its end of the bargain. Just say what you mean, and Perl will usually "get it". (Unless you're talking nonsense, of course--the Perl parser understands Perl a lot better than either English or Swahili.)
But back to nouns. A noun can name a particular object, or it can name a class of objects generically without specifying which one is currently being referred to. Most computer languages make this distinction, only we call the particular one a value and the generic one a variable. A value just exists somewhere, who knows where, but a variable gets associated with one or more values over its lifetime. So whoever is interpreting the variable has to keep track of that association. That interpreter may be in your brain or in your computer.
A variable is just a handy place to keep something, a place with a name, so you know where to find your special something when you come back looking for it later. As in real life, there are various kinds of places to store things, some of them rather private, and some of them out in public. Some places are temporary, and other places are more permanent. Computer scientists love to talk about the "scope" of variables, but that's all they mean by it. Perl has various handy ways of dealing with scoping issues, which you'll be happy to learn later when the time is right. Which is not yet. (Look up the adjectives local, my, and our in Chapter 29, "Functions", when you get curious, or see "Scoped Declarations" in Chapter 4, "Statements and Declarations".)
But a more immediately useful way of classifying variables is by what sort of data they can hold. As in English, Perl's primary type distinction is between singular and plural data. Strings and numbers are singular pieces of data, while lists of strings or numbers are plural. (And when we get to object-oriented programming, you'll find that the typical object looks singular from the outside but plural from the inside, like a class of students.) We call a singular variable a scalar, and a plural variable an array. Since a string can be stored in a scalar variable, we might write a slightly longer (and commented) version of our first example like this:
Note that we did not have to predefine what kind of variable $phrase is. The $ character tells Perl that phrase is a scalar variable, that is, one containing a singular value. An array variable, by contrast, would start with an @ character. (It may help you to remember that a $ is a stylized "s", for "scalar", while @ is a stylized "a", for "array".)$phrase = "Howdy, world!\n"; # Set a variable. print $phrase; # Print the variable.
Perl has some other variable types, with unlikely names like "hash", "handle", and "typeglob". Like scalars and arrays, these types of variables are also preceded by funny characters. For completeness, here are all the funny characters you'll encounter:
Some language purists point to these funny characters as a reason to abhor Perl. This is superficial. These characters have many benefits, not least of which is that variables can be interpolated into strings with no additional syntax. Perl scripts are also easy to read (for people who have bothered to learn Perl!) because the nouns stand out from verbs. And new verbs can be added to the language without breaking old scripts. (We told you Perl was designed to evolve.) And the noun analogy is not frivolous--there is ample precedent in English and other languages for requiring grammatical noun markers. It's how we think! (We think.)
From our earlier example, you can see that scalars may be assigned a new value with the = operator, just as in many other computer languages. Scalar variables can be assigned any form of scalar value: integers, floating-point numbers, strings, and even esoteric things like references to other variables, or to objects. There are many ways of generating these values for assignment.
As in the Unix[4]shell, you can use different quoting mechanisms to make different kinds of values. Double quotation marks (double quotes) do variable interpolation[5] and backslash interpolation (such as turning \n into a newline) while single quotes suppress interpolation. And backquotes (the ones leaning to the left) will execute an external program and return the output of the program, so you can capture it as a single string containing all the lines of output.
And while we haven't covered fancy values yet, we should point out that scalars may also hold references to other data structures, including subroutines and objects.$answer = 42; # an integer $pi = 3.14159265; # a "real" number $avocados = 6.02e23; # scientific notation $pet = "Camel"; # string $sign = "I love my $pet"; # string with interpolation $cost = 'It costs $100'; # string without interpolation $thence = $whence; # another variable's value $salsa = $moles * $avocados; # a gastrochemical expression $exit = system("vi $file"); # numeric status of a command $cwd = `pwd`; # string output from a command
$ary = \@myarray; # reference to a named array $hsh = \%myhash; # reference to a named hash $sub = \&mysub; # reference to a named subroutine $ary = [1,2,3,4,5]; # reference to an unnamed array $hsh = {Na => 19, Cl => 35}; # reference to an unnamed hash $sub = sub { print $state }; # reference to an unnamed subroutine $fido = new Camel "Amelia"; # reference to an object
[4]Here and elsewhere, when we say Unix, we mean any operating system resembling Unix, including BSD, Linux, and, of course, Unix.
[5]Sometimes called "substitution" by shell programmers, but we prefer to reserve that word for something else in Perl. So please call it interpolation. We're using the term in the textual sense ("this passage is a Gnostic interpolation") rather than in the mathematical sense ("this point on the graph is an interpolation between two other points").
If you use a variable that has never been assigned a value, the uninitialized variable automatically springs into existence as needed. Following the principle of least surprise, the variable is created with a null value, either "" or 0. Depending on where you use them, variables will be interpreted automatically as strings, as numbers, or as "true" and "false" values (commonly called Boolean values). Remember how important context is in human languages. In Perl, various operators expect certain kinds of singular values as parameters, so we will speak of those operators as "providing" or "supplying" a scalar context to those parameters. Sometimes we'll be more specific, and say it supplies a numeric context, a string context, or a Boolean context to those parameters. (Later we'll also talk about list context, which is the opposite of scalar context.) Perl will automatically convert the data into the form required by the current context, within reason. For example, suppose you said this:
The original value of $camels is a string, but it is converted to a number to add 1 to it, and then converted back to a string to be printed out as 124. The newline, represented by "\n", is also in string context, but since it's already a string, no conversion is necessary. But notice that we had to use double quotes there--using single quotes to say '\n' would result in a two-character string consisting of a backslash followed by an "n", which is not a newline by anybody's definition.$camels = '123'; print $camels + 1, "\n";
So, in a sense, double quotes and single quotes are yet another way of specifying context. The interpretation of the innards of a quoted string depends on which quotes you use. (Later, we'll see some other operators that work like quotes syntactically but use the string in some special way, such as for pattern matching or substitution. These all work like double-quoted strings too. The double-quote context is the "interpolative" context of Perl, and is supplied by many operators that don't happen to resemble double quotes.)
Similarly, a reference behaves as a reference when you give it a "dereference" context, but otherwise acts like a simple scalar value. For example, we might say:
Here we create a reference to a Camel object and put it into the variable $fido. On the next line, we test $fido as a scalar Boolean to see if it is "true", and we throw an exception (that is, we complain) if it is not true, which in this case would mean that the new Camel constructor failed to make a proper Camel object. But on the last line, we treat $fido as a reference by asking it to look up the saddle() method for the object held in $fido, which happens to be a Camel, so Perl looks up the saddle() method for Camel objects. More about that later. For now, just remember that context is important in Perl because that's how Perl knows what you want without your having to say it explicitly, as many other computer languages force you to do.$fido = new Camel "Amelia"; if (not $fido) { die "dead camel"; } $fido->saddle();
Some kinds of variables hold multiple values that are logically tied together. Perl has two types of multivalued variables: arrays and hashes. In many ways, these behave like scalars--they spring into existence with nothing in them when needed, for instance. But they are different from scalars in that, when you assign to them, they supply a list context to the right side of the assignment rather than a scalar context.
Arrays and hashes also differ from each other. You'd use an array when you want to look something up by number. You'd use a hash when you want to look something up by name. The two concepts are complementary. You'll often see people using an array to translate month numbers into month names, and a corresponding hash to translate month names back into month numbers. (Though hashes aren't limited to holding only numbers. You could have a hash that translates month names to birthstone names, for instance.)
An array is an ordered list of scalars, accessed[6] by the scalar's position in the list. The list may contain numbers, or strings, or a mixture of both. (It might also contain references to subarrays or subhashes.) To assign a list value to an array, you simply group the values together (with a set of parentheses):
Conversely, if you use @home in a list context, such as on the right side of a list assignment, you get back out the same list you put in. So you could set four scalar variables from the array like this:@home = ("couch", "chair", "table", "stove");
These are called list assignments. They logically happen in parallel, so you can swap two variables by saying:($potato, $lift, $tennis, $pipe) = @home;
As in C, arrays are zero-based, so while you would talk about the first through fourth elements of the array, you would get to them with subscripts 0 through 3.[7] Array subscripts are enclosed in square brackets [like this], so if you want to select an individual array element, you would refer to it as $home[n], where n is the subscript (one less than the element number) you want. See the example that follows. Since the element you are dealing with is a scalar, you always precede it with a $.($alpha,$omega) = ($omega,$alpha);
[6] Or keyed, or indexed, or subscripted, or looked up. Take your pick.
[7] If this seems odd to you, just think of the subscript as an offset, that is, the count of how many array elements come before it. Obviously, the first element doesn't have any elements before it, and so has an offset of 0. This is how computers think. (We think.)
If you want to assign to one array element at a time, you could write the earlier assignment as:
Since arrays are ordered, you can do various useful operations on them, such as the stack operations push and pop. A stack is, after all, just an ordered list, with a beginning and an end. Especially an end. Perl regards the end of your array as the top of a stack. (Although most Perl programmers think of an array as horizontal, with the top of the stack on the right.)$home[0] = "couch"; $home[1] = "chair"; $home[2] = "table"; $home[3] = "stove";
A hash is an unordered set of scalars, accessed[8] by some string value that is associated with each scalar. For this reason hashes are often called associative arrays. But that's too long for lazy typists to type, and we talk about them so often that we decided to name them something short and snappy. The other reason we picked the name "hash" is to emphasize the fact that they're disordered. (They are, coincidentally, implemented internally using a hash-table lookup, which is why hashes are so fast, and stay so fast no matter how many values you put into them.) You can't push or pop a hash though, because it doesn't make sense. A hash has no beginning or end. Nevertheless, hashes are extremely powerful and useful. Until you start thinking in terms of hashes, you aren't really thinking in Perl. Figure 1-1 shows the ordered elements of an array and the unordered (but named) elements of a hash.
[8] Or keyed, or indexed, or subscripted, or looked up. Take your pick.
Since the keys to a hash are not automatically implied by their position, you must supply the key as well as the value when populating a hash. You can still assign a list to it like an ordinary array, but each pair of items in the list will be interpreted as a key and a value. Since we're dealing with pairs of items, hashes use the funny character % to mark hash names. (If you look carefully at the % character, you can see the key and the value with a slash between them. It may help to squint.)
Suppose you wanted to translate abbreviated day names to the corresponding full names. You could write the following list assignment:
But that's rather difficult to read, so Perl provides the => (equals sign, greater-than sign) sequence as an alternative separator to the comma. Using this syntactic sugar (and some creative formatting), it is much easier to see which strings are the keys and which strings are the associated values.%longday = ("Sun", "Sunday", "Mon", "Monday", "Tue", "Tuesday", "Wed", "Wednesday", "Thu", "Thursday", "Fri", "Friday", "Sat", "Saturday");
Not only can you assign a list to a hash, as we did above, but if you mention a hash in list context, it'll convert the hash back to a list of key/value pairs, in a weird order. This is occasionally useful. More often people extract a list of just the keys, using the (aptly named) keys function. The key list is also unordered, but can easily be sorted if desired, using the (aptly named) sort function. Then you can use the ordered keys to pull out the corresponding values in the order you want.%longday = ( "Sun" => "Sunday", "Mon" => "Monday", "Tue" => "Tuesday", "Wed" => "Wednesday", "Thu" => "Thursday", "Fri" => "Friday", "Sat" => "Saturday", );
Because hashes are a fancy kind of array, you select an individual hash element by enclosing the key in braces (those fancy brackets also known as "curlies"). So, for example, if you want to find out the value associated with Wed in the hash above, you would use $longday{"Wed"}. Note again that you are dealing with a scalar value, so you use $ on the front, not %, which would indicate the entire hash.
Linguistically, the relationship encoded in a hash is genitive or possessive, like the word "of" in English, or like "'s". The wife of Adam is Eve, so we write:
$wife{"Adam"} = "Eve";
Arrays and hashes are lovely, simple, flat data structures. Unfortunately, the world does not always cooperate with our attempts to oversimplify. Sometimes you need to build not-so-lovely, not-so-simple, not-so-flat data structures. Perl lets you do this by pretending that complicated values are really simple ones. To put it the other way around, Perl lets you manipulate simple scalar references that happen to refer to complicated arrays and hashes. We do this all the time in natural language when we use a simple singular noun like "government" to represent an entity that is completely convoluted and inscrutable. Among other things.
To extend our previous example, suppose we want to switch from talking about Adam's wife to Jacob's wife. Now, as it happens, Jacob had four wives. (Don't try this at home.) In trying to represent this in Perl, we find ourselves in the odd situation where we'd like to pretend that Jacob's four wives were really one wife. (Don't try this at home, either.) You might think you could write it like this:
But that wouldn't do what you want, because even parentheses and commas are not powerful enough to turn a list into a scalar in Perl. (Parentheses are used for syntactic grouping, and commas for syntactic separation.) Rather, you need to tell Perl explicitly that you want to pretend that a list is a scalar. It turns out that square brackets are powerful enough to do that:$wife{"Jacob"} = ("Leah", "Rachel", "Bilhah", "Zilpah"); # WRONG
That statement creates an unnamed array and puts a reference to it into the hash element $wife{"Jacob"}. So we have a named hash containing an unnamed array. This is how Perl deals with both multidimensional arrays and nested data structures. As with ordinary arrays and hashes, you can also assign individual elements, like this:$wife{"Jacob"} = ["Leah", "Rachel", "Bilhah", "Zilpah"]; # ok
You can see how that looks like a multidimensional array with one string subscript and one numeric subscript. To see something that looks more tree-structured, like a nested data structure, suppose we wanted to list not only Jacob's wives but all the sons of each of his wives. In this case we want to treat a hash as a scalar. We can use braces for that. (Inside each hash value we'll use square brackets to represent arrays, just as we did earlier. But now we have an array in a hash in a hash.)$wife{"Jacob"}[0] = "Leah"; $wife{"Jacob"}[1] = "Rachel"; $wife{"Jacob"}[2] = "Bilhah"; $wife{"Jacob"}[3] = "Zilpah";
That would be more or less equivalent to saying:$kids_of_wife{"Jacob"} = { "Leah" => ["Reuben", "Simeon", "Levi", "Judah", "Issachar", "Zebulun"], "Rachel" => ["Joseph", "Benjamin"], "Bilhah" => ["Dan", "Naphtali"], "Zilpah" => ["Gad", "Asher"], };
You can see from this that adding a level to a nested data structure it is like adding another dimension to a multidimensional array. Perl lets you think of it either way, but the internal representation is the same.$kids_of_wife{"Jacob"}{"Leah"}[0] = "Reuben"; $kids_of_wife{"Jacob"}{"Leah"}[1] = "Simeon"; $kids_of_wife{"Jacob"}{"Leah"}[2] = "Levi"; $kids_of_wife{"Jacob"}{"Leah"}[3] = "Judah"; $kids_of_wife{"Jacob"}{"Leah"}[4] = "Issachar"; $kids_of_wife{"Jacob"}{"Leah"}[5] = "Zebulun"; $kids_of_wife{"Jacob"}{"Rachel"}[0] = "Joseph"; $kids_of_wife{"Jacob"}{"Rachel"}[1] = "Benjamin"; $kids_of_wife{"Jacob"}{"Bilhah"}[0] = "Dan"; $kids_of_wife{"Jacob"}{"Bilhah"}[1] = "Naphtali"; $kids_of_wife{"Jacob"}{"Zilpah"}[0] = "Gad"; $kids_of_wife{"Jacob"}{"Zilpah"}[1] = "Asher";
The important point here is that Perl lets you pretend that a complex data structure is a simple scalar. On this simple kind of encapsulation, Perl's entire object-oriented structure is built. When we earlier invoked the Camel constructor like this:
we created a Camel object that is represented by the scalar $fido. But the inside of the Camel is more complicated. As well-behaved object-oriented programmers, we're not supposed to care about the insides of Camels (unless we happen to be the people implementing the methods of the Camel class). But generally, an object like a Camel would consist of a hash containing the particular Camel's attributes, such as its name ("Amelia" in this case, not "fido"), and the number of humps (which we didn't specify, but probably defaults to 1; check the front cover).$fido = new Camel "Amelia";
If your head isn't spinning a bit from reading that last section, then you have an unusual head. People don't generally like to deal with complex data structures, whether governmental or genealogical. So in our natural languages, we have many ways of sweeping complexity under the carpet. Many of these fall into the category of topicalization, which is just a fancy linguistics term for agreeing with someone about what you're going to talk about (and by exclusion, what you're probably not going to talk about). This happens on many levels in language. On a high level, we divide ourselves up into various subcultures that are interested in various subtopics and establish sublanguages that talk primarily about those subtopics. The lingo of the doctor's office ("indissoluable asphyxiant") is different from the lingo of the chocolate factory ("everlasting gobstopper"). Most of us automatically switch contexts as we go from one lingo to another.
On a conversational level, the context switch has to be more explicit, so our language gives us many ways of saying what we're about to say. We put titles on our books and headers on our sections. On our sentences, we put quaint phrases like "In regard to your recent query" or "For all X". Usually, though, we just say things like, "You know that dangley thingy that hangs down in the back of your throat?"
Perl also has several ways of topicalizing. One important topicalizer is the package declaration. Suppose you want to talk about Camels in Perl. You'd likely start off your Camel module by saying:
This has several notable effects. One of them is that Perl will assume from this point on that any unspecified verbs or nouns are about Camels. It does this by automatically prefixing any global name with the module name "Camel::". So if you say:package Camel;
then the real name of $fido is $Camel::fido (and the real name of &fetch is &Camel::fetch, but we're not talking about verbs yet). This means that if some other module says:package Camel; $fido = &fetch();
Perl won't get confused, because the real name of this $fido is $Dog::fido, not $Camel::fido. A computer scientist would say that a package establishes a namespace. You can have as many namespaces as you like, but since you're only in one of them at a time, you can pretend that the other namespaces don't exist. That's how namespaces simplify reality for you. Simplification is based on pretending. (Of course, so is oversimplification, which is what we're doing in this chapter.)package Dog; $fido = &fetch();
Now it's important to keep your nouns straight, but it's just as important to keep your verbs straight. It's nice that &Camel::fetch is not confused with &Dog::fetch within the Camel and Dog namespaces, but the really nice thing about packages is that they classify your verbs so that other packages can use them. When we said:
we were actually invoking the &new verb in the Camel package, which has the full name of &Camel::new. And when we said:$fido = new Camel "Amelia";
we were invoking the &Camel::saddle routine, because $fido remembers that it is pointing to a Camel. This is how object-oriented programming works.$fido->saddle();
When you say package Camel, you're starting a new package. But sometimes you just want to borrow the nouns and verbs of an existing package. Perl lets you do that with a use declaration, which not only borrows verbs from another package, but also checks that the module you name is loaded in from disk. In fact, you must say something like:
before you say:use Camel;
because otherwise Perl wouldn't know what a Camel is.$fido = new Camel "Amelia";
The interesting thing is that you yourself don't really need to know what a Camel is, provided you can get someone else to write the Camel module for you. Even better would be if someone had already written the Camel module for you. It could be argued that the most powerful thing about Perl is not Perl itself, but CPAN (Comprehensive Perl Archive Network), which contains myriads of modules that accomplish many different tasks that you don't have to know how to do. You just have to download it and know how to say:
and then use the verbs from that module in a manner appropriate to the topic under discussion.use Some::Cool::Module;
So, like topicalization in a natural language, topicalization in Perl "warps" the language that you'll use from there to the end of the program. In fact, some of the built-in modules don't actually introduce verbs at all, but simply warp the Perl language in various useful ways. These special modules we call pragmas. For instance, you'll often see people use the pragma strict, like this:
What the strict module does is tighten up some of the rules so that you have to be more explicit about various things that Perl would otherwise guess about, such as how you want your variables to be scoped. Making things explicit is helpful when you're working on large projects. By default Perl is optimized for small projects, but with the strict pragma, Perl is also good for large projects that need to be more maintainable. Since you can add the strict pragma at any time, Perl is also good for evolving small projects into large ones, even when you didn't expect that to happen. Which is usually.use strict;
As is typical of your typical imperative computer language, many of the verbs in Perl are commands: they tell the Perl interpreter to do something. On the other hand, as is typical of a natural language, the meanings of Perl verbs tend to mush off in various directions depending on the context. A statement starting with a verb is generally purely imperative and evaluated entirely for its side effects. (We sometimes call these verbs procedures, especially when they're user-defined.) A frequently seen built-in command (in fact, you've seen it already) is the print command:
This has the side effect of producing the desired output:print "Adam's wife is $wife{'Adam'}.\n";
Adam's wife is Eve.
But there are other "moods" besides the imperative mood. Some verbs are for asking questions and are useful in conditionals such as if statements. Other verbs translate their input parameters into return values, just as a recipe tells you how to turn raw ingredients into something (hopefully) edible. We tend to call these verbs functions, in deference to generations of mathematicians who don't know what the word "functional" means in normal English.
An example of a built-in function would be the exponential function:
But Perl doesn't make a hard distinction between procedures and functions. You'll find the terms used interchangeably. Verbs are also sometimes called operators (when built-in), or subroutines (when user-defined).[9] But call them whatever you like--they all return a value, which may or may not be a meaningful value, which you may or may not choose to ignore.$e = exp(1); # 2.718281828459 or thereabouts
[9] Historically, Perl required you to put an ampersand character (&) on any calls to user-defined subroutines (see $fido = &fetch(); earlier). But with Perl version 5, the ampersand became optional, so that user-defined verbs can now be called with the same syntax as built-in verbs ($fido = fetch();). We still use the ampersand when talking about the name of the routine, such as when we take a reference to it ($fetcher = \&fetch;). Linguistically speaking, you can think of the ampersand form &fetch as an infinitive, "to fetch", or the similar form "do fetch". But we rarely say "do fetch" when we can just say "fetch". That's the real reason we dropped the mandatory ampersand in Perl 5.
As we go on, you'll see additional examples of how Perl behaves like a natural language. But there are other ways to look at Perl too. We've already sneakily introduced some notions from mathematical language, such as subscripts, addition, and the exponential function. But Perl is also a control language, a glue language, a prototyping language, a text-processing language, a list-processing language, and an object-oriented language. Among other things.
But Perl is also just a plain old computer language. And that's how we'll look at it next.
Copyright © 2002 O'Reilly & Associates. All rights reserved.