CS270 Language Blog - Perl: February 2020

Thursday, February 27, 2020

Binary Search

A binary search is an algorithm for searching for an item in a sorted list. It can be implemented in Perl for numeric data as follows:

sub binarySearch
{
	# $data will be an array reference
	my $data = shift;
	my $target = shift;
	my $start = shift || 0;
	my $end = shift || $#$data;
	# if start and end pointers cross over, $data does not contain $target
	return -1 if ($end < $start);
	# Perl always does floating point division,
	# so we have to tell it to convert to an integer
	my $mid = $start + int(($end - $start) / 2);
	if ($data->[$mid] == $target)
	{
		return $mid;
	}
	elsif ($data->[$mid] > $target)
	{
		return binarySearch($data, $target, $start, $mid - 1);
	}
	else
	{
		return binarySearch($data, $target, $mid + 1, $end);
	}
}

This subroutine handles the retrieval of arguments in a slightly different way than the subroutines we saw in the previous post. The shift function is used to remove and return the element at index 0 of a specified array or, if no array is specified, of @_. If shift is called on an empty array, it returns undef. This behavior is taken advantage of to make the start and end arguments optional—the program will first shift a value from @_ and evaluate it. If it is a true value, the short-circuit behavior of the logical or operator causes the value that was returned from shift to be returned by the logical or operator and thus assigned to the variable. If the shift returned undef, which is a false value, the logical or operator will then evaluate and return the second value, thus providing default values for $start and $end that will be used if the caller does not supply such values.

Also note that we can access the value at a particular index of an array to which we have a reference by using the arrow operator -> followed by the index we want to access. This is similar to how we called subroutines from a reference in the previous post.

Two additional items to note pertain to the use of this subroutine by the caller. Firstly, the input array to which $data is a reference must be sorted—this is a requirement of the binary search algorithm. Secondly, because we have not specified the number and types of the arguments we are expecting (this is because we make some of the arguments optional, which cannot be done if the number and types of arguments are specified), Perl will not automatically convert an array into an array reference before passing it as an argument to the subroutine—it is the responsibility of the caller to do so. An example of a complete program that shows how this subroutine would be called is shown below:

sub binarySearch
{
	# $data will be an array reference
	my $data = shift;
	my $target = shift;
	my $start = shift || 0;
	my $end = shift || $#$data;
	# if start and end pointers cross over, $data does not contain $target
	return -1 if ($end < $start);
	# Perl always does floating point division,
	# so we have to tell it to convert to an integer
	my $mid = $start + int(($end - $start) / 2);
	if ($data->[$mid] == $target)
	{
		return $mid;
	}
	elsif ($data->[$mid] > $target)
	{
		return binarySearch($data, $target, $start, $mid - 1);
	}
	else
	{
		return binarySearch($data, $target, $mid + 1, $end);
	}
}

my @rawData = ();
push @rawData, int(rand(100)) for (1..500);
# input data to a binary search must be sorted
my @sortedData = sort { $a <=> $b } @rawData;
my $target;
until (defined $target)
{
    print "Enter a number 0-99: ";
    my $usrInput = <STDIN>;
    chomp $usrInput;
    unless ($usrInput =~ /^\d{1,2}$/)
    {
        print "$usrInput is not between 0-99. Please enter a number between 0-99.\n";
        next;
    }
    $target = $usrInput;
}
my $result = binarySearch(\@sortedData, $target);
if ($result == -1)
{
	print "$target was not found in the data.\n";
}
else
{
	print "$target was found at index $result of the data.\n";
}

Notice how the @rawData array is sorted to create the @sortedData array, and it is this array that is passed to binarySearch. The { $a <=> $b } in the invocation of the sort function instructs Perl to perform a numeric sort (the default sort treats the array’s elements as strings and sorts them lexicographically). Also notice how, when binarySearch is called, it is not @sortedData itself that is passed as an argument but rather a reference to it (created by prefixing a backslash). Finally, notice how only two arguments—the data to be searched and the value to search for—are passed to binarySearch when it is called by the main program. Since no values for them have been specified by the caller, the $start and $end variables will take on the default values specified in the binarySearch subroutine—in this case, the first and last indices of the data.

Saturday, February 22, 2020

Subroutines

A subroutine is a block of code with a defined name that, instead of being executed immediately when it is encountered in the program, is stored for later use. The block of code can then be run (potentially multiple times) by using the name defined for it later in the program.

A Note About Terminology

Many languages refer to what Perl calls subroutines as functions. In Perl, a function is something that is built into Perl, such as chomp that we’ve used in earlier posts to strip the trailing newline from user input, whereas a subroutine is written by the user.

A Simple Subroutine with No Parameters or Return Value

Subroutines in Perl are defined using the keyword sub, followed by the name of the subroutine, followed by the block of code to be executed when the subroutine is called. For example, the following subroutine, when called, will print the digits 0-9 to the screen, each on their own line:

sub printDigits
{
    print "$_\n" for (0..9);
}

The Argument Array

Okay, technically I lied when I said that the printDigits method above has no parameters. Because I haven’t explicitly told Perl what types of parameters I’m looking for, Perl will allow me to pass in as many arguments as I want of whatever types I want when I go to call this subroutine later in the program. The arguments are stored in the variable @_. For example, this subroutine, when called, will print out any arguments passed to it as a comma-separated list:

sub commaSeparated
{
    my @args = @_;
    print "$args[$_], " for (0..($#args - 1));
    print $args[$#args];
}

Notice how the subroutine starts by unpacking @_ into a local variable, @args. As with $_ in the context of loops, @_ is just an alias to the actual arguments: copying the arguments into a local variable and accessing them using that local variable instead of @_ ensures that, if we modify one of the arguments, those modifications are not reflected outside of the subroutine.

Returning a Value

Subroutines can also return a value back to the caller using the keyword return followed by the value to be returned. For example, the following subroutine adds up the arguments passed to it and returns their sum:

sub add
{
    my @args = @_;
    my $result = 0;
    $result += $_ for (@args);
    return $result;
}

Specifying Parameters

Perl allows you to specify the number and type of parameters to be passed to a subroutine. This is done by placing sigils in parentheses after the subroutine’s name. For example, the following division function takes two scalar values as its arguments:

sub divide($$)
{
    my ($dividend, $divisor) = @_;
    return $dividend / $divisor;
}

Unfortunately, it is not possible to specify the type any more specifically than by its sigil (so you can’t require that the arguments be numeric values). Also note that, even when the number and type of arguments is specified, the arguments are still stored in @_: they cannot be named in the subroutine declaration, only by unpacking them into local variables as is done on the first line of the divide function shown above.

Passing Arrays and Hashes to a Subroutine Using References

Arrays in Perl are automatically flattened: if an array “contains” another array, the contents of the inner array are unpacked into the outer array, meaning each element of the former inner array is treated as a single element of the outer array, and it is impossible to determine just by looking at the outer array where the start and end of the inner array were. Likewise, if an array “contains” a hash, the hash is flattened into the array, destroying the key-value associations in the process. Because the arguments of a subroutine are passed to it as an array, this means that special measures must be taken in order to pass an array or hash as an argument to a subroutine without losing its structure.

To specify an array or hash in the parameter list, prepend a backslash \ to the appropriate sigil. When the subroutine is called, a reference to the array or hash passed as an argument is placed in @_. It can then be dereferenced into a local variable by using the appropriate sigil for an array or hash, followed by the reference as accessed from @_ enclosed in curly braces { }. For example, the following subroutine takes an array and a hash and returns an array containing the elements of the array that are keys in the hash:

sub findKeys(\@\%)
{
    my @searchingFor = @{$_[0]};
    my %hashToSearch = %{$_[1]};
    my @found = ();
    for (@searchingFor)
    {
        push @found, $_ if (exists $hashToSearch{$_});
    }
    return @found;
}

It is also possible to store the references themselves to local variables, which would carry the scalar sigil $, and work with the references directly, dereferencing them each time they are used. For example,

sub findKeys(\@\%)
{
    my ($searchingFor, $hashToSearch) = @_;
    my @found = ();
    for (@$searchingFor)
    {
        push @found, $_ if (exists $$hashToSearch{$_});
    }
    return @found;
}

Note that when the reference is stored in a scalar variable instead of at an index in an array, it is not necessary to surround it with curly braces when dereferencing it (although the curly braces can still be used if one so desires). Also note that when the references are used directly instead of being dereferenced into a local variable, any changes made to the contents of the array or hash being referenced will continue to be visible after the function has returned. So, for example, the following subroutine takes an array and a hash and removes from the hash all key-value pairs for which the key is contained in the array:

sub removeKeys(\@\%)
{
    my ($keysToRemove, $hashToPrune) = @_;
    delete $$hashToPrune{$_} for (@$keysToRemove);
}

A simple program using this subroutine is shown below:

sub removeKeys(\@\%)
{
    my ($keysToRemove, $hashToPrune) = @_;
    delete $$hashToPrune{$_} for (@$keysToRemove);
}

sub printHash(\%)
{
    my %hash = %{$_[0]};
    print "$_ => $hash{$_}\n" for (keys %hash);
}

my @searchTerms = ("foo", "bar", "baz");
my %searchingIn = (
    foo => 3,
    bar => 7,
    qux => 9
);

printHash(%searchingIn);
removeKeys(@searchTerms, %searchingIn);
print "After calling removeKeys:\n"
printHash(%searchingIn);

This program produces the following output:

bar => 7
qux => 9
foo => 3
After calling removeKeys:
qux => 9

This output also demonstrates an important fact to note about the behavior of hashes: they are unordered. When you iterate over a hash, the only thing you are guaranteed is that every key-value pair will be generated exactly once—Perl makes no guarantees about the order in which they are generated.

Subroutine References

Recall that in the post on conditional statements, we used a hash to substitute for an extended if-elsif-else chain to determine which of several strings to print. What if we have an extended if-elsif-else chain where the operations to be performed are more complicated than just printing some string? Can we still use a hash instead of an if-elsif-else chain? The answer is yes—we do it by storing references to subroutines as the hash&rsuqo;s values.

A reference to a subroutine is created by prepending \& to the name of the subroutine. A subroutine reference is never followed by a list of arguments—the arguments will be supplied when we dereference the subroutine and call it. Calling a subroutine from a reference is done by using the reference, followed by the arrow operator ->, followed by the argument list. Consider the following program:

sub getNumberInput
{
    while (1)
    {
        print "Enter a number: ";
        my $usrInput = <STDIN>;
        chomp $usrInput;
        return $usrInput if ($usrInput ~= /^-?\d+(\.\d+)?$/);
        # if user input was valid, subroutine will have returned on the previous line
        # and so this line will not be executed
        print "$usrInput is not a valid number. Please enter a number.\n";
    }
}

sub add($$)
{
    my ($a, $b) = @_;
    return $a + $b;
}

sub subtract($$)
{
    my ($a, $b) = @_;
    return $a - $b;
}

sub multiply($$)
{
    my ($a, $b) = @_;
    return $a * $b;
}

sub divide($$)
{
    my ($a, $b) = @_;
    return $a / $b;
}

sub mod($$)
{
    my ($a, $b) = @_;
    return $a % $b;
}

sub exp($$)
{
    my ($a, $b) = @_;
    return $a ** $b;
}

my %options = (
    addition => \&add,
    subtraction => \&subtract,
    multiplication => \&multiply,
    division => \&divide,
    modulo => \&mod,
    exponentiation => \&exp
);

my $first = getNumberInput();
my $second = getNumberInput();
my $operation;
until (exists $options{$operation})
{
    print "That operation is not supported.\n" if (defined $operation);
    print "Enter an operation: ";
    $operation = <STDIN>;
    chomp $operation;
}
my $result = $options{$operation}->($first, $second);
print "The result is $result.\n";

The second-to-last line is the one that is of interest to us. We retrieve a subroutine reference from the %options hash and use the arrow operator to simultaneously dereference it and call it with $first and $second as its arguments.

Saturday, February 15, 2020

Do It Again – Loops

Loops are the way in which Perl allows the same code to be executed multiple times. There are three main types of loops: for, while, and do-while. In order to understand the first type of loop in Perl—the for loop—we must first discuss a new data type—arrays.

Arrays

An array is a list of values. Array variables are declared using the sigil @. The listing of the array’s contents is bounded by parentheses, and the values are separated from one another by commas. For example, we could declare my @friends = ("Joe Smith", "Frank Jones", "Linda Brown");. When accessing the values in the array, the name of the array variable is prefixed by the scalar sigil $ (since the value to be retrieved is eventually a scalar) and followed by the desired value’s index (the position in the array at which the desired value appears) enclosed in square brackets [ ]. Note that array indices in Perl begin with 0. So, for example, to get "Joe Smith" out of the array declared earlier, we would use $friends[0].

It is also possible to access a slice (sub-array) of the array by placing a range inside the square brackets. A range is declared by giving the first and last numbers desired to be included in the range separated by two dots, for example 1..5. In this case, since the value that will eventually be retrieved is also an array, we would prefix the array variable with the array sigil @. So, for example,

my @months = ("January", "February", "March", "April", "May", "June",
              "July", "August", "September", "October", "November", "December");
my @secondQuarter = @months[3..5];
print "The months in the second quarter are @secondQuarter";

produces the output The months in the second quarter are April May June. Note that ranges are themselves arrays, so it is perfectly valid to say, for example, my @foo = 4..8;.

IMPORTANT: Ranges are inclusive at both ends. The declaration above is equivalent to my @foo = (4, 5, 6, 7, 8);, not my @foo = (4, 5, 6, 7); as a programmer coming from a language such as Java might expect.

An array’s length (the number of elements it contains) can be accessed by prefixing the scalar sigil $ to the name of the array variable. For example, to get the length of the months array from earlier, we would use $months, which would evaluate to 12. Prefixing $# to the name of an array variable gives the value of the last index of that array variable (i.e. one less than the length). So, for example, $#months would evaluate to 11.

The `for` Loop

The for loop is used to iterate over the contents of some array (which could be a range). By default, the current element in the loop is aliased to the variable name $_. So, for example, we can print out the numbers 0-9 inclusive, each on their own line, with the following code:

for (0..9)
{
    print "$_\n";
}

We can also specify our own alias for the current loop element by placing it after the for and before the opening parenthesis. For example,

for my $i (0..9)
{
    print "$i\n";
}

produces exactly the same output as the version using $_.

We can also use an array variable to iterate over in a for loop. For example, we can print out the months of the year, each on their own line, using the following code:

my @months = ("January", "February", "March", "April", "May", "June",
              "July", "August", "September", "October", "November", "December");
for (@months)
{
    print "$_\n";
}

WARNING: Whether you define your own name for the loop variable or use $_, it is just an alias for the current position in the array being iterated over. You can do anything with the alias that you could do with $someArray[$whateverIndex], including modify the contents of the array. For example,

my @nums = 0..4;
for (@nums)
{
    $_++;
}
print "My nums are @nums";

produces the output My nums are 1 2 3 4 5. The modification to the loop variable does persist outside the loop, unlike in languages such as Java where the loop variable is merely a shared reference to the contents of the current position in the array being iterated over.

As with the simple if and unless statements, when the code to be executed by a for loop is a single line, the for can be placed after that line, for example print "$_\n" for (0..9). Unlike with the “normal” for loop, you cannot define your own alias for the loop variable when using the postfix syntax; you must use $_.

The `while` and `do`-`while` Loops

Instead of taking an array and iterating over it, while and do-while loops repeat the code they contain indefinitely until a supplied condition is no longer true. For example, the following code accepts a number as input from the user and decrements that number for as long as it remains positive:

print "Enter a positive number: ";
my $selected = <STDIN>;
chomp $selected;
while ($selected > 0)
{
    print $selected--, "\n";
}

Running this program with 8 as the user input produces the following:

Enter a positive number: 8
8
7
6
5
4
3
2
1

Note that the loop condition is checked at the beginning of the loop. This means it is possible (if the user enters, say, -6 as input) that the code inside the loop will not execute at all. If we want to guarantee that the code in the loop executes at least once, we use a do-while loop, which checks the loop condition at the end of the loop. For example, rewriting the above program to use a do-while loop gives the following:

print "Enter a positive number: ";
my $selected = <STDIN>;
chomp $selected;
do
{
    print $selected--, "\n";
} while ($selected > 0);

Now, even if we were to provide as input a non-positive number such as -6, the loop would still execute at least once, even though its loop condition is initially false:

Enter a positive number: -6
-6

Just as unless (condition) is equivalent to if (not condition), until (condition) is equivalent to while (not condition). So the first version of the above program could be written as

print "Enter a positive number: ";
my $selected = <STDIN>;
chomp $selected;
until ($selected <= 0)
{
    print $selected--, "\n";
}

and the second version as

print "Enter a positive number: ";
my $selected = <STDIN>;
chomp $selected;
do
{
    print $selected--, "\n";
} until ($selected <= 0);

Exiting Early and Skipping Iterations

Suppose we have a long array of numbers, and we want to find the index of the first occurrence of some particular number in our array. As soon as we find it, we don’t need to keep looking at the rest of the array. To exit a for loop before we reach the end of the array, we use the keyword last. The last keyword can also be used to exit a while loop even while the loop condition is still true. In a similar fashion, if we want to skip a value in the array being iterated over by a for loop, or if we want to skip the rest of a while loop’s body and reevaluate its condition, we use the keyword next. An example of the usage of last and next is shown below:

my @bigData = ();
push @bigData, int(rand(100)) for (1..500);
my $loc = -1;
my $target;
until (defined $target)
{
    print "Enter a number 0-99: "
    my $usrInput = <STDIN>;
    chomp $usrInput;
    if ($usrInput =~ /\D/ or $usrInput < 0 or $usrInput > 99)
    {
        print "$usrInput is not between 0-99. Please enter a number between 0-99.\n";
        next;
    }
    $target = $usrInput;
}
for my $i (0..$#bigData)
{
    if ($bigData[$i] == $target)
    {
        $loc = $i;
        last;
    }
}
if ($loc == -1)
{
    print "The data does not contain the value $target\n";
}
else
{
    print "The first occurrence of $target is at index $loc\n";
}

This program introduces a few new elements. The push function adds a value to the end of an array. The rand function generates a random number that is at least 0 and less than its argument, and the int function rounds its argument down to the next smaller integer. So the combination of the two, int(rand($someNumber)), produces a random integer in the range 0..($someNumber - 1). The expression $usrInput =~ /\D/ checks to see whether $usrInput contains any non-numeric characters. It does this using what’s called a regular expression, which we’ll discuss in more detail in a future post. The reason this is necessary is because of the way in which Perl implicitly converts between strings and numeric data types. When a string that begins with a non-numeric character is used in a numeric context (such as being compared to a number using the < and > operators), it is implicitly converted to 0, which is within our range of valid inputs and thus will be accepted by the program as if the user had actually typed 0. To avoid this false positive, we have to explicitly check whether the input is non-numeric.

Now let’s look at how the next and last statements are being used. In our until loop, we ask the user for a number and check to see if it is valid. If it is not, we print an error message and invoke next. This causes the last line of code in the loop, which assigns the user input to the $target variable, to be skipped over—we don’t want to store an invalid user input as the value we’re going to try to search for. Thus the loop exit condition, that $target has a defined value, remains false, and so we prompt the user for another input.

Once we have a valid input, we start searching the @bigData array for the target value provided by the user, which we do using a for loop (remember that $#bigData represents the last index in @bigData). As soon as we find our target value, we store the index we found it at and then invoke last. This causes the loop to immediately exit, and the program continues execution with the next line of code outside the loop (in this case, the line if ($loc == -1)). We do this because, once we’ve found the first occurrence of the target value, we have the information we need—continuing to search through the rest of the array would only waste time and resources. An example run of what the user would see is shown below:

Enter a number 0-99: foo
foo is not between 0-99. Please enter a number between 0-99.
Enter a number 0-99: -6
-6 is not between 0-99. Please enter a number between 0-99.
Enter a number 0-99: 104
104 is not between 0-99. Please enter a number between 0-99.
Enter a number 0-99: 47
The first occurrence of 47 is at index 274

Saturday, February 8, 2020

Conditional Statements

Comparison Operators

Since most conditional switching is based on comparing the values of two variables, we first need to know how to do those comparisons. Perl provides two sets of comparison operators—one set for comparing numeric values, another set for comparing string values.

Comparison	Numeric Operator	String Operator
Equals	`==`	`eq`
Does Not Equal	`!=`	`ne`
Is Greater Than	`>`	`gt`
Is Greater Than or Equal To	`>=`	`ge`
Is Less Than	`<`	`lt`
Is Less Than or Equal To	`<=`	`le`

Perl does not define a boolean (true/false) data type. Instead, the comparison operators return 1 if the comparison is true and undef if false. undef, short for “undefined”, is a special value that is treated as 0 when used in a numeric context and "" (the empty string) when used in a string context.

Logical Operators

The logical operators, and and or are used to evaluate multiple conditions simultaneously. $a and $b evaluates to true if $a and $b each individually evaluate to true, and evaluates to false otherwise. $a or $b evaluates to true if at least one of $a and $b individually evaluates to true, and evaluates to false only if both individually evaluate to false. and and or can also be represented symbolically as && and ||, respectively.

The logical operator not is used to reverse the value of the condition it precedes. So not $a is false if $a is true, and true if $a is false. not can also be represented symbolically as !.

A Word on Truth and Falsity

As mentioned in the discussion of comparison operators, Perl does not define explicit true and false values. For the purposes of logical operations, the following values are treated as false:

0
0.0
00 # 0 in octal
0b0 # 0 in binary
0x0 # 0 in hexadecimal
"" # the empty string
'0' # the string containing 0
() # the empty list
undef # the undefined value

All other values are treated as true.

Note that in the above, # introduces a single-line comment: everything after the # until the end of the line is ignored by the Perl interpreter. Also note that '0' evaluates to false. This is because of the implicit conversion from string to numeric values mentioned in the previous post.

Conditional Statements

Conditional Statements Generally

Conditional statements test the truth or falsity of a condition and, if the condition evaluates to the desired truth or falsity, execute the code in the following block. Code blocks in Perl begin with { and end with }. The conditional statement must always be followed by a code block, even if there is only one line of code to be executed if the condition evaluates to the desired truth value. So

if ($foo)
    print "blah blah blah";

is a syntax error: unlike in languages such as Java and C++, this code must be written as

if ($foo)
{
    print "blah blah blah";
}

even though there is only one line of code that is dependent on the conditional.

`if` and `unless`

The two simplest conditional statements are if, which executes the block of code that follows it if the given condition is true, and unless, which executes the block of code that follows it if the given condition is false (so unless ($foo) is the same as if (not $foo)). So, for example,

my $foo = 7;
if ($foo > 5)
{
    print "foo is big\n";
}
unless ($foo % 2 == 0)
{
    print "foo is odd\n";
}

produces the output

foo is big
foo is odd

In this simplest case of an if or unless that is used to execute a single statement, and does not have an attached else or elsif clause (we’ll discuss those in a moment), we can avoid having to create a code block by placing the if or unless after the statement we want executed. So the following program is the exact same as the one above:

my $foo = 7;
print "foo is big\n" if ($foo > 5);
print "foo is odd\n" unless ($foo % 2 == 0);

`else` and `elsif`

An else clause can be placed after an if or unless statement, and the code in the else clause is executed if the code in the if or unless is not. So

my $foo = 7;
if ($foo < 5)
{
    print "foo is small\n";
}
else
{
    print "foo is big\n";
}

unless ($foo % 2 != 0)
{
    print "foo is even\n";
}
else
{
    print "foo is odd\n";
}

once again produces the output

foo is big
foo is odd

To chain two or more conditions together in this way (if the first condition isn’t fulfilled, check the second condition and execute its code if it is fulfilled, otherwise check the next condition, and so on and so forth until some code is executed if none of the conditions are fulfilled), the conditions after the first are stated using the keyword elsif (there is no elsunless— to get that behavior, you would have to nest an unless clause inside an else block). So, for example,

my $foo = 7;
if ($foo < 5)
{
    print "foo is small";
}
elsif ($foo < 10)
{
    print "foo is medium";
}
else
{
    print "foo is big";
}

produces the output foo is medium. Note that a final else clause is not required; if it is absent, the program will simply do nothing if none of the conditions are fulfilled. So, for example,

my $foo = 13;
if ($foo < 5)
{
    print "foo is small";
}
elsif ($foo < 10)
{
    print "foo is medium";
}

produces no output because neither of the conditions were fulfilled. Also note that the fulfilling of one condition meets that the following conditions are not checked. For example, if the value of $foo in the above program were 3, the program would produce the output foo is small. It would not also print foo is medium, even though $foo < 10 is true, because as soon as one of the conditions is fulfilled, the rest of the if-elsif-else chain is bypassed.

Digression: Hashes

A hash is a built-in data type in Perl that associates keys with values. For example, a hash might be used like a contacts list to associate names with email addresses. Hash variables are declared using the sigil %. The listing of the hash’s contents is bounded by parentheses. Keys are separated from values using the so-called “fat comma” operator, =>, and key-value pairs are separated from each other by commas. For example,

my %contacts = (
	"Joe Smith" => 'jsmith@aol.com',
	"Frank Jones" => 'fjones@yahoo.com',
	"Linda Brown" => 'lbrown@hotmail.com'
);

When accessing the values in a hash, the name of the hash variable is prefixed by the sigil $ for a scalar (since the value that is eventually retrieved is a scalar), followed by the name of the key enclosed in curly braces { }. For example, using the hash declared above, print $contacts{"Joe Smith"}; produces the output jsmith@aol.com.

The builtin function exists can be used to check whether a hash contains a value for a particular key. Still using the hash declared above, exists $contacts{"Paul Williams"} would evaluate to false, since %contacts does not contain a value for the key "Paul Williams".

Using Hashes as an Alternative to Extended `if`-`elsif`-`else` Chains

Suppose we want to write a program that takes as input from the user a number between 1 and 10, inclusive, and prints out that number as a word. We could use an if-elsif-else chain: first check if the user input 1, then check 2, then 3, and so and so forth until an error message is printed if the user’s input isn’t a number 1-10. But this long of a chain can get cumbersome very quickly. Is there any way to shorten the code? Yes—we use a hash to associate the numbers with their corresponding words. The code for this program looks like this:

my %numbersSpelledOut = (
	1 => "one",
	2 => "two",
	3 => "three",
	4 => "four",
	5 => "five",
	6 => "six",
	7 => "seven",
	8 => "eight",
	9 => "nine",
	10 => "ten"
);
print "Enter a number 1-10: ";
my $userInput = <STDIN>;
chomp $userInput;
unless (exists $numbersSpelledOut{$userInput})
{
	print "I don't have the name for that number.\n";
}
else
{
	print "$numbersSpelledOut{$userInput}\n";
}

We see two new elements in this code. First, <STDIN> is an instruction to get input from the keyboard. The user types their input into the console and presses Enter to submit. Unfortunately for the programmer, the user pressing Enter to submit causes a newline character to be appended to the input string that is stored to $userInput. This is where the second new element comes in. The chomp function strips the trailing newline and stores the result back to the same variable. So, for example, if the user inputs 7, the program runs as follows:

Enter a number 1-10: 7
seven

The key-value pairs stored in the hash take care of what would have been the if and elsif blocks in the chain. What would have been the else clause is handled by the unless exists check. For example, if the user enters 13, the program runs as follows:

Enter a number 1-10: 13
I don't have the name for that number.

The program checks to see whether %numbersSpelledOut contains a value for the key 13 and, finding that it does not, prints the error message.

The Ternary Conditional Operator

One of the more common uses of conditionals is to set variables. For example, the following code sets $max to the larger of $a and $b:

my $a = 4;
my $b = 7;
my $max;
if ($a > $b)
{
    $max = $a;
}
else
{
    $max = $b;
}

The ternary conditional operator can be used to shorten this if-else construct to a single statement. It is written as $testCondition? $valueIfTrue : $valueIfFalse. So the above example could be rewritten as

my $a = 4;
my $b = 7;
my $max = ($a > $b)? $a : $b;

An unusual feature of Perl is that it allows the ternary conditional to be used on the left side of an assignment operator to determine which variable a value is to be assigned to. For example, this program assigns the larger of $a and $b to $max and the smaller of the two values to $min:

my $a = 4;
my $b = 7;
my $min;
my $max;
($a > $b)? $max : $min = $a;
(defined $max)? $max : $min = $b;

The defined function used in the last line of this code snippet checks whether a value has been assigned to the specified variable.

Saturday, February 1, 2020

Constants and Variables and Operators, Oh My!

Declaring a Variable

Sigils

Every variable name in Perl is prefixed by what is known as a sigil, which denotes in very broad terms the category of data being stored by the variable. Note that I use category rather than type. The most commonly encountered sigil, $, is used for any of the scalar data types, i.e. data types that store only a single value, such as strings, integers, floating-point numbers, etc. The sigil @ is used for arrays, and the sigil % for hashes, but we won’t be working with those just yet.

Identifiers

The identifier is the part of the variable that comes after the sigil—the variable’s actual name. The identifier can consist of any combination of uppercase letters, lowercase letters, digits, and the underscore (_) character, with the sole restriction that the first character of the identifier cannot be a digit.

Scope

The first time a variable name is used, its scope must be specified. This is done by placing one of two keywords in front of the sigil and identifier. Local variables (those that persist only in the current code block) are declared using the keyword my. For example, to declare a local variable named foo and assign it the value 7, we would write my $foo = 7;. On future uses of this variable within the same code block, we would simply use the sigil and identifier by themselves, for example $foo = 5;. As soon as we exit the current code block, $foo disappears.

Global variables do not disappear at the end of the current code block, but instead persist throughout the entire program. Global variables are declared using the keyword our. So to declare a global variable named foo and assign it the value 7, we would write our $foo = 7;. As with local variables, we would use only the sigil and identifier on future uses of this variable. Because they persist throughout the program, global variables can create potential problems and should therefore be used sparingly.

Constants

Perl has no built-in construct for declaring constants—that is, variables whose values cannot be changed once they have been declared. However, variables whose values are intended not to change once they have been declared are conventionally given identifiers consisting of uppercase letters and underscores if necessary to separate words. For example, the number of centimeters in one inch might be declared as our $CM_PER_INCH = 2.54;. While Perl won’t prevent us from changing this value later, the fact that its identifier is in all caps tells us we probably shouldn’t.

As an aside, declaring global constants is not nearly as problematic as declaring global non-constant variables, since the value associated with a constant is expected to remain the same throughout the entire program.

Data Types

Numeric Data Types

Perl treats integers and floating-point numbers equally and will freely convert between them. So, from Perl’s point of view, there is no difference between 7 (integer) and 7.0 (floating-point). Perl allows the use of the underscore as a thousands separator in numeric literals, for example my $foo = 1_000.0; is the same as my $foo = 1000.0. Additionally, integers can be declared in hexadecimal by prefixing 0x, in octal by prefixing 0, and in binary by prefixing 0b. So, for example, 0x20, 040, and 0b100000 all evaluate to 32.

Strings

Strings are how Perl stores text. Strings can be delimited either by single quotes (') or double quotes ("), and the two types of strings work slightly differently.

Single-Quoted Strings

Strings delimited by single quotes are treated as raw text. Escape sequences (such as \n for newline) within a single-quoted string are not processed. For example, print 'Hello, World!\n'; produces the output Hello, World!\n. The \n is printed to the screen as-is rather than being converted to a newline character. The sole exception to this is the escape sequence \', which is used to allow the single-quote character to appear within a string delimited by single quotes.

Double-Quoted Strings

Escape sequences appearing in strings delimited by double quotes are converted into the special characters they represent. So print "Hello, World!\nMore text on the next line"; produces the output

Hello, World!
More text on the next line

Additionally, double-quoted strings allow what is known as interpolation. When a variable name (complete with sigil) appears inside a string, the value of the variable is substituted into the string. So the code

my $foo = 7;
print "The value of foo is $foo";

produces the output The value of foo is 7. Note that the substitution occurs immediately when the string is evaluated and will not reflect subsequent changes to the value of the variable. So the code

my $foo = 7;
my $output = "The value of foo is $foo";
$foo = 12;
print $output;

also produces the output The value of foo is 7, because $foo was 7 at the time the string was evaluated. The subsequent change to the value of $foo does not cause the string to be updated to reflect the new value of $foo.

Operators

Arithmetic Operators

As one might expect, +, -, *, and / are used to add, subtract, multiply, and divide numbers. Because Perl treats integers and floating-point numbers equally, the / operator always performs floating-point division, unlike in many languages where the result of dividing two integers is always an integer. Perl also includes the exponentiation operator, **, which raises the first number to the power of the second, and the modulo operator, %, which gives the remainder when the first number is divided by the second. So the code

print 24 + 7, "\n";
print 24 - 7, "\n";
print 24 * 7, "\n";
print 24 / 7, "\n";
print 24 ** 7, "\n";
print 24 % 7, "\n";

produces the output

31
17
168
3.42857142857143
4586471424
3

All of the arithmetic operators work equally well with variables as they do with numeric literals. So, for example, the code

my $foo = 24;
my $bar = 7;
print $foo + $bar;

produces the output 31.

Compound Assignment Operators

Any of the six arithmetic operators listed above can be prefixed to the assignment operator, =, to produce what is known as a compound assignment operator. A compound assignment operator always takes a variable as its first argument. Its second argument can be a literal or another variable. The effect of a compound assignment operator is to apply the specified arithmetic operation and then store the result back into the variable provided as the first argument. So the code

my $foo = 7;
$foo **= 2;
print $foo;

produces the output 49.

Increment and Decrement Operators

One of the more common cases of modifying a variable’s value and storing the result back to the same variable is adding or subtracting 1 from the variable’s value. Perl provides special operators for doing this, known as the increment (++) and decrement (--) operators. Unlike the operators discussed above, the increment and decrement operators are unary—they work on only a single value. So, for example, the code

my $foo = 24;
my $bar = 7;
$foo++;
$bar--;
print "$foo\n$bar\n";

produces the output

25
6

The increment and decrement operators can also be placed before the variable (including sigil), i.e. ++$foo;. In the most common case, where the increment or decrement occurs on a line by itself, there is no difference between the two usages.

String Operators

Perl also provides two operators whose use is specific to strings. The string concatenation operator, which puts two strings together one after the other, is . Note that this is different from many languages, which use + for this purpose as well as for addition. The reason for this is that, in Perl, a string whose contents can be interpreted as a numeric literal can be implicitly converted to a number. So, for example, the code

my $foo = "6.5";
my $bar = "27";
print $foo + $bar, "\n";
print $foo . $bar, "\n";

produces the output

33.5
6.527

Notice how, when the + operator was used, the two strings were implicitly converted to numbers and treated as if they were numbers.

The second string operator is the repetition operator, x. It takes a string as its first argument and a number as its second argument, and it repeats the given string the specified number of times (if the number given is not an integer, it is rounded down; if the number given is negative, it is treated as 0). So, for example, the code print "foo"x3; produces the output foofoofoo.

Thursday, February 27, 2020

Binary Search

Saturday, February 22, 2020

Subroutines

A Note About Terminology

A Simple Subroutine with No Parameters or Return Value

The Argument Array

Returning a Value

Specifying Parameters

Passing Arrays and Hashes to a Subroutine Using References

Subroutine References

Saturday, February 15, 2020

Do It Again – Loops

Arrays

The for Loop

The while and do-while Loops

Exiting Early and Skipping Iterations

Saturday, February 8, 2020

Conditional Statements

Comparison Operators

Logical Operators

A Word on Truth and Falsity

Conditional Statements

Conditional Statements Generally

if and unless

else and elsif

Digression: Hashes

Using Hashes as an Alternative to Extended if-elsif-else Chains

The Ternary Conditional Operator

Saturday, February 1, 2020

Constants and Variables and Operators, Oh My!

Declaring a Variable

Sigils

Identifiers

Scope

Constants

Data Types

Numeric Data Types

Strings

Single-Quoted Strings

Double-Quoted Strings

Operators

Arithmetic Operators

Compound Assignment Operators

Increment and Decrement Operators

String Operators

The `for` Loop

The `while` and `do`-`while` Loops

`if` and `unless`

`else` and `elsif`

Using Hashes as an Alternative to Extended `if`-`elsif`-`else` Chains