Saturday, February 15, 2020

Do It Again – Loops

Loops are the way in which Perl allows the same code to be executed multiple times. There are three main types of loops: for, while, and do-while. In order to understand the first type of loop in Perl—the for loop—we must first discuss a new data type—arrays.

Arrays

An array is a list of values. Array variables are declared using the sigil @. The listing of the array’s contents is bounded by parentheses, and the values are separated from one another by commas. For example, we could declare my @friends = ("Joe Smith", "Frank Jones", "Linda Brown");. When accessing the values in the array, the name of the array variable is prefixed by the scalar sigil $ (since the value to be retrieved is eventually a scalar) and followed by the desired value’s index (the position in the array at which the desired value appears) enclosed in square brackets [ ]. Note that array indices in Perl begin with 0. So, for example, to get "Joe Smith" out of the array declared earlier, we would use $friends[0].

It is also possible to access a slice (sub-array) of the array by placing a range inside the square brackets. A range is declared by giving the first and last numbers desired to be included in the range separated by two dots, for example 1..5. In this case, since the value that will eventually be retrieved is also an array, we would prefix the array variable with the array sigil @. So, for example,

my @months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"); my @secondQuarter = @months[3..5]; print "The months in the second quarter are @secondQuarter";

produces the output The months in the second quarter are April May June. Note that ranges are themselves arrays, so it is perfectly valid to say, for example, my @foo = 4..8;.

IMPORTANT: Ranges are inclusive at both ends. The declaration above is equivalent to my @foo = (4, 5, 6, 7, 8);, not my @foo = (4, 5, 6, 7); as a programmer coming from a language such as Java might expect.

An array’s length (the number of elements it contains) can be accessed by prefixing the scalar sigil $ to the name of the array variable. For example, to get the length of the months array from earlier, we would use $months, which would evaluate to 12. Prefixing $# to the name of an array variable gives the value of the last index of that array variable (i.e. one less than the length). So, for example, $#months would evaluate to 11.

The for Loop

The for loop is used to iterate over the contents of some array (which could be a range). By default, the current element in the loop is aliased to the variable name $_. So, for example, we can print out the numbers 0-9 inclusive, each on their own line, with the following code:

for (0..9) { print "$_\n"; }

We can also specify our own alias for the current loop element by placing it after the for and before the opening parenthesis. For example,

for my $i (0..9) { print "$i\n"; }

produces exactly the same output as the version using $_.

We can also use an array variable to iterate over in a for loop. For example, we can print out the months of the year, each on their own line, using the following code:

my @months = ("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"); for (@months) { print "$_\n"; }

WARNING: Whether you define your own name for the loop variable or use $_, it is just an alias for the current position in the array being iterated over. You can do anything with the alias that you could do with $someArray[$whateverIndex], including modify the contents of the array. For example,

my @nums = 0..4; for (@nums) { $_++; } print "My nums are @nums";

produces the output My nums are 1 2 3 4 5. The modification to the loop variable does persist outside the loop, unlike in languages such as Java where the loop variable is merely a shared reference to the contents of the current position in the array being iterated over.

As with the simple if and unless statements, when the code to be executed by a for loop is a single line, the for can be placed after that line, for example print "$_\n" for (0..9). Unlike with the “normal” for loop, you cannot define your own alias for the loop variable when using the postfix syntax; you must use $_.

The while and do-while Loops

Instead of taking an array and iterating over it, while and do-while loops repeat the code they contain indefinitely until a supplied condition is no longer true. For example, the following code accepts a number as input from the user and decrements that number for as long as it remains positive:

print "Enter a positive number: "; my $selected = <STDIN>; chomp $selected; while ($selected > 0) { print $selected--, "\n"; }

Running this program with 8 as the user input produces the following:

Enter a positive number: 8 8 7 6 5 4 3 2 1

Note that the loop condition is checked at the beginning of the loop. This means it is possible (if the user enters, say, -6 as input) that the code inside the loop will not execute at all. If we want to guarantee that the code in the loop executes at least once, we use a do-while loop, which checks the loop condition at the end of the loop. For example, rewriting the above program to use a do-while loop gives the following:

print "Enter a positive number: "; my $selected = <STDIN>; chomp $selected; do { print $selected--, "\n"; } while ($selected > 0);

Now, even if we were to provide as input a non-positive number such as -6, the loop would still execute at least once, even though its loop condition is initially false:

Enter a positive number: -6 -6

Just as unless (condition) is equivalent to if (not condition), until (condition) is equivalent to while (not condition). So the first version of the above program could be written as

print "Enter a positive number: "; my $selected = <STDIN>; chomp $selected; until ($selected <= 0) { print $selected--, "\n"; }

and the second version as

print "Enter a positive number: "; my $selected = <STDIN>; chomp $selected; do { print $selected--, "\n"; } until ($selected <= 0);

Exiting Early and Skipping Iterations

Suppose we have a long array of numbers, and we want to find the index of the first occurrence of some particular number in our array. As soon as we find it, we don’t need to keep looking at the rest of the array. To exit a for loop before we reach the end of the array, we use the keyword last. The last keyword can also be used to exit a while loop even while the loop condition is still true. In a similar fashion, if we want to skip a value in the array being iterated over by a for loop, or if we want to skip the rest of a while loop’s body and reevaluate its condition, we use the keyword next. An example of the usage of last and next is shown below:

my @bigData = (); push @bigData, int(rand(100)) for (1..500); my $loc = -1; my $target; until (defined $target) { print "Enter a number 0-99: " my $usrInput = <STDIN>; chomp $usrInput; if ($usrInput =~ /\D/ or $usrInput < 0 or $usrInput > 99) { print "$usrInput is not between 0-99. Please enter a number between 0-99.\n"; next; } $target = $usrInput; } for my $i (0..$#bigData) { if ($bigData[$i] == $target) { $loc = $i; last; } } if ($loc == -1) { print "The data does not contain the value $target\n"; } else { print "The first occurrence of $target is at index $loc\n"; }

This program introduces a few new elements. The push function adds a value to the end of an array. The rand function generates a random number that is at least 0 and less than its argument, and the int function rounds its argument down to the next smaller integer. So the combination of the two, int(rand($someNumber)), produces a random integer in the range 0..($someNumber - 1). The expression $usrInput =~ /\D/ checks to see whether $usrInput contains any non-numeric characters. It does this using what’s called a regular expression, which we’ll discuss in more detail in a future post. The reason this is necessary is because of the way in which Perl implicitly converts between strings and numeric data types. When a string that begins with a non-numeric character is used in a numeric context (such as being compared to a number using the < and > operators), it is implicitly converted to 0, which is within our range of valid inputs and thus will be accepted by the program as if the user had actually typed 0. To avoid this false positive, we have to explicitly check whether the input is non-numeric.

Now let’s look at how the next and last statements are being used. In our until loop, we ask the user for a number and check to see if it is valid. If it is not, we print an error message and invoke next. This causes the last line of code in the loop, which assigns the user input to the $target variable, to be skipped over—we don’t want to store an invalid user input as the value we’re going to try to search for. Thus the loop exit condition, that $target has a defined value, remains false, and so we prompt the user for another input.

Once we have a valid input, we start searching the @bigData array for the target value provided by the user, which we do using a for loop (remember that $#bigData represents the last index in @bigData). As soon as we find our target value, we store the index we found it at and then invoke last. This causes the loop to immediately exit, and the program continues execution with the next line of code outside the loop (in this case, the line if ($loc == -1)). We do this because, once we’ve found the first occurrence of the target value, we have the information we need—continuing to search through the rest of the array would only waste time and resources. An example run of what the user would see is shown below:

Enter a number 0-99: foo foo is not between 0-99. Please enter a number between 0-99. Enter a number 0-99: -6 -6 is not between 0-99. Please enter a number between 0-99. Enter a number 0-99: 104 104 is not between 0-99. Please enter a number between 0-99. Enter a number 0-99: 47 The first occurrence of 47 is at index 274

4 comments:

  1. Very neat. I like how simply you are able to iterate over arrays using a for-loop. How are you able iterate over a hash? Do you need to obtain the keys/values first somehow as an array?

    ReplyDelete
    Replies
    1. Yes. If you want to iterate over a hash %someHash, you would use for(keys %someHash).

      Delete
  2. Is there any way to declare arrays as two-dimensional?

    ReplyDelete
    Replies
    1. You would declare an array of array references (which are discussed in the next post on subroutines).

      Delete