Subroutines

A subroutine is a block of code with a defined name that, instead of being executed immediately when it is encountered in the program, is stored for later use. The block of code can then be run (potentially multiple times) by using the name defined for it later in the program.

A Note About Terminology

Many languages refer to what Perl calls subroutines as functions. In Perl, a function is something that is built into Perl, such as chomp that we’ve used in earlier posts to strip the trailing newline from user input, whereas a subroutine is written by the user.

A Simple Subroutine with No Parameters or Return Value

Subroutines in Perl are defined using the keyword sub, followed by the name of the subroutine, followed by the block of code to be executed when the subroutine is called. For example, the following subroutine, when called, will print the digits 0-9 to the screen, each on their own line:

sub printDigits
{
    print "$_\n" for (0..9);
}

The Argument Array

Okay, technically I lied when I said that the printDigits method above has no parameters. Because I haven’t explicitly told Perl what types of parameters I’m looking for, Perl will allow me to pass in as many arguments as I want of whatever types I want when I go to call this subroutine later in the program. The arguments are stored in the variable @_. For example, this subroutine, when called, will print out any arguments passed to it as a comma-separated list:

sub commaSeparated
{
    my @args = @_;
    print "$args[$_], " for (0..($#args - 1));
    print $args[$#args];
}

Notice how the subroutine starts by unpacking @_ into a local variable, @args. As with $_ in the context of loops, @_ is just an alias to the actual arguments: copying the arguments into a local variable and accessing them using that local variable instead of @_ ensures that, if we modify one of the arguments, those modifications are not reflected outside of the subroutine.

Returning a Value

Subroutines can also return a value back to the caller using the keyword return followed by the value to be returned. For example, the following subroutine adds up the arguments passed to it and returns their sum:

sub add
{
    my @args = @_;
    my $result = 0;
    $result += $_ for (@args);
    return $result;
}

Specifying Parameters

Perl allows you to specify the number and type of parameters to be passed to a subroutine. This is done by placing sigils in parentheses after the subroutine’s name. For example, the following division function takes two scalar values as its arguments:

sub divide($$)
{
    my ($dividend, $divisor) = @_;
    return $dividend / $divisor;
}

Unfortunately, it is not possible to specify the type any more specifically than by its sigil (so you can’t require that the arguments be numeric values). Also note that, even when the number and type of arguments is specified, the arguments are still stored in @_: they cannot be named in the subroutine declaration, only by unpacking them into local variables as is done on the first line of the divide function shown above.

Passing Arrays and Hashes to a Subroutine Using References

Arrays in Perl are automatically flattened: if an array “contains” another array, the contents of the inner array are unpacked into the outer array, meaning each element of the former inner array is treated as a single element of the outer array, and it is impossible to determine just by looking at the outer array where the start and end of the inner array were. Likewise, if an array “contains” a hash, the hash is flattened into the array, destroying the key-value associations in the process. Because the arguments of a subroutine are passed to it as an array, this means that special measures must be taken in order to pass an array or hash as an argument to a subroutine without losing its structure.

To specify an array or hash in the parameter list, prepend a backslash \ to the appropriate sigil. When the subroutine is called, a reference to the array or hash passed as an argument is placed in @_. It can then be dereferenced into a local variable by using the appropriate sigil for an array or hash, followed by the reference as accessed from @_ enclosed in curly braces { }. For example, the following subroutine takes an array and a hash and returns an array containing the elements of the array that are keys in the hash:

sub findKeys(\@\%)
{
    my @searchingFor = @{$_[0]};
    my %hashToSearch = %{$_[1]};
    my @found = ();
    for (@searchingFor)
    {
        push @found, $_ if (exists $hashToSearch{$_});
    }
    return @found;
}

It is also possible to store the references themselves to local variables, which would carry the scalar sigil $, and work with the references directly, dereferencing them each time they are used. For example,

sub findKeys(\@\%)
{
    my ($searchingFor, $hashToSearch) = @_;
    my @found = ();
    for (@$searchingFor)
    {
        push @found, $_ if (exists $$hashToSearch{$_});
    }
    return @found;
}

Note that when the reference is stored in a scalar variable instead of at an index in an array, it is not necessary to surround it with curly braces when dereferencing it (although the curly braces can still be used if one so desires). Also note that when the references are used directly instead of being dereferenced into a local variable, any changes made to the contents of the array or hash being referenced will continue to be visible after the function has returned. So, for example, the following subroutine takes an array and a hash and removes from the hash all key-value pairs for which the key is contained in the array:

sub removeKeys(\@\%)
{
    my ($keysToRemove, $hashToPrune) = @_;
    delete $$hashToPrune{$_} for (@$keysToRemove);
}

A simple program using this subroutine is shown below:

sub removeKeys(\@\%)
{
    my ($keysToRemove, $hashToPrune) = @_;
    delete $$hashToPrune{$_} for (@$keysToRemove);
}

sub printHash(\%)
{
    my %hash = %{$_[0]};
    print "$_ => $hash{$_}\n" for (keys %hash);
}

my @searchTerms = ("foo", "bar", "baz");
my %searchingIn = (
    foo => 3,
    bar => 7,
    qux => 9
);

printHash(%searchingIn);
removeKeys(@searchTerms, %searchingIn);
print "After calling removeKeys:\n"
printHash(%searchingIn);

This program produces the following output:

bar => 7
qux => 9
foo => 3
After calling removeKeys:
qux => 9

This output also demonstrates an important fact to note about the behavior of hashes: they are unordered. When you iterate over a hash, the only thing you are guaranteed is that every key-value pair will be generated exactly once—Perl makes no guarantees about the order in which they are generated.

Subroutine References

Recall that in the post on conditional statements, we used a hash to substitute for an extended if-elsif-else chain to determine which of several strings to print. What if we have an extended if-elsif-else chain where the operations to be performed are more complicated than just printing some string? Can we still use a hash instead of an if-elsif-else chain? The answer is yes—we do it by storing references to subroutines as the hash&rsuqo;s values.

A reference to a subroutine is created by prepending \& to the name of the subroutine. A subroutine reference is never followed by a list of arguments—the arguments will be supplied when we dereference the subroutine and call it. Calling a subroutine from a reference is done by using the reference, followed by the arrow operator ->, followed by the argument list. Consider the following program:

sub getNumberInput
{
    while (1)
    {
        print "Enter a number: ";
        my $usrInput = <STDIN>;
        chomp $usrInput;
        return $usrInput if ($usrInput ~= /^-?\d+(\.\d+)?$/);
        # if user input was valid, subroutine will have returned on the previous line
        # and so this line will not be executed
        print "$usrInput is not a valid number. Please enter a number.\n";
    }
}

sub add($$)
{
    my ($a, $b) = @_;
    return $a + $b;
}

sub subtract($$)
{
    my ($a, $b) = @_;
    return $a - $b;
}

sub multiply($$)
{
    my ($a, $b) = @_;
    return $a * $b;
}

sub divide($$)
{
    my ($a, $b) = @_;
    return $a / $b;
}

sub mod($$)
{
    my ($a, $b) = @_;
    return $a % $b;
}

sub exp($$)
{
    my ($a, $b) = @_;
    return $a ** $b;
}

my %options = (
    addition => \&add,
    subtraction => \&subtract,
    multiplication => \&multiply,
    division => \&divide,
    modulo => \&mod,
    exponentiation => \&exp
);

my $first = getNumberInput();
my $second = getNumberInput();
my $operation;
until (exists $options{$operation})
{
    print "That operation is not supported.\n" if (defined $operation);
    print "Enter an operation: ";
    $operation = <STDIN>;
    chomp $operation;
}
my $result = $options{$operation}->($first, $second);
print "The result is $result.\n";

The second-to-last line is the one that is of interest to us. We retrieve a subroutine reference from the %options hash and use the arrow operator to simultaneously dereference it and call it with $first and $second as its arguments.

4 comments:

Justin ScottFebruary 25, 2020 at 11:33 PM
I like how you can store references to sub routines. Are you able to use/create anonymous sub routines?
Griffin WhiteFebruary 27, 2020 at 11:34 AM
Is the only way to get user input? Is there anything you can do to ensure they only only enter a certain type of input?

CS270 Language Blog - Perl

Saturday, February 22, 2020