Thursday, February 27, 2020

Binary Search

A binary search is an algorithm for searching for an item in a sorted list. It can be implemented in Perl for numeric data as follows:

sub binarySearch { # $data will be an array reference my $data = shift; my $target = shift; my $start = shift || 0; my $end = shift || $#$data; # if start and end pointers cross over, $data does not contain $target return -1 if ($end < $start); # Perl always does floating point division, # so we have to tell it to convert to an integer my $mid = $start + int(($end - $start) / 2); if ($data->[$mid] == $target) { return $mid; } elsif ($data->[$mid] > $target) { return binarySearch($data, $target, $start, $mid - 1); } else { return binarySearch($data, $target, $mid + 1, $end); } }

This subroutine handles the retrieval of arguments in a slightly different way than the subroutines we saw in the previous post. The shift function is used to remove and return the element at index 0 of a specified array or, if no array is specified, of @_. If shift is called on an empty array, it returns undef. This behavior is taken advantage of to make the start and end arguments optional—the program will first shift a value from @_ and evaluate it. If it is a true value, the short-circuit behavior of the logical or operator causes the value that was returned from shift to be returned by the logical or operator and thus assigned to the variable. If the shift returned undef, which is a false value, the logical or operator will then evaluate and return the second value, thus providing default values for $start and $end that will be used if the caller does not supply such values.

Also note that we can access the value at a particular index of an array to which we have a reference by using the arrow operator -> followed by the index we want to access. This is similar to how we called subroutines from a reference in the previous post.

Two additional items to note pertain to the use of this subroutine by the caller. Firstly, the input array to which $data is a reference must be sorted—this is a requirement of the binary search algorithm. Secondly, because we have not specified the number and types of the arguments we are expecting (this is because we make some of the arguments optional, which cannot be done if the number and types of arguments are specified), Perl will not automatically convert an array into an array reference before passing it as an argument to the subroutine—it is the responsibility of the caller to do so. An example of a complete program that shows how this subroutine would be called is shown below:

sub binarySearch { # $data will be an array reference my $data = shift; my $target = shift; my $start = shift || 0; my $end = shift || $#$data; # if start and end pointers cross over, $data does not contain $target return -1 if ($end < $start); # Perl always does floating point division, # so we have to tell it to convert to an integer my $mid = $start + int(($end - $start) / 2); if ($data->[$mid] == $target) { return $mid; } elsif ($data->[$mid] > $target) { return binarySearch($data, $target, $start, $mid - 1); } else { return binarySearch($data, $target, $mid + 1, $end); } } my @rawData = (); push @rawData, int(rand(100)) for (1..500); # input data to a binary search must be sorted my @sortedData = sort { $a <=> $b } @rawData; my $target; until (defined $target) { print "Enter a number 0-99: "; my $usrInput = <STDIN>; chomp $usrInput; unless ($usrInput =~ /^\d{1,2}$/) { print "$usrInput is not between 0-99. Please enter a number between 0-99.\n"; next; } $target = $usrInput; } my $result = binarySearch(\@sortedData, $target); if ($result == -1) { print "$target was not found in the data.\n"; } else { print "$target was found at index $result of the data.\n"; }

Notice how the @rawData array is sorted to create the @sortedData array, and it is this array that is passed to binarySearch. The { $a <=> $b } in the invocation of the sort function instructs Perl to perform a numeric sort (the default sort treats the array’s elements as strings and sorts them lexicographically). Also notice how, when binarySearch is called, it is not @sortedData itself that is passed as an argument but rather a reference to it (created by prefixing a backslash). Finally, notice how only two arguments—the data to be searched and the value to search for—are passed to binarySearch when it is called by the main program. Since no values for them have been specified by the caller, the $start and $end variables will take on the default values specified in the binarySearch subroutine—in this case, the first and last indices of the data.

No comments:

Post a Comment