What is Six Degrees of Kevin Bacon?
Six Degrees of Kevin Bacon is a parlor game based on trying to find the shortest “chain” connecting two actors and/or actresses. Each “link” in the chain is a pair of actors and/or actresses who have co-starred in a movie, and the number of links in the chain determines the length of the chain. In the seminal case of this game, one of the actors being linked is Kevin Bacon, hence the name.
To demonstrate the way in which this game works, consider the following two examples: William Shatner to Leonard Nimoy and Kevin Bacon to Natalie Portman. Shatner and Nimoy co-starred in Star Trek: The Motion Picture, so the length of the chain connecting them is 1. Bacon co-starred with Meg Gibson in Picture Perfect, and Gibson co-starred with Portman in Vox Lux, so the length of the chain connecting Bacon and Portman is 2.
The goal of this program is to find the shortest chain given the names of the two actors. Note that this program will only give the chain length as a number—it will not tell which movies and intermediate actors (if any) the chain was traced through. Throughout the development process, I will be using the Oracle of Bacon as a reference to verify my program’s results.
Loading the Raw Data into Memory
The Oracle of Bacon makes the dataset it uses to find matches publicly available for download as a compressed text file. After downloading and extracting the file, I opened it to look at how the data was formatted. Here is the first entry in the file:
{"title":"Actrius","cast":["Núria Espert","Rosa Maria Sardà","Anna Lizaran","Mercè Pons"],"directors":["Ventura Pons"],"producers":["Ventura Pons"],"companies":["Canal+ España","Els Films de la Rambla S.A.","Generalitat de Catalunya - Departament de Cultura","Televisión Española","Buena Vista International"],"year":1997}
Now knowing the data format, I was able to then write regular expressions to extract each film’s cast list. I then used the built-in split function to extract the individual actors’ and actresses’ names from the cast list and store them in an array. This portion of the code is shown below:
Note that since the string I want to match my regex against is stored in the special variable $_, I do not need to use the =~ operator to tell Perl to check for a match—I simply declare my regex, and Perl checks $_ for a match by default. Also note that I am storing array references into the @casts array; recall that this is necessary to prevent the sub-arrays from losing their structure when stored in the larger array. Also note the \p{L} character class in the regular expression. This is a special character class that matches anything considered a letter according to the Unicode standard. The reason I use this instead of A-Za-z is to ensure that I am correctly capturing actors and actresses such as Ricardo Montalbán whose names contain accented characters.
Having loaded into memory the cast lists of every movie in the datset, I can now construct my adjacency list—a hash that maps each actor or actress to the actors and/or actresses with whom they have co-starred. Since I can do this simply by iterating over the arrays I created earlier, I first allow my file variable to go out of scope so that Perl will close the connection to it. The code for constructing the adjacency list is shown below:
The grep function returns a list of all elements in an array that match a given regular expression. Recall that an empty list is treated as a false value in boolean context; hence, the grep function as it is being used here essentially acts to prevent duplicate mappings being added to the adjacency list. Also note that, in the else clause, I use square brackets rather than parentheses around the sole element of my newly created array. This is because using square brackets creates an array reference, saving me the step of having to declare the array and then obtain a reference to it using a backslash as I did with the @cast array in creating the first hash.
For those of you following along at home, be forewarned that the dataset is a very large file (about 43 MB); it will take several minutes to construct the adjacency list.
Finding the Shortest Chain
This essentially boils down to a breadth-first search algorithm: given a starting node (actor/actress), check all of the adjacent nodes (actors/actresses) to see whether they match the target node (actor/actress), then check all the nodes adjacent to the adjacent nodes, and so on and so forth until the target node is found, as seen below:
Note again the use of grep to ensure that no node (actor/actress) is visited more than once; this is also the reason for keeping all of the names in the queue and simply advancing a pointer ($currentIndex) instead of removing nodes from the queue as they are visited: if the nodes were removed as they were visited, it would create the possibility of nodes being visited more than once. Also recall that string equality uses the eq operator rather than the == operator used for numerical equality. Finally, recall that an array variable used in a scalar context evaluates to the number of elements it contains (as seen in the while loop condition).



