Files
popcyclical-blog-archive/posts/2012-04-25-how-do-you-solve-a-problem-like-a-euler-using-f.md

17 KiB
Raw Permalink Blame History

title, date, slug, published
title date slug published
How Do You Solve a Problem Like a Euler? Using F# 2012-04-25T19:44:10.7264-05:00 how-do-you-solve-a-problem-like-a-euler-using-f true

Ive been slowly making my way through the first 100 Project Euler problems while learning F# in an attempt to be more pragmatic - and to try and prevent my grey matter from getting too stale.  After many hours of keyboard-head-banging, Im now getting to the point where I dont flail uselessly when beginning to type in some F# code the pattern matching, automatic generalization and higher-order functions now feel like useful tools rather than strange curiosities.  Since Ive primarily coded in C# for many years, Ive been using the book “Real World Functional Programming” by Tomas Petricek as a starting point its been a pretty good read. So, for this post the code will be in F#, but itll be secondary to the more general topic of problem solving for puzzle-type scenarios commonly found on Project Euler.

I found this particular problem, #98, a bit more challenging than usual - and kind of fun to work through.  It involves matching anagrams with square numbers using character replacement heres the full description:

By replacing each of the letters in the word CARE with 1, 2, 9, and 6 respectively, we form a square number: 1296 = 362. What is remarkable is that, by using the same digital substitutions, the anagram, RACE, also forms a square number: 9216 = 962. We shall call CARE (and RACE) a square anagram word pair and specify further that leading zeroes are not permitted, neither may a different letter have the same digital value as another letter.

Using words.txt, a 16K text file containing nearly two-thousand common English words, find all the square anagram word pairs (a palindromic word is NOT considered to be an anagram of itself).

What is the largest square number formed by any member of such a pair?

This is what Id consider a pretty good puzzle the elements (a word list and square numbers) are easy enough to grasp, the way theyre related together is unique for this scenario, and an effective solution does not immediately pop into mind.  Well, it didnt pop into *my *mind, at least.  So where to begin?

Many of the problems (at least the ones Ive solved so far) on Project Euler involve a few common steps to reach a solution:

  • Generate a set of input values for the problem usually this will be a large set of potential values, such as all the prime numbers below 10,000,000, or in this case, 16K of words and a bunch of square numbers.
  • Given these inputs, build an algorithm that can test for the solution.
  • Add constraints to reduce the size of the input set so that the solution can quickly be found.  On modern hardware, this is usually under 1 second but anything under a minute is considered kosher.

Input Values

The input values are easy here theyve provided the word list, and a sequence of square numbers should be trivial.  First, reading the word list.  Instead of placing the words each on a new line, the file instead contains a single line formatted like:

"A","ABILITY","ABLE","ABOUT","ABOVE","ABSENCE"...

Ok so itll take a little parsing.  No problem.

let words =
    System.IO.File.ReadAllLines(@"words.txt") 
    |> Array.collect (fun line -> line.Split(','))
    |> Array.map (fun w -> w.Replace("\"", "").ToLower())

This loads the line from the file, splits the words by commas, and removes the quotations… and I threw in making the words lowercase since I hate having my puzzles constantly shouting at me.  A little aside for the language particulars: the |> is the magical-looking-but-actually-quite-simple pipeline operator it passes the output from the function on the left to be input for the function on the right. And the fun > is F# syntax for lambda expressions.  At the end of the pipeline comes a simple array containing the parsed words.

 

So for the rest of the input - the square numbers.  F# has the means to generate an infinite sequence which can then be sliced and diced to get at the bits that are needed.  The perfect squares can be generated with:

let allSquares =   
    Seq.unfold(fun (square,odd) -> Some(square, (square+odd, odd+2))) (0,1)

The square numbers are “unfolded that is, each element is calculated based on the result from the previous element, like youd do with.  Square numbers have a property where they can be generated by as the sum of a list of successive odd numbers. The tuple of (0,1) on the far right is fed in as an initial value, with 0 being the first “square” number and 1 being the first odd number.  At each step of the unfold, the odd number is increased by 2, and the square number is calculated by the adding the previous odd number. So, the (square,odd) tuples being computed will look like:(0,1),(1,3),(4,5),(9,7),(16,9), with only the squares being captured as output.  Works for me!

As with most simple-yet-effective bits of code, this one went through a couple of refinements before nailing it.  The first pass looked like this:

Seq.unfold(fun i -> Some(i, i + 1)) 0
|> Seq.map (fun i -> i*i)

…where the squares are generated in two steps instead of one. The first bit unfolds all the natural numbers from 0 to infinity (or more likely, maxint), and the second part maps these each by squaring them.  It produces the same result, but is less elegant because it uses more moving pieces.

Thats all the input this problem should require next, to design the…

Success Algorithm

A useful bit of the Project Euler problem descriptions is that they usually include an example which can be used as a test case for a solution algorithm. Here theyve provided:

care, race // <- anagrams map to  
1296, 9216 // <- squares

…with a replacement dictionary of (c,1),(a,2),(r,9),(e,6).  Its important to note that the anagrams and squares might be symmetric, where the numbers can be swapped (i.e. (c,9),(a,2),(r,1),(e,6) works as well), but there are cases (such as the final answer) where this doesnt hold.

To devise an algorithm to test for an answer, the first step is to think it over before doing any typing.  Its better to have some notion of a strategy than to rush in with the codes.  I struggle with adhering to this its just too easy to think “Im so good, Ill just figure it out as I program.”  Ive found its more productive to get away - get a pen and some paper and start sketching ideas, take a walk, etc.  Therell be plenty of time to work through all the details in code, but its best to go in with a plan a vague or even an incorrect strategy is better than none at all.

For this problem, I started with the idea of comparing each letter of the first word with the first square number, building a replacement dictionary while iterating through them.  If a valid replacement dictionary for the first set is found, it would then be tested against the second word-square-number pair. Writing in algorithm in F# meant I could treat the words as lists and iterate them using a common combination of recursive function calls and pattern matching.

let rec getReplacementDictionary (sToT, tToS) (source, target) =
    match source, target with
    // have already seen this exact mapping -> skip it
    | s::ss, t::tt when Map.containsKey s sToT && (Map.find s sToT) = t 
        -> getReplacementDictionary (sToT, tToS) (ss, tt)
    // have a mapping for the source, but it's not the target -> failure
    | s::_, _ when Map.containsKey s sToT
        -> None
    // have a mapping for the target, but it's not the source -> failure
    | _::_, t::_ when Map.containsKey t tToS 
        -> None
    // never before seen mapping -> add it
    | s::ss, t::tt
        -> getReplacementDictionary (Map.add s t sToT, Map.add t s tToS) (ss, tt)
    // end of the line - a successful translation!
    | [], [] -> Some(sToT, tToS)
    | _ -> raise(System.ArgumentException("words not equal length"))

Ive named the variables “source” and “target” theyll get passed the word and the number, respectively.  They are “matched” against conditions using the patterns seen on each line following the pipe “|” character.  The easiest of these to grasp is near the end | [], [] > Some(sToT, tToS)“ which matches two empty lists, indicating that the words have been completely checked and that there is a valid dictionary.  For the dictionary itself, I found that it was necessary to keep tabs on both the source-to-target values (sToT) as well as the target-to-source (tToS) values.  A bi-directional mapping structure would be ideal, but it would have been more effort to construct that than to fudge it with an extra variable.  If there is a better way to handle this, Id be interested to hear it… 

At any rate, the result of this function will either be a failure - with the None value - or with the completed dictionary(s) the Some(sToT, tToS).  Testing it against “care” and “1296” produces the expect result:

> getReplacementDictionary (Map.empty, Map.empty) ("care" |> List.ofSeq, "1296" |> List.ofSeq);;
val it : (Map<char,char> * Map<char,char>) option =
  Some
    (map [('a', '2'); ('c', '1'); ('e', '6'); ('r', '9')],
     map [('1', 'c'); ('2', 'a'); ('6', 'e'); ('9', 'r')])

…and changing a digit in the number to one of the already mapped digits results (Im using 1292 which repeats the 2) in a failure:

> getReplacementDictionary (Map.empty, Map.empty) ("care" |> List.ofSeq, "1292" |> List.ofSeq);;
val it : (Map<char,char> * Map<char,char>) option = None

To test this dictionary against the second group, “race” and “9216”, the original function may be reused because all of the character mappings will have been seen in the first group and they will simply be verified via the first pattern. 

let dict = getReplacementDictionary (Map.empty, Map.empty) ("care" |> List.ofSeq, "1296" |> List.ofSeq)
getReplacementDictionary dict.Value ("race" |> List.ofSeq, "9216" |> List.ofSeq);;

val dict : (Map<char,char> * Map<char,char>) option =
  Some
    (map [('a', '2'); ('c', '1'); ('e', '6'); ('r', '9')],
     map [('1', 'c'); ('2', 'a'); ('6', 'e'); ('9', 'r')])

This is making an assumption that the input words are anagrams of each other this is an OK assumption to make for now.  Indeed, it can be even assumed that the input data may be filtered for anagrams Ill cover that in the next section.

I noticed that this could be further streamlined by simply appending the words and the numbers together, since that is more-or-less what is occurring when calling it twice.  And in that case, the function no longer needs to return the mapping dictionary just a simple boolean indicating if a valid dictionary can be applied to the input.   So, the function can be modified as follows, making sure to rename it:

let rec hasReplacementDictionary (sToT, tToS) (source, target) =
    match source, target with
    | s::ss, t::tt when Map.containsKey s sToT && (Map.find s sToT) = t 
        -> hasReplacementDictionary (sToT, tToS) (ss, tt)
    | s::_, _ when Map.containsKey s sToT
        -> false
    | _::_, t::_ when Map.containsKey t tToS 
        -> false
    | s::ss, t::tt
        -> hasReplacementDictionary (Map.add s t sToT, Map.add t s tToS) (ss, tt)
    | [], [] -> true
    | _ -> raise(System.ArgumentException("words not equal length"))

and run a test:

> let los = List.ofSeq
let s = List.append (List.ofSeq "care") (List.ofSeq "race")
let t = List.append (List.ofSeq "1296") (List.ofSeq "9216");;

> hasReplacementDictionary (Map.empty, Map.empty) (s, t);;
val it : bool = true

Success!  Now, having all the necessary input and a valid algorithm, it would be possible to test every possibly combination of inputs to find the correct answer. 

Input Constraints

Project Euler problems are usually infeasible to answer without a computer.  (Theres sometimes really smart math folks in the forums who will find clever ways around this, but for the rest of us…) 

How many combinations of the input data could there be, anyway?  Theres a total of 1,786 words ranging in length from 1 to 14 characters, which leads to ~3.2 million combinations of just the words.  My script to calculate the count of all square numbers under 14 digits ran out of memory, but it turns out only up to 9 digit squares are needed.  There are 31,623 of those.  The number of combinations would be 1,7862 × 31,6232 ≈ 3.2E15.  Thats several thousand billion got to keep filtering or itll take a year!

The next obvious step is to pair up the anagrams and discard the rest of the words.  A simple way to do this is to alphabetically sort each character in each word and group together the ones that are exactly the same.  For example “care” and “race” will both map to “acer” neat trick!  Heres the code

let groupAnagrams ws =
    ws 
    // sort by characters in each string and then glue them back together
    |> Seq.groupBy (Array.ofSeq >> Array.sort >> (fun cs -> new string(cs)))
    // take only the anagrams - where the number of words is greater than one
    |> Seq.map snd
    |> Seq.filter (Seq.length >> ((<) 1))

This is a great first step but its not quite enough.  I want all the pairs of anagrams and there are cases where there are more than 2 words that all have the same letters.  For example, “post”, “spot” and “stop” will all group together.  What I need is combinations of all the pairs so Ill end up with (“post”, “spot”), (“spot”, stop”), and (“post”, “stop”).  Fortunately, I had already borrowed/stolen a generic combination function that Tomas had posted on Stack Overflow.

let combinations size set = 
    let rec combinations acc size set = seq {
        match size, set with 
        | n, x::xs -> 
            if n > 0 then yield! combinations (x::acc) (n - 1) xs
            if n >= 0 then yield! combinations acc n xs 
        | 0, [] -> yield acc 
        | _, [] -> () }
    combinations [] size (set |> List.ofSeq)

Now I can run all the whole list of anagram groups through the combinations algorithm and produce a single list of anagram pairs:

let pairwiseTuples =
    Seq.collect (
        combinations 2
        >> Seq.map (fun l -> l.[0], l.[1])
    )

And indeed it does its job:

[|("cat", "act"); ... ("race", "care");
  ...
  ("spot", "post"); ("stop", "post"); ("stop", "spot"); 
  ...|]

Running the words through this results in 44 pairs of anagrams with the lengthiest pair being 9 letters long ("reduction", "introduce"). This is starting to sound reasonable to process and its where I got stuck for a bit.  Looping through all of the squares for each pair still seems computationally prohibitive. 

I walked away from it for the day and in the shower the next morning “duh, the square numbers can be bundled up as anagrams as well!"  These are the ah-ha moments that make it all worthwhile.  A few examples (“1024”, “2401”), (“14400”, “10404”).  There turned out to still be quite a few for the larger digit counts (for example, there are 70052 anagram pairs for square numbers with 9 digits), but since there are so few words in the list that are above 5 characters long, it turned out to be OK.

All was left was to write a few more lines to run all the equal-length anagram pairs through the success function and find the result.  It ran in about a second on my hardware, so that is an acceptable solution in my mind.  There are undoubtedly numerous additional optimizations that could be applied and perhaps even a way to word it out on paper (I doubt it for this one) so Id be eager to hear any comments.