My Mom always does these anagram puzzles in her morning papers, so I took it upon myself to create a solution finder for her! Now I know that most of you probably aren't interested in what tmps Mother does or solving anagram problems, but you might find the code I made somewhat interesting.
What I am doing is taking a scrambled input string and checking to see if any combination of the string exists in the dictionary file (is it a valid english word or not). Here is what the program looks like in action:
Quote
bash-3.00$ time puzzle.sh adegln
Using anagram: adegln
Sorting adegln alphabetically... adegln
Scanning database...
Found 1 result(s):
angled
Found 2 result(s):
dangle
real 0m0.343s
As you can see, it works pretty quickly ;P The
time program tells us less than a second, anyways.
How it works:
I started off with just a dictionary of all english words (thanks iago). I created script.sh which takes every line in the wordlist and rearranges each
character alphabetically, keeping all of the lines in the original order. Now you have two wordlists, both parallel on a line to line basis. Then I created afilter.c.
afilter (once compiled) takes data from standard input and rearranges the data alphabetically, then spits out the arrangement, in classical unix fashion ;P. Then I wrote
puzzle.sh, a bash script that takes a command line argument (the scrambled word), pipes it to
afilter wich outputs an alphabetically arranged version of the anagram. That output is
grep'd from the second (alphabetized) wordlist. When a match is found, the line number is extracted via
awk and
awk is used again to scan the first (original) wordlist for that very line number, which is parallel to the second word list, thus giving you the matching (english) word.
Here is the link to the source (script.sh, afilter.c, puzzle.sh, both wordlists) if anybody is interested:
www.javaop.com/~tmp/puzzle.tar.gz
It is 600k, 95% of which is the dictionary file included.
That's pretty sweet.
I wonder: would it be possible to make this faster by creating a database, hashing each word by-letter, and then doing a word lookup by the smaller result subset?
Example:
H(x) is defined:
proc H(x):
hashcode := 0
for each letter in x
{
hashcode ^= letter
}
return hashcode
end proc
(Joe, that's real pseudocode :P)
You'll have many more collisions, but you also should speed up lookups. Other hash functions would be acceptable too, as long as they had the commutative property. For example, addition and multiplication could work because you get the same result regardless of the operations order. This is important for an anagram finder, and in fact addition might be better than XOR.
Anyway, once you have the rows with the matching hash, you can do the same row comparison that you do now with the words. :)
Quote
I wonder: would it be possible to make this faster by creating a database, hashing each word by-letter, and then doing a word lookup by the smaller result subset?
I could think of a few ways to optimize it (afilter is rather slow itself), however the difference can be no more than a fraction of a second, as it is running at a fraction of a second as it is. Note: all of the search/compare algorithm is implemented by
grep itself.
Of course it's no more than a fraction of a second. :P Still, though, a few jillion runs and those fractional seconds add up. :P
Tmp's solution is pretty similar to another problem I outlined, from the book Programming Pearls, by Bentley. The code to sort the letters in each word:
http://www.cs.bell-labs.com/cm/cs/pearls/sign.c
The wordlist is piped into that, then the output from that is piped into the sort program, and the output from that is piped into this:
http://www.cs.bell-labs.com/cm/cs/pearls/squash.c
Which combines similar words onto the same line. The output from that could be grepped or you could use an optimized search on it.
I highly recommend Programming Pearls, it's an amazing book.
This is a very interesting idea, and looks quite successful. Although, I think you may be ruining it for your mother.
Thanks! I'll have to use this for my science homework from now on. I *hate* unscrambling stuff.
If I get a chance, I suppose I'll port this to entirely C++, so that it can be compiled with VC++ for Windows.
I have to complement Tmp's choice in program. He chose my #1 favorite problem to solve :)
Quote from: Joe[e2] on December 05, 2005, 08:21:53 PM
Thanks! I'll have to use this for my science homework from now on. I *hate* unscrambling stuff.
If I get a chance, I suppose I'll port this to entirely C++, so that it can be compiled with VC++ for Windows.
It might be a pain to get that working on Windows, Windows sucks at piping and stuff. But who knows?
Well, I could probably do it all internally. Grep would be the hardest part, where I'd probably fail.