News:

Wieners, Brats, Franks, we've got 'em all.

Main Menu

Anagram solver!

Started by mynameistmp, December 05, 2005, 06:58:11 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

mynameistmp

My Mom always does these anagram puzzles in her morning papers, so I took it upon myself to create a solution finder for her! Now I know that most of you probably aren't interested in what tmps Mother does or solving anagram problems, but you might find the code I made somewhat interesting.

What I am doing is taking a scrambled input string and checking to see if any combination of the string exists in the dictionary file (is it a valid english word or not). Here is what the program looks like in action:

Quote
bash-3.00$ time puzzle.sh adegln
Using anagram: adegln
Sorting adegln alphabetically... adegln
Scanning database...
Found 1 result(s):
angled
Found 2 result(s):
dangle

real    0m0.343s

As you can see, it works pretty quickly ;P The time program tells us less than a second, anyways.

How it works:

I started off with just a dictionary of all english words (thanks iago). I created script.sh which takes every line in the wordlist and rearranges each character alphabetically, keeping all of the lines in the original order. Now you have two wordlists, both parallel on a line to line basis. Then I created afilter.c. afilter (once compiled) takes data from standard input and rearranges the data alphabetically, then spits out the arrangement, in classical unix fashion ;P. Then I wrote puzzle.sh, a bash script that takes a command line argument (the scrambled word), pipes it to afilter wich outputs an alphabetically arranged version of the anagram. That output is grep'd from the second (alphabetized) wordlist. When a match is found, the line number is extracted via awk and awk is used again to scan the first (original) wordlist for that very line number, which is parallel to the second word list, thus giving you the matching (english) word.

Here is the link to the source (script.sh, afilter.c, puzzle.sh, both wordlists) if anybody is interested:
www.javaop.com/~tmp/puzzle.tar.gz

It is 600k, 95% of which is the dictionary file included.

Joe

Quote from: Camel on June 09, 2009, 04:12:23 PMI'd personally do as Joe suggests

Quote from: AntiVirus on October 19, 2010, 02:36:52 PM
You might be right about that, Joe.


MyndFyre

#2
I wonder: would it be possible to make this faster by creating a database, hashing each word by-letter, and then doing a word lookup by the smaller result subset?

Example:
H(x) is defined:

proc H(x):
  hashcode := 0
  for each letter in x
  {
     hashcode ^= letter
  }

  return hashcode
end proc

(Joe, that's real pseudocode :P)

You'll have many more collisions, but you also should speed up lookups.  Other hash functions would be acceptable too, as long as they had the commutative property.  For example, addition and multiplication could work because you get the same result regardless of the operations order.  This is important for an anagram finder, and in fact addition might be better than XOR.

Anyway, once you have the rows with the matching hash, you can do the same row comparison that you do now with the words.  :)
Quote from: Joe on January 23, 2011, 11:47:54 PM
I have a programming folder, and I have nothing of value there

Running with Code has a new home!

Quote from: Rule on May 26, 2009, 02:02:12 PMOur species really annoys me.

mynameistmp

Quote
I wonder: would it be possible to make this faster by creating a database, hashing each word by-letter, and then doing a word lookup by the smaller result subset?

I could think of a few ways to optimize it (afilter is rather slow itself), however the difference can be no more than a fraction of a second, as it is running at a fraction of a second as it is. Note: all of the search/compare algorithm is implemented by grep itself.

MyndFyre

Of course it's no more than a fraction of a second.  :P  Still, though, a few jillion runs and those fractional seconds add up.  :P
Quote from: Joe on January 23, 2011, 11:47:54 PM
I have a programming folder, and I have nothing of value there

Running with Code has a new home!

Quote from: Rule on May 26, 2009, 02:02:12 PMOur species really annoys me.

iago

Tmp's solution is pretty similar to another problem I outlined, from the book Programming Pearls, by Bentley.  The code to sort the letters in each word:
http://www.cs.bell-labs.com/cm/cs/pearls/sign.c

The wordlist is piped into that, then the output from that is piped into the sort program, and the output from that is piped into this:
http://www.cs.bell-labs.com/cm/cs/pearls/squash.c

Which combines similar words onto the same line.  The output from that could be grepped or you could use an optimized search on it. 

I highly recommend Programming Pearls, it's an amazing book. 

Quik

This is a very interesting idea, and looks quite successful. Although, I think you may be ruining it for your mother.
Quote[20:21:13] xar: i was just thinking about the time iago came over here and we made this huge bomb and light up the sky for 6 min
[20:21:15] xar: that was funny

Joe

Thanks! I'll have to use this for my science homework from now on. I *hate* unscrambling stuff.

If I get a chance, I suppose I'll port this to entirely C++, so that it can be compiled with VC++ for Windows.
Quote from: Camel on June 09, 2009, 04:12:23 PMI'd personally do as Joe suggests

Quote from: AntiVirus on October 19, 2010, 02:36:52 PM
You might be right about that, Joe.


iago

I have to complement Tmp's choice in program.  He chose my #1 favorite problem to solve :)


Quote from: Joe[e2] on December 05, 2005, 08:21:53 PM
Thanks! I'll have to use this for my science homework from now on. I *hate* unscrambling stuff.

If I get a chance, I suppose I'll port this to entirely C++, so that it can be compiled with VC++ for Windows.

It might be a pain to get that working on Windows, Windows sucks at piping and stuff.  But who knows?

Joe

Well, I could probably do it all internally. Grep would be the hardest part, where I'd probably fail.
Quote from: Camel on June 09, 2009, 04:12:23 PMI'd personally do as Joe suggests

Quote from: AntiVirus on October 19, 2010, 02:36:52 PM
You might be right about that, Joe.