Author Topic: String Puzzle  (Read 7226 times)

0 Members and 1 Guest are viewing this topic.

Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
String Puzzle
« on: December 12, 2007, 12:56:47 pm »
Given a string, detect the number of occurrences of two substrings: "fo" and "oo" in O(n) time that works without traversing the string more than once.  For example, given the string

  "foobarfoo"

The answer would be 4, since there are two occurrences of "fo" and two occurrences of "oo".

After you develop the algorithm, program it in your favorite language and post it!

I don't think this is a super difficult problem, so I won't give any more hints. :)
« Last Edit: December 12, 2007, 03:19:40 pm by Sidoh »

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Re: String Puzzle
« Reply #1 on: December 12, 2007, 03:04:38 pm »
Am I crazy, or is there no trick to this problem? It seems to me that it can be done with two linear searches.

Code: [Select]
int fo = 0;
int oo = 0;
for(i = 0; i < strlen(str) - 1; i++)
  if(str[i] == 'f' && str[i+1] == 'o')
   fo++;

for(i = 0; i < strlen(str) - 1; i++)
  if(str[i] == 'o' && str[i+1] == 'o')
   oo++;

As far as I can tell, that runs in O(n). Each of the loops runs 2n times, for a total of 4n. You drop the constant, and end up with O(n).

Am I missing something?

Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: String Puzzle
« Reply #2 on: December 12, 2007, 03:18:07 pm »
You're absolutely right.  That's an O(n) solution.  That's exactly the way the problem was phrased when it was given to me, but I think that the original problem had another condition: you can only pass through the loop once.

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Re: String Puzzle
« Reply #3 on: December 12, 2007, 04:26:18 pm »
That's still easy:

Code: [Select]
int fo = 0;
int oo = 0;
for(i = 0; i < strlen(str) - 1; i++)
{
  if(str[i] == 'f' && str[i+1] == 'o')
   fo++;
  if(str[i] == 'o' && str[i+1] == 'o')
   oo++;
}

Perhaps they meant doing it in "n" commands, or something? *shrug*

Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: String Puzzle
« Reply #4 on: December 12, 2007, 04:51:08 pm »
Haha, yeah.  They actually meant something completely different.  They wanted you to find all repeated pairs of the strings in O(n).  I have a solution for that too, but your solution is fine, assuming you incorporate out-of-bounds checks, which I'm sure you thought of.

My solution for this problem was to use a state graph.  That's the first thing I thought of (within seconds) and it was fairly trivial, so I didn't bother to think past that.  They're both O(n), though.  You're right; that problem had no trick to it. :(

Here's the real problem (sort of):

Find all pairs in the string that are repeated and output the number for each pair in O(n) time.

What I'm thinking is you're supposed to use extra space.  Use a hash table or something similar, I guess.  I have a solution with a graph, but I'm still not sure if it's O(n).  I'll have to think about it more.
« Last Edit: December 12, 2007, 04:59:41 pm by Sidoh »

Offline Blaze

  • x86
  • Hero Member
  • *****
  • Posts: 7136
  • Canadian
    • View Profile
    • Maide
Re: String Puzzle
« Reply #5 on: December 12, 2007, 09:17:48 pm »
Well, I'll give this a shot in PHP.  :)


<?PHP
  $thatstring 
"foobarfoo";  #The string we're reading
  
$counter[];                 #The counter which keeps track of how many times a pattern has happened.  :)
  
  
for ($i 0$i strlen($thatstring) - 1;$i++)
  {
    
$counter[substr($thatstring$i,2)]++;    
  }
  
  
print_r($counter);
  
?>


Ouputs:
Quote
[fo] => 2
[oo] => 2
[ob] => 1
[ba] => 1
[ar] => 1
[rf] => 1

Woo?
And like a fool I believed myself, and thought I was somebody else...

Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: String Puzzle
« Reply #6 on: December 12, 2007, 09:35:57 pm »
Yeah, that's effectively the solution using a hash table (but avoiding all of the messiness that's involved with hash tables).  There's some shifty stuff that goes on with arrays in PHP.  The way they are constructed in the engine depends on a bunch of stuff, including the size of the array, what you're puting into it and what you're doing with it after it's constructed.  When you use an array in that fashion, it's acting like a dictionary/hash table.

To quote the PHP manual:

Quote
An array in PHP is actually an ordered map. A map is a type that maps values to keys. This type is optimized in several ways, so you can use it as a real array, or a list (vector), hashtable (which is an implementation of a map), dictionary, collection, stack, queue and probably more. Because you can have another PHP array as a value, you can also quite easily simulate trees.

Part of the requirement of the original problem (this is a Microsoft interview question, apparently.  tests if you understand space/time trade-offs, maybe?) is that you write it in C (tests if you understand data structures, I guess?).

What would you use as the hash function?  Since it's a pretty simple value (the pair), you can pretty trivially concoct a perfect hash function (1:1).

The solution using the graph is also pretty interesting.  I'm not completely sure that you can guarantee that it's O(n) without use of O(R^2) space, though (where R is the range of characters). Just sort of thinking out loud here -- I suppose you have to do that with the hash table too, though.  Depending on how the hash table is implemented (in addition to [tex]\alpha[/tex], its load factor = the number of elements / the capacity), you could contrive a string that induces a bunch of collisions and make lookups in the hash table O(n), which would make the entire thing O(n^2).

Incidentally, one of my research advisor's past topics is a the DAWG (Directed Acyclic Word Graph).  It's a really interesting data structure that allows you to do efficient searches for substrings within strings and is intended to be used in things like the human genome project (where you're searching through gigabytes of data for a string that's a tiny tiny fraction of that).  If you have a string of length n and a substring of length m, it takes O(mn) time to build the DAWG, but then it requires O(m) time to search for any substring of that length.  The solution I thought of for this problem involving a graph is loosely related to this data structure, but it's definitely not a DAWG.

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Re: String Puzzle
« Reply #7 on: December 12, 2007, 10:00:06 pm »
this is a Microsoft interview question, apparently.
That explains why the originally specification was so lame. :D

Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: String Puzzle
« Reply #8 on: December 12, 2007, 10:10:03 pm »
this is a Microsoft interview question, apparently.
That explains why the originally specification was so lame. :D

Man, I knew you'd say something!  I said the same thing, though.  hehe.

It was posted on some board that's designed for this sort of thing (technical interview questions in computer scienceish subjects).  The problem was worded strangely.  Sorry about all of the confusion.  Here's the original problem:

Quote
I was asked that you have a string say, "foobarfoo", in which "fo" and "oo" are repeated twice. You have to find all such repeated pairs in O(n) time. The string can only have alphanumeric elements in it and code it in C.

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Re: String Puzzle
« Reply #9 on: December 12, 2007, 10:42:28 pm »
this is a Microsoft interview question, apparently.
That explains why the originally specification was so lame. :D

Man, I knew you'd say something!  I said the same thing, though.  hehe.

It was posted on some board that's designed for this sort of thing (technical interview questions in computer scienceish subjects).  The problem was worded strangely.  Sorry about all of the confusion.  Here's the original problem:

Quote
I was asked that you have a string say, "foobarfoo", in which "fo" and "oo" are repeated twice. You have to find all such repeated pairs in O(n) time. The string can only have alphanumeric elements in it and code it in C.
Come on, of course I said something! Too easy!

But yeah, a sorta-hashtable could be used in C. I'm not positive if it would be O(n) still, because of scanning the hashtable, but I'd do something like this (note: untested, but seems like it ought to work):

Code: [Select]
int matches[256 * 256];
int i;

for(i = 0; i < strlen(str) - 1; i++)
 matches[(str[i] << 8) | str[i + 1]]++;

for(i = 0; i < sizeof(matches); i++) /* Sorta ruins O(n), but necessary? */
 if(matches[i] == 2)
  printf("Found a pair: %c%c\n", matches[i] & 0x0FF, matches[i] >> 8);

<edit> works, basically. Here is the full tested C code:
Code: [Select]
#include <stdio.h>
#define MAX (256 * 256)
int main()
{
    int matches[MAX];
    int i;
    char *str = "foobarfoo";

    memset(matches, 0, MAX * sizeof(int));

    for(i = 0; i < strlen(str) - 1; i++)
        matches[(str[i] << 8) | str[i + 1]]++;

    for(i = 0; i < MAX; i++) /* Sorta ruins O(n), but necessary? */
        if(matches[i] == 2)
            printf("Found a pair: %c%c\n", i >> 8, i & 0x0FF);

    return 0;
}

And the output:
ron@slayer:~$ ./a.out
Found a pair: fo
Found a pair: oo


Incidentally, this method sorts them, too. :D
« Last Edit: December 12, 2007, 10:51:41 pm by iago »

Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: String Puzzle
« Reply #10 on: December 12, 2007, 11:02:17 pm »
Nice!  I think your initial size is a bit big, though.  You only have 36^2 or 62^2 possible pairs, depending on case-sensitivity.

So your hash function is the first character shifted by eight bits to the left + the second character?  Yeah, that seems like it'd work just fine.  The traversal of the hash table is really O(1) since there's nothing in the program that will change its running time.  I don't think it's necessary, however.  Just traverse the string again, look up each pair you encounter, output the pair and its count if it's > 1 and set the value = 0.

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Re: String Puzzle
« Reply #11 on: December 12, 2007, 11:07:55 pm »
Nice!  I think your initial size is a bit big, though.  You only have 36^2 or 62^2 possible pairs, depending on case-sensitivity.

So your hash function is the first character shifted by eight bits to the left + the second character?  Yeah, that seems like it'd work just fine.  The traversal of the hash table is really O(1) since there's nothing in the program that will change its running time.  I don't think it's necessary, however.  Just traverse the string again, look up each pair you encounter, output the pair and its count if it's > 1 and set the value = 0.
Yeah, my program can technically do any binary data, so the buffer is much larger than necessary (65535 entries).

You're right that I could just process the string a second time, I hadn't even thought of that! Also, if I'm ok with finding any number or repetitions >1 (not just pairs), I can build it into the loop itself, too:
Code: [Select]
#include <stdio.h>
#define MAX (256 * 256)
int main()
{
    int matches[MAX];
    int i;
    int temp;
    char *str = "foobarfoojkilajkfi23jkljlkwef89ujklsafdj23 ssdafjklasdfjklsdfkjlsadf s kjsdf23jklsdflweu";

    memset(matches, 0, MAX * sizeof(int));

    for(i = 0; i < strlen(str) - 1; i++)
    {
        temp = (str[i] << 8) | str[i + 1];
        matches[temp]++;
        if(matches[temp] == 2)
            printf("Found a pair: '%c%c'\n", str[i], str[i + 1]);
    }

    return 0;
}

And output:
ron@slayer:~$ ./a.out
Found a pair: 'fo'
Found a pair: 'oo'
Found a pair: 'jk'
Found a pair: 'kl'
Found a pair: '23'
Found a pair: 'af'
Found a pair: 'la'
Found a pair: 'sd'
Found a pair: 'fj'
Found a pair: 'ls'
Found a pair: 'df'
Found a pair: 'jl'
Found a pair: 'sa'
Found a pair: ' s'
Found a pair: 'kj'
Found a pair: '3j'
Found a pair: 'we


Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: String Puzzle
« Reply #12 on: December 12, 2007, 11:14:48 pm »
Awesome!  I'm going to whip up something in C.  I'll post my solution when I'm done.  I'd really like to give the graph solution a go, but I think it'd be sort of nasty in C.

Oh, forgot to mention: it'll handle any binary data that can be represented in <= 8 bits in all cases, right?  I guess is a safe assumption if you're using UTF-8, lol (which gcc does?).  After that, you start to have collisions.
« Last Edit: December 12, 2007, 11:22:22 pm by Sidoh »

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Re: String Puzzle
« Reply #13 on: December 12, 2007, 11:50:37 pm »
Well, mine won't really work on wide characters. To hell with those Asians, though! :)

In any case, binary data tends to be represented in bytes anyways, so it'll work as long as you're looking for pairs of single-byte sequences.

Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: String Puzzle
« Reply #14 on: December 13, 2007, 12:00:23 am »
Well, mine won't really work on wide characters. To hell with those Asians, though! :)

In any case, binary data tends to be represented in bytes anyways, so it'll work as long as you're looking for pairs of single-byte sequences.

Hahaha.

Yep, that's exactly what I meant. :)

Here's my solution:

Code: [Select]
chris@shadow:~/src/c/pairs$ cat pairs.c
#include <stdio.h>

#define HASH_MAX 1296

int hash_digit(char d);

int main()
{
        int pairs[HASH_MAX];
        int i;
        memset(pairs, 0, HASH_MAX * sizeof(int));

        char * string;
        printf("Enter a string : ");
        scanf("%s", string);

        for(i = 0; i < strlen(string) - 1; i++)
        {
                int hash = ((36)*hash_digit(string[i]) + hash_digit(string[i + 1]));
                pairs[hash]++;
        }

        for(i = 0; i < strlen(string) - 1; i++)
        {
                int hash = ((36)*hash_digit(string[i]) + hash_digit(string[i + 1]));

                if(pairs[hash] > 1)
                {
                        printf("Found pair: %c%c, occurred %d times.\n", string[i], string[i+1], pairs[hash]);
                        pairs[hash] = 0;
                }
        }
}

int hash_digit(char d)
{
        if(d >= 0x61 && d <= 0x7A)
        {
                return (d - 0x60);
        }
        else if(d >= 0x30 && d <= 0x39)
        {
                return (d - 0x29);
        }
}

Output:

Code: [Select]
chris@shadow:~/src/c/pairs$ ./pairs
Enter a string : aloaiwleyrkasdjfhaksjcvbwieurgfasiudfyaisudcbaksdjbkjahsvdwieuyriasudyaksdjbflkasjdfbqjwhkeriuyasdfjkahsgdfkjhagsdrfwe
Found pair: ai, occurred 2 times.
Found pair: yr, occurred 2 times.
Found pair: ka, occurred 3 times.
Found pair: as, occurred 5 times.
Found pair: sd, occurred 5 times.
Found pair: dj, occurred 3 times.
Found pair: ha, occurred 2 times.
Found pair: ak, occurred 3 times.
Found pair: ks, occurred 3 times.
Found pair: sj, occurred 2 times.
Found pair: wi, occurred 2 times.
Found pair: ie, occurred 2 times.
Found pair: eu, occurred 2 times.
Found pair: iu, occurred 2 times.
Found pair: ud, occurred 3 times.
Found pair: df, occurred 4 times.
Found pair: ya, occurred 3 times.
Found pair: su, occurred 2 times.
Found pair: jb, occurred 2 times.
Found pair: kj, occurred 2 times.
Found pair: ah, occurred 2 times.
Found pair: hs, occurred 2 times.
Found pair: uy, occurred 2 times.
Found pair: ri, occurred 2 times.