Clan x86

General Forums => General Discussion => Topic started by: deadly7 on July 15, 2010, 11:43:38 PM

Title: Google Blog on Algorithms
Post by: deadly7 on July 15, 2010, 11:43:38 PM
I was randomly surfing the interwebs and I came across this blog post by Google: http://googlepublicpolicy.blogspot.com/2010/02/this-stuff-is-tough.html . It led me to wondering what, exactly, the google search algorithm looks like and how they managed to scale it so well that it returns a search with millions of potential results in 1 millisecond. Any ideas?
Title: Re: Google Blog on Algorithms
Post by: while1 on July 16, 2010, 12:11:58 AM
20 years of spaghetti code!
Title: Re: Google Blog on Algorithms
Post by: Sidoh on July 16, 2010, 05:59:53 AM
Quote from: while1 on July 16, 2010, 12:11:58 AM
20 years of spaghetti code!


hahahahahahahaha
Title: Re: Google Blog on Algorithms
Post by: iago on July 16, 2010, 10:50:27 AM
Google is nuts. Recent I did some light EULA violation and scraped a certain large social networking site for some very targeted information. The URLs alone were 10gb and are a bitch to work with. I couldn't even imagine trying to _store_ the data from the site, not to mention search it in any reasonable timeframe. And that's only one site out of millions!
Title: Re: Google Blog on Algorithms
Post by: warz on July 16, 2010, 11:07:25 AM
Google has like an entire system operating system that their software runs on, I think. (Somebody does, maybe Facebook?)

I also bet in this scenario, high quality hard drives would help a ton. Drives with very fast read speeds would speed of queries quite a bit I think.
Title: Re: Google Blog on Algorithms
Post by: Chavo on July 16, 2010, 05:40:36 PM
A lot of their speed is garnered by some very specifically tailored hardware architecture modifications.  I wrote a paper / presentation on it when I was still an undergrad and could dig it up if anyone is interested.
Title: Re: Google Blog on Algorithms
Post by: Blaze on July 17, 2010, 02:07:36 AM
Yes, that would be quite interesting, I think!
Title: Re: Google Blog on Algorithms
Post by: truste1 on July 17, 2010, 06:36:36 PM
Quote from: Blaze on July 17, 2010, 02:07:36 AM
Yes, that would be quite interesting, I think!

Yes, I think that would be quite interesting.
Title: Re: Google Blog on Algorithms
Post by: Falcon on July 17, 2010, 06:43:13 PM
Quote from: Chavo on July 16, 2010, 05:40:36 PM
A lot of their speed is garnered by some very specifically tailored hardware architecture modifications.  I wrote a paper / presentation on it when I was still an undergrad and could dig it up if anyone is interested.
I'm interested.
Title: Re: Google Blog on Algorithms
Post by: Joe on August 11, 2010, 12:02:43 AM
It sounds interesting.
Title: Re: Google Blog on Algorithms
Post by: MyndFyre on August 12, 2010, 12:31:09 AM
Google's distribution algorithm is called MapReduce (http://en.wikipedia.org/wiki/MapReduce).  It basically is a massive parallelization algorithm.
Title: Re: Google Blog on Algorithms
Post by: Joe on August 12, 2010, 04:44:55 AM
Quote from: MyndFyre on August 12, 2010, 12:31:09 AM
Google's distribution algorithm is called MapReduce (http://en.wikipedia.org/wiki/MapReduce).  It basically is a massive parallelization algorithm.

Combo breaker!
Title: Re: Google Blog on Algorithms
Post by: Sidoh on August 12, 2010, 06:22:41 AM
I have always thought of map reduce more like a framework. I don't think it's really an algorithm is it?
Title: Re: Google Blog on Algorithms
Post by: Blaze on August 12, 2010, 12:21:39 PM
Quote from: Sidoh on August 12, 2010, 06:22:41 AM
I have always thought of map reduce more like a framework. I don't think it's really an algorithm is it?
Quote from: wikipediaMapReduce is a software framework
Title: Re: Google Blog on Algorithms
Post by: Chavo on August 12, 2010, 12:34:28 PM
I never remember to go look when I'm at home and have access to my backup server.  However, here is a crappy article about it from people that don't know what they are talking about and impressed by things that are actually pretty common in the Enterprise environment:

http://news.cnet.com/8301-1001_3-10209580-92.html

What it doesn't talk about is the multi-tiered structure that search requests are actually handled (at a hardware level).  Each cluster group has nodes dedicated to handling requests, routing requests to servers most likely to have results cached, and servers that do nothing but handle optimizing what is currently cached in their huge memory banks from disk.  Essentially, they replace a typical SAN environment with a distributed cache/routing/control cluster.