Google Blog on Algorithms

deadly7 · July 15, 2010, 11:43:38 PM

I was randomly surfing the interwebs and I came across this blog post by Google: http://googlepublicpolicy.blogspot.com/2010/02/this-stuff-is-tough.html . It led me to wondering what, exactly, the google search algorithm looks like and how they managed to scale it so well that it returns a search with millions of potential results in 1 millisecond. Any ideas?

while1 · July 16, 2010, 12:11:58 AM

20 years of spaghetti code!

Sidoh · July 16, 2010, 05:59:53 AM

Quote from: while1 on July 16, 2010, 12:11:58 AM
20 years of spaghetti code!

hahahahahahahaha

iago · July 16, 2010, 10:50:27 AM

Google is nuts. Recent I did some light EULA violation and scraped a certain large social networking site for some very targeted information. The URLs alone were 10gb and are a bitch to work with. I couldn't even imagine trying to _store_ the data from the site, not to mention search it in any reasonable timeframe. And that's only one site out of millions!

warz · July 16, 2010, 11:07:25 AM

Google has like an entire system operating system that their software runs on, I think. (Somebody does, maybe Facebook?)

I also bet in this scenario, high quality hard drives would help a ton. Drives with very fast read speeds would speed of queries quite a bit I think.

Chavo · July 16, 2010, 05:40:36 PM

A lot of their speed is garnered by some very specifically tailored hardware architecture modifications. I wrote a paper / presentation on it when I was still an undergrad and could dig it up if anyone is interested.

Blaze · July 17, 2010, 02:07:36 AM

Yes, that would be quite interesting, I think!

truste1 · July 17, 2010, 06:36:36 PM

Quote from: Blaze on July 17, 2010, 02:07:36 AM
Yes, that would be quite interesting, I think!

Yes, I think that would be quite interesting.

Falcon · July 17, 2010, 06:43:13 PM

Quote from: Chavo on July 16, 2010, 05:40:36 PM
A lot of their speed is garnered by some very specifically tailored hardware architecture modifications. I wrote a paper / presentation on it when I was still an undergrad and could dig it up if anyone is interested.

I'm interested.

Joe · August 11, 2010, 12:02:43 AM

It sounds interesting.

MyndFyre · August 12, 2010, 12:31:09 AM

Google's distribution algorithm is called MapReduce. It basically is a massive parallelization algorithm.

Joe · August 12, 2010, 04:44:55 AM

Quote from: MyndFyre on August 12, 2010, 12:31:09 AM
Google's distribution algorithm is called MapReduce. It basically is a massive parallelization algorithm.

Combo breaker!

Sidoh · August 12, 2010, 06:22:41 AM

I have always thought of map reduce more like a framework. I don't think it's really an algorithm is it?

Blaze · August 12, 2010, 12:21:39 PM

Quote from: Sidoh on August 12, 2010, 06:22:41 AM
I have always thought of map reduce more like a framework. I don't think it's really an algorithm is it?

Quote from: wikipediaMapReduce is a software framework

Chavo · August 12, 2010, 12:34:28 PM

I never remember to go look when I'm at home and have access to my backup server. However, here is a crappy article about it from people that don't know what they are talking about and impressed by things that are actually pretty common in the Enterprise environment:

http://news.cnet.com/8301-1001_3-10209580-92.html

What it doesn't talk about is the multi-tiered structure that search requests are actually handled (at a hardware level). Each cluster group has nodes dedicated to handling requests, routing requests to servers most likely to have results cached, and servers that do nothing but handle optimizing what is currently cached in their huge memory banks from disk. Essentially, they replace a typical SAN environment with a distributed cache/routing/control cluster.

Clan x86

News:

Google Blog on Algorithms

deadly7

while1

Sidoh

iago

warz

Chavo

Blaze

truste1

Falcon

Joe

MyndFyre

Joe

Sidoh

Blaze

Chavo