News:

Pretty crazy that we're closer to 2030, than we are 2005. Where did the time go!

Main Menu

Google Blog on Algorithms

Started by deadly7, July 15, 2010, 11:43:38 PM

Previous topic - Next topic

0 Members and 2 Guests are viewing this topic.

deadly7

I was randomly surfing the interwebs and I came across this blog post by Google: http://googlepublicpolicy.blogspot.com/2010/02/this-stuff-is-tough.html . It led me to wondering what, exactly, the google search algorithm looks like and how they managed to scale it so well that it returns a search with millions of potential results in 1 millisecond. Any ideas?
[17:42:21.609] <Ergot> Kutsuju you're girlfrieds pussy must be a 403 error for you
[17:42:25.585] <Ergot> FORBIDDEN

on IRC playing T&T++
<iago> He is unarmed
<Hitmen> he has no arms?!

on AIM with a drunk mythix:
(00:50:05) Mythix: Deadly
(00:50:11) Mythix: I'm going to fuck that red dot out of your head.
(00:50:15) Mythix: with my nine

while1

I tend to edit my topics and replies frequently.

http://www.operationsmile.org

Sidoh


iago

Google is nuts. Recent I did some light EULA violation and scraped a certain large social networking site for some very targeted information. The URLs alone were 10gb and are a bitch to work with. I couldn't even imagine trying to _store_ the data from the site, not to mention search it in any reasonable timeframe. And that's only one site out of millions!

warz

Google has like an entire system operating system that their software runs on, I think. (Somebody does, maybe Facebook?)

I also bet in this scenario, high quality hard drives would help a ton. Drives with very fast read speeds would speed of queries quite a bit I think.
http://www.chyea.org/ - web based markup debugger

Chavo

A lot of their speed is garnered by some very specifically tailored hardware architecture modifications.  I wrote a paper / presentation on it when I was still an undergrad and could dig it up if anyone is interested.

Blaze

Yes, that would be quite interesting, I think!
And like a fool I believed myself, and thought I was somebody else...

truste1

Quote from: Blaze on July 17, 2010, 02:07:36 AM
Yes, that would be quite interesting, I think!

Yes, I think that would be quite interesting.
Ain't Life Grand?

Falcon

Quote from: Chavo on July 16, 2010, 05:40:36 PM
A lot of their speed is garnered by some very specifically tailored hardware architecture modifications.  I wrote a paper / presentation on it when I was still an undergrad and could dig it up if anyone is interested.
I'm interested.

Joe

Quote from: Camel on June 09, 2009, 04:12:23 PMI'd personally do as Joe suggests

Quote from: AntiVirus on October 19, 2010, 02:36:52 PM
You might be right about that, Joe.


MyndFyre

Google's distribution algorithm is called MapReduce.  It basically is a massive parallelization algorithm.
Quote from: Joe on January 23, 2011, 11:47:54 PM
I have a programming folder, and I have nothing of value there

Running with Code has a new home!

Quote from: Rule on May 26, 2009, 02:02:12 PMOur species really annoys me.

Joe

Quote from: MyndFyre on August 12, 2010, 12:31:09 AM
Google's distribution algorithm is called MapReduce.  It basically is a massive parallelization algorithm.

Combo breaker!
Quote from: Camel on June 09, 2009, 04:12:23 PMI'd personally do as Joe suggests

Quote from: AntiVirus on October 19, 2010, 02:36:52 PM
You might be right about that, Joe.


Sidoh

I have always thought of map reduce more like a framework. I don't think it's really an algorithm is it?

Blaze

Quote from: Sidoh on August 12, 2010, 06:22:41 AM
I have always thought of map reduce more like a framework. I don't think it's really an algorithm is it?
Quote from: wikipediaMapReduce is a software framework
And like a fool I believed myself, and thought I was somebody else...

Chavo

#14
I never remember to go look when I'm at home and have access to my backup server.  However, here is a crappy article about it from people that don't know what they are talking about and impressed by things that are actually pretty common in the Enterprise environment:

http://news.cnet.com/8301-1001_3-10209580-92.html

What it doesn't talk about is the multi-tiered structure that search requests are actually handled (at a hardware level).  Each cluster group has nodes dedicated to handling requests, routing requests to servers most likely to have results cached, and servers that do nothing but handle optimizing what is currently cached in their huge memory banks from disk.  Essentially, they replace a typical SAN environment with a distributed cache/routing/control cluster.