Author Topic: Google Blog on Algorithms  (Read 4267 times)

0 Members and 1 Guest are viewing this topic.

Offline deadly7

  • 42
  • x86
  • Hero Member
  • *****
  • Posts: 6496
    • View Profile
Google Blog on Algorithms
« on: July 15, 2010, 11:43:38 pm »
I was randomly surfing the interwebs and I came across this blog post by Google: http://googlepublicpolicy.blogspot.com/2010/02/this-stuff-is-tough.html . It led me to wondering what, exactly, the google search algorithm looks like and how they managed to scale it so well that it returns a search with millions of potential results in 1 millisecond. Any ideas?
[17:42:21.609] <Ergot> Kutsuju you're girlfrieds pussy must be a 403 error for you
 [17:42:25.585] <Ergot> FORBIDDEN

on IRC playing T&T++
<iago> He is unarmed
<Hitmen> he has no arms?!

on AIM with a drunk mythix:
(00:50:05) Mythix: Deadly
(00:50:11) Mythix: I'm going to fuck that red dot out of your head.
(00:50:15) Mythix: with my nine

Offline while1

  • x86
  • Hero Member
  • *****
  • Posts: 1013
    • View Profile
Re: Google Blog on Algorithms
« Reply #1 on: July 16, 2010, 12:11:58 am »
20 years of spaghetti code!
I tend to edit my topics and replies frequently.

http://www.operationsmile.org

Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: Google Blog on Algorithms
« Reply #2 on: July 16, 2010, 05:59:53 am »
20 years of spaghetti code!


hahahahahahahaha

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Re: Google Blog on Algorithms
« Reply #3 on: July 16, 2010, 10:50:27 am »
Google is nuts. Recent I did some light EULA violation and scraped a certain large social networking site for some very targeted information. The URLs alone were 10gb and are a bitch to work with. I couldn't even imagine trying to _store_ the data from the site, not to mention search it in any reasonable timeframe. And that's only one site out of millions!

Offline warz

  • Hero Member
  • *****
  • Posts: 1134
    • View Profile
    • chyea.org
Re: Google Blog on Algorithms
« Reply #4 on: July 16, 2010, 11:07:25 am »
Google has like an entire system operating system that their software runs on, I think. (Somebody does, maybe Facebook?)

I also bet in this scenario, high quality hard drives would help a ton. Drives with very fast read speeds would speed of queries quite a bit I think.
http://www.chyea.org/ - web based markup debugger

Offline Chavo

  • x86
  • Hero Member
  • *****
  • Posts: 2219
  • no u
    • View Profile
    • Chavoland
Re: Google Blog on Algorithms
« Reply #5 on: July 16, 2010, 05:40:36 pm »
A lot of their speed is garnered by some very specifically tailored hardware architecture modifications.  I wrote a paper / presentation on it when I was still an undergrad and could dig it up if anyone is interested.

Offline Blaze

  • x86
  • Hero Member
  • *****
  • Posts: 7136
  • Canadian
    • View Profile
    • Maide
Re: Google Blog on Algorithms
« Reply #6 on: July 17, 2010, 02:07:36 am »
Yes, that would be quite interesting, I think!
And like a fool I believed myself, and thought I was somebody else...

Offline truste1

  • Hero Member
  • *****
  • Posts: 1130
  • I haven't visited my profile!
    • View Profile
Re: Google Blog on Algorithms
« Reply #7 on: July 17, 2010, 06:36:36 pm »
Yes, that would be quite interesting, I think!

Yes, I think that would be quite interesting.
Ain't Life Grand?

Offline Falcon

  • Full Member
  • ***
  • Posts: 241
  • I haven't visited my profile!
    • View Profile
Re: Google Blog on Algorithms
« Reply #8 on: July 17, 2010, 06:43:13 pm »
A lot of their speed is garnered by some very specifically tailored hardware architecture modifications.  I wrote a paper / presentation on it when I was still an undergrad and could dig it up if anyone is interested.
I'm interested.

Offline Joe

  • B&
  • x86
  • Hero Member
  • *****
  • Posts: 10319
  • In Soviet Russia, text read you!
    • View Profile
    • Github
Re: Google Blog on Algorithms
« Reply #9 on: August 11, 2010, 12:02:43 am »
It sounds interesting.
I'd personally do as Joe suggests

You might be right about that, Joe.


Offline MyndFyre

  • Boticulator Extraordinaire
  • x86
  • Hero Member
  • *****
  • Posts: 4540
  • The wait is over.
    • View Profile
    • JinxBot :: the evolution in boticulation
Re: Google Blog on Algorithms
« Reply #10 on: August 12, 2010, 12:31:09 am »
Google's distribution algorithm is called MapReduce.  It basically is a massive parallelization algorithm.
I have a programming folder, and I have nothing of value there

Running with Code has a new home!

Our species really annoys me.

Offline Joe

  • B&
  • x86
  • Hero Member
  • *****
  • Posts: 10319
  • In Soviet Russia, text read you!
    • View Profile
    • Github
Re: Google Blog on Algorithms
« Reply #11 on: August 12, 2010, 04:44:55 am »
Google's distribution algorithm is called MapReduce.  It basically is a massive parallelization algorithm.

Combo breaker!
I'd personally do as Joe suggests

You might be right about that, Joe.


Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: Google Blog on Algorithms
« Reply #12 on: August 12, 2010, 06:22:41 am »
I have always thought of map reduce more like a framework. I don't think it's really an algorithm is it?

Offline Blaze

  • x86
  • Hero Member
  • *****
  • Posts: 7136
  • Canadian
    • View Profile
    • Maide
Re: Google Blog on Algorithms
« Reply #13 on: August 12, 2010, 12:21:39 pm »
I have always thought of map reduce more like a framework. I don't think it's really an algorithm is it?
Quote from: wikipedia
MapReduce is a software framework
And like a fool I believed myself, and thought I was somebody else...

Offline Chavo

  • x86
  • Hero Member
  • *****
  • Posts: 2219
  • no u
    • View Profile
    • Chavoland
Re: Google Blog on Algorithms
« Reply #14 on: August 12, 2010, 12:34:28 pm »
I never remember to go look when I'm at home and have access to my backup server.  However, here is a crappy article about it from people that don't know what they are talking about and impressed by things that are actually pretty common in the Enterprise environment:

http://news.cnet.com/8301-1001_3-10209580-92.html

What it doesn't talk about is the multi-tiered structure that search requests are actually handled (at a hardware level).  Each cluster group has nodes dedicated to handling requests, routing requests to servers most likely to have results cached, and servers that do nothing but handle optimizing what is currently cached in their huge memory banks from disk.  Essentially, they replace a typical SAN environment with a distributed cache/routing/control cluster.
« Last Edit: August 12, 2010, 12:37:49 pm by Chavo »