Author Topic: Graphing huge amounts of data  (Read 3474 times)

0 Members and 1 Guest are viewing this topic.

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Graphing huge amounts of data
« on: August 08, 2010, 08:38:05 pm »
I have 170,000,000 numbers, sorted, and I want to visualize the growth. Does anybody know a tool that can handle that much data?

Offline deadly7

  • 42
  • x86
  • Hero Member
  • *****
  • Posts: 6496
    • View Profile
Re: Graphing huge amounts of data
« Reply #1 on: August 08, 2010, 09:17:31 pm »
SPSS is a program I've seen commonly used to interface with lots of numbers. It's a statistics tool, so I don't know if you need something as robust. Is all that you're looking to do graph things? Linux friendliness a big requirement?
[17:42:21.609] <Ergot> Kutsuju you're girlfrieds pussy must be a 403 error for you
 [17:42:25.585] <Ergot> FORBIDDEN

on IRC playing T&T++
<iago> He is unarmed
<Hitmen> he has no arms?!

on AIM with a drunk mythix:
(00:50:05) Mythix: Deadly
(00:50:11) Mythix: I'm going to fuck that red dot out of your head.
(00:50:15) Mythix: with my nine

Offline rabbit

  • x86
  • Hero Member
  • *****
  • Posts: 8092
  • I speak for the entire clan (except Joe)
    • View Profile
Re: Graphing huge amounts of data
« Reply #2 on: August 08, 2010, 09:34:29 pm »
MATLAB?

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Re: Graphing huge amounts of data
« Reply #3 on: August 08, 2010, 09:57:01 pm »
Yeah, I want to simply visualize a huge series of numbers so I can see how they're laid out. Let's say they're user ids from a certain social networking site, and I'm curious how they choose user ids so we can save time when doing a full brute force. In theory. ;)

Statistics are okay, as long as it can handle 170 million lines. Linux friendliness would be nice.

I'm not really familiar with MATLAB. Can it do what I need?

Offline deadly7

  • 42
  • x86
  • Hero Member
  • *****
  • Posts: 6496
    • View Profile
Re: Graphing huge amounts of data
« Reply #4 on: August 08, 2010, 10:00:33 pm »
Yeah, I want to simply visualize a huge series of numbers so I can see how they're laid out. Let's say they're user ids from a certain social networking site, and I'm curious how they choose user ids so we can save time when doing a full brute force. In theory. ;)

Statistics are okay, as long as it can handle 170 million lines. Linux friendliness would be nice.
Well, SPSS is (iirc) a pay-to-use program and I can't recall how expensive it is. If you want to do statistics based on names vs. uid's, etc etc, SPSS is probably one of the easiest ways to go about it. I've used it a couple times and it has a pretty steep learning curve but that shouldn't be a problem for you. :P
Quote
I'm not really familiar with MATLAB. Can it do what I need?
MATLAB can graph numbers if that's what you're looking to do.
[17:42:21.609] <Ergot> Kutsuju you're girlfrieds pussy must be a 403 error for you
 [17:42:25.585] <Ergot> FORBIDDEN

on IRC playing T&T++
<iago> He is unarmed
<Hitmen> he has no arms?!

on AIM with a drunk mythix:
(00:50:05) Mythix: Deadly
(00:50:11) Mythix: I'm going to fuck that red dot out of your head.
(00:50:15) Mythix: with my nine

Offline Chavo

  • x86
  • Hero Member
  • *****
  • Posts: 2219
  • no u
    • View Profile
    • Chavoland
Re: Graphing huge amounts of data
« Reply #5 on: August 08, 2010, 10:30:23 pm »

Offline iago

  • Leader
  • Administrator
  • Hero Member
  • *****
  • Posts: 17914
  • Fnord.
    • View Profile
    • SkullSecurity
Re: Graphing huge amounts of data
« Reply #6 on: August 08, 2010, 10:42:25 pm »
Yeah, I want to simply visualize a huge series of numbers so I can see how they're laid out. Let's say they're user ids from a certain social networking site, and I'm curious how they choose user ids so we can save time when doing a full brute force. In theory. ;)

Statistics are okay, as long as it can handle 170 million lines. Linux friendliness would be nice.
Well, SPSS is (iirc) a pay-to-use program and I can't recall how expensive it is. If you want to do statistics based on names vs. uid's, etc etc, SPSS is probably one of the easiest ways to go about it. I've used it a couple times and it has a pretty steep learning curve but that shouldn't be a problem for you. :P
Ah, cool. Sounds like more than I need -- I don't care about names or anything, just the numbers.

Quote
I'm not really familiar with MATLAB. Can it do what I need?
MATLAB can graph numbers if that's what you're looking to do.
But can it graph 170 million of them? The input file is just one number/line and is 2.1gb.

http://www.r-project.org/
Cool, looks promising. :)

Offline deadly7

  • 42
  • x86
  • Hero Member
  • *****
  • Posts: 6496
    • View Profile
Re: Graphing huge amounts of data
« Reply #7 on: August 08, 2010, 11:12:18 pm »
Quote
MATLAB can graph numbers if that's what you're looking to do.
It can probably do it, but you probably will want a high-powered machine for it. Newer distributions have support for multithreading. You may have to do some relabeling of your data to make it graphable (e.g. write a quick script to query and make it into an x-y coordinate system). Supports Linux.
[17:42:21.609] <Ergot> Kutsuju you're girlfrieds pussy must be a 403 error for you
 [17:42:25.585] <Ergot> FORBIDDEN

on IRC playing T&T++
<iago> He is unarmed
<Hitmen> he has no arms?!

on AIM with a drunk mythix:
(00:50:05) Mythix: Deadly
(00:50:11) Mythix: I'm going to fuck that red dot out of your head.
(00:50:15) Mythix: with my nine

Offline zorm

  • Hero Member
  • *****
  • Posts: 591
    • View Profile
    • Zorm's Page
Re: Graphing huge amounts of data
« Reply #8 on: August 08, 2010, 11:17:32 pm »
MATLAB isn't free though.

There is however its open source equivalent, Octave. It is used on some of the fastest super computers in the world so I'm reasonable sure it can handle your 170 million datapoints.
"Frustra fit per plura quod potest fieri per pauciora"
- William of Ockham

Offline Sidoh

  • x86
  • Hero Member
  • *****
  • Posts: 17634
  • MHNATY ~~~~~
    • View Profile
    • sidoh
Re: Graphing huge amounts of data
« Reply #9 on: August 09, 2010, 06:22:25 am »
http://www.r-project.org/

Exactly what I was going to recommend.  I've used R to handle a few orders of magnitude more data, and it does it pretty well. :)

iago: put my vote in for R.  I love R. :)

Offline nslay

  • Hero Member
  • *****
  • Posts: 786
  • Giraffe meat, mmm
    • View Profile
Re: Graphing huge amounts of data
« Reply #10 on: August 09, 2010, 11:36:48 am »
I have 170,000,000 numbers, sorted, and I want to visualize the growth. Does anybody know a tool that can handle that much data?


If you want free, you could try gnuplot.  I'm not sure how gracefully gnuplot will respond to that much data though.  Octave is also free and matlab-like but it also uses gnuplot.

Can't you subsample the data?

Code: [Select]
set term 'png'
set output 'myplot.png'
set grid
set xlabel 'x'
set ylabel 'y'
# Simple 2D plot (with connected lines) using columns 1 and 2 of a space-delimited file as x and y respectively.
plot 'mydata' using 1:2 title 'Baby Cow Population' w lines
An adorable giant isopod!