Clan x86

General Forums => General Discussion => Topic started by: iago on August 08, 2010, 08:38:05 PM

Title: Graphing huge amounts of data
Post by: iago on August 08, 2010, 08:38:05 PM
I have 170,000,000 numbers, sorted, and I want to visualize the growth. Does anybody know a tool that can handle that much data?
Title: Re: Graphing huge amounts of data
Post by: deadly7 on August 08, 2010, 09:17:31 PM
SPSS is a program I've seen commonly used to interface with lots of numbers. It's a statistics tool, so I don't know if you need something as robust. Is all that you're looking to do graph things? Linux friendliness a big requirement?
Title: Re: Graphing huge amounts of data
Post by: rabbit on August 08, 2010, 09:34:29 PM
MATLAB?
Title: Re: Graphing huge amounts of data
Post by: iago on August 08, 2010, 09:57:01 PM
Yeah, I want to simply visualize a huge series of numbers so I can see how they're laid out. Let's say they're user ids from a certain social networking site, and I'm curious how they choose user ids so we can save time when doing a full brute force. In theory. ;)

Statistics are okay, as long as it can handle 170 million lines. Linux friendliness would be nice.

I'm not really familiar with MATLAB. Can it do what I need?
Title: Re: Graphing huge amounts of data
Post by: deadly7 on August 08, 2010, 10:00:33 PM
Quote from: iago on August 08, 2010, 09:57:01 PM
Yeah, I want to simply visualize a huge series of numbers so I can see how they're laid out. Let's say they're user ids from a certain social networking site, and I'm curious how they choose user ids so we can save time when doing a full brute force. In theory. ;)

Statistics are okay, as long as it can handle 170 million lines. Linux friendliness would be nice.
Well, SPSS is (iirc) a pay-to-use program and I can't recall how expensive it is. If you want to do statistics based on names vs. uid's, etc etc, SPSS is probably one of the easiest ways to go about it. I've used it a couple times and it has a pretty steep learning curve but that shouldn't be a problem for you. :P
Quote
I'm not really familiar with MATLAB. Can it do what I need?
MATLAB can graph numbers if that's what you're looking to do.
Title: Re: Graphing huge amounts of data
Post by: Chavo on August 08, 2010, 10:30:23 PM
http://www.r-project.org/
Title: Re: Graphing huge amounts of data
Post by: iago on August 08, 2010, 10:42:25 PM
Quote from: deadly7 on August 08, 2010, 10:00:33 PM
Quote from: iago on August 08, 2010, 09:57:01 PM
Yeah, I want to simply visualize a huge series of numbers so I can see how they're laid out. Let's say they're user ids from a certain social networking site, and I'm curious how they choose user ids so we can save time when doing a full brute force. In theory. ;)

Statistics are okay, as long as it can handle 170 million lines. Linux friendliness would be nice.
Well, SPSS is (iirc) a pay-to-use program and I can't recall how expensive it is. If you want to do statistics based on names vs. uid's, etc etc, SPSS is probably one of the easiest ways to go about it. I've used it a couple times and it has a pretty steep learning curve but that shouldn't be a problem for you. :P
Ah, cool. Sounds like more than I need -- I don't care about names or anything, just the numbers.

Quote from: deadly7 on August 08, 2010, 10:00:33 PM
Quote
I'm not really familiar with MATLAB. Can it do what I need?
MATLAB can graph numbers if that's what you're looking to do.
But can it graph 170 million of them? The input file is just one number/line and is 2.1gb.

Quote from: Chavo on August 08, 2010, 10:30:23 PM
http://www.r-project.org/
Cool, looks promising. :)
Title: Re: Graphing huge amounts of data
Post by: deadly7 on August 08, 2010, 11:12:18 PM
Quote
MATLAB can graph numbers if that's what you're looking to do.
It can probably do it, but you probably will want a high-powered machine for it. Newer distributions have support for multithreading. You may have to do some relabeling of your data to make it graphable (e.g. write a quick script to query and make it into an x-y coordinate system). Supports Linux.
Title: Re: Graphing huge amounts of data
Post by: zorm on August 08, 2010, 11:17:32 PM
MATLAB isn't free though.

There is however its open source equivalent, Octave. It is used on some of the fastest super computers in the world so I'm reasonable sure it can handle your 170 million datapoints.
Title: Re: Graphing huge amounts of data
Post by: Sidoh on August 09, 2010, 06:22:25 AM
Quote from: Chavo on August 08, 2010, 10:30:23 PM
http://www.r-project.org/

Exactly what I was going to recommend.  I've used R to handle a few orders of magnitude more data, and it does it pretty well. :)

iago: put my vote in for R.  I love R. :)
Title: Re: Graphing huge amounts of data
Post by: nslay on August 09, 2010, 11:36:48 AM
Quote from: iago on August 08, 2010, 08:38:05 PM
I have 170,000,000 numbers, sorted, and I want to visualize the growth. Does anybody know a tool that can handle that much data?


If you want free, you could try gnuplot.  I'm not sure how gracefully gnuplot will respond to that much data though.  Octave is also free and matlab-like but it also uses gnuplot.

Can't you subsample the data?


set term 'png'
set output 'myplot.png'
set grid
set xlabel 'x'
set ylabel 'y'
# Simple 2D plot (with connected lines) using columns 1 and 2 of a space-delimited file as x and y respectively.
plot 'mydata' using 1:2 title 'Baby Cow Population' w lines