News:

So the widespread use of emojis these days kinda makes forum smileys pointless, yeah?

Main Menu

Graphing huge amounts of data

Started by iago, August 08, 2010, 08:38:05 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

iago

I have 170,000,000 numbers, sorted, and I want to visualize the growth. Does anybody know a tool that can handle that much data?

deadly7

SPSS is a program I've seen commonly used to interface with lots of numbers. It's a statistics tool, so I don't know if you need something as robust. Is all that you're looking to do graph things? Linux friendliness a big requirement?
[17:42:21.609] <Ergot> Kutsuju you're girlfrieds pussy must be a 403 error for you
[17:42:25.585] <Ergot> FORBIDDEN

on IRC playing T&T++
<iago> He is unarmed
<Hitmen> he has no arms?!

on AIM with a drunk mythix:
(00:50:05) Mythix: Deadly
(00:50:11) Mythix: I'm going to fuck that red dot out of your head.
(00:50:15) Mythix: with my nine

rabbit


iago

Yeah, I want to simply visualize a huge series of numbers so I can see how they're laid out. Let's say they're user ids from a certain social networking site, and I'm curious how they choose user ids so we can save time when doing a full brute force. In theory. ;)

Statistics are okay, as long as it can handle 170 million lines. Linux friendliness would be nice.

I'm not really familiar with MATLAB. Can it do what I need?

deadly7

Quote from: iago on August 08, 2010, 09:57:01 PM
Yeah, I want to simply visualize a huge series of numbers so I can see how they're laid out. Let's say they're user ids from a certain social networking site, and I'm curious how they choose user ids so we can save time when doing a full brute force. In theory. ;)

Statistics are okay, as long as it can handle 170 million lines. Linux friendliness would be nice.
Well, SPSS is (iirc) a pay-to-use program and I can't recall how expensive it is. If you want to do statistics based on names vs. uid's, etc etc, SPSS is probably one of the easiest ways to go about it. I've used it a couple times and it has a pretty steep learning curve but that shouldn't be a problem for you. :P
Quote
I'm not really familiar with MATLAB. Can it do what I need?
MATLAB can graph numbers if that's what you're looking to do.
[17:42:21.609] <Ergot> Kutsuju you're girlfrieds pussy must be a 403 error for you
[17:42:25.585] <Ergot> FORBIDDEN

on IRC playing T&T++
<iago> He is unarmed
<Hitmen> he has no arms?!

on AIM with a drunk mythix:
(00:50:05) Mythix: Deadly
(00:50:11) Mythix: I'm going to fuck that red dot out of your head.
(00:50:15) Mythix: with my nine

Chavo


iago

Quote from: deadly7 on August 08, 2010, 10:00:33 PM
Quote from: iago on August 08, 2010, 09:57:01 PM
Yeah, I want to simply visualize a huge series of numbers so I can see how they're laid out. Let's say they're user ids from a certain social networking site, and I'm curious how they choose user ids so we can save time when doing a full brute force. In theory. ;)

Statistics are okay, as long as it can handle 170 million lines. Linux friendliness would be nice.
Well, SPSS is (iirc) a pay-to-use program and I can't recall how expensive it is. If you want to do statistics based on names vs. uid's, etc etc, SPSS is probably one of the easiest ways to go about it. I've used it a couple times and it has a pretty steep learning curve but that shouldn't be a problem for you. :P
Ah, cool. Sounds like more than I need -- I don't care about names or anything, just the numbers.

Quote from: deadly7 on August 08, 2010, 10:00:33 PM
Quote
I'm not really familiar with MATLAB. Can it do what I need?
MATLAB can graph numbers if that's what you're looking to do.
But can it graph 170 million of them? The input file is just one number/line and is 2.1gb.

Quote from: Chavo on August 08, 2010, 10:30:23 PM
http://www.r-project.org/
Cool, looks promising. :)

deadly7

Quote
MATLAB can graph numbers if that's what you're looking to do.
It can probably do it, but you probably will want a high-powered machine for it. Newer distributions have support for multithreading. You may have to do some relabeling of your data to make it graphable (e.g. write a quick script to query and make it into an x-y coordinate system). Supports Linux.
[17:42:21.609] <Ergot> Kutsuju you're girlfrieds pussy must be a 403 error for you
[17:42:25.585] <Ergot> FORBIDDEN

on IRC playing T&T++
<iago> He is unarmed
<Hitmen> he has no arms?!

on AIM with a drunk mythix:
(00:50:05) Mythix: Deadly
(00:50:11) Mythix: I'm going to fuck that red dot out of your head.
(00:50:15) Mythix: with my nine

zorm

MATLAB isn't free though.

There is however its open source equivalent, Octave. It is used on some of the fastest super computers in the world so I'm reasonable sure it can handle your 170 million datapoints.
"Frustra fit per plura quod potest fieri per pauciora"
- William of Ockham

Sidoh

Quote from: Chavo on August 08, 2010, 10:30:23 PM
http://www.r-project.org/

Exactly what I was going to recommend.  I've used R to handle a few orders of magnitude more data, and it does it pretty well. :)

iago: put my vote in for R.  I love R. :)

nslay

Quote from: iago on August 08, 2010, 08:38:05 PM
I have 170,000,000 numbers, sorted, and I want to visualize the growth. Does anybody know a tool that can handle that much data?


If you want free, you could try gnuplot.  I'm not sure how gracefully gnuplot will respond to that much data though.  Octave is also free and matlab-like but it also uses gnuplot.

Can't you subsample the data?


set term 'png'
set output 'myplot.png'
set grid
set xlabel 'x'
set ylabel 'y'
# Simple 2D plot (with connected lines) using columns 1 and 2 of a space-delimited file as x and y respectively.
plot 'mydata' using 1:2 title 'Baby Cow Population' w lines

An adorable giant isopod!