News:

Who uses forums anymore?

Main Menu

K-Means "Regression" Forest (Pictures!)

Started by nslay, February 17, 2011, 01:56:56 AM

Previous topic - Next topic

0 Members and 3 Guests are viewing this topic.

nslay

Certainly, not a new idea, but something you'll probably not see very often (or ever) ... a "regression" tree that can infer things that aren't functions ... Very cool!

So, the typical regression problem is to infer some function on some observed points [latex]\{(\mathbf{x}_n,y_n)\}_{n=1}^N[/latex] where [latex]y_n[/latex] are possibly noisy (i.e. so interpolation wouldn't be appropriate).

The regression tree divides the feature space (the [latex]x[/latex]'s) in such a way so as to minimize
[latex]
\ell(Y) = \sum_{y \in Y} (y - \bar{y})^2 = |Y| \text{Var}(Y)
[/latex]

The divisions are binary splits, so each vertex is minimizing the above on each side
[latex]
\ell(Y_{\text{left}},Y_{\text{right}}) = \sum_{y \in Y_{\text{left}}} (y - \bar{y})^2 + \sum_{y \in Y_{\text{right}}} (y - \bar{y})^2 = N_{\text{left}} \text{Var}(Y_{\text{left}}) + N_{\text{right}}\text{Var}(Y_{\text{right}})
[/latex]

The left and right sides are per-feature (dimension) and determined by a threshold
[latex]
Y_{\text{left}} = \{ y_n\ :\ x_{nf} \leq t,\ n=1,2,3, ... N \} \\
Y_{\text{right}} = \{ y_n\ :\ x_{nf} > t,\ n = 1, 2, 3, ... N \}
[/latex]

So, to minimize the above, the vertex searches over thresholds [latex]t[/latex] in a subset of features [latex]f[/latex] and chooses the [latex](t,f)[/latex] with minimal residual.

Now, to generalize this to learn things that aren't functions, consider minimizing a modification of the above
[latex]
\ell(Y) = \sum_{k=1}^K \sum_{y \in Y_k} (y - \bar{y}_k)^2 = \sum_{k=1}^K |Y_k| \text{Var}(Y_k)
[/latex]

Each partition [latex]Y_k[/latex] is a cluster of [latex]y[/latex] values. So, for example, if you were to examine a cross-section of a cloud of points describing a circle, you would see two distinct clusters of [latex]y[/latex] values (unless you were at the ends of the circle).

The general residual to minimize is
[latex]
\ell(Y_{\text{left}},Y_{\text{right}}) = \ell(Y_{\text{left}}) + \ell(Y_{\text{right}})
[/latex]
And I don't really care to expand it further because all the subscripts get ugly.  The clusters are found using K-Means or similar. I chose to use K-Means because it is simple. The clustering problem is also simple ... 1D clustering.

So, I didn't explain any of this very well because I'm not really trying to lecture anyone ... just show some cool pictures with a little bit of explanation.

Here's some toy data:
Consider 1 feature [latex]x[/latex] and we want to predict [latex]y[/latex] (which can have two values).


So, here's some examples of individual KM regression trees:

These are plots of the pdfs which are estimated with Kernel Density Estimation. The parameters are inferred from K-Means clustering (the predictions for Y would be the K means).

This training set has 1000 instances. This tree has minimum local sample size of 20. This parameter is important so that clustering works well.


This is the same conditional, but at a different angle.


Here's an example of a cross-section which nicely captures the behavior of the data.


This training set has 3000 instances. This tree has minimum local sample size of 100.


This is the same conditional, but at a different angle.


Here's an attempt to infer Archimedes Spiral. This is the toy data below.


Here, the training set has 3000 instances. I trained a regression tree with K=5 clusters in mind and a minimum sample size of 100. Not a very nice picture


This is the same conditional, but at a different angle.


Lastly, here's a KM Regression Forest with 100 trees. It works very nicely here.


The same conditional at a different angle


Here are some cross-sections of the forest




You can use normal regression forest too ... but you don't get nice bimodal conditionals like that. Maybe if you had a lot of regression trees, it would work.

All of this was written in Octave using a mixture of C++ Octfile and Matlab scripting. The code can deal with general data ... these toy data sets were just to make some cool pictures.
An adorable giant isopod!

dark_drake

It took me a minute to figure out what was going on due to my ineptitude with set notation, but once I did, I thought this was pretty cool. Nice job!
errr... something like that...

nslay

Quote from: dark_drake on February 17, 2011, 10:24:23 AM
It took me a minute to figure out what was going on due to my ineptitude with set notation, but once I did, I thought this was pretty cool. Nice job!

Well, feel free to ask any questions about notation or whatever. I wasn't very mindful of my audience it seems. I think Random Forest (and its variants) are very cool. I was very excited to be able to make a regression tree do this. If I ever come across anything else cool, I'll be sure to post more pictures.

By the way, the bands on the edges of the data are due to the way the tree works (the thresholds). When an instance leaves the training domain, it's like querying the tree for a prediction at the very edge of the training domain.  That said, the tree always has an answer for any instance ... when the instance leaves the training domain, it's not a very good one.
An adorable giant isopod!

dark_drake

Thanks, but I'm familiar enough with set notation to puzzle the meaning out, but I have to think about it. Anyway, could you get a better picture of the Archimedes spiral by increasing the K-values/number of clusters?
errr... something like that...

nslay

Quote from: dark_drake on February 17, 2011, 01:01:19 PM
Thanks, but I'm familiar enough with set notation to puzzle the meaning out, but I have to think about it. Anyway, could you get a better picture of the Archimedes spiral by increasing the K-values/number of clusters?

Yes, I'm sure of it.  I'm not sure if it will totally eliminate the possibility of botched clustered branches though. There are more than 5 possible responses in that example.
An adorable giant isopod!