Clan x86
General Forums => Academic / School => Topic started by: nslay on February 17, 2011, 01:56:56 AM

Certainly, not a new idea, but something you'll probably not see very often (or ever) ... a "regression" tree that can infer things that aren't functions ... Very cool!
So, the typical regression problem is to infer some function on some observed points [latex]\{(\mathbf{x}_n,y_n)\}_{n=1}^N[/latex] where [latex]y_n[/latex] are possibly noisy (i.e. so interpolation wouldn't be appropriate).
The regression tree divides the feature space (the [latex]x[/latex]'s) in such a way so as to minimize
[latex]
\ell(Y) = \sum_{y \in Y} (y  \bar{y})^2 = Y \text{Var}(Y)
[/latex]
The divisions are binary splits, so each vertex is minimizing the above on each side
[latex]
\ell(Y_{\text{left}},Y_{\text{right}}) = \sum_{y \in Y_{\text{left}}} (y  \bar{y})^2 + \sum_{y \in Y_{\text{right}}} (y  \bar{y})^2 = N_{\text{left}} \text{Var}(Y_{\text{left}}) + N_{\text{right}}\text{Var}(Y_{\text{right}})
[/latex]
The left and right sides are perfeature (dimension) and determined by a threshold
[latex]
Y_{\text{left}} = \{ y_n\ :\ x_{nf} \leq t,\ n=1,2,3, ... N \} \\
Y_{\text{right}} = \{ y_n\ :\ x_{nf} > t,\ n = 1, 2, 3, ... N \}
[/latex]
So, to minimize the above, the vertex searches over thresholds [latex]t[/latex] in a subset of features [latex]f[/latex] and chooses the [latex](t,f)[/latex] with minimal residual.
Now, to generalize this to learn things that aren't functions, consider minimizing a modification of the above
[latex]
\ell(Y) = \sum_{k=1}^K \sum_{y \in Y_k} (y  \bar{y}_k)^2 = \sum_{k=1}^K Y_k \text{Var}(Y_k)
[/latex]
Each partition [latex]Y_k[/latex] is a cluster of [latex]y[/latex] values. So, for example, if you were to examine a crosssection of a cloud of points describing a circle, you would see two distinct clusters of [latex]y[/latex] values (unless you were at the ends of the circle).
The general residual to minimize is
[latex]
\ell(Y_{\text{left}},Y_{\text{right}}) = \ell(Y_{\text{left}}) + \ell(Y_{\text{right}})
[/latex]
And I don't really care to expand it further because all the subscripts get ugly. The clusters are found using KMeans or similar. I chose to use KMeans because it is simple. The clustering problem is also simple ... 1D clustering.
So, I didn't explain any of this very well because I'm not really trying to lecture anyone ... just show some cool pictures with a little bit of explanation.
Here's some toy data:
Consider 1 feature [latex]x[/latex] and we want to predict [latex]y[/latex] (which can have two values).
(http://nslay.36bit.com/tmp/circledata.png)
So, here's some examples of individual KM regression trees:
These are plots of the pdfs which are estimated with Kernel Density Estimation. The parameters are inferred from KMeans clustering (the predictions for Y would be the K means).
This training set has 1000 instances. This tree has minimum local sample size of 20. This parameter is important so that clustering works well.
(http://nslay.36bit.com/tmp/kmtreecontour1.png)
This is the same conditional, but at a different angle.
(http://nslay.36bit.com/tmp/kmtreeconditional1.png)
Here's an example of a crosssection which nicely captures the behavior of the data.
(http://nslay.36bit.com/tmp/kmtreepdf1.png)
This training set has 3000 instances. This tree has minimum local sample size of 100.
(http://nslay.36bit.com/tmp/kmtreecontour3.png)
This is the same conditional, but at a different angle.
(http://nslay.36bit.com/tmp/kmtreeconditional3.png)
Here's an attempt to infer Archimedes Spiral. This is the toy data below.
(http://nslay.36bit.com/tmp/kmtreedata2.png)
Here, the training set has 3000 instances. I trained a regression tree with K=5 clusters in mind and a minimum sample size of 100. Not a very nice picture
(http://nslay.36bit.com/tmp/kmtreecontour2.png)
This is the same conditional, but at a different angle.
(http://nslay.36bit.com/tmp/kmtreeconditional2.png)
Lastly, here's a KM Regression Forest with 100 trees. It works very nicely here.
(http://nslay.36bit.com/tmp/kmforestcontour1.png)
The same conditional at a different angle
(http://nslay.36bit.com/tmp/kmforestconditional1.png)
Here are some crosssections of the forest
(http://nslay.36bit.com/tmp/kmforestpdf1.png)
(http://nslay.36bit.com/tmp/kmforestpdf12.png)
(http://nslay.36bit.com/tmp/kmforestpdf14.png)
You can use normal regression forest too ... but you don't get nice bimodal conditionals like that. Maybe if you had a lot of regression trees, it would work.
All of this was written in Octave using a mixture of C++ Octfile and Matlab scripting. The code can deal with general data ... these toy data sets were just to make some cool pictures.

It took me a minute to figure out what was going on due to my ineptitude with set notation, but once I did, I thought this was pretty cool. Nice job!

It took me a minute to figure out what was going on due to my ineptitude with set notation, but once I did, I thought this was pretty cool. Nice job!
Well, feel free to ask any questions about notation or whatever. I wasn't very mindful of my audience it seems. I think Random Forest (and its variants) are very cool. I was very excited to be able to make a regression tree do this. If I ever come across anything else cool, I'll be sure to post more pictures.
By the way, the bands on the edges of the data are due to the way the tree works (the thresholds). When an instance leaves the training domain, it's like querying the tree for a prediction at the very edge of the training domain. That said, the tree always has an answer for any instance ... when the instance leaves the training domain, it's not a very good one.

Thanks, but I'm familiar enough with set notation to puzzle the meaning out, but I have to think about it. Anyway, could you get a better picture of the Archimedes spiral by increasing the Kvalues/number of clusters?

Thanks, but I'm familiar enough with set notation to puzzle the meaning out, but I have to think about it. Anyway, could you get a better picture of the Archimedes spiral by increasing the Kvalues/number of clusters?
Yes, I'm sure of it. I'm not sure if it will totally eliminate the possibility of botched clustered branches though. There are more than 5 possible responses in that example.