So when I was in the shower earlier and I wondered to myself “Hm. How would I generate random points on an N-sphere?” (true story).
I thought about it for a bit and generated the two obvious solutions. Unfortunately the two obvious solutions are, respectively, wrong and massively inefficient (actually the inefficient one isn’t too bad for low dimensions, but I was interested in the high dimensional case where it rapidly becomes massively inefficient). Once out of the shower I looked the answer up, and found the whole thing interesting enough that I figured someone else might too, so this is a post about that.
Obvious but wrong solution
Generate N random uniform numbers between -1 and 1. Divide the resulting vector by its norm.
Why is this wrong? Well, the initial N random numbers will lie uniformly in the N cube. So the probability density at a given point is proportional to the length of the line that goes from the zero, through that point and out to the N sphere. This line is not of uniform length for the very simple reason that cubes aren’t spheres.
Obvious but slow solution
Generate N uniform random numbers between -1 and 1. Retry this until the resulting norm is <= 1. Divide the resulting vector by its norm. Why does this work? Because we're using a rejection method to make sure we get points inside the unit ball. The result is not uniformly distributed inside the unit ball but it is (evidently) spherically symmetrical. By dividing by the norm we’re then projecting the spherically symmetrical distribution onto the sphere and thus getting a uniform distribution on it.
Why is it slow? Because of the curse of dimensionality. When we do the rejection we’re effectively generating a geometric distribution with probability of success equal to the proportion of the N-cube the N-ball occupies. This means we expect to perform the generation 1 / that probability times, which tends towards infinity quite rapidly as N gets large.
Obvious in retrospect solution
At this point I looked up the answer, though I should have been able to figure it out.
What we want is a spherically symmetric distribution a la the previous solution but which is more efficient to calculate.
It turns out that a good choice for this is an N dimensional normal distribution, which can be obtained by simply generating N Normal(0, 1) random variables. You can then divide by the norm of those variables and get a distribution on the unit sphere.
This is apparently not the best solution (generating normal random numbers is a little expensive), but I couldn’t find a reference to a version which works in N dimensions for any N.
Here’s some C code implementing this: https://gist.github.com/1354700.
Why not project from from a hypersphere back to euclidean space? I’ve forked your gist and it seems to beat this by an order of magnitude or so. Did I screw up somewhere? The returned values seem to be spherically normed.
https://gist.github.com/1356509
http://en.wikipedia.org/wiki/N-sphere#Hyperspherical_coordinates
Hm, I must have messed something up. Forgot a 2*PI and now my sphere is not so well normed. Looking.
Ah, got them all. Applied an undesired cos() to the last dimension and messed up my indexing of the inner loop.
Also, I’ve added a norm printout for debugging. Still an order of magnitude faster!
Oh, dear, I am wrong. Nevermind!
Hm, hypersphere performs much better on low-dimensional data (N <= 10) but gets much worse as N goes up.
Hi Jeff. Thanks for this!
Unfortunately I don’t think it works. The problem is that you get a probability distribution which is uniform in the hyperspherical angles, but the area is *not* uniform in the hyperspherical angles – the result you want is a probability distribution where the density is proportional to http://upload.wikimedia.org/wikipedia/en/math/4/0/a/40aef408d861b1266498e760f4304b8f.png
There is a way to do something like this for the 3-sphere (and I think the 4-sphere) but I don’t believe it generalizes.
On the other hand I’ve just woken up, so I could be completely confused. :-)
David