For a project Euler problem I had an algorithm the first step of which was to find the prime factorization of the number you fed it. This was by far the slowest part of the operation (mostly because the rest of it was pretty heavily cached to a much smaller number of solutions).
I was running this function on the numbers from 1 to 10^8, and it occurred to me that there must be a simpler way to precompute the prime factorizations of all of these without doing the full factorization for each. After some thought I came up with the following solution. It’s in some sense the most obvious thing you can possibly do (at least given a basic knowledge of some of the algorithms around this), but I’d not seen it before and knowing it exists might save you some time in similar scenarios, so I thought I’d post it here.
It’s essentially a modified Sieve of Eratosthenes. The difference is what you store in the sieve and what you do in the crossing off operation. There’s also an additional step.
The idea is this: We maintain a list of prime factors found so far for each number.
We then loop over the numbers in order from 2 upwards. If when we reach a number its list of prime factors is empty then it’s a prime, and we start marking off as follows:
Call the current number p. First we mark off all multiples of p and add p to their list. Then we mark off all multiples of p^2 and add p again. Then p^3, etc. until p^k > n.
Then we’re done, and move on to the next number.
That’s really all there is to it.
Here’s some python code that implements it:
def prime_factorizations(n): sieve = [ for x in xrange(0, n)] for i in xrange(2, n): if not sieve[i]: q = i while q < n: for r in xrange(q, n, q): sieve[r].append(i) q *= i return sieve
Here it is in action:
>>> prime_factorizations(15) [, , , , [2, 2], , [2, 3], , [2, 2, 2], [3, 3], [2, 5], , [2, 2, 3], , [2, 7]]
Edit: In the comments, Rich points out that you can make this more space efficient by instead storing a single prime for each integer: Specifically, the smallest prime that divides it. You can then reconstruct the desired list by iterated division – if the integer stored is equal to the current index, it’s a prime. If not, it contains one of its prime factors. Then divide by that factor and recurse. This might be a bit slower but is a lot more space efficient. Here’s some more python demonstrating this:
def prime_factorizations(n): sieve = [0 for x in xrange(0, n)] for i in xrange(2, n): if not sieve[i]: sieve[i] = i for r in xrange(r, n, r): sieve[r] = sieve[r] or i return sieve def factor_list_from_sieve(sieve, n): results =  while n > 1: p = sieve[n] results.append(p) if p == n: break n = n / p return results
The first method returns the described array of primes, the second reconstructs the prime factor list from it.
>>> sieve = prime_factorizations(1000) >>> factor_list_from_sieve(sieve, 992) [2, 2, 2, 2, 2, 31]