For a project Euler problem I had an algorithm the first step of which was to find the prime factorization of the number you fed it. This was by far the slowest part of the operation (mostly because the rest of it was pretty heavily cached to a much smaller number of solutions).
I was running this function on the numbers from 1 to 10^8, and it occurred to me that there must be a simpler way to precompute the prime factorizations of all of these without doing the full factorization for each. After some thought I came up with the following solution. It’s in some sense the most obvious thing you can possibly do (at least given a basic knowledge of some of the algorithms around this), but I’d not seen it before and knowing it exists might save you some time in similar scenarios, so I thought I’d post it here.
It’s essentially a modified Sieve of Eratosthenes. The difference is what you store in the sieve and what you do in the crossing off operation. There’s also an additional step.
The idea is this: We maintain a list of prime factors found so far for each number.
We then loop over the numbers in order from 2 upwards. If when we reach a number its list of prime factors is empty then it’s a prime, and we start marking off as follows:
Call the current number p. First we mark off all multiples of p and add p to their list. Then we mark off all multiples of p^2 and add p again. Then p^3, etc. until p^k > n.
Then we’re done, and move on to the next number.
That’s really all there is to it.
Here’s some python code that implements it:
def prime_factorizations(n): sieve = [ for x in xrange(0, n)] for i in xrange(2, n): if not sieve[i]: q = i while q < n: for r in xrange(q, n, q): sieve[r].append(i) q *= i return sieve
Here it is in action:
>>> prime_factorizations(15) [, , , , [2, 2], , [2, 3], , [2, 2, 2], [3, 3], [2, 5], , [2, 2, 3], , [2, 7]]
Edit: In the comments, Rich points out that you can make this more space efficient by instead storing a single prime for each integer: Specifically, the smallest prime that divides it. You can then reconstruct the desired list by iterated division – if the integer stored is equal to the current index, it’s a prime. If not, it contains one of its prime factors. Then divide by that factor and recurse. This might be a bit slower but is a lot more space efficient. Here’s some more python demonstrating this:
def prime_factorizations(n): sieve = [0 for x in xrange(0, n)] for i in xrange(2, n): if not sieve[i]: sieve[i] = i for r in xrange(r, n, r): sieve[r] = sieve[r] or i return sieve def factor_list_from_sieve(sieve, n): results =  while n > 1: p = sieve[n] results.append(p) if p == n: break n = n / p return results
The first method returns the described array of primes, the second reconstructs the prime factor list from it.
>>> sieve = prime_factorizations(1000) >>> factor_list_from_sieve(sieve, 992) [2, 2, 2, 2, 2, 31]
It’s more space efficient if you just store the smallest prime divisor for each number (or ‘0’ for prime). As you fill out the sieve, after finding a new prime, just fill in each slot with the current prime if it is still empty. When looking up the factorization of a number, you will need to do a bit of division and look further back in the sieve, but this cost is more than offset by the speed benefits of using a simple `int` for your sieve, instead of an `int`
Good point. Thanks.
My actual use case doesn’t look much like this – I only care about counts of prime divisors, so I can just use a counter and a fixed length array for each. It’s still less space efficient, but only by a factor of 4 or 5 (I can use bytes for each count as all numbers I care about here are well under 2^256 and consequently also have fewer than 256 distinct prime factors), which is a space/time tradeoff I’m happy to make.
See http://hackage.haskell.org/package/NumberSieves .
Pingback: Distinct factorizations | Bgpjax