Improving the prime sieve code

Improving the prime sieve code#

An inefficiency with our code is that we compute the prime numbers using two data structures. This wastes memory, but also means we need to check if a values already exists in crossed_out twice. The second check is needed to make sure we aren’t needlessly appending an already crossed out prime number.

The function prime_sieve2 delivers the same results as prime_sieve, but more efficiently. The primary source of this saving is by designing our code so that we required only one list.

Hide code cell source
# imports and test cases
import math

n_primes = {10 : 4,                 
            100 : 25,               
            1000 : 168,
            10000 : 1229,
            100000 : 9592,
            1000000 : 78498,
            10000000 : 664579}
def prime_sieve2(n):
    # We will use boolean's here instead of ints
    candidates = [True] * (n + 1)
    candidates[0] = candidates[1] = False
    limit = int(math.sqrt(n)) + 1    
    
    for i in range(2, limit): 
        if candidates[i]:
            candidates[i+i::i] = [False] * ((n - i) // i)
            
    return [i for i in range(n+1) if candidates[i]]        
print(len(prime_sieve2(100)) == n_primes[100])
print(len(prime_sieve2(1000)) == n_primes[1000])
True
True
%timeit prime_sieve2(1000)
36.7 μs ± 245 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Does the time saving matter?#

You should see a substantial reduction in computational overhead of prime_sieve2. We are now finding the primes under 1000 in microseconds (µs) as opposed milliseconds in our naive implementation (1 µs = 1000ms!).

Some of you might be thinking “both procedures give the same results and execution takes under a second. So does this type of optimisation matter?” To answer that question let’s compute primes under 10,000. You should now see a much bigger difference with our naive function returning in around a second while prime_sieve2 is still in µs (on my machine still below half a millsecond).

When first writing this section I first tested prime_sieve with n=1_000_000. Needless to say, I quickly decided to use an example of 10,000 instead! When you fancy a cup of tea or coffee set n=1_000_000 and come back after your break to see an enourmous difference in performance!

Hide code cell source
# original function
def prime_sieve(n):
    '''
    Compute the prime numbers between 1 and n.
    Naive implementation of the Sieve of Eratosthenes.
    
    Parameters:
    ----------
    n: int
        The upper limit
    
    Returns:
    -------
    list
        a list of primes
    '''
    # a list of candidate numbers to sieve
    candidates = [i for i in range(n + 1)]
    # a list of numbered eliminated from consideration
    crossed_out = [0, 1]
    # maximum iterations required
    limit = int(math.sqrt(n)) + 1
    
    for i in range(2, limit):
         for j in range(i+i, n+1, i):              
                if not candidates[j] in crossed_out:
                    crossed_out.append(candidates[j])
    
    return [i for i in candidates if i not in crossed_out]   
TEN_THOUSAND = 10_000
%timeit len(prime_sieve(TEN_THOUSAND))
1.09 s ± 11.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit len(prime_sieve2(TEN_THOUSAND))
368 μs ± 1 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Taking a look at the redesign#

The redesigned implementation only has single list candidates: its contains only bool values. In python that won’t make much, if any, difference to memory usage, as bool is really a special type of int. But its a nice simplification as we the index of the list item gives us our factor. This modification provides our biggest speed up. Each time we ‘cross out’ a number e.g. i we simply set candidate[i] = False. (We don’t bother checking as we are not appending to a second list). At the end of the function any items that are still True are the primes.

Don’t believe me about int and bool? Try executing (True + 4).

The second change might be considered a ‘micro optimisation’. I.e. list slicing in favour of an inner for loop.

candidates[i+i::i] = [False] * ((n - i) // i)

Here we start at the current index + one factor ahead e.g. if i=2 then we start at index i+i=4 and select items in steps of i=2 all the way to the end candidates[i+i::i] (Note we could also have used candidates[i+i:n+1:i]). In general, a slice will be faster than looping over the list. Here’s a simpler example of list slicing with a step:

list_to_slice = [i for i in range(31)]
list_to_slice[4:len(list_to_slice):2]
[4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30]