Do micro-optimisations matter?#
In short micro-optimisations of your python code matter less than an efficient design. I.e. ‘trial and error’ versus a ‘prime sieve’ or multiple versus single data structures (at least in this example) is more important than a for loop versus a list slice. The difference is that the former gave us an order of magnitude speed up while the latter helped, but to a much smaller extent.
Another example of a micro-optimisation (at least in standard python) relates to the type of data structure used. prime_sieve2
made use of a list of boolean values. Let’s have a look at its size in memory in bytes.
Show code cell source
import math
import sys
candidates_bool = [True] * (10)
sys.getsizeof(candidates_bool)
136
As we only effectively track 0’s (False) and 1’s (True) there is an option to shrink the memory requirement substantially by using a python bytearray
.
candidates_bytes = bytearray(b"\x01") * 10
print(candidates_bytes)
print(sys.getsizeof(candidates_bytes))
bytearray(b'\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01')
67
That approximately halves memory usage. That’s a nice feature. prime_sieve3
updates our sieve to use a byte_array
. We will also compare its performance to prime_sieve2
def prime_sieve3(n):
# bytearray here instead of bools to reduce memory overhead.
candidates = bytearray(b"\x01") * (n + 1)
# 0 and 1's instead of False and True
candidates[0] = 0
candidates[1] = 0
limit = int(math.sqrt(n)) + 1
for i in range(2, limit):
if candidates[i]:
candidates[i+i::i] = [0] * ((n - i) // i)
return [i for i in range(n+1) if candidates[i]]
TEN_THOUSAND = 10_000
%timeit prime_sieve3(TEN_THOUSAND)
389 μs ± 1.27 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Well that’s a bit of a surprise. Our memory efficient prime_sieve3
is slightly slower on average than prime_sieve2
. But, and its a big BUT, that’s only for the first 10,000 natural numbers. When we compute larger primes the pattern reverses. For example, prime_sieve3
is faster to compute the primes up to 10 million. This performance gap widens as we compute larger and larger primes. So for very small primes this ‘optimisation’ doesn’t help, but it may help for larger primes (large is of course relative).
Show code cell source
def prime_sieve2(n):
# We will use boolean's here instead of ints
candidates = [True] * (n + 1)
candidates[0] = candidates[1] = False
limit = int(math.sqrt(n)) + 1
for i in range(2, limit):
if candidates[i]:
candidates[i+i::i] = [False] * ((n - i) // i)
return [i for i in range(n+1) if candidates[i]]
TEN_MILLION = 10_000_000
%timeit prime_sieve2(TEN_MILLION)
%timeit prime_sieve3(TEN_MILLION)
577 ms ± 2.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
459 ms ± 3.38 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)