Python is a great language that’s easy to learn (just keep in mind: "always indent after colons"), but it’s also possible that your written code ends up being quite slow. Such inelegantly written code isn’t “Pythonic,” or following with the language’s best practices (simple, fast, easy to read and comprehend).
(In a previous column, I suggested that un-Pythonic code is analogous to carpentry: it’s like working against the grain, and therefore harder to get the code to do what you want, at the speed that you want. This becomes an even bigger issue as you expand into new, specific areas of development. Speed is also a big deal when you plunge into the GUI.)
The fight against slow code hasn’t been helped by Python 2 (generally the faster one) being deprecated by Python 3 (the slower one!). While new updates to a language are generally a good thing (and long overdue, in Python’s case), past experiments have Python 2 code running typically 5-10 percent faster than Python 3. I haven't tried measuring the difference between 32-bit and 64-bit yet, but in other programming languages, 32-bit code is often faster than 64-bit because of smaller code size.
Yes, Python 3 can be faster than Python 2, but the developer really needs to work at it. I decided to do a brief rundown and discover some best practices for giving your code a need for speed.
My Setup
For this rundown, I wanted to take Visual Studio 2019 Community, install Python, and see how the language performs. Python is well-integrated into the platform; but before you go down this route, you should read the Microsoft Visual Studio Python documents.
It takes less than one GB in total to install 32-bit and 64-bit Python 3.
Visual Studio debugger in great with other languages in addition to Python. With breakpoints, local variables, and call stack all laid out in front of you, it is about as good as you’ll ever get. After installing 64-bit Python 3.7, I also installed the 32-bit version. When you change this setting, it creates a new virtual environment for you—making things easier than perhaps we deserve when it comes to such things.
Timing Code
Let’s get into timing code. Since Python 3.3, time.perf_counter() returns time as a fraction of a second with an accuracy in nanoseconds (previously, Python used time.clock()). That's possibly more accuracy than Python really needs, but it's easy to use. Just call perf_counter() twice and subtract the difference.
On my PC, I tested this with a simple cached Fibonacci calculator. Here's the code:
import sys
from time import perf_counter
sys.setrecursionlimit(10000)
cache = {}
def fib(n):
if n not in cache.keys():
cache[n] = _fib(n)
return cache[n]
def _fib(n):
if n < 2:
return n
else:
return fib(n-1) + fib(n-2)
print(F"Fib(500)={fib(500)}")
start=perf_counter()
for i in range(100):
cache={}
fib(500);
elapsed = perf_counter()-start
print(F"Avg time for Fib(500) is {elapsed/100:10.7}");
That takes around 0.7 milliseconds on my PC (for 32-bit); the 64-bit version is faster at 0.4 milliseconds. The loop gives us an average time over 100 calls.
If you’re coding along, don't forget the sys.setrecursionlimit() call or you'll blow the limit shortly after fib(400). The 10000 value lets you go all the way to about fib(1900), which takes 2.4 milliseconds. Who says Python is slow!
Now that we can time Python code, let’s make it faster.
Tips
The Time Complexity page on the Python Wiki gives a good idea of how long certain operations take. O(1) is the fastest, while O(n log n) or o(nk) are the slowest. O(n) means the time is proportional to the number of items (for example, in a list).
There’s also the performance tips page, as well, but remember it was originally written when Python 2 was popular. As a result, things such as using ‘xrange’ instead of ‘range’ are out of date. Python 3 range is actually a generator in the mode of Python 2 xrange, and there is no Python 3 xrange.
Now let’s look a bit closer at generators.
Generators
Generators have a speed advantage, as they return just one item at a time. They're like functions (or more properly, coroutines in other languages). The difference between a function and a generator function is that the latter returns a value through yield, and can carry on from where it left off when called again.
Here’s an example I cooked up. It reads a 2GB file in about 20 seconds, line by line. In this case, no processing is done except getting the length of each line.
import sys,io
from time import perf_counter
def read_in_chunks(file_object, chunk_size=1024):
while True:
line = file_object.readline(chunk_size)
if not line:
break
yield line
count =0
start=perf_counter()
f = open("D:\development\largefile.csv","rt",8192)
for piece in read_in_chunks(f):
count += len(piece)
elapsed = perf_counter()-start
print(F"Len of file ={count} read in {elapsed:8.3}");
This “read rate” is about 95 MB per second, which is good. Generators only have one value at a time, unlike a list (for example) that holds all its values in-memory. Reducing memory use can give us some speed improvement but it’s probably not a game changer.
Is 'In' Faster Than Indexing for a List?
With a large list (in this case, strings), is it quicker to use ‘in’ to get the value out of the list, or index it (as in ‘list[index]’)?
Try this for yourself. The following creates a list of values with 100000 strings, where each string is a random int in the range 1.. 100,000. I then loop through this list, outputting it to a text file.
One method outputs the indexed value:
for i in range(len(values)):
file1.write(values[i])
The other just gets the value from the list:
for i in values:
file1.write(i)
Here’s the full listing:
import sys,io,random
from time import perf_counter
values=[]
for _ in range(100000):
value = random.randint(1,100000)
values.append(str(value))
count =0
start=perf_counter()
file1 = open("myfile1.txt","wt")
for i in range(len(values)):
file1.write(values[i])
file1.write('\n')
file1.close()
elapsed = perf_counter()-start
print(F"Done in {elapsed:8.3}");
start=perf_counter()
file1 = open("myfile2.txt","wt")
for i in values:
file1.write(i)
file1.write('\n')
file1.close()
elapsed = perf_counter()-start
print(F"Done in {elapsed:8.3}");
Both took approximately 0.2 second and generated a text file of about 673KB in size. More often than not, the indexed version is a bit faster (and on one occasion, three times faster)—but not always. I had a suspicion that the garbage collector was perhaps running and disabled it (add ‘gc’ to the import and put ‘gc.disable()’ at the start and ‘gc.enable()’ at the end), but that didn’t make a difference.
Use Lists Not Arrays
If you've come from another programming language (such as C), then you might be tempted to use arrays. Python's dictionaries and lists make for faster code; use them instead. Python arrays are a wrapper around a C array, and there’s extra overhead in converting each array element to a Python object.
There is one exception to this "No-array rule" and that's to use NumPy and its arrays, which are all coded in C and very fast.
The only problem with NumPy is that you need to install it. For Visual Studio, the Conda environment may be the best way to install NumPy; read how to do it.
Use Built-in Functions
For version 3.7 (as installed by Visual Studio), Python comes with 69 built-in functions. These are programmed in C, and are therefore optimized (functions such as abs(), len(), min(), max(),pow(), round() etc.)
Conclusion: Speedy Python
Making your code go faster is not always easy. Fortunately, if you use built-in functions, arrays, and “Pythonic” best practices, you can squeeze some more speed out of your code!