When developing software, you should be aware of your development loop. In most cases, you will be following this pattern:
The faster this development loop can be completed, the more iterations you can finish in a work day, and the more you can get done. Speeding up this process increases your overall productivity and allows you to do more with your time.
To speed up your development loop, consider these things:
This blog post will help with item (2) from above.
If a function is deterministic, it should be possible to "cache" the results of the function.
deterministic: Given a consistent set of inputs, the function returns consistent outputs. i.e. \(f(x)=y\) where x->y is a 1:1 mapping.
Consider the following function:
from time import sleep
def myfunction(a, b):
sleep(5) #simulate a function being really slow
return a+b
# runtime takes 15 seconds
myfunction(1,2)
myfunction(2,3)
myfunction(1,2)
This function has the following properties:
You can add caching to this function by storing the return value in a dictionary and returning that value if the same inputs are provided in a subsequent call.:
from time import sleep
cache = dict()
def myfunction(a, b):
try:
return cache[a, b]
except KeyError:
sleep(5)
return a+b
# runtime takes 10 seconds
myfunction(1,2)
myfunction(2,3)
myfunction(1,2) #pulled from cache
By using a dictionary, you add another requirement: The inputs to the function must be hashable. In Python, that means they have to have a .__hash__()
method.
Python ships with some caches in functools that you should use. These decorators can be easily applied to functions without re-writing their logic.
from functools import cache
@cache
def myfunction(a,b):
return a+b
# runtime takes 10 seconds
myfunction(1,2)
myfunction(2,3)
myfunction(1,2) #pulled from cache
This is all well and good, but it has one key problem: It doesn't persist between runs!
Python's built in caches don't save to disk, so I wrote my own caching function that does.
from cachefunctions import cachefunction #import my own cachefunctions
@cachefunction("cachefile.pkl")
def myfunction(a,b):
return a+b
# runtime takes 10 seconds on first run
# runtime takes 0 seconds on second run
myfunction(1,2) #loaded from cache on second run
myfunction(2,3) #loaded from cache on second run
myfunction(1,2) #loaded from cache on first run
This package also offers a class object for creating and configuring function cache objects. I intend to add more features that can be configured in the future.
from cachefunctions import FunctionCache
fc = FunctionCache("cachefile.pkl")
@fc.decorator
def myfunction(a,b):
return a+b
# runtime takes 10 seconds on first run
# runtime takes 0 seconds on second run
myfunction(1,2)
myfunction(2,3)
myfunction(1,2)
I do a lot of "data pipeline" type development, where the __main__
block contains calls to functions that complete several steps. I use this toolkit to cache the results from steps I know are running correctly, so I can more quickly test parts I have just written.
from cachefunctions import functioncache
@functioncache("retrievemeasurements.pkl")
def retrievemeasurements()
""" omitted for brevity """
@functioncache("rangemeasurements2keplerianelements.pkl")
def rangemeasurements2keplerianelements()
""" omitted for brevity """
def calculatenearestapproach()
""" function actively being developed, so not cached """
if __name__ == "__main__":
# retrievemeasurements runs instantly because the values were cached
rangemeasurements = retrievemeasurements('a','b','c')
# rangemeasurements2keplerianelements runs instantly because values were cached
keplerianelements = rangemeasurements2keplerianelements(rangemeasurements)
# This function runs slowly because I am developing it
nearestapproach = calculatenearestapproach(keplerianelements, 'ISS')
# I usually remove all caching decorators before delivering a product
When you are in a development loop, think carefully about how your time is being spent. Lots of little runs through your script can add up quickly. Cacheing results to disk is one convenient, generally applicable way to speed up your development loop, but you have to be aware of what can and should be cached.