Speeding up your Python Development Loop with Cache Functions

Speeding up your Python Development Loop with Cache Functions

Save time, be more productive

Development Loop

When developing software, you should be aware of your development loop. In most cases, you will be following this pattern:

  1. Add or modify some logic
  2. Run some code
  3. Identify errors and warnings
  4. Back to (1)

The faster this development loop can be completed, the more iterations you can finish in a work day, and the more you can get done. Speeding up this process increases your overall productivity and allows you to do more with your time.

Things to Consider

To speed up your development loop, consider these things:

  1. Is the code you are running telling you what you need to know? For example, it might be worth building unittests and running those instead of repeatedly running the "main" block of your script.
  2. Are you spending a lot of time waiting for the code to run? Can you speed that up or take a shortcut?
  3. Are you really reading the errors and warnings in a smart way?

This blog post will help with item (2) from above.

A primer on caches

If a function is deterministic, it should be possible to "cache" the results of the function.

deterministic: Given a consistent set of inputs, the function returns consistent outputs. i.e. \(f(x)=y\) where x->y is a 1:1 mapping.

How to Cache in Python

Consider the following function:

from time import sleep
def myfunction(a, b):
    sleep(5) #simulate a function being really slow
    return a+b

# runtime takes 15 seconds
myfunction(1,2)
myfunction(2,3)
myfunction(1,2)

This function has the following properties:

  1. It is deterministic
  2. The function is slow.

You can add caching to this function by storing the return value in a dictionary and returning that value if the same inputs are provided in a subsequent call.:

from time import sleep
cache = dict()
def myfunction(a, b):
    try:
        return cache[a, b]
    except KeyError:
        sleep(5)
        return a+b

# runtime takes 10 seconds
myfunction(1,2)
myfunction(2,3)
myfunction(1,2) #pulled from cache

By using a dictionary, you add another requirement: The inputs to the function must be hashable. In Python, that means they have to have a .__hash__() method.

Python's Standard Caches

Python ships with some caches in functools that you should use. These decorators can be easily applied to functions without re-writing their logic.

from functools import cache

@cache
def myfunction(a,b):
    return a+b

# runtime takes 10 seconds
myfunction(1,2)
myfunction(2,3)
myfunction(1,2) #pulled from cache

Caching to disk

This is all well and good, but it has one key problem: It doesn't persist between runs!

Python's built in caches don't save to disk, so I wrote my own caching function that does.

from cachefunctions import cachefunction #import my own cachefunctions

@cachefunction("cachefile.pkl")
def myfunction(a,b):
    return a+b

# runtime takes 10 seconds on first run
# runtime takes 0 seconds on second run
myfunction(1,2) #loaded from cache on second run
myfunction(2,3) #loaded from cache on second run
myfunction(1,2) #loaded from cache on first run

This package also offers a class object for creating and configuring function cache objects. I intend to add more features that can be configured in the future.

from cachefunctions import FunctionCache

fc = FunctionCache("cachefile.pkl")
@fc.decorator
def myfunction(a,b):
    return a+b

# runtime takes 10 seconds on first run
# runtime takes 0 seconds on second run
myfunction(1,2)
myfunction(2,3)
myfunction(1,2)

Use in rapid development

I do a lot of "data pipeline" type development, where the __main__ block contains calls to functions that complete several steps. I use this toolkit to cache the results from steps I know are running correctly, so I can more quickly test parts I have just written.

from cachefunctions import functioncache 

@functioncache("retrievemeasurements.pkl")
def retrievemeasurements()
    """ omitted for brevity """

@functioncache("rangemeasurements2keplerianelements.pkl")
def rangemeasurements2keplerianelements()
    """ omitted for brevity """

def calculatenearestapproach()
    """ function actively being developed, so not cached """

if __name__ == "__main__":
    # retrievemeasurements runs instantly because the values were cached
    rangemeasurements = retrievemeasurements('a','b','c')
    # rangemeasurements2keplerianelements runs instantly because values were cached
    keplerianelements = rangemeasurements2keplerianelements(rangemeasurements)
    # This function runs slowly because I am developing it
    nearestapproach = calculatenearestapproach(keplerianelements, 'ISS')

    # I usually remove all caching decorators before delivering a product

Conclusion

When you are in a development loop, think carefully about how your time is being spent. Lots of little runs through your script can add up quickly. Cacheing results to disk is one convenient, generally applicable way to speed up your development loop, but you have to be aware of what can and should be cached.