Property Cache¶
Basic Property¶
In some cases, an object has properties that don’t need to be computed until necessary, and once computed are generally static and could just be cached. This could be accomplished using the following simple recipe:
class Obj:
def __init__(self):
self._attribute = None
...
@property
def attribute(self):
if self._attribute is None:
self._attribute = some_slow_computation(self)
return self._attribute
If you want to support re-computation (besides just setting the object to None
again), it’s not hard to add:
class Obj:
def __init__(self):
self._attribute = None
self._need_attribute = True
...
@property
def attribute(self):
if self._need_attribute:
self._attribute = some_slow_computation(self)
self._need_attribute = False
return self._attribute
...
attr1 = my_obj.attribute
my_obj._need_attribute = True
attr2 = my_obj.attribute # Re-computes attribute
Adding inter-dependence between such properties is not hard, but quickly becomes verbose. In fact, all of this code is verbose relative to the simple goal: for some property x
, define its value, but don’t actually compute it until necessary, and allow the code to make it “necessary” again. This is easy to describe, and easy to think of, but just convoluted to code (but fortunately, easy to template).
To simplify this process, miniutils
provides a CachedProperty
decorator that’s simple by default, and moderately powerful when necessary. Let’s take a look at a simple use case first, then we’ll examine its capabilities:
class Obj:
@CachedProperty()
def attribute(self):
return some_slow_computation(self)
That’s all you need. No need to initialize, set up flags, or anything. It’s all handled automatically. A use case like above might look like:
attr1 = my_object.attribute # Computed the first time
attr2 = my_object.attribute # Loaded from cache
assert attr1 is attr2
del my_object.attribute # Deletes the cached object and marks for re-computation
attr3 = my_object.attribute # Re-computes the value
Despite being simple to use, it’s still a fairly powerful decorator:
- Like
@property
, this method is converted to a property (in fact, theproperty
function is used under the hood, so you don’t have anyCachedProperty
objects floating around) - The result is lazy-computed, just like you’d expect from a property
- The result is cached and returned instantly if not marked for re-computation (note that the object doesn’t have to be hashable since there’s no lookup being performed)
- Its computation can affect the computation of other properties, and thus automatically mark those properties for re-computation when needed (i.e., it maintains a dependency chain amongst CachedProperties)
- A simple setter can be automatically defined which invalidates downstream properties without needing more code (note that, at this time, you can’t safely define a custom setter, you can either use the default or let the property be unsettable)
- If the property returns a basic iterable (list, dictionary, set), it’s wrapped so that modifications to its content (if permitted) invalidate downstream properties.
A key feature not yet demonstrated is the ability to add dependencies amongst properties. Essentially, this defines a directed graph where resetting, re-computing, or altering upstream properties marks all dependent downstream properties for re-computation. This can be seen in the following demonstration:
class Printer:
@CachedProperty('b', settable=True)
def a(self):
print("Running a")
return 5
@CachedProperty('c', is_collection=True)
def b(self):
print("Running b")
return [self.a] * 100
@CachedProperty('d')
def c(self):
print("Running c")
return sum(self.b)
@CachedProperty()
def d(self):
print("Running d")
return str(self.c ** 2)
p = Printer()
p.a # Computes A
p.c # Computes C, during which it computes B
p.a = 3 # Sets A, invalidating B and C (and D, if it weren't already invalid)
p.c # Computes C, and thus B, again
p.c # Returns the cached value for C
p.b[0] = 0 # Alters a value within B (not B itself), which correctly invalidates C
p.c # Computes C, using cached B
del p.a # Invalidates A, and therefore B and C
p.d # Computes D, and thus C, B, and A
This isn’t the complete feature set of the decorator, but it’s a good initial taste of what can be accomplished using it.
-
class
miniutils.caching.
CachedProperty
(*affects, settable=False, threadsafe=True, is_collection=False, allow_collection_mutation=True)[source]¶ Marks this property to be cached. Delete this property to remove the cached value and force it to be rerun.
Parameters: - affects – Strings that list the names of the other properties in this class that are directly invalidated when this property’s value is altered
- settable – Whether or not to allow this property to have values assigned directly to it
- threadsafe – Whether or not to restrict execution of this property’s code to a single thread at a time (safe for recursive calls)
- is_collection – Whether or not this property returns a collection (currently supports lists, sets, and dictionaries; others might not work exactly as expected)
- allow_collection_mutation – Whether or not the returned collection should allow its values to be altered
Indexed Property¶
Even using the above tools, it is non-concise to allow indexing into a property where values are lazily computed.
The LazyDictionary
decorator allows you to write a __getitem__
style property that can be used like a dictionary and has its results cached:
class Primes:
@LazyDictionary()
def is_prime(self, i):
if not isinstance(i, int) or i < 1:
raise ValueError("Can only check if a positive integer is prime")
elif i in [1, 2]:
return True
elif i % 2 == 0:
return False
else:
return all(i % p != 0 for p in range(3, int(math.sqrt(i)) + 1, 2) if self.is_prime[p])
p = Primes()
p.is_prime[5] # True, caches the fact that 1, 2, and 3 are prime
p.is_prime[500] # False, caches all primes up to sqrt(500)
p.is_prime[501] # False, virtually instant since it uses the cached primes used to compute is_prime[500]
The indexing notation is used and preferred to make clear that this decorator only aims to support one hashable argument, and is meant to behave like a dictionary or list. It is not iterable, since the result of that would depend on whatever prior code happened to be executed. Instead, you should iterate through all desired keys, and simply index them; that way, any that need to be re-computed are, and those that can are loaded from cache.
This plugs cleanly into CachedProperty
, accepting a list of properties whose values are invalidated when this dictionary is modified. It also supports allowing or disallowing explicit assignment to certain indices:
p = Primes()
p.is_prime[3] = False
p.is_prime[9] # This is now True, since there is no lesser known prime
This is meant to provide a slight additional feature to having a cached dictionary, though honestly it’s probably a very small improvement over self.is_prime = defaultdict(self._is_prime)
, since it has the additions of invalidating cached properties and making values dependant on their indices.
Values can be explicitly assigned to indices (if allow_collection_mutation=True
); assigned values override cached values. Raised KeyError``s are cached to prevent re-running indices where failure is known. If an error is not due solely to the index, raise some other error to allow that index to be retried later if some variation to the program's state might allow it to succeed. ``.get(key, default)
and .update(dict)
are also provided to offer a more dictionary-like interface. A particular object instance will have a miniutils.caching._LazyDictionary
instance which provides its caching, though the decorated function is once again replaced with a simple @property
.
-
class
miniutils.caching.
LazyDictionary
(*affects, allow_collection_mutation=False)[source]¶ Marks this indexable property to be a cached dictionary. Delete this property to remove the cached value and force it to be rerun.
Parameters: - affects – Strings that list the names of the other properties in this class that are directly invalidated when this property’s value is altered
- allow_collection_mutation – Whether or not the returned collection should allow its values to be altered
File-backed Function Cache¶
As a file-based alternative to simple function caching (such as that provided by functools.lru_cache
), miniutils.caching.FileCached
provides caching of a function’s results using shelve
as its storage backend. This is primarily intended for long-run file processing scripts, and as such it natively supports invalidating cache items if relied-upon files are modified since when the cache entry was created.
There are several ways to use this cache. The simplest is to use it as a decorator, leveraging miniutils.caching.file_cached_decorator()
. The following example stores the results of load_data
in a cache at ./preprocessed
, which gets automatically invalidated when /path/to/data.csv
gets modified:
@file_cached_decorator('./preprocessed', files_used=['/path/to/data.csv'])
def load_data():
df = pandas.read_csv('/path/to/data.csv')
# Modify, clean, process data
return df
This could also be accomplished on a function not defined in the user code, using miniutils.caching.FileCached
directly:
data = FileCached(load_data, './preprocessed', files_used=['/path/to/data.csv'])
By offloading the generation of the cache to the caller code, it’s also possible to dynamically provide the list of files being used when they are arguments to the function:
def load_data(path):
df = pandas.read_csv(path)
# ...
data = FileCached(load_data, './preprocessed', files_used=[data_path])(data_path)
This use of miniutils.caching.FileCached
is how it is meant to be used when attempting to store function results across multiple runs of a script. Each time the script is run, it will connect to the same persistent on-disk cache, update if function arguments or relied-upon files change, and synchronize any new function results back to disk before the program exits.
By default, miniutils.caching.FileCached
and its decorator form generate a cache filepath based on the function’s name if no explicit name is set. It is recommended not to use this default name if you wish to use the cache between runs of Python, since any change to the function’s name will invalidate the cache; also, this breaks if you wish to cache multiple functions with the same name.
Warning
Note that shelve
, and therefore miniutils.caching.FileCached
, is not thread-safe or multiprocess-safe, so this cache will likely fail if being used in any parallel fashion. To use a data store in a parallel fashion, you should probably rely on a robust database system of some sort, such as MongoDB.
Warning
When purging a file cache, miniutils.caching.FileCached
deletes all files matching its database’s filepath. Make sure that the file path given for the cache has no relation to any other code or data files used by your program.
-
class
miniutils.caching.
FileCached
(fn, cache_path=None, files_used=None, auto_purge=False)[source]¶ Caches function results to a file to save re-computation of highly expensive calls
Parameters: - fn (function) – The functions whose result should be cached
- cache_path (str) – No-extension file path where cache should be kept
- files_used (Iterable) – List of files that could effect the result of this function; cache results are invalidated if any of these files are updated since the last function call
- auto_purge – If True, deletes the file cache when this cache object passes out of scope
Type: auto_purge: bool
-
miniutils.caching.
file_cached_decorator
(*args, **kwargs)[source]¶ A decorator version of
FileCached
Parameters: - cache_path (str) – No-extension file path where cache should be kept
- files_used (Iterable) – List of files that could effect the result of this function; cache results are invalidated if any of these files are updated since the last function call
- auto_purge – If True, deletes the file cache when this cache object passes out of scope
Type: auto_purge: bool
Returns: A decorator for a function
Return type: function