Skip to the content.

Prediction cache pattern

Usecase

Architecture

In the prediction cache pattern, you will store predicted result into a cache to make it possible to search for the later request for the repeated data. If your service gets requests for same data, and if it is possible to identify it is the same, you can get advantage of the pattern.
The prediction server or proxy will store input data as cache key with prediction as value, if the key does not exist. After the cache, the cache search and prediction will be executed parallelly and returns the value if the cache hits without waiting for the prediction completes. It will shorten the amount of time taken to predict with less load to the prediction server.
The amount of data to cache may need to be considered with balance of cost and volume. The unit price for cache space tends to be higher with less size than storage or database, hence it is recommended to plan a policy to clear cache.
If the prediction result changes with time, it requires to clear old cache to prevent responding with outdated prediction. If the service gets high load that the cache size increases rapidly, it is important to plan cache clear policy concretely. In many cases, the cache gets cleared with time elapsed or request frequency of key.

Diagram

Prediction cache

diagram

Pros

Cons

Needs consideration

Sample

https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/prediction_cache_pattern