Skip to the content.

Asynchronous pattern

Usecase

Architecture

The asynchronous pattern realizes separation of a prediction request and prediction retrieval with placing queue or cache in between the client and predictor. It will allow the client to not to wait for the inference latency. In order for the client to get the prediction, you have to add poling to pull the result from the queue. If you want the prediction result to be retrieved by a resource other than the client, like Diagram2, it can proceed to the next step without waiting for the prediction latency.
In addition, both in case of Diagram1 and Diagram2, you can make the prediction server to push the result to the other component, while you have to carefully consider its usecase for the system and workflow becomes quite complex.

Diagram

Diagram1

diagram1

Diagram2

diagram2

Pros

Cons

Needs consideration

Sample

https://github.com/shibuiwilliam/ml-system-in-actions/tree/main/chapter4_serving_patterns/asynchronous_pattern