- When the immediate process does not depend on the prediction.
- To separate client requesting prediction and destination to respond to.
The asynchronous pattern realizes separation of a prediction request and prediction retrieval with placing queue or cache in between the client and predictor. It will allow the client to not to wait for the inference latency. In order for the client to get the prediction, you have to add poling to pull the result from the queue. If you want the prediction result to be retrieved by a resource other than the client, like
Diagram2, it can proceed to the next step without waiting for the prediction latency.
In addition, both in case of
Diagram2, you can make the prediction server to push the result to the other component, while you have to carefully consider its usecase for the system and workflow becomes quite complex.
- You can separate client and prediction.
- The client does not have to wait for the prediction latency.
- Requires queue, cache or similar kind of proxy.
- Not fit to real-time usecase.
- How to trigger prediction:
- Queue: prediction will be FIFO
- Cache: depends on existence of cache
- PubSub: predictor’s subscription to run prediction
- Needs consideration for prediction error:
- If you need to retry, consider triggering retry in the prediction server or return to queue.
- If the error is caused by data or programmatical issue, there may be a chance that the request keeps retrying until you manually disposes the request.
- Since the pattern does not support ordered predition, you have to consider the workflow if you need concrete order for input or event in the usecase.