Online bigsize pattern

Case

Situation

In any service, including web and batch, there is a requirement or expectation for time of completion for one task. For instance in a web service, the requirement might a mean time to respond for each request. For a batch system, completing processing of a hundred million records in 6 hours, midnight, would be a requirement. In any business and system, time is always limited and expensive that you cannot take everlasting time for one task. A standard and useful unit for cumulating time taken for task is average seconds taken for a prediction in machine learning system, and you have to design a model and prediction system to meet the requirement. Especially in a case of using a complex and gigantic deep learning model that may be time-consuming, you have to optimize somehow, or it will be impossible in web realtime services. You may try scale up or out to balance amount of process per server, though this can only be applied if unit inference responding time meets requirement. You may also try using an inference GPU, which may be costful than CPU. It is also a challenge, and might be the best choice, to make a simple model to meet the performance and latency.
There is no best answer for taking a balance of model evaluation and speed tradeoff. It is important to define service level of the evaluation and speed according to business and system needs.

Diagram

diagram

Pros

Cons

Work around