If your on-demand Serverless Inference endpoint does not receive traffic for a while and then your endpoint suddenly receives new requests, it can take some time for your endpoint to spin up the compute resources to process the requests. This is called a cold start. … A cold start can also occur if your concurrent requests exceed the current concurrent request usage.