Hey,
When generating responses with structured output and non-streaming API, it sometimes takes 3s, sometimes 10-20s. I am firing that request subsequently while testing the app.
Is this by design, or any place I can learn more about what contributes to such variation?
Topic:
Machine Learning & AI
SubTopic:
Foundation Models