Skip to main content

streaming_responses

🎥 Use Streaming for Real‑Time Responses​

When interacting with large language models, you can process partial outputs as they arrive by enabling streaming. This reduces latency and lets you display or process tokens in real time. Simply set stream: proc and handle each chunk in the callback.

require "ruby/openai"

client = OpenAI::Client.new

client.chat.completions(
parameters: {
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Explain streaming in Ruby." }],
stream: proc { |chunk| print chunk.dig("choices", 0, "delta", "content") }
}
)

This will print tokens as they arrive, allowing you to build a live UI or log partial data without waiting for the full response.