Peter Stuifzand

Streaming AI with htmx

Below is a simplified chat bubble for an conversation with an LLM.

This template is written in Templ and uses htmx to fill the chat bubble with incoming content. The incoming content is sent with SSE using the v2 version r3labs/sse Server Sent Events server. This version supports SplitData which makes it easier to sent the HTML for htmx. Adding a <span></span> around every token helps with the spaces in the tokens.

templ StreamingChatBubble(url string, message string) {
    <div>
        <span>user</span>
        <div>{{message}}</div>
    </div>
    <div
        hx-ext="sse"
        sse-connect={ url }
        sse-swap="done"
        hx-swap="outerHTML"
    >
        <span>assistant</span>
        <div
            sse-swap="message"
            hx-swap="beforeend"
        ></div>
    </div>
}

templ StreamingChatComplete(message string) {
    <div class="...">
        <span>assistant</span>
        <div>{{message}}</div>
    </div>
}

The url contains the SSE url with a stream parameter. The Go web server uses the code to find the right channel for updates. The updates to the channel are sent to the frontend. Each of the messages has a type. The messages with partial LLM response have the type message. And then when all partial responses are sent, the done message is sent with the complete chat bubble. This also removes the connection to the message channel.

type: message
data: <span> token1</span>

type: message
data: <span>token2</span>

type: message
data: <span> token3</span>

type: done
data: <div>
data:   <div>token1token2 token3</div>
data: </div>

The SSE server (simplified)

To use the SSE server, you need an endpoint for the server itself (called /events here). Next you need a way to start a new stream. In a chat application you have a textarea in a form that POSTs to the /streaming endpoint.

<form hx-post="/streaming" hx-trigger="submit" hx-target="#chat-messages" hx-swap="beforeend">
    <textrea name="prompt"></textarea>
    <button type="submit">Chat</button>
</form>
<div id="chat-messages"></div>
func main() {
    sseServer := sse.New()
    sseServer.SplitData = true

    http.Handle("/events", sseServer)

    http.HandleFunc("/streaming", func (w http.ResponseWriter, r *http.Request) {
        streamID := fmt.Sprintf("stream-%d", rand.Int63())
        sseServer.CreateStream(streamID)
        prompt := r.FormValue("prompt")
        go func() {
            defer sseServer.RemoveStream(streamID)
            // call LLM and get tokens
            res, err := llm.CreateCompletion(...)

            // publish every token
            contentBuilder := strings.Builder{}
            for ... {
                sseServer.Publish(streamID, &sse.Event{
                    Event: []byte("message"),
                    Data: []byte("<span>" + token + "</span>"),
                })

                contentBuilder.WriteString(token)
            }

            output := bytes.Buffer{}
            component.StreamingChatComplete(contentBuilder.String()).Render(r.Context(), w)

            sseServer.Publish(streamID, &sse.Event{
                Event: []byte("done"),
                Data: output.Bytes(),
            })
        }()

        component.StreamingChatBubble(fmt.Sprintf("/events?stream=%s", streamID), message).Render(r.Context(), w)
    })
    http.ListenAndServe(":8080", nil)
}
© 2023 Peter Stuifzand