For a simple note-taking application I wanted to try out embeddings and similarity search to find similar notes based on the content. In this post I will show how you how to use the gonum.org/v1/gonum/blas/blas32 package to calculate with embeddings.
Get your embeddings
When I started I used the OpenAI embeddings and saved these in the sqlite database with the notes themselves. At the moment I keep all embeddings in memory when the applications starts and new embeddings are added to the map when new notes are created or when notes are updated.
var embeddings map[int64][]float32
func getEmbedded(content string) ([]float32, error) {
# ... fetch the embeddings from somewhere
}
Search for similar objects
import "gonum.org/v1/gonum/blas/blas32"
type Score struct {
ID int64
Score float32
}
func getMatchingNotes(query blas32.Vector) ([]Score, error)
var scores []Score
for id, embedding := range embeddings {
emb := blas32.Vector{N: len(embedding), Inc: 1, Data: embedding}
similarity := blas32.Dot(query, emb)
scores = append(scores, Score{ID: id, Score: similarity})
}
return scores
}
func VectorSearch(query string) ([]Score, error) {
qfs, err := getEmbedded(query)
if err != nil {
return nil, err
}
qemb := blas32.Vector{N: len(qfs), Inc: 1, Data: qfs}
scores, err := getMatchNotes(qemb)
if err != nil {
return nil, err
}
sort.Slice(results, func(i, j int) bool {
return scores[i].Score > scores[j].Score
})
}
In the past it seemed that vector stores are special things that you need to find similar items, but at the core it boils down to a short loop that finds the dot product between a query vector and a document vector. By separating the storage from the algorithm it becomes more apparent that it is possible to calculate similarity between more things. Instead of a query, you can use other notes as queries, or multiple other notes (using average).