Peter Stuifzand

Embeddings Similarity Search

For a simple note-taking application I wanted to try out embeddings and similarity search to find similar notes based on the content. In this post I will show how you how to use the gonum.org/v1/gonum/blas/blas32 package to calculate with embeddings.

Get your embeddings

When I started I used the OpenAI embeddings and saved these in the sqlite database with the notes themselves. At the moment I keep all embeddings in memory when the applications starts and new embeddings are added to the map when new notes are created or when notes are updated.

var embeddings map[int64][]float32

func getEmbedded(content string) ([]float32, error) {
  # ... fetch the embeddings from somewhere
}

Search for similar objects

import "gonum.org/v1/gonum/blas/blas32"

type Score struct {
  ID    int64
  Score float32
}

func getMatchingNotes(query blas32.Vector) ([]Score, error)
  var scores []Score

  for id, embedding := range embeddings {
    emb := blas32.Vector{N: len(embedding), Inc: 1, Data: embedding}
    similarity := blas32.Dot(query, emb)
    scores = append(scores, Score{ID: id, Score: similarity})
  }

  return scores
}

func VectorSearch(query string) ([]Score, error) {
  qfs, err := getEmbedded(query)
  if err != nil {
    return nil, err
  }

  qemb := blas32.Vector{N: len(qfs), Inc: 1, Data: qfs}

  scores, err := getMatchNotes(qemb)
  if err != nil {
    return nil, err
  }

  sort.Slice(results, func(i, j int) bool {
    return scores[i].Score > scores[j].Score
  })
}

In the past it seemed that vector stores are special things that you need to find similar items, but at the core it boils down to a short loop that finds the dot product between a query vector and a document vector. By separating the storage from the algorithm it becomes more apparent that it is possible to calculate similarity between more things. Instead of a query, you can use other notes as queries, or multiple other notes (using average).

© 2023 Peter Stuifzand