Skip to contents

Initializes a DuckDB database connection for storing embedded documents, with optional support for the experimental `vss` extension.

Arguments

db_path

Path to the DuckDB file. Use `":memory:"` to create an in-memory database.

overwrite

Logical; if `TRUE`, deletes any existing DuckDB file or table.

embedding_dim

Integer; the dimensionality of the vector embeddings to store.

load_vss

Logical; whether to load the experimental `vss` extension. This defaults to `TRUE`, but is forced to `FALSE` during CRAN checks.

Value

A live DuckDB connection object. Be sure to manually disconnect with: DBI::dbDisconnect(con, shutdown = TRUE)

Details

This function is part of the vector-store utilities for:

  • Embedding text via the OpenAI API

  • Storing and chunking documents in DuckDB

  • Building `HNSW` and `FTS` indexes

  • Running nearest-neighbour search over vector embeddings

Only create_vectorstore() is exported; helpers like insert_vectors(), build_vector_index(), and search_vectors() are internal but designed to be composable.

Examples

if (FALSE) { # \dontrun{
# Create vector store
con <- create_vectorstore("tests/testthat/test-data/my_vectors.duckdb", overwrite = TRUE)

# Assume response is output from fetch_data()
docs <- data.frame(head(response))

# Insert documents with embeddings
insert_vectors(
  con = con,
  df = docs,
  embed_fun = embed_openai(),
  chunk_chars = 12000
)

# Build vector + FTS indexes
build_vector_index(con, type = c("vss", "fts"))

# Perform vector search
response <- search_vectors(con, query_text = "Tell me about R?", top_k = 5)
} # }