Initializes a DuckDB database connection for storing embedded documents, with optional support for the experimental `vss` extension.
Arguments
- db_path
Path to the DuckDB file. Use `":memory:"` to create an in-memory database.
- overwrite
Logical; if `TRUE`, deletes any existing DuckDB file or table.
- embedding_dim
Integer; the dimensionality of the vector embeddings to store.
- load_vss
Logical; whether to load the experimental `vss` extension. This defaults to `TRUE`, but is forced to `FALSE` during CRAN checks.
Value
A live DuckDB connection object. Be sure to manually disconnect with:
DBI::dbDisconnect(con, shutdown = TRUE)
Details
This function is part of the vector-store utilities for:
Embedding text via the OpenAI API
Storing and chunking documents in DuckDB
Building `HNSW` and `FTS` indexes
Running nearest-neighbour search over vector embeddings
Only create_vectorstore()
is exported; helpers like insert_vectors()
, build_vector_index()
,
and search_vectors()
are internal but designed to be composable.
Examples
if (FALSE) { # \dontrun{
# Create vector store
con <- create_vectorstore("tests/testthat/test-data/my_vectors.duckdb", overwrite = TRUE)
# Assume response is output from fetch_data()
docs <- data.frame(head(response))
# Insert documents with embeddings
insert_vectors(
con = con,
df = docs,
embed_fun = embed_openai(),
chunk_chars = 12000
)
# Build vector + FTS indexes
build_vector_index(con, type = c("vss", "fts"))
# Perform vector search
response <- search_vectors(con, query_text = "Tell me about R?", top_k = 5)
} # }