Overview
RAGFlowChainR is an R package that brings Retrieval-Augmented Generation (RAG) capabilities to R, inspired by LangChain. It enables intelligent retrieval of documents from a local vector store (DuckDB), optional web search, and seamless integration with Large Language Models (LLMs).
Features include:
- 📂 Ingest files and websites
- 🔍 Semantic search using vector embeddings
- 🧠 RAG chain execution with conversational memory and dynamic prompt construction
- 🔌 Plug-and-play with OpenAI, Ollama, Groq, and Anthropic
Python version: RAGFlowChain (PyPI)
GitHub (Python): RAGFlowChain
Development version
To get the latest features or bug fixes, you can install the development version of RAGFlowChainR
from GitHub:
# If needed
install.packages("remotes")
remotes::install_github("knowusuboaky/RAGFlowChainR")
See the full function reference or the package website for more details.
🔐 Environment Setup
Sys.setenv(TAVILY_API_KEY = "your-tavily-api-key")
Sys.setenv(OPENAI_API_KEY = "your-openai-api-key")
Sys.setenv(GROQ_API_KEY = "your-groq-api-key")
Sys.setenv(ANTHROPIC_API_KEY = "your-anthropic-api-key")
To persist across sessions, add these to your ~/.Renviron
file.
Usage
1. Data Ingestion
library(RAGFlowChainR)
local_files <- c("tests/testthat/test-data/sprint.pdf",
"tests/testthat/test-data/introduction.pptx",
"tests/testthat/test-data/overview.txt")
website_urls <- c("https://www.r-project.org")
crawl_depth <- 1
response <- fetch_data(
local_paths = local_files,
website_urls = website_urls,
crawl_depth = crawl_depth
)
response
#> source title ...
#> 1 documents/sprint.pdf <NA> ...
#> 2 documents/introduction.pptx <NA> ...
#> 3 documents/overview.txt <NA> ...
#> 4 https://www.r-project.org R: The R Project for Statistical Computing ...
#> ...
cat(response$content[1])
#> Getting Started with Scrum\nCodeWithPraveen.com ...
2. Vector Store & Semantic Search
con <- create_vectorstore("tests/testthat/test-data/my_vectors.duckdb", overwrite = TRUE)
docs <- data.frame(head(response)) # reuse from fetch_data()
insert_vectors(
con = con,
df = docs,
embed_fun = embed_openai(),
chunk_chars = 12000
)
build_vector_index(con, type = c("vss", "fts"))
response <- search_vectors(con, query_text = "Tell me about R?", top_k = 5)
response
#> id page_content dist
#> 1 5 [Home]\nDownload\nCRAN\nR Project...\n... 0.2183
#> 2 6 [Home]\nDownload\nCRAN\nR Project...\n... 0.2183
#> ...
cat(response$page_content[1])
#> [Home]\nDownload\nCRAN\nR Project\nAbout R\nLogo\n...
3. RAG Chain Querying
rag_chain <- create_rag_chain(
llm = call_llm,
vector_database_directory = "tests/testthat/test-data/my_vectors.duckdb",
method = "DuckDB",
embedding_function = embed_openai(),
use_web_search = FALSE
)
response <- rag_chain$invoke("Tell me about R")
response
#> $input
#> [1] "Tell me about R"
#>
#> $chat_history
#> [[1]] $role: "human", $content: "Tell me about R"
#> [[2]] $role: "assistant", $content: "R is a programming language..."
#>
#> $answer
#> [1] "R is a programming language and software environment commonly used for statistical computing and graphics..."
cat(response$answer)
#> R is a programming language and software environment commonly used for statistical computing and graphics...
LLM Support
call_llm(
prompt = "Summarize the capital of France.",
provider = "groq",
model = "llama3-8b",
temperature = 0.7,
max_tokens = 200
)
📦 Related Package: chatLLM
The chatLLM
package (now available on CRAN 🎉) offers a modular interface for interacting with LLM providers including OpenAI, Groq, and Anthropic.
install.packages("chatLLM")
Features:
- 🔁 Seamless provider switching (
openai
,groq
,anthropic
) - ✍️ Prompt + system message templating
- 💬 Multi-message chat sessions
- 🔌 Native integration with
RAGFlowChainR
- 🔐
.Renviron
-based key management