Skip to contents

Constructs a state graph-based agent that recommends, generates, executes, fixes, and explains data wrangling transformations based on user instructions and dataset structure. The resulting function handles list or single data frame inputs and produces a cleaned dataset.

Arguments

model

A function that takes a prompt string and returns LLM-generated output.

human_validation

Logical; whether to enable manual review step before code execution.

Logical; skip initial recommendation of wrangling steps.

bypass_explain_code

Logical; skip final explanation step after wrangling.

verbose

Logical; whether to print progress messages (default: TRUE)

Value

A callable agent function that mutates a provided `state` list by populating: - `data_wrangled`: the final cleaned data frame, - `data_wrangler_function`: the code used, - `data_wrangler_error`: any execution error (if occurred), - `wrangling_report`: LLM-generated explanation (if `bypass_explain_code = FALSE`)

Examples

if (FALSE) { # \dontrun{
# 1) Simulate multiple data frames with a common ID
df1 <- data.frame(
  ID = c(1, 2, 3, 4),
  Name = c("John", "Jane", "Jim", "Jill"),
  stringsAsFactors = FALSE
)

df2 <- data.frame(
  ID = c(1, 2, 3, 4),
  Age = c(25, 30, 35, 40),
  stringsAsFactors = FALSE
)

df3 <- data.frame(
  ID = c(1, 2, 3, 4),
  Education = c("Bachelors", "Masters", "PhD", "MBA"),
  stringsAsFactors = FALSE
)

# 2) Combine into a list
data <- list(df1, df2, df3)

# 3) Create the agent
data_wrangling_agent <- build_data_wrangling_agent(
  model = my_llm_wrapper,
  human_validation = FALSE,
  bypass_recommended_steps = FALSE,
  bypass_explain_code = FALSE,
  verbose = FALSE
)

# 4) Define the initial state
initial_state <- list(
  data_raw = data,
  user_instructions = "Merge the data frames on the ID column.",
  max_retries = 3,
  retry_count = 0
)

# 5) Run the agent
final_state <- data_wrangling
} # }