Splits documents into text units
Public fields
chunk_size
Target chunk size
chunk_overlap
Overlap size
by_sentence
Preserve sentences
Methods
Method new()
Create a new DocumentChunker
Usage
DocumentChunker$new(chunk_size = 1200, chunk_overlap = 100, by_sentence = TRUE)
Arguments
chunk_size
Target size
chunk_overlap
Overlap
by_sentence
Preserve sentences
Method chunk()
Chunk a document
Usage
DocumentChunker$chunk(text, document_id = NULL)
Arguments
text
Document text
document_id
Document ID
Returns
List of TextUnit objects
Method clone()
The objects of this class are cloneable with this method.
Usage
DocumentChunker$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.