Skip to contents

This function processes a given text string by converting it to lowercase, removing numbers, non-alphanumeric characters, extra whitespace, and stopwords based on a specified language. It also transliterates text to ASCII, splits words, and reconstructs a clean text string suitable for analysis.

Usage

prepare_text(text, stopwords = NULL)

Arguments

text

A character vector or object that can be coerced to a character string. Represents the input text to be cleaned.

stopwords

A character vector specifying stopwords removal. Defaults to "spanish" stopwords from the tm:stopwords package.

Value

A cleaned character string, with stopwords removed and text formatted for analysis.

Examples

# Example usage:
prepare_text("¡Hola! Esto es una prueba 123.")
#> [1] "hola prueba"