This function computes a fuzzy similarity score between two strings based on the token set ratio methodology. It considers the intersection and differences between tokenized word sets from the input strings, and calculates a similarity score normalized by string lengths.
Value
A numeric similarity score between 0
and 100
, representing the degree of similarity between the two input strings.
Details
This function performs the following steps:
Tokenizes the input strings.
Identifies intersecting and differing tokens between the two tokenized sets.
Computes the longest common subsequence (LCS) distance for differing tokens and normalizes it.
Calculates similarity ratios for intersecting tokens combined with differing token sets.
Returns the maximum of the normalized LCS distance and the two intersecting token ratios.
The function short-circuits to return 100
if one token set is a subset of the other. If either input string is empty, the function returns 0
.