• Priyanka P. Pattnaik

Playing with the funny library: fuzzywuzzy

This story is about one of my projects where I discover this powerful library for approximation matching from a dataset. Before finding this I am doing a hard coding for finding the exact match of a sequence from the database but it takes more time and the complexity is increasing.

Basic string matching in python is: Str1 == str2

And this matching only works if str1 and str2 both are in the same case.

After finding this library, I find that within a few lines of code I can actually find out the matched sequence from the dataset and it solves the complexity of my work.

How the fuzzy-wuzzy is working? — Levenshtein Distance.

Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.

I found that the best explanation of the Levenshtein distance is given by Wikipedia. And in the above picture, you can see it perfectly. Now if you want to check in the point of view of a programmer the distance can be calculated using this below function.

In python, the best part is we don't need to write this whole long function for calculation. We already have some pre-defined libraries which take care of that part. We just only need to understand how it is working.

Installation: pip install fuzzywuzzy

And here I am showing how it calculates the exact matching or the partial matching while giving the percentage ratio of that matching.

Brought to You by

COE-AI(CET-BBSR)- A Initiative by CET-BBSR, Tech Mahindra, and BPUT to provide solutions to Real-world problems through ML and IoT