What is local sensitive hashing? A technique for performing a rough nearest neighbour search in high-dimensional spaces is called local…
What is MinHash? MinHash is a technique for estimating the similarity between two sets. It was first introduced in information…
What is SimHash? Simhash is a technique for generating a fixed-length "fingerprint" or "hash" of a variable-length input, such as…
This article discusses one of the most valuable tools when analysing textual data in natural language processing — fuzzy string…
Text similarity is a really useful natural language processing (NLP) tool. It allows you to find similar pieces of text…