Bot essentials 12 : The NLU deepdive – Stemming and lemmatization

So what exactly is stemming and lemmatization and how does it get used in machine learning? The specific issues that these approaches solve for inflections in language use so that search / retrieval and response accuracy can be increased further.

Stemming

When we stem a branch we cut off the redundant branches to retain the core whole of the branch or tree. Similarly for word stemming, we cut the redundant aspects of the word to determine the core essence and the context of the word that we use.  The standard technique for stemming is using Porter’s algorithm. The Porter’s approach is a standard set of heuristics on how we can handle inflection points in English.

A stemmer (as we call the algorithm) uses the principle of abstraction or chopping of words so we abstract Berry and berries to berri. Applying a stemmer increases the probability and accuracy of matching words against their inflected derivations.

Lemmatization

We think of lemmatization to be more effective than stemming. In a lemmatization algorithm, we don’t just reduce or chop off the inflections but we use a knowledge base to obtain the correct base of the word forms.

Stemming and lemmatization are techniques that we use for determining word usage. We do this to frame the intent and the context of the word that we use in a sentence. We use these techniques in NLP that we use in the engine of bot platforms like Engati. This is to find the closest match of answers to questions that people as us.

Read about Bot Essentials- How do we evolve a chatbot for business transaction?

We couple the base tenets of Natural Language Understanding with processing techniques. This allows a categorization of each sentence that we ask a bot in free form. Further, determining the context, intent, word meanings and matching it with available responses forms the core. It is basically how a bot will structure a response.

stemming and lemmitisation

The aspects described in the last few blogs are only those for building a simple FAQ bot. This however, is an important first step in the evolution of machine learning technology. This is helping us in building a knowledge set and is even allowing us to open it to the world to ask queries.

Try Engati today. It has one of the most finely tuned NLP engines to build a bot without the need for any programming. You can build and start teaching the bot to learn the responses you want it to provide without really delving into the depths of NLU and NLP techniques you have read about in this blog series.

Stay updated with Engati by scheduling a free demo.