In this blog we will be demonstrating the functionality of applying the full ML pipeline over a set of documents which in this case we are using 10 books from the internet.
So lets start with first thing first..
What is Clustering ?
Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statisticaldata analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.
Clustering when applied on the textual data , then it is known as Document Clustering.
It has applications in automatic…
View original post 650 more words