Saturday, March 07, 2009

The Data Mining (DM) Series - Introduction

Since Data Mining is my primary specialization in masters, I am planning to blog a series on the basics of Data Mining.

What is data(Knowledge) mining (DM) ?
As I had mentioned earlier data mining actually refers to mining knowledge from large amounts of data. Example: You are an email service provider and want to filter out all spam mails. You can apply data mining to simply identify the pattern of spam mails and filter them out.

Why do we need data mining ?
To learn hidden knowledge that might be present in vast amounts of data. Example: If you own a big retail store and want to find the target customers for a specific product, just mine the buying patterns in your customer database to find out the list of potential buyers

When we talk about DM, we would most likely talk about the functionalities of data mining.They represent the methodology used to perform mining. They are
1. Frequent Pattern Mining, Associations and Correlations
2. Classification and Prediction
3. Cluster Analysis
4. Outlier Analysis
5. Stream Mining
6. Sequence, Trend and Evolution Analysis
7. Graph Mining
8. Information Network Analysis
9. Web Mining

I'll elaborate on each of the functionality in this series.

Applications of Data Mining:
There are several practical applications of data mining. The list would probably be too big. So let me just name a few with the DM technique it uses
1. Spam filtering - uses classification
2. Intrusion Detection System - uses Frequent Pattern Mining / Classification / Stream Mining
3. Fraud Detection - uses Outlier Analysis
4. Forecasting (weather / market etc) - uses Trend Analysis
5. Web search - Web Mining

Math background:
The math concepts that gets heavily used in data mining are
1. Statistics
2. Probability
3. Linear Algebra

No comments: