Close

TMNL-EMBA-BA-19
MBA(Exe.)-BA-2019-20: Term-IV

TEXT MINING AND NATURAL LANGUAGE PROCESSING
No. of Credits: 3
Course Description
Text as a data source for knowledge discovery and identification has always been crucial because of its relevance to the information needs of diverse individuals, communities and organizations. With the rapid growth of online applications that allows free expression of opinions and ideas there is a tremendous growth of text data involving critical business intelligence. Detailed analysis of text data requires understanding of a battery of techniques from information retrieval, natural language processing and data mining. This course aims to demonstrate students the major techniques available for mining, analysing and visualizing text data.

Learning Outcomes
Upon completion of this course students will be able to:

· Discuss the various ways in which text can be analysed, and appropriate uses of each
· Use open source text analytic tools
· Develop insights about the applications various text analysis tools
· Design a project for textual analysis suitable for a specific domain

No. of Sessions
Topics
Readings
1-3
Introduction to Text Analytics
Origin of Text Mining - Understanding Text – Applications – Information Visualization - Architecture for Text Mining Applications.
Chapter 1 (R1)
Mathematics Background: Probability - Bayes’s Rule - Probability Distribution Sampling Distribution - Matrices
Chapter 2 (R1)
4-7
Determining the vocabulary of terms
Parsing unstructured text I: keywords, n-grams
Markov Models and POS Tagging
Chapter 4 (R1)
Parsing unstructured text II: parse tree, stemming, lexicon and ontology
Application - Stanfor NLP Kit
8
Treating Text as Data - Features
Chapter 6 (R2)
Scoring, term weighting and the vector space model
9-11
Exploratory analytics I: text clustering
Mathematics Background: Clustering techniques
Chapter 16 and 17 (R2)
Applications of Text clustering - Document clustering
Chapter 8 (R1)
12-14
Exploratory analytics II: topic modeling
Mathematics Background: Matrix decompositions and latent semantic indexing and Bayesian Distribution
Chapter 18 (R2)
LSA, LDA and Word2Vec
Materials will be provided
15
Text Summarization
Chapter 10 (R1)
16-18
Predictive analytics: text classification
Mathematics Background: Suprevised Learning Algorithms
Reference book 3
Applications of Supervised Learning Algorithm
19
Application: Sentiment analysis (Non-Supervised Learning methods)
20
Project presentation
Readings
R1
Text Mining Application Programming by Manu Konchandy
R2
Introduction to Information Retrieval by Manning Christopher D., Raghavan Prabhakar
Reference
1
Foundations of Statistical Natural Language Processing (The MIT Press) by Christopher Manning
2
Thomas W. Miller, Prentice Hall, “Data and Text Mining - A Business Applications Approach”, Second impression, 2011
3
Data Mining and Predictive Analytics, 2nd Edition by Daniel T. Larose, Chantal D. Larose
Evaluation
1
Attendance and Class Participation 10%
2
Mid Term of 25% weight
3
Individual projects of 25% weight
4
End-term of 40% weight
Created By: Alora Kar on 12/11/2019 at 04:33 PM
Category: MBA(Exe.)BA-2019-20 T-IV Doctype: Document

...........................