Skip to Main Content

IS450

Download as PDF

Text Mining and Language Processing

SCIS Sch of Computing & Info Sys

Course (UG/PG)

Undergraduate

Offering Unit/Department

Course Description

Given the dominance of text information over the Internet, mining high-quality information from text becomes increasingly critical. The actionable knowledge extracted from text data facilitates our life in a broad spectrum of areas, including business intelligence, information acquisition, behavior analysis and decision making process. In this course, we will cover important topics in ext mining including: document representation, text categorization and clustering, sentiment analysis, probabilistic topic models and text visualization. Text mining techniques adopt the models from research areas such as Statistics, NLP and Linguistics. We will also focus on basic natural language processing techniques, language parsing and analysis and evaluation techniques.

Course Learning Outcomes

1. To understand the vector representation of documents and apply cosine to measure similarity

2. To understand TF and IDF weighting and gain hands-on experience with vector space models

3. To understand how naïve Bayes classifier works for text classification

4. To gain some basic knowledge about other classification algorithms including linear classifiers and neural networks

5. To apply API for text classification and document clustering

6. To understand why topic modeling is useful and apply Gensim API to derive topics from a corpus

7. To understand the basic approaches to some typical problems in sentiment analysis and apply supervised approach for sentiment polarity classification

8. To gain some basic understanding of natural language processing

9. To understand Information Extraction (IE), techniques and its applications

10. To understand Named Entity Recognition (NER) and gain knowledge about the techniques for NER

11. To understand the definitions of accuracy, precision, recall and F-measure

12. To be aware of evaluation methods for text clustering

13. To understand advanced text analytics tasks like Text summarization and Question answering and apply techniques for such tasks

  1. To apply deep learning models and LLM models for text processing and mining tasks.

Discipline-Specific Competencies

Data Analytics, Business Innovation, Pattern Recognition Systems, Research, Text Analytics and Processing

SMU Graduate Learning Outcomes

Disciplinary Knowledge, Critical thinking & problem solving, Innovation and enterprising skills, Collaboration and leadership, Communication, Self-directed learning

Grading Basis

GRD - Graded

Course Units

1