Module Handbook

  • Dynamischer Default-Fachbereich geändert auf INF

Module INF-24-52-M-6

Information Retrieval and Data Mining (M, 4.0 LP)

Module Identification

Module Number Module Name CP (Effort)
INF-24-52-M-6 Information Retrieval and Data Mining 4.0 CP (120 h)

Basedata

CP, Effort 4.0 CP = 120 h
Position of the semester 1 Sem. irreg. SuSe
Level [6] Master (General)
Language [EN] English
Module Manager
Lecturers
Area of study [INF-INSY] Information Systems
Reference course of study [INF-88.79-SG] M.Sc. Computer Science
Livecycle-State [NORM] Active

Courses

Type/SWS Course Number Title Choice in
Module-Part
Presence-Time /
Self-Study
SL SL is
required for exa.
PL CP Sem.
2V+1U INF-24-52-K-6
Information Retrieval and Data Mining
P 42 h 78 h
U-Schein
ja PL1 4.0 irreg. SuSe
  • About [INF-24-52-K-6]: Title: "Information Retrieval and Data Mining"; Presence-Time: 42 h; Self-Study: 78 h
  • About [INF-24-52-K-6]: The study achievement "[U-Schein] proof of successful participation in the exercise classes (ungraded)" must be obtained.
    • It is a prerequisite for the examination for PL1.

Examination achievement PL1

  • Form of examination: oral examination (20-60 Min.)
  • Examination Frequency: Examination only within the course
  • Examination number: 62452 ("Information Retrieval and Data Mining")

Evaluation of grades

The grade of the module examination is also the module grade.


Contents

  • Boolean Information Retrieval (IR), TF-IDF)
  • Evaluation Models (Precision, Recall, MAP, NDCG)
  • Probabilistic IR, BM25
  • Hypothesis testing
  • Statistical language models
  • Latent topic models (LSI, pLSI, LDA)
  • Relevance feedback, novelty & diversity
  • PageRank, HITS
  • Spam detection, social networks
  • Inverted lists
  • Index compression, top-k query processing
  • Frequent itemsets & association rules
  • Hierarchical, density-based, and co-clustering
  • Decision trees and Naive Bayes
  • Support vector machines

Competencies / intended learning achievements

After successfully completing the module, students will be able to:
  • explain how modern information retrieval systems are realized,
  • assess the performance of information retrieval systems in terms of user-perceived quality and also with respect to statistical significance,
  • handle unstructured, textual information, regarding human created typos, synonymy, polysemy, etc. as well as novelty aspects among documents,
  • study core data mining approaches such as frequent itemset mining, decision trees, k-means clustering, and Bayesian classification, allowing them to build data analytics solutions, for instance, for smart decision making (concepts that are getting more and more important in the Big Data era).

Literature

  • Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press, 2008
  • Larry Wasserman. All of Statistics, Springer, 2004.
  • Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines
  • Anand Rajaraman and Jeffrey D. Ullman. Mining of Massive Datasets, Cambridge University Press, 2011.
  • supplementary literature references will be given in the lecture

Requirements for attendance of the module (informal)

None

Requirements for attendance of the module (formal)

None

References to Module / Module Number [INF-24-52-M-6]

Course of Study Section Choice/Obligation
[INF-88.79-SG] M.Sc. Computer Science [Specialisation] Specialization 1 [WP] Compulsory Elective
Module-Pool Name
[INF-SIAK-DT-CS-MPOOL-6] SIAK Certificate "Digital Transformation" - Modules INF "Computer Science"