Section: Detailed program | 340SM - ALGORITHMIC DATA MINING 2023

Section outline

Amortized analysis: aggregate analysis and accounting method. Chapters 17.1 and 17.2 of Cormen's book.
Red-black trees: definition and properties. Rotations and insertion procedure. Chapters 13.1, 13.2 and 13.3 of Cormen's book.
Hash tables: definition and properties of hash tables and hash functions. Collision resolution by chaining. Analysis of hashing with chaining. Dynamic hash tables and their analysis. Chapters 11.1, 11.2, 11.3 (for the latter, only the paragraph "What makes a good hash function?") and 17.4 of Cormen's book.
Exact pattern matching on strings: definition of string matching.The naive algorithm. The Knuth-Morris-Pratt algorithm and its analysis. The Boyer-Moore-Galil algorithm. Chapter 32 of Cormen's book, excluding 32.2 and 32.3; Chapter 2.2 of Gusfield's book (please find it in Teams: Materiale del corso)
Multiple exact pattern matching: definition of suffix tree and suffix links. Representation of the branches of the suffix tree and implied time and space complexities. Use of the suffix tree for (multiple) exact pattern matching. Definition of suffix array and its use for exact pattern matching. Chapters 5, 6.1 (for the latter, only the definition of suffix links), 6.5, 7.1, 7.14 (until 7.14.4) of Gusfield's book; see also Ben Langmead's slides on tries, suffix trees and suffix arrays and the "survey_suffix_tree" paper.
Approximate pattern matching: definition of Hamming distance. Definition of k-mismatch problem. Definition of Longest common extensions. Solution to k-mismatch via longest common extensions. Definition of Edit distance. Dynamic programming algorithm for computing the edit distance. Chapters 9.1, 9.4, 11.2 and 11.3 of Gusfield's book.
Frequent pattern mining: definition of support, maximal frequent itemset, and the frequent itemset mining problem. The support monotonicity and the downward closure property (aka Apriori property). The Apriori algorithm. Definition and use of the enumeration tree. TreeProjection and DepthProject methods. Chapters 4.1, 4.2, 4.3, 4.4.1, 4.4.2 (excluding 4.4.2.1), 4.4.3 (until p.110 excluding p.109) of C. Aggarwal, Data Mining - The Textbook
Graph mining: definition of graph mining. Algorithm based on the enumeration tree. Chapter 11 of Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Machine Learning: Fundamental Concepts and Algorithms
Social networks analysis: properties of social networks. The betweenness measure. The community detection problem. The Girvan–Newman Algorithm for community detection. Chapters 19.1, 19.2.1, 19.2.2, 19.2.3, 19.2.5.3, 19.3 (intro), 19.3.2 of C. Aggarwal, Data Mining - The Textbook
Data stream model: The membership problem: Bloom filter and its analysis. The counting problem: Count-min sketch and its analysis. The cardinality problem: the k-bottom algorithm and its analysis. Finding similar items: definition of Jaccard distance between sets. K-shingling for text documents. Similarity-Preserving Summaries of Sets. Minhashing and Minhash signatures. Locality-sensitive hashing and its analysis. Chapters 12.2.2 (intro), 12.2.2.1, 12.2.2.2 of C. Aggarwal, Data Mining - The Textbook; Chapters 3.1.1, 3.2.1, 3.2.3, 3.3, 3.4 of J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets. See also Ben Langmead's slides.