340SM - ALGORITHMIC DATA MINING 2023
Schema della sezione
-
Dear students, please find below a SAMPLE FOR THE WRITTEN PART of the exam.
-
Sample exam File PDF
-
Amortized analysis: aggregate analysis and accounting method. Chapters 17.1 and 17.2 of Cormen's book.
Red-black trees: definition and properties. Rotations and insertion procedure. Chapters 13.1, 13.2 and 13.3 of Cormen's book.
Hash tables: definition and properties of hash tables and hash functions. Collision resolution by chaining. Analysis of hashing with chaining. Dynamic hash tables and their analysis. Chapters 11.1, 11.2, 11.3 (for the latter, only the paragraph "What makes a good hash function?") and 17.4 of Cormen's book.
Exact pattern matching on strings: definition of string matching.The naive algorithm. The Knuth-Morris-Pratt algorithm and its analysis. The Boyer-Moore-Galil algorithm. Chapter 32 of Cormen's book, excluding 32.2 and 32.3; Chapter 2.2 of Gusfield's book (please find it in Teams: Materiale del corso)
Multiple exact pattern matching: definition of suffix tree and suffix links. Representation of the branches of the suffix tree and implied time and space complexities. Use of the suffix tree for (multiple) exact pattern matching. Definition of suffix array and its use for exact pattern matching. Chapters 5, 6.1 (for the latter, only the definition of suffix links), 6.5, 7.1, 7.14 (until 7.14.4) of Gusfield's book; see also Ben Langmead's slides on tries, suffix trees and suffix arrays and the "survey_suffix_tree" paper.
Approximate pattern matching: definition of Hamming distance. Definition of k-mismatch problem. Definition of Longest common extensions. Solution to k-mismatch via longest common extensions. Definition of Edit distance. Dynamic programming algorithm for computing the edit distance. Chapters 9.1, 9.4, 11.2 and 11.3 of Gusfield's book.
Frequent pattern mining: definition of support, maximal frequent itemset, and the frequent itemset mining problem. The support monotonicity and the downward closure property (aka Apriori property). The Apriori algorithm. Definition and use of the enumeration tree. TreeProjection and DepthProject methods. Chapters 4.1, 4.2, 4.3, 4.4.1, 4.4.2 (excluding 4.4.2.1), 4.4.3 (until p.110 excluding p.109) of C. Aggarwal, Data Mining - The Textbook
Graph mining: definition of graph mining. Algorithm based on the enumeration tree. Chapter 11 of Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Machine Learning: Fundamental Concepts and Algorithms
Social networks analysis: properties of social networks. The betweenness measure. The community detection problem. The Girvan–Newman Algorithm for community detection. Chapters 19.1, 19.2.1, 19.2.2, 19.2.3, 19.2.5.3, 19.3 (intro), 19.3.2 of C. Aggarwal, Data Mining - The Textbook
Data stream model: The membership problem: Bloom filter and its analysis. The counting problem: Count-min sketch and its analysis. The cardinality problem: the k-bottom algorithm and its analysis. Finding similar items: definition of Jaccard distance between sets. K-shingling for text documents. Similarity-Preserving Summaries of Sets. Minhashing and Minhash signatures. Locality-sensitive hashing and its analysis. Chapters 12.2.2 (intro), 12.2.2.1, 12.2.2.2 of C. Aggarwal, Data Mining - The Textbook; Chapters 3.1.1, 3.2.1, 3.2.3, 3.3, 3.4 of J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets. See also Ben Langmead's slides.
-
Dear students,
Please find below a set of papers among which you can pick the one to present for the exam. When you choose a paper, please write the title and your name in this file Paper_choice.xlsx to reserve it for the date on which you intend to take the exam.
-
16.FM Index File PDF
-
affix array File PDF
-
CDAWGS File PDF
-
38.WhatsHap File PDF
-
19.Omnitigs File PDF