Information Retrieval
Schema della sezione
-
The Information Retrieval part of the course will be structured with both in-person and online lectures. The first lecture will be on October 9, 2020 and it will be an in-person lecture.
- Monday, from 9 am to 11am (online).
- Friday, from 9am to 11am, Room 5A, H2bis building (in-person).
The course has moved to a full Team (instead of only a chat). To join it use the following link: https://teams.microsoft.com/l/team/19%3adfb68332f938413996640dd43c306367%40thread.tacv2/conversations?groupId=a58f9ecb-d21b-4fc2-b2a5-621c0c0da6a5&tenantId=a54b3635-128c-460f-b967-6ded8df82e75
The Team of the course can be accessed here.The recording of the lectures are available here:
Important information on the exam
As a general rule the date of the exam can per personalised for each person.
However, to help with coordination, for December and January there are two fixed dates:- December 18, 2020. Please submit the project (code or report) by email on or before December 15.
- January 15, 2021. Please submit the project (code or report) by email on or before January 12.
-
Select which project you would like to prepare for the IR part of the course. If two/three people select the same project then it must be split into two/three distinct projects (see slides). Write an email to ask for a personalised project.
-
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
Introduction to Information Retrieval
Cambridge University Press. 2008.This is textbook for the course, the pdf and html versions of the book are available from the linked website
-
Topics:
- Introduction to the course
- Exam info
- History of IR
- Terminology
- Boolean model
- Inverted Index
-
Topics:
- Answering queries using the inverted index
- Stemming and lemmatization
- Stop words
- Answering phrase queries: biword index and positional posting
- Data structures for posting lists
-
Python Notebook and python source code of a simple Boolean Information Retrieval System.
The corpus is available at: http://www.cs.cmu.edu/~ark/personas/.
-
Topics:
- Data structures for dictionaries (hash tables, b-trees, tries)
- Postings list and dictionary compression
- Updating the index
-
Topics:
- Wildcard queries
- Spelling corrections
- Query optimization
-
Code to implement spelling correction (with edit distance) in a simple Boolean IR system.
-
Topics:
- Ranked retrieval
- Zone scoring
- tf-idf weighting
- Vector space model
-
Topics:
- Evaluation of IR systems
- Relevance feedback
- Evaluation of IR systems
-
Code to implement the ranking of the results via tf-idf
-
Topics:
- Probabilistic IR
- Binary Independence Model
- OKAPI weighting
- Bayesian networks
-
List of possible projects for the IR course. Two new types of projects have been added.
-
Topics:
- Structure of the web
- Crawling
- IR on the web
- Storage
-
Lecture 10 File PDF
Topics:
- PageRank
- HITS
-
The TIME collection and a simple graph in json format. The two files will be used in Lecture 11.
-
A way to show vectors in the vector space model and a simple PageRank implementation.
-
Topics:
- Matrix factorisation
- Latent Semantic Indexing
- Recommender systems
-
Topics:
- Multimedia IR
- IR and Neural Networks
- Where to go next