Schema della sezione

  • The Information Retrieval part of the course will be structured with both in-person and online lectures. The first lecture will be on October 9, 2020 and it will be an in-person lecture.

    • Monday, from 9 am to 11am (online).
    • Friday, from 9am to 11am, Room 5A, H2bis building (in-person).


    The course has moved to a full Team (instead of only a chat). To join it use the following link: https://teams.microsoft.com/l/team/19%3adfb68332f938413996640dd43c306367%40thread.tacv2/conversations?groupId=a58f9ecb-d21b-4fc2-b2a5-621c0c0da6a5&tenantId=a54b3635-128c-460f-b967-6ded8df82e75

    The Team of the course can be accessed here.

    The recording of the lectures are available here: 


    Important information on the exam

    As a general rule the date of the exam can per personalised for each person.
    However, to help with coordination, for December and January there are two fixed dates: 

    • December 18, 2020. Please submit the project (code or report) by email on or before December 15.
    • January 15, 2021. Please submit the project (code or report) by email on or before January 12.

    • Select which project you would like to prepare for the IR part of the course. If two/three people select the same project then it must be split into two/three distinct projects (see slides). Write an email to ask for a personalised project.

    • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
      Introduction to Information Retrieval
      Cambridge University Press. 2008.

      This is textbook for the course, the pdf and html versions of the book are available from the linked website


    • Topics:

      • Introduction to the course
      • Exam info
      • History of IR
      • Terminology
      • Boolean model
      • Inverted Index

    • Topics:

      • Answering queries using the inverted index
      • Stemming and lemmatization
      • Stop words
      • Answering phrase queries: biword index and positional posting
      • Data structures for posting lists

    • Python Notebook and python source code of a simple Boolean Information Retrieval System.

      The corpus is available at: http://www.cs.cmu.edu/~ark/personas/.

    • Topics:

      • Data structures for dictionaries (hash tables, b-trees, tries)
      • Postings list and dictionary compression
      • Updating the index

    • Topics:

      • Wildcard queries
      • Spelling corrections
      • Query optimization

    • Code to implement spelling correction (with edit distance) in a simple Boolean IR system.

    • Topics:

      • Ranked retrieval
      • Zone scoring
      • tf-idf weighting
      • Vector space model

    • Topics:

      • Evaluation of IR systems
      • Relevance feedback

    • Code to implement the ranking of the results via tf-idf

    • Topics:

      • Probabilistic IR
      • Binary Independence Model
      • OKAPI weighting
      • Bayesian networks

    • List of possible projects for the IR course. Two new types of projects have been added.

    • Topics:

      • Structure of the web
      • Crawling
      • IR on the web
      • Storage

    • Topics:

      • PageRank 
      • HITS

    • The TIME collection and a simple graph in json format. The two files will be used in Lecture 11.

    • A way to show vectors in the vector space model and a simple PageRank implementation.

    • Topics:

      • Matrix factorisation
      • Latent Semantic Indexing
      • Recommender systems

    • Topics:

      • Multimedia IR
      • IR and Neural Networks
      • Where to go next

  • The lectures on Data Visualization start on Friday, November 20. We will continue with the established schedule, that is Monday and Fridays from 9:00 to 11:00. All lectures will be online only. You can access the Teams here.

    All students that were members of the Information Retrieval Team are also members of the Data Visualization Team. If a mistake was made and this is not the case, let me know.