481SM - INFORMATION RETRIEVAL AND DATA VISUALIZATION 2020
Schema della sezione
-
The Information Retrieval part of the course will be structured with both in-person and online lectures. The first lecture will be on October 9, 2020 and it will be an in-person lecture.
- Monday, from 9 am to 11am (online).
- Friday, from 9am to 11am, Room 5A, H2bis building (in-person).
The course has moved to a full Team (instead of only a chat). To join it use the following link: https://teams.microsoft.com/l/team/19%3adfb68332f938413996640dd43c306367%40thread.tacv2/conversations?groupId=a58f9ecb-d21b-4fc2-b2a5-621c0c0da6a5&tenantId=a54b3635-128c-460f-b967-6ded8df82e75
The Team of the course can be accessed here.The recording of the lectures are available here:
Important information on the exam
As a general rule the date of the exam can per personalised for each person.
However, to help with coordination, for December and January there are two fixed dates:- December 18, 2020. Please submit the project (code or report) by email on or before December 15.
- January 15, 2021. Please submit the project (code or report) by email on or before January 12.
-
Select which project you would like to prepare for the IR part of the course. If two/three people select the same project then it must be split into two/three distinct projects (see slides). Write an email to ask for a personalised project.
-
Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
Introduction to Information Retrieval
Cambridge University Press. 2008.This is textbook for the course, the pdf and html versions of the book are available from the linked website
-
Topics:
- Introduction to the course
- Exam info
- History of IR
- Terminology
- Boolean model
- Inverted Index
-
Topics:
- Answering queries using the inverted index
- Stemming and lemmatization
- Stop words
- Answering phrase queries: biword index and positional posting
- Data structures for posting lists
-
Python Notebook and python source code of a simple Boolean Information Retrieval System.
The corpus is available at: http://www.cs.cmu.edu/~ark/personas/.
-
Topics:
- Data structures for dictionaries (hash tables, b-trees, tries)
- Postings list and dictionary compression
- Updating the index
-
Topics:
- Wildcard queries
- Spelling corrections
- Query optimization
-
Code to implement spelling correction (with edit distance) in a simple Boolean IR system.
-
Topics:
- Ranked retrieval
- Zone scoring
- tf-idf weighting
- Vector space model
-
Topics:
- Evaluation of IR systems
- Relevance feedback
- Evaluation of IR systems
-
Code to implement the ranking of the results via tf-idf
-
Topics:
- Probabilistic IR
- Binary Independence Model
- OKAPI weighting
- Bayesian networks
-
List of possible projects for the IR course. Two new types of projects have been added.
-
Topics:
- Structure of the web
- Crawling
- IR on the web
- Storage
-
Lecture 10 File PDF
Topics:
- PageRank
- HITS
-
The TIME collection and a simple graph in json format. The two files will be used in Lecture 11.
-
A way to show vectors in the vector space model and a simple PageRank implementation.
-
Topics:
- Matrix factorisation
- Latent Semantic Indexing
- Recommender systems
-
Topics:
- Multimedia IR
- IR and Neural Networks
- Where to go next
-
The lectures on Data Visualization start on Friday, November 20. We will continue with the established schedule, that is Monday and Fridays from 9:00 to 11:00. All lectures will be online only. You can access the Teams here.
All students that were members of the Information Retrieval Team are also members of the Data Visualization Team. If a mistake was made and this is not the case, let me know.
-
A collection of various freely available data sources
-
Course introduction
Foundations
- What is data visualization
- Why visualize data
- Historical visualizations
-
Foundations File PDF
-
Foundations
- The three principles of good visualization design
Data abstraction- Dataset types
-
Data abstraction
- Attribute types and semantics
Task abstraction- Goals and tasks
- Actions and targets
Human visual perception- Motivation
- Memory
- Visual encoding
- Channel accuracy
-
Human visual perception
- Channel discriminability
- Channel salience (pop-out)
- Channel separability
- Grouping
- Visual order
-
Human visual perception
- Color perception
- Color specification
-
Human visual perception
- Color use: color maps, semantics of color, considerations for color blind people, the importance of size and contrast
- Color use: color maps, semantics of color, considerations for color blind people, the importance of size and contrast
-
Human visual perception
- Color use: importance of background and surrounding color, choosing color
Visualization design- The seven steps of visualization design
- Basic charts: line, bar and pie charts, dot and choropleth maps
-
Visualization design
- Basic charts: tile maps, node-link diagrams and adjacency matrices
- Visualizing multidimensional data
- Visualizing uncertain data
-
The description of the third assignment
-
The data for the third assignment
-
Visualization design
- Visualizing missing data
- Using interactivity for data adjustments and presentation adjustments
- Examples of interaction, animation and storytelling
- Visualization tools (D3, Tableau)
Examples of (un)trustworthy visualizations- Visualizations using dubious data
-
Reviewing the results of the third assignment
Information about the exam
Examples of (un)trustworthy visualizations
- Ignoring conventions
- Abusing scales
-
Results of the third assignment
-
Exam File PDF
Information about the exam
-
Python libraries needed for hands-on lessons (Plotly and Dash)
Examples of (un)trustworthy visualizations
- Improper categorization
- Oversimplifying
- Ignoring uncertainty
- Confirmation bias
Examples of (in)accessible visualizations -
Test Plotly File IPYNB
-
Test Dash File IPYNB
-
How to create accessible visualizations
Visualizing COVID-19
Hands-on example in Python
- The Plotly library
- Recreating the interactive Gapminder bubble chart
-
Plotly Basics File IPYNB
-
Hands-on example in Python
- Adding animation to the Gapminder bubble chart
- The Dash library
- Using dropdowns to select axes of the chart
-