481SM - INFORMATION RETRIEVAL AND DATA VISUALIZATION 2021
Schema della sezione
-
The Information Retrieval part of the course will be structured with recorded in-person lectures.
There will be two lectures each week:
- Monday, from 11:00 to 13:00. Aula Morin, H2bis building.
- Thursday,
from 10:00 to 12:00from 9:00 to 11:00. Room 4C, H2bis building.
The first lecture will be on October 4, 2021.
The code to access the MS Teams of the course is: t0i8w8d
Exam
The exam for the Information Retrieval part of the course consists of a project plus an oral exam.
For December and January the dates of the exam are the following ones:
- December 21, 2021. From 9:30 am in room 5B, H2bis building.
- January 14, 2022. From 9:30 am in room
5B4D, H2bis building. Check the MS Teams of the course for the schedule!
Please submit your project (code or report) by email at least 5 days before the exam.
If you need to have a remote exam please inform me at the moment of the submission, so that I can update the schedule.
If you need to anticipate the exam for scheduling purposes (e.g., to the previous week) please write to me so that, if enough people require it, I can fix an additional date in December.
After January the exam will be by appointment.
-
Projects File PDF
-
-
Slides File PDF
-
Dataset File ALL
-
-
The lectures on Data Visualization will start on Wednesday, November 17. The lectures will take place in building H2bis, room 4C for three weeks as follows:
- on Wednesdays from 14:00 to 17:00
- on Thursdays from 9:00 to 13:00
The lectures will be recorded and available on Teams, but I encourage attendance in person whenever possible.
Note that the lessons on November 17 and 18 will be held online.
-
A collection of various freely available data sources
-
-
-
Introduction to the course
-
Foundations File PDF
Foundations of data visualization: definition, motivation and historical visualizations. The purposes of data visualization. The three principles of good visualization design: trustworthiness, accessibility and elegance. Differences between truth and trust and various ways to lie with visualizations. Guidelines for creating accessible visualizations. The importance of data-ink ratio and the reasons against uncommon charts. Inspiration for elegant visualizations.
-
Data abstraction: motivation, types of datasets, attribute types and semantics.
-
Task abstraction: motivation, goals and tasks, actions and targets, examples of efficient designs for particular visual tasks
-
-
-
How vision works, the importance of attention and its implications for design. How memory works and its implications for presentations.The visual encoding and the expressiveness of the visual channels. Channel accuracy and its implications for visualization design. Channel discriminability, salience and separability, and their implications for visualization design. The Gestalt laws of grouping (proximity, similarity, connection, enclosure, closure and figure/ground), their hierarchy and their use in data visualization. How to obtain visual order.
-
List issues with the given visualization and improve it
-
-
-
The results of the first assignment
-
An installation guide to the Python libraries required for the hands-on lesson
Note that these instructions have been updated because the original ones weren't working for some of the students.
-
Test Plotly File IPYNB
A test of the Plotly Python library
-
Test Dash File IPYNB
A test of the Dash Python library
-
Color File PDF
The importance of color. Color perception: the anatomy and physiology of the human eye, the trichromatic theory and the opponent processing theory. Color specification with color spaces RGB and HSL, CIE Lab and HCL. Demonstrations using various color pickers. Intuitivity and perceptual uniformity of each space. Use of color. Sequential color maps. Issues with the rainbow color map. Alternative sequential color maps (including cubehelix). Diverging and categorical color maps. The desired properties of univariate color maps. Bivariate color maps. Using established color maps (colorbrewer) or constructing new ones. The semantics of color. Considerations for colorblind people. The importance of size and contrast. The importance of background and surrounding colors. Advice on choosing colors.
-
-
-
The seven steps of visualization design. Different ways to acquire data. The importance of parsing and filtering data. Mining data for exploratory data analysis. Choosing the right representation for the given data and the given task. Online repositories of charts for various purposes. Refining the visualization and supporting interactivity.
The dos and don’ts of basic charts (line charts, bar charts). Stacked bar charts and pie charts. Visualizing geographical data with dot distribution maps and choropleth maps. Visualizing geographical data with tile maps. Visualizing networks and trees with node-link diagrams and adjacency matrices. Visualizing multidimensional data with Chernoff faces, bubble plots, the scatter plot matrix, parallel coordinates, radar charts, radial histograms, small multiples and horizon charts. Using principal component analysis and multidimensional scaling for visual exploratory data analysis.
Visualizing uncertain and missing data. Advantages and disadvantages of interactivity. Using interactivity for data adjustments (framing, navigating, animating, sequencing and contributing) and presentation adjustments (focusing, annotating and orientating). Examples of interaction, animation and storytelling. Examples of available visualization tools (D3, Observable, Tableau and Processing).
-
Instructions for the second assignment
-
Exam File PDF
Information about the exam (in project form)
-
-
-
The results of the second assignment
-
Visualizations that lie by using dubious data, such as unrepresentative data and missing data. Using non-comparable data in comparisons. Using absolute instead of cumulative data (and vice versa). Using absolute instead of relative data on maps. Examples of ignoring conventions (unequal intervals, pie charts that do not add up to 100%) and abusing scales (bar charts with truncated axis, aspect ratio bias, dual axes, improper scaling of areas and pictograms). Examples of misrepresenting data by using unnecessary 3-D visualizations. Examples of improper categorization and oversimplification. Examples of cherry-picking data in order to hide (unfavorable) data or conceal existing patterns. Examples of visualizations suggesting patterns that are not there. Examples of misrepresenting or concealing uncertainty. Examples of erroneous interpretation of visualizations due to confirmation bias.
-
-
-
Example of accessible visualizations: redesign of diversity of aging, plots in Excel, slope graphs, connected scatter plots, smoothed line charts, the importance of notations. Guidelines for creating accessible visualizations.
-
Entire Gapminder data
-
Additional information about the Gapminder data
-
A small snippet of the Gapminder data containing only the data for Italy and South Africa between 2016 and 2020
-
A Jupyter notebook that help to understand how the Plotly library constructs figures
-
Gapminder File IPYNB
A Jupyter notebook visualizing Gapminder data in an interactive way (using the Plotly and Dash libraries):
- Creating
a basic scatter plot and styling it to meet the Gapminder style
- Adding animation to the Gapminder bubble chart, choosing chart axes with Dash dropdowns
- Creating
a basic scatter plot and styling it to meet the Gapminder style
-