Data Mining
Academic year 2025–2026
"The success of companies like Google, Facebook, Amazon, and Netflix, not to mention Wall Street firms and industries from manufacturing and retail to healthcare, is increasingly driven by better tools for extracting meaning from very large quantities of data. 'Data Scientist' is now the hottest job title in Silicon Valley." – Tim O'Reilly
The course will develop algorithms and statistical techniques for data analysis and mining, with emphasis on massive data sets such. It will cover the main aspects behind data mining.
Announcements
The first day of class is September 22; it will start a bit late, at 16.30.
You need to register to:
- The class mailing list.
- Google classroom.
Instructors
Aris Anagnostopoulos, Sapienza University of Rome.
Luca Becchetti, Sapienza University of Rome.
When and where:
Monday 16.00–19.00, Room A5–A6
Thursday 12.00–14.00, Room A5–A6
Office hours
You can use the office hours for any question regarding the class material, past or current homeworks, general questions on data mining, the meaning of life, pretty much anything. The best resource is Google classroom. If this is not enough, you can send an email to the TAs and, if needed, to the instructors for arrangement.
Textbook and references
The main textbook is the "Mining of Massive Datasets," by J. Leskovec, A. Rajaraman, and J. D. Ullman. The printed version has been updated and you can download the latest version (currently 3) from the book's web site.
In addition, we will also use some chapters from some other textbooks, all available online:
- C. Aggarwal, "Data Mining: The Textbook," Springer (must be downloaded from Sapienza)
- M. J. Zaki and W. Meira, Jr., "Data Mining and Analysis: Fundamental Concepts and Algorithms," Cambridge University Press
- R. Zafarani, M. A. Abbasi, and H. Liu, "Social Media Mining: An Introduction," Cambridge University Press
- C. D. Manning, P. Raghavan, and H. Schütze, "Introduction to Information Retrieval," Cambridge University Press
- A. Blum, J. Hopcroft, and R. Kannan, "Foundations of Data Science," Cambridge University Press
The following book is not obligatory for the class, but is a vary useful book for the topic of feature engineering
- Pablo Duboue, "The Art of Feature Engineering," Cambridge University Press
Finally, we will cover material from various sources, which we will post online as the course proceeds.
Examination format
There are two ways to pass the class:
One is:- Do a hackathon that we will set up at the end of the semester.
- Do a 1-hour written exam, where we will ask you basic concepts that we have covered during the semester.
- Do a 2-hour more extended written exam.
In addition, we will take into account participation during class.
Syllabus and lecture material
Make sure that you register to the class mailing list and to Google classroom to be able to get this information. See the Announcements above.