Data Mining

Academic year 2025–2026

"The success of companies like Google, Facebook, Amazon, and Netflix, not to mention Wall Street firms and industries from manufacturing and retail to healthcare, is increasingly driven by better tools for extracting meaning from very large quantities of data. 'Data Scientist' is now the hottest job title in Silicon Valley."      – Tim O'Reilly

 

The course will develop algorithms and statistical techniques for data analysis and mining, with emphasis on massive data sets such. It will cover the main aspects behind data mining.

 

Announcements

The first day of class is September 22; it will start a bit late, at 16.30.

You need to register to:

  • The class mailing list.
  • Google classroom.
Details will be given after the first day of class.

 

Instructors

Aris Anagnostopoulos, Sapienza University of Rome.

Luca Becchetti, Sapienza University of Rome.

 

When and where:

Monday 16.00–19.00, Room A5–A6

Thursday 12.00–14.00, Room A5–A6

 

Office hours

You can use the office hours for any question regarding the class material, past or current homeworks, general questions on data mining, the meaning of life, pretty much anything. The best resource is Google classroom. If this is not enough, you can send an email to the TAs and, if needed, to the instructors for arrangement.

 

Textbook and references

The main textbook is the "Mining of Massive Datasets," by J. Leskovec, A. Rajaraman, and J. D. Ullman. The printed version has been updated and you can download the latest version (currently 3) from the book's web site.

In addition, we will also use some chapters from some other textbooks, all available online:

The following book is not obligatory for the class, but is a vary useful book for the topic of feature engineering

Finally, we will cover material from various sources, which we will post online as the course proceeds.

 

Examination format

There are two ways to pass the class:

One is:
  • Do a hackathon that we will set up at the end of the semester.
  • Do a 1-hour written exam, where we will ask you basic concepts that we have covered during the semester.
The second one is:
  • Do a 2-hour more extended written exam.

In addition, we will take into account participation during class.

 

Syllabus and lecture material

Make sure that you register to the class mailing list and to Google classroom to be able to get this information. See the Announcements above.