Data Mining

Academic year 2017–2018

Course Projects

You have a lot of freedom about the type of project. It can be something that interests you. Maybe a facebook or an Android application you might be interested to design and contains topics related to data mining. If you do research, it may be related to your research. Data mining is a quite broad field, so you may find something that would be of interest to you. If you work for a company, it could be something that may be related to your work there.

In addition, there is available a very large collection of public datasets from various Italian government bodies. They are very interested to see their data being used and analyzed towards something useful.

The following sites can be starting points as they contain links to a lot of public sites that contain data:

http://www.dati.gov.it
http://www.opendatahub.it

You can search the various sites and see what datasets are available.  As you will see, there are various sources and types of data: real-time traffic data, financement, health, culture, ...

What can these data tell us about a problem and how could we use them to do something useful? Search around the data and come up with a project topic that you find interesting. Of particular interest are topics that combine information from more than one datasets.

For instance, a topic that might be of interest might be to look through the http://www.opencoesione.gov.it web site and find a way (possibly by making combined use with other datasets) to evaluate if public projects are successful, in what regions public funds are being used better or worse, and so on. Actually, there is particular interest of such a topic, which and if the project is interesting there may be the possibility for an internship.

There is a huge number of datasets, try to search around and find an interesting problem.

Except for these public data, you may propose something else. A social-network application, some analysis of twitter data toward a particular problem, and so on.

Another option is, if you have some well-thought ideas that you think can improve the algorithms or techniques that we have covered, to discuss them and try to test them.

You are allowed, and actually encouraged, to use the technologies that we did during the tutorails; yours or someone else's.

 

Research-oriented problems

If you are particularly interested in the topic, and you are interested in working on a project that is more original, more challenging, and may lead to a thesis, then let us know. Then we can work on a problem in collaboration. Such projects, in comparison to standard projects, require:

  • More effort/time
  • More theoretical skills
  • Study of relevant literature to identify and solve problems

Projects on pervasive systems (in collaboration with Andrea Vitaletti)

Another option for motivated students, is to work in the area of pervasive systems, which will be done together with Andrea Vitaletti.

  1. Efficient data management for underwater networks
  2. Privacy preserving machine learning
  3. Privacy preserving data management
  4. Data Management for Medical Clinical Trials
  5. Data Integration from open data sources

Projects on the Internet of Things (in collaboration with Ioannis Chatzigiannakis)

Yet another option is that of the Internet of Things, which will be done together with Ioannis Chatzigiannakis.

  1. Scalable Data Processing on Android Devices
  2. Scalable Data Processing on Raspberry Pi Cluster
  3. Machine Learning on Healthcare Applications
  4. Machine Learning on Network Intrusion Detection
  5. Data Analysis for Performance of Buildings


Procedure:

Projects are done individually. First you should find a topic. Then email to Aris your proposal. He will tell you if it is OK or if it is too easy or too hard, and give you suggestions; if needed you can arrange a meeting to discuss.

Finding the project topic, is also part of the project. However, if you have problems finding a concrete topic, you can contact the instructor with some general ideas, to close down to a problem together.

Of course, if you are more interested into the more reasearch-oriented problems or the projects in collaboration with the other instructions, let Aris know.

There is no deadline for the project. It must be prepared before the exam session (appello) that you want to follow. During the exam you should hand in the code, the input and output data, any additional files that you may have (e.g., some document, presentation, etc.) and you have to present it. How you present it is up to you. For example, you may want to have a powerpoint presentation, or just go through your application. You will also answer questions about the project. (Note that during the project presentation you may be also asked about the homeworks and class materia.)

Scalable Data Processing on Android Devices
 
2. Scalable Data Processing on Raspberry Pi Cluster
 
3. Machine Learning on Healthcare Applications

4. Machine Learning on Network Intrusion Detection

5. Data Analysis for Performance of Buildings