Data Mining, 2015 - Projects

Data Mining

Academic year 2014–2015

Course Projects

For the course project there are three options.

Option 1.

Most students are expected to choose this option, which is to find a topic related to data mining and implement it.

You have a lot of freedom about the type of project. It can be something that interests you. Maybe a facebook or an Android application you might be interested to design and contains topics related to data mining. If you do research, it may be related to your research. Data mining is a quite broad field, so you may find something that would be of interest to you. If you work for a company, it could be something that may be related to your work there.

In addition, there is available a very large collection of public datasets from various Italian government bodies. They are very interested to see their data being used and analyzed towards something useful.

The following sites can be starting points as they contain links to a lot of public sites that contain data:

http://www.dati.gov.it
http://www.opendatahub.it

You can search the various sites and see what datasets are available.  As you will see, there are various sources and types of data: real-time traffic data, financement, health, culture, ...

What can these data tell us about a problem and how could we use them to do something useful? Search around the data and come up with a project topic that you find interesting. Of particular interest are topics that combine information from more than one datasets.

For instance, a topic that might be of interest might be to look through the http://www.opencoesione.gov.it web site and find a way (possibly by making combined use with other datasets) to evaluate if public projects are successful, in what regions public funds are being used better or worse, and so on. Actually, there is particular interest of such a topic, which and if the project is interesting there may be the possibility for an internship.

There is a huge number of datasets, try to search around and find an interesting problem.

Except for these public data, you may propose something else. A social-network application, some analysis of twitter data toward a particular problem, and so on.

Another option is, if you have some well-thought ideas that you think can improve the algorithms or techniques that we have covered, to discuss them and try to test them.


Procedure:

First you should find a topic. Then email to the instructor your proposal. He will tell you if it is OK or if it is too easy or too hard, and give you suggestions; if needed you can arrange a meeting to discuss.

If you have problems finding a concrete topic, you can contact the instructor with some general ideas, to close down to a problem together.

 

Option 2.

The second type of project is for students who would are interested in data mining beyond the class, requiring more effort (but also being more fun). The idea is that we will form teams to participate in and try to win some public challenges. Some practical issues about this option:

  • It will be done in groups of size 2 or 3 students (only exceptionally, for a hard challenge, a team of 4 people can be proposed), together with the instructor and maybe some Ph.D. students.
  • The idea is that the students and the instructor will try together to win the challenge (of course the actual work will be done by the students). It is expected that the amount of work for these types of paper is larger than Option 1.
  • Winning at some challenges offer money/fame.
  • The timing depends on the deadlines of the corresponding competitions.
Here is a list of potential projects. Check the individual pages for details, and for any questions ask the instructor. As other challenges come out, this list will be updated. Also if you find some interesting challenge, you can propose it.
  1. ICDM 2015 challenge, Deadline: 24/8/2015
  2. Avito Context Ad Clicks, Deadlines: 21/7/2015 (first submission), 28/7/2015 (final submission)
  3. WSDM challenge, hopefully an interesting challenge will be published here.
  4. TIM Big data challenge, Deadlines: 28/7/2015 (registration), 5/9/2015 (submission). The idea here is to come up with some interesting problem or application that uses data offered by the organizers. Data are becoming available in time, to see what data are available you should register and browse the competition site. For example, in the beginning of June, a description of the available data can be found here.


Procedure:

First be sure that you are interested in investing the time into participating.

Email to the instructor with the challenge that you are interested in and with a proposed team.

The instructor will tell you how if this is fine and how to proceed. Note that each challenge will be assigned to at most one team.

 

Option 3.

This option is for people who are more interested in doing some research in data mining and believe they would enjoy to work more on this on the future. It is more research oriented, and expected to be harder than Option 1, but will provide a better idea of how it is to work on a new problem. Working on this option will require meetings every one or two weeks and will span several months. You should contact the instructor for a topic and he will give you some papers to read. After that, you will discuss, and then fix a problem to work on. Typically, a problem will have both a theoretical and a programming part.