Home

Our Lab

The Knowledge Discovery and Data Mining Laboratory (KDD Lab) is a joint research initiative of ISTI Institute of CNR and the Department of Computer Science of the University of Pisa.

The objective of the research unit is the development of theory, techniques and systems for extracting and delivering useful knowledge out of large masses of data.

Today, knowledge discovery and data mining is both a technology that blends data analysis methods with sophisticated algorithms for processing large data sets, and an active research field that aims at developing new data analysis methods for novel forms of data. On one side, classification, clustering and pattern discovery tools are now part of mature data analysis and Business Intelligence systems and have been successfully applied to problems in various commercial and scientific domains. On the other side, the increasing heterogeneity and complexity of the new forms of data – such as those arriving from medicine, biology, the Web, the Earth observation systems, the mobility data arriving from wireless networks – call for new forms of patterns and models, together with new algorithms to discover such patterns and models efficiently.

In this context, the mission of the KDD laboratory is to pursue fundamental research, strategic applications and higher education in the areas of:

Mobility Data Mining for Science of Cities

Ethical Data Mining

Social Network Analysis and Visual Analytics for Social Mining

Analytical Platforms and Data Infrastructures for Social Mining

Our Skills

It was 1999 when we approached data mining research field. Our exploration of the world of Data is still continuing...

Data Mining

Analysis methods and tools to extract knowledge hidden in the data, including frequent patterns, clustering and classification.

Data Visualization

Visual representation coupled with advanced analytics to comprehend and understand complex and large data.

Privacy Risk Assessment

Study and design of methods for assessing privacy risks in data analytics.

 

Data Science

A combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and technological tools.

Big Data

Acquiring strategies to manage and analyse large data sets and related tools such as MapReduce, Spark, Hive and Pigas well as NoSQL databases.

Community Analysis

Identify hidden sub-structures within complex networks and exploit them to bound homophilic behaviors.

 

Mobility Data Analysis

Inferring human mobility information from location data sources such as GPS trajectories, mobile phone traces and social media.

Economic Complexity

Understand hidden features of products and customers studying their position in the network built over the market.

Network Dynamics

Track, understand and forecast topological perturbations that affect complex networks as time goes by.

 

Privacy by Design

Building frameworks to counter the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities.

Science of Success

Understanding the patterns of success in several fields: sports performance, popularity of artistic items, emergence of new technologies.

Sports Data Mining

Developing new methods of performance measurement by taking advantage of the huge growth of data collected during sport events.

 

Quantification

Design algorithms for estimating the distribution of a population across different classes, and for tracking the changes in this distribution.

Spreading, Diffusion and Innovation

Design and develop useful tools for understanding, monitoring and signaling diffusion phenomena.

Well-Being Indicators

Developing of models to predict the well-being of territories based on Big Data on human behavior.

 

Discrimination Discovery and Prevention

Design algorithms for discovering discrimination in socially sensitive decision data and for enforcing fairness in data mining models.

Social Network Analysis

Extract meaningful knowledge from complex online and offline social contexts.

Multi-Dimensional Networks

Correlate multiple data sources to build and thus understand semantic enriched descriptions of real world networked contexts.

The Data Incubator

Start:
2015/06/01 00:00 Europe/Rome
End:
2015/07/17 00:00 Europe/Rome
Location:
New York, Washington DC, San Francisco Bay Area, or electronically
Link:
The Data Incubator
Description:
The Data Incubator is an intensive 7 week fellowship that prepares the best scientists and engineers with advanced degrees to work as data scientists and quants. It identifies fellows who already have the 90% difficult-to-learn skills and equips them with the last 10%: the tools and technology stack that make them self-sufficient, productive contributors. The program is free for fellows. Employers only pay a tuition fee if they successfully hire. Fellows have the option to participate in the program either in person in NYC, Washington DC, San Francisco Bay Area, or electronically.

Lipari School on Computational Complex Systems "Mapping the World, from open data to crowdsourcing and bottom-up society: models, algorithms and applications"

Start:
2015/07/12 09:00 Europe/Rome
End:
2015/07/18 17:00 Europe/Rome
Location:
Lipari Island, Italy
Link:
Lipari Summer School 2015
Description:
This summer school will provide opportunities to collect experience with modern data analysis, in particular Big Data analytics. This includes subjects such as how to mine data in the Internet and data of Social Media. Moreover, participants will be able to work with us on the Planetary Nervous System. The Planetary Nervous System is a large-scale distributed research platform that will provide real-time social mining services as a public good. Existing Big Data systems threaten social cohesion as they are designed to be closed, proprietary, privacy-intrusive and discriminatory.

DyNo: 1st International Workshop on Dynamics in Networks

Start:
2015/08/25 09:00 Europe/Rome
End:
2015/08/25 09:00 Europe/Rome
Location:
Paris, France
Link:
DyNo
Description:

In the last years we witnessed to a shift from static network analysis to a dynamic networks analysis, i.e., the study of networks whose structure change over time. As time goes by, all the perturbations which occur on the network topology due to the rise and fall of nodes and edges have repercussions on the network phenomena we are used to observe. As an example, evolution over time of social interactions in a network can play an important role in the diffusion of an infectious disease.

MoKMaSD 2015

Start:
2015/09/08 09:00 Europe/Rome
End:
2015/09/08 09:00 Europe/Rome
Location:
York, UK
Link:
MoKMaSD 2015
Description:

MoKMaSD 2015 aims at bringing together practitioners and researchers from academia, industry and research institutions to present research results and exchange experience, ideas, and solutions for modelling and analysing complex systems and using knowledge management and discovery methodologies in various domain areas such as social systems, ecology, biology, medicine, smart cities, governance, education and social software engineering.

Clip evento BIG DATA IN ACTION Roma
Dino Pedreschi at SASForum 22/04/2015
[HD] Dino Pedreschi -  Towards a Digital Time Machine fueled by Big Data and Social Mining
Watch Dogs e i problemi oggi con Big Data e Data Mining - TVtech

Publications

Fun Facts


3
Gigabytes of data produced by a single person each year
3100
Millions of Internet users
500
Millions of Tweets sent per day
2300
Gigabytes of Internet traffic per day

Contacts

Need info? Want ideas? Write us!

Address @ ISTI

KDD Lab
Istituto di Scienza e Tecnologie dell’Informazione
Area della Ricerca CNR
via G. Moruzzi 1
56124 Pisa, Italy

Address @ UniPi

KDD Lab
Dipartimento di Informatica
Università di Pisa
Largo B. Pontecorvo 3
56127 Pisa, Italy

Phone Number

Phone: +39 050 621 3013
Fax: +39 050 315 2040

Email

kddlab-info@isti.cnr.it

Largo B. Pontecorvo 3
56127 Pisa
Italy
via Moruzzi 1
56124 Pisa
Italy