FONDAMENTI DI ANALISI DATI E LABORATORIOModule FONDAMENTI DI ANALISI DATI
Academic Year 2025/2026 - Teacher: ANTONINO FURNARIExpected Learning Outcomes
- Knowledge and Understanding: The student will gain a solid understanding of the fundamental principles needed to collect, organize, model, analyze, and interpret data. This will be achieved through the presentation of a theoretical-mathematical framework and numerous examples of its application to real datasets. The student will develop a deep understanding of the conceptual foundations of data analysis.
- Applying Knowledge and Understanding: The student will acquire technical skills for constructing, managing, and analyzing real datasets, with the goal of building models and decision support systems. They will be able to apply the acquired knowledge to solve practical problems using tools and techniques for data analysis.
- Making Judgements: The student will be able to independently choose the most appropriate techniques for solving a data analysis problem, evaluating their pros and cons. They will be capable of justifying their choices and critically assessing various methodologies for data analysis and knowledge extraction.
- Communication Skills: The student will be trained to produce complete, rigorous, and visually appropriate reports that effectively and correctly communicate the results of data analysis and exploration. Conclusions will be clearly justified and communicated effectively to both technical and non-technical audiences.
- Learning Skills: The student will develop the necessary skills to update themselves independently on the use of techniques, software, and programming languages useful for data analysis, ensuring continuous learning even after the course ends.
Course Structure
Lectures in the classroom that combine theoretical instruction with practical lab sessions. During these sessions, the techniques studied will be demonstrated and applied through code examples and guided analyses on real-world datasets.
If the teaching is given in mixed or distance mode, the necessary variations with respect to what was previously stated may be introduced, in order to comply with the program provided and reported in the syllabus.
Required Prerequisites
Basic skills in programming, calculus, and linear algebra are required.
Attendance of Lessons
Attending lectures is not mandatory, but strongly recommended.
Detailed Course Content
The course is divided into three main modules:
1. Data Analysis
2. Predictive Techniques
3. Data Representation
The following sections detail the contents of each module.
Data Analysis
• Overview of data analysis: main types, purposes, applications, and examples
• Types of data: nominal, ordinal, interval, and ratio
• Data collection techniques: surveys, experiments, observational studies, sampling
• Difference between sample and population
• Data preprocessing techniques: data cleaning, handling missing data, standardization, encoding categorical variables (dummy variables), noise reduction (filtering, outlier removal, normalization)
• Use of probability in data analysis: joint, marginal, and conditional probability; independence and conditional independence; Bayes’ theorem and its application; discrete, continuous, and cumulative probability distributions; notable distributions
• Measures of central tendency: mean, median, mode
• Measures of dispersion: variance, standard deviation, quartiles, interquartile range
• Covariance and correlation between variables
• Data visualization techniques: pie charts, histograms, boxplots, scatterplots, hexbin plots, density maps, contour plots, scatter matrix, regression plots
• Inferential analysis tools: confidence intervals, significance levels, statistical tests
Predictive Techniques
• Core concepts: training, validation, and test sets; cross-validation; generative and discriminative algorithms; parameters and hyperparameters; parametric and non-parametric methods; overfitting and underfitting; bias and variance; linear and nonlinear models
• Regression techniques: evaluation metrics (mean squared error, mean absolute error); linear regression; model evaluation and statistical significance of coefficients; model selection techniques such as backward elimination
• Classification techniques: performance metrics (confusion matrix, precision, recall, F1 score); ROC curves for binary classification; K-Nearest Neighbor (KNN), logistic regression, multinomial and softmax regression; MAP and Naive Bayes
Data Representation
• Features, representation functions, feature spaces, metrics
• Clustering techniques: definitions and K-Means
• Gaussian fitting and Maximum Likelihood
• Non-parametric density estimation using kernel density estimation
• Principal Component Analysis (PCA)
-->Textbook Information
Chapters from these books:
- Peck, Roxy, Chris Olsen, and Jay L. Devore. Introduction to statistics and data analysis. Cengage Learning, 2015.
- James, Gareth Gareth Michael. An introduction to statistical learning: with applications in Python, 2023.https://www.statlearning.com
- Bishop, Christopher M. "Machine Learning. Machine learning, 2006. https://www.microsoft.com/en-us/research/publication/pattern-recognition-machine-learning/
- Hernán, Miguel A., and James M. Robins. Causal inference, 2010. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
- Knaflic, Cole Nussbaumer. Storytelling with data: A data visualization guide for business professionals. John Wiley & Sons, 2025.
Teaching material shared by the teacher through Microsoft Teams (Team code: i87g4nb) and through the https://antoninofurnari.github.io/fadlecturenotes/ website.
Course Planning
Subjects | Text References | |
---|---|---|
1 | Introduction to the course | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
2 | Main Data Analysis Concepts | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
3 | Descriptive Statistics and Graphical Representation of data | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
4 | Uncertainty and data as the observation of random events | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
5 | Probability Distributions | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
6 | Use of statistical inference in data analysis | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
7 | Associations of Two Variables | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
8 | Clustering & Density estimation | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
9 | Dimensionality Reduction and Principal Component Analysis | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
10 | Predictive Data Analysis | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
11 | Probabilistic Models for Classification | Teaching material made provided by the teacher, specific chapters of the recommended nooks. |
Learning Assessment
Learning Assessment Procedures
The exam is divided into the following tests:
- A written exam designed to assess the student's theoretical understanding of the topics covered in the course, from both a theoretical and methodological perspective. The exam is graded on a scale of thirty.
- A project assigned by the instructor and carried out independently by the student, aimed at evaluating practical skills in data analysis and communication of results. The project is presented to the instructor through a presentation and graded on a scale of thirty.
Students with disabilities and/or DSA must contact the teacher, the CInAP representative of the DMI (Prof. Daniele) and CInAP well in advance of the exam date to communicate that they intend to take the exam using the appropriate compensatory measures.
Two written in itinere exams are scheduled during the course. Passing both tests grants exemption from the final written exam.
The final grade is obtained by means of a weighted average between the marks obtained in the two tests with weights of 40% for the written test and 60% for the project.
The assessment of learning can also be conducted remotely if the conditions require it.
The grading of each test is expressed on a scale of thirty points according to the following scheme:
Score 29-30 with honors
The student has a deep understanding of the concepts and techniques of data analysis. They can promptly analyze data analysis problems, identifying the most suitable data analysis techniques for the given problem independently and critically, and indicating the most suitable methodological practices for their application. They have excellent communication skills and language proficiency.
Score 26-28
The student has a good understanding of the concepts and techniques of data analysis. They can analyze data analysis problems, identifying appropriate data analysis techniques for the given problem and indicating suitable methodological practices for their application. They have good communication skills and language proficiency.
Score 22-25
The student has a fair knowledge of the concepts and techniques of data analysis, although it may be limited to the main topics. They can analyze data analysis problems, albeit not always in a linear manner, identifying suitable data analysis techniques for the given problem. They have fair communication skills and language proficiency.
Score 18-21
The student has minimal knowledge of the concepts and techniques of data analysis. They have limited ability to analyze data analysis problems. They have sufficient communication skills, although not always appropriate language proficiency.
Examination not passed
The student does not possess the minimum required knowledge of the main content of the course. Their ability to use specific language is very poor or nonexistent, and they are unable to independently apply the acquired knowledge.
Examples of frequently asked questions and / or exercises
The data analysis project is generally based on medium-large datasets obtainable on the internet.
Examples of typical exam questions:
- Define the classification problem, discuss the differences with respect to the regression problem and give practical examples.
- Explain the K-NN algorithm for classification. Discuss the effect of parameter K on algorithm performance. Give graphical examples of how the algorithm works and the effect of K.
- Discuss evaluation measures for classification problems: accuracy, confusion matrix, precision, recall and F1 score. The pros and cons of the measures considered are discussed, also in relation to the characteristics of the test dataset.
- Illustrate the main techniques useful for studying the correlation between variables.