Bioinformatics

Academic Year 2025/2026 - Teacher: SALVATORE ALAIMO

Expected Learning Outcomes

-->

Below, we report the general learning objectives of the course in terms of expected learning outcomes:

1. Knowledge and Understanding: The course aims to provide foundational knowledge and skills for the analysis, representation, and organization of bioinformatics data.

2. Applying Knowledge and Understanding: The student will acquire knowledge of models and algorithms for bioinformatics data analysis, such as sequence alignment and comparison, analysis of nucleic acid and protein structures, workflow construction, and analysis reproducibility.

3. Making Judgements: Through concrete examples and case studies, the student will be able to independently develop solutions to specific problems related to bioinformatics data analysis. The final part of the course will focus on case studies that allow students to apply the skills acquired.

4. Communication Skills: The student will acquire the necessary communication skills and appropriate use of technical language in the general field of bioinformatics data analysis.

5. Learning Skills: The course aims to provide students with the theoretical and practical methodologies needed to independently address and solve new problems encountered during professional activities. To this end, various topics will be presented by involving students in finding possible solutions to real-world problems using benchmarks from literature and case studies.

Course Structure

Traditional in-person lectures.
Should teaching be carried out in mixed mode or remotely, it may be necessary to introduce changes with respect to previous statements, in line with the programme planned and outlined in the syllabus.

Required Prerequisites

Programming
Data structures

Attendance of Lessons

Attendance is not mandatory, but strongly recommended.
Slides will be made available by the instructor to aid lesson comprehension.
Note: Slides are not a substitute for study. Students should study the provided materials, textbook, and complete exercises to fully understand course concepts.

Detailed Course Content

Module 1 – Introduction and Fundamentals

Introduction to Bioinformatics

Course objectives, structure, and assessment methods
Overview of bioinformatics: definition, applications
Types of non-omics biological data: sequences, structures, interactions
Introduction to the program and tools to be used in the course

Fundamentals of Probability, Statistics, Inference, Statistical Tests

Basic concepts of probability
Discrete and continuous probability distributions
Random variables and statistical independence
Bayes’ Theorem with bioinformatics applications
Descriptive statistics for biological data
Hypothesis testing, p-value, type I and II errors
Common statistical tests (t-test, chi-squared, ANOVA)
Regression and correlation models
Visualization: histograms, boxplots
Concept of statistical vs biological significance
Practical examples using biological data

Module 2 – Languages for Bioinformatics

Introduction to R for Bioinformatics Analysis

Data structures: vectors, data frames, lists
Basic functions
Use of the tidyverse packages
Bioinformatics packages in R (Bioconductor)
Statistical analysis in R
Graph creation in R: histograms, scatterplots, boxplots, heatmaps
Brief intro to ggplot2

Introduction to Python and Biopython

Fundamentals of Biopython
Sequence manipulation and access to biological databases
Sequence transcription and translation
GC-content calculation, reverse complement
Parsing of annotations and biological features

Module 3 – Sequence Comparison and Analysis

Representation of Biological Sequences

File formats for sequences (FASTA, FASTQ, GenBank)
Properties of nucleotide and protein sequences
Biological databases
Importing and manipulating sequences

Sequence Alignment I

Basic concepts of sequence similarity: similarity, identity, and homology
Local vs global alignment
Global alignment algorithms (Needleman-Wunsch)
Substitution matrices (PAM, BLOSUM)
Alignment evaluation

Sequence Alignment II

Local alignment algorithms (Smith-Waterman)
Multiple sequence alignment
Alignment programs (BLAST, CLUSTAL)
Practical alignment exercises in Python and R
MSA interpretation and profile construction

Pattern Search in Sequences

Exact and approximate pattern matching
Pattern search algorithms (Boyer-Moore, Knuth-Morris-Pratt)
Hidden Markov Models (HMM) for sequences
Applications in bioinformatics

Molecular Phylogeny I

Basics of molecular evolution
Construction of phylogenetic trees
Distance and parsimony methods
Phylogenetic analysis software

Molecular Phylogeny II

Maximum likelihood in phylogeny
Bayesian inference
Interpretation of phylogenetic results
Applications in bioinformatics

Module 4 – Structural Bioinformatics

Nucleic Acid Structures I

Structural properties of DNA and RNA
RNA secondary structure prediction
Algorithms for structure prediction
Visualization of nucleotide structures

Nucleic Acid Structures II

3D structure analysis of nucleic acids
DNA-protein interactions
Non-canonical structures of nucleic acids

Protein Structure I

Structural organization levels of proteins
Secondary structure prediction
Statistical and machine learning methods for prediction
Protein structure databases

Protein Structure II

Tertiary structure prediction
Homology modeling
Prediction with AlphaFold and RNAfold
Result interpretation
Basic use of PyMOL and ChimeraX
Exploration of a protein with known active sites

Molecular Interactions

Thermodynamic principles of interactions
Molecular docking and scoring functions
Molecular dynamics simulations
Analysis of protein-protein and protein-DNA/RNA interactions

Module 5 – Workflow and Reproducibility

Bioinformatics Workflows I

Principles of bioinformatics pipelines
Workflow management
Reproducibility in bioinformatics
Workflow documentation
Manual examples (Bash/Python)

Bioinformatics Workflows II

Implementation of complete workflows
Introduction to Snakemake or Nextflow
Practical example: Snakemake/Nextflow for MSA
Management of large datasets

Containerization for Bioinformatics I

Introduction to Docker
Creating containers for bioinformatics tools
Best practices in containerization
Practical examples of containers for bioinformatics analyses

Containerization for Bioinformatics II

Container orchestration with Docker Compose
Sharing and distribution of containers
Integration of containers into bioinformatics workflows
Second homework: build a workflow to analyze FASTA files and produce a report

Case Studies and practical examples

Textbook Information

-->

Recommended text: "Fondamenti di bioinformatica"
Authors: Manuela Helmer Citterich, Fabrizio Ferrè, Giulio Pavesi, Graziano Pesole, Chiara Romualdi
Publisher: Zanichelli (2018)

Other recommended text:

· “Bioinformatics”
Authors: Andreas D. Baxevanis, Gary D. Bader, David S. Wishart
Publisher: Wiley (2020)

· “R Bioinformatics Cookbook: Utilize R packages for bioinformatics, genomics, data science, and machine learning”
Authors: Dan MacLean
Publisher: Packt Publishing (2023)

· “Mastering Python for Bioinformatics: How to Write Flexible, Documented, Tested Python Code for Research Computing”
Publisher: Ken Youens-Clark
Editore O'Reilly Media (2021)

· “Bioinformatica: Dalla sequenza alla struttura delle proteine”
Authors: Stefano Pascarella, Alessandro Paiardini
Publisher: Zanichelli (2011)

Additional resources will be indicated by the instructor through slides used in class.

Course Planning

	Subjects	Text References
1	Introduction to bioinformatica
2	Probability foundations
3	Introduction to R
4	Introduction to Python and Biopython
5	Pairwise sequence alignment
6	Multiple sequence alignment
7	Molecular Phylogenetics
8	RNA Structure
9	Protein Structure
10	Prediction of molecular interactions
11	Workflow and Reproducibility

Learning Assessment

Learning Assessment Procedures

-->

The final exam consists of a written test and an oral interview in which a project, agreed upon between student and instructor, will be discussed.

· The written test and oral interview will be graded out of 30. The final grade is a weighted average:

o Written test: 25% of the final grade

o Oral exam/project: 75% of the final grade

· The written test includes a theory question on course topics, where the student must demonstrate comprehensive understanding.

· Minimum passing score for written test: 18/30. Students must pass the written test to access the oral exam.

· The written exam can be reviewed with the instructor at any time.

· Minimum score to pass the full exam: 18/30.

· The project must be completed within one month of passing the written test. It can be arranged at any time.

· If the student refuses the written test score, the project score is retained for the entire academic year. If the final grade is refused, both written and project must be retaken.

Exam logistics (time and location) will be announced through official university channels.

Notes:

Use of any hardware (calculators, tablets, smartphones, headphones, etc.) or personal documents during the written exam is prohibited.
Exam registration via the university student portal is mandatory.
Late registration via email is not allowed. Without registration, the exam result cannot be recorded.

-->

Examples of frequently asked questions and / or exercises

Some examples of exam questions include:

What is pairwise sequence alignment? Describe the problem, its complexity, and its purpose, and illustrate an algorithm used to solve it.
What is multiple sequence alignment? Describe the problem, its complexity, and its purpose, and illustrate an algorithm used to solve it.
Describe the problem of RNA secondary structure and an algorithm for predicting it.

Other sample questions will be provided in class.

Please note that these examples are not binding for the final exam.

VERSIONE IN ITALIANO

Degree Course in

Computer Science