Bioinformatics

Academic Year 2025/2026 - Teacher: SALVATORE ALAIMO

Expected Learning Outcomes

-->

Below, we report the general learning objectives of the course in terms of expected learning outcomes:

1.     Knowledge and Understanding: The course aims to provide foundational knowledge and skills for the analysis, representation, and organization of bioinformatics data.

2.     Applying Knowledge and Understanding: The student will acquire knowledge of models and algorithms for bioinformatics data analysis, such as sequence alignment and comparison, analysis of nucleic acid and protein structures, workflow construction, and analysis reproducibility.

3.     Making Judgements: Through concrete examples and case studies, the student will be able to independently develop solutions to specific problems related to bioinformatics data analysis. The final part of the course will focus on case studies that allow students to apply the skills acquired.

4.      Communication Skills: The student will acquire the necessary communication skills and appropriate use of technical language in the general field of bioinformatics data analysis.

5.      Learning Skills: The course aims to provide students with the theoretical and practical methodologies needed to independently address and solve new problems encountered during professional activities. To this end, various topics will be presented by involving students in finding possible solutions to real-world problems using benchmarks from literature and case studies.

Course Structure

Traditional in-person lectures.
Should teaching be carried out in mixed mode or remotely, it may be necessary to introduce changes with respect to previous statements, in line with the programme planned and outlined in the syllabus.

Required Prerequisites

  • Programming
  • Data structures

Attendance of Lessons

Attendance is not mandatory, but strongly recommended.
Slides will be made available by the instructor to aid lesson comprehension.
Note: Slides are not a substitute for study. Students should study the provided materials, textbook, and complete exercises to fully understand course concepts.

Detailed Course Content

Module 1 – Introduction and Fundamentals

  • Introduction to Bioinformatics
    • Course objectives, structure, and assessment methods
    • Overview of bioinformatics: definition, applications
    • Types of non-omics biological data: sequences, structures, interactions
    • Introduction to the program and tools to be used in the course
  • Fundamentals of Probability, Statistics, Inference, Statistical Tests
    • Basic concepts of probability
    • Discrete and continuous probability distributions
    • Random variables and statistical independence
    • Bayes’ Theorem with bioinformatics applications
    • Descriptive statistics for biological data
    • Hypothesis testing, p-value, type I and II errors
    • Common statistical tests (t-test, chi-squared, ANOVA)
    • Regression and correlation models
    • Visualization: histograms, boxplots
    • Concept of statistical vs biological significance
    • Practical examples using biological data

Module 2 – Languages for Bioinformatics

  • Introduction to R for Bioinformatics Analysis
    • Data structures: vectors, data frames, lists
    • Basic functions
    • Use of the tidyverse packages
    • Bioinformatics packages in R (Bioconductor)
    • Statistical analysis in R
    • Graph creation in R: histograms, scatterplots, boxplots, heatmaps
    • Brief intro to ggplot2
  • Introduction to Python and Biopython
    • Fundamentals of Biopython
    • Sequence manipulation and access to biological databases
    • Sequence transcription and translation
    • GC-content calculation, reverse complement
    • Parsing of annotations and biological features

Module 3 – Sequence Comparison and Analysis

  • Representation of Biological Sequences
    • File formats for sequences (FASTA, FASTQ, GenBank)
    • Properties of nucleotide and protein sequences
    • Biological databases
    • Importing and manipulating sequences
  • Sequence Alignment I
    • Basic concepts of sequence similarity: similarity, identity, and homology
    • Local vs global alignment
    • Global alignment algorithms (Needleman-Wunsch)
    • Substitution matrices (PAM, BLOSUM)
    • Alignment evaluation
  • Sequence Alignment II
    • Local alignment algorithms (Smith-Waterman)
    • Multiple sequence alignment
    • Alignment programs (BLAST, CLUSTAL)
    • Practical alignment exercises in Python and R
    • MSA interpretation and profile construction
  • Pattern Search in Sequences
    • Exact and approximate pattern matching
    • Pattern search algorithms (Boyer-Moore, Knuth-Morris-Pratt)
    • Hidden Markov Models (HMM) for sequences
    • Applications in bioinformatics
  • Molecular Phylogeny I
    • Basics of molecular evolution
    • Construction of phylogenetic trees
    • Distance and parsimony methods
    • Phylogenetic analysis software
  • Molecular Phylogeny II
    • Maximum likelihood in phylogeny
    • Bayesian inference
    • Interpretation of phylogenetic results
    • Applications in bioinformatics

Module 4 – Structural Bioinformatics

  • Nucleic Acid Structures I
    • Structural properties of DNA and RNA
    • RNA secondary structure prediction
    • Algorithms for structure prediction
    • Visualization of nucleotide structures
  • Nucleic Acid Structures II
    • 3D structure analysis of nucleic acids
    • DNA-protein interactions
    • Non-canonical structures of nucleic acids
  • Protein Structure I
    • Structural organization levels of proteins
    • Secondary structure prediction
    • Statistical and machine learning methods for prediction
    • Protein structure databases
  • Protein Structure II
    • Tertiary structure prediction
    • Homology modeling
    • Prediction with AlphaFold and RNAfold
    • Result interpretation
    • Basic use of PyMOL and ChimeraX
    • Exploration of a protein with known active sites
  • Molecular Interactions
    • Thermodynamic principles of interactions
    • Molecular docking and scoring functions
    • Molecular dynamics simulations
    • Analysis of protein-protein and protein-DNA/RNA interactions

Module 5 – Workflow and Reproducibility

  • Bioinformatics Workflows I
    • Principles of bioinformatics pipelines
    • Workflow management
    • Reproducibility in bioinformatics
    • Workflow documentation
    • Manual examples (Bash/Python)
  • Bioinformatics Workflows II
    • Implementation of complete workflows
    • Introduction to Snakemake or Nextflow
    • Practical example: Snakemake/Nextflow for MSA
    • Management of large datasets
  • Containerization for Bioinformatics I
    • Introduction to Docker
    • Creating containers for bioinformatics tools
    • Best practices in containerization
    • Practical examples of containers for bioinformatics analyses
  • Containerization for Bioinformatics II
    • Container orchestration with Docker Compose
    • Sharing and distribution of containers
    • Integration of containers into bioinformatics workflows
    • Second homework: build a workflow to analyze FASTA files and produce a report
  • Case Studies and practical examples

Textbook Information

-->

Recommended text: "Fondamenti di bioinformatica"
Authors: Manuela Helmer Citterich, Fabrizio Ferrè, Giulio Pavesi, Graziano Pesole, Chiara Romualdi
Publisher: Zanichelli (2018)

Other recommended text:

·       “Bioinformatics”
Authors: Andreas D. Baxevanis, Gary D. Bader, David S. Wishart
Publisher: Wiley (2020)

·       “R Bioinformatics Cookbook: Utilize R packages for bioinformatics, genomics, data science, and machine learning”
Authors: Dan MacLean
Publisher: Packt Publishing (2023)

·       “Mastering Python for Bioinformatics: How to Write Flexible, Documented, Tested Python Code for Research Computing”
Publisher: Ken Youens-Clark
Editore O'Reilly Media (2021)

·       “Bioinformatica: Dalla sequenza alla struttura delle proteine”
Authors: Stefano Pascarella, Alessandro Paiardini
Publisher: Zanichelli (2011)

Additional resources will be indicated by the instructor through slides used in class.

Course Planning

 SubjectsText References
1Introduction to bioinformatica
2Probability foundations
3Introduction to R
4Introduction to Python and Biopython
5Pairwise sequence alignment
6Multiple sequence alignment
7Molecular Phylogenetics
8RNA Structure
9Protein Structure
10Prediction of molecular interactions
11Workflow and Reproducibility

Learning Assessment

Learning Assessment Procedures

-->

The final exam consists of a written test and an oral interview in which a project, agreed upon between student and instructor, will be discussed.

·       The written test and oral interview will be graded out of 30. The final grade is a weighted average:

o   Written test: 25% of the final grade

o   Oral exam/project: 75% of the final grade

·       The written test includes a theory question on course topics, where the student must demonstrate comprehensive understanding.

·       Minimum passing score for written test: 18/30. Students must pass the written test to access the oral exam.

·       The written exam can be reviewed with the instructor at any time.

·       Minimum score to pass the full exam: 18/30.

·       The project must be completed within one month of passing the written test. It can be arranged at any time.

·       If the student refuses the written test score, the project score is retained for the entire academic year. If the final grade is refused, both written and project must be retaken.

Exam logistics (time and location) will be announced through official university channels.

Notes:

  • Use of any hardware (calculators, tablets, smartphones, headphones, etc.) or personal documents during the written exam is prohibited.
  • Exam registration via the university student portal is mandatory.
  • Late registration via email is not allowed. Without registration, the exam result cannot be recorded.

-->

Examples of frequently asked questions and / or exercises

Some examples of exam questions include:

  • What is pairwise sequence alignment? Describe the problem, its complexity, and its purpose, and illustrate an algorithm used to solve it.
  • What is multiple sequence alignment? Describe the problem, its complexity, and its purpose, and illustrate an algorithm used to solve it.
  • Describe the problem of RNA secondary structure and an algorithm for predicting it.

Other sample questions will be provided in class.

Please note that these examples are not binding for the final exam.

VERSIONE IN ITALIANO