Principi della Programmazione Parallela

Academic Year 2025/2026 - Teacher: EVA SCIACCA

Expected Learning Outcomes

Knowledge and Understanding:

Students will acquire fundamental knowledge of methodologies for designing parallel algorithms and using high-performance computing (HPC) architectures. Specifically, the concepts underlying code parallelization, the main parallel programming paradigms, and their architectural implications will be explored. The theoretical skills necessary to understand techniques for shared and distributed memory programming will also be provided, using dedicated languages ​​and programming models such as OpenMP and MPI.


Applying Knowledge and Understanding:

Students will acquire the ability to solve simple problems requiring the design and analysis of algorithmic solutions in the field of parallel programming.


Making Judgments:

Students will be able to apply their acquired knowledge to solve computational problems requiring the design, implementation, and evaluation of algorithmic solutions in a parallel environment. Students will be able to develop simple parallel applications using specific programming paradigms and models, identifying the most appropriate architectural choices based on the context (shared or distributed memory) and evaluating their performance.


Communication skills:

Students will acquire adequate communication skills and language skills to clearly and coherently describe issues related to the design and analysis of parallel algorithms. They will be able to present the main technical and conceptual solutions related to parallel programming, even to non-specialists, using a communicative register appropriate to the context.


Learning skills:

Students will develop the ability to independently update and expand their knowledge, including in relation to new application or technological contexts in the field of parallel programming. They will be able to critically consult and understand relevant scientific and technical literature, maintaining a flexible and informed approach to the evolution of methodologies and paradigms in the field.

Course Structure

Lessons will take place in person, in a traditional lecture format. The instructor will present the theoretical content with the support of slides and the blackboard. Active student participation will be encouraged through questions and moments of classroom discussion.

The course will also include laboratory activities. Lectures will introduce theoretical concepts, while the lab will enable their practical application through exercises and case studies.

If the course were to be delivered in blended or online mode, the necessary adjustments may be introduced with respect to the above description, in order to comply with the program outlined in the syllabus.

Required Prerequisites

Basic knowledge of computer architecture, operating systems, algorithms, and basic knowledge of C and/or C++.

Attendance of Lessons

For a thorough understanding of the topics covered and the methodologies presented, regular attendance at the lectures is strongly recommended.

Detailed Course Content

The course Principles of Parallel Programming provides a structured introduction to the main methodologies for the design and implementation of parallel algorithms, with an approach oriented toward computer science and programming. It is intended for students who already have a basic knowledge of programming and of computer architecture principles. The training path explores the fundamental concepts of High-Performance Computing (HPC), illustrating strategies to optimize code execution on modern architectures.

Particular attention is devoted to the analysis of parallelism models, the identification of computational bottlenecks, and the application of programming techniques on both shared and distributed memory. Students learn how to develop parallel programs using the OpenMP and MPI standards, and how to evaluate performance on real systems. The approach balances theoretical aspects with practical experimentation, keeping formal references accessible while emphasizing algorithmic intuition.

Textbook Information

[1] Hager, Georg, and Gerhard Wellein. Introduction to high performance computing for scientists and engineers. CRC Press, 2010.

[2] Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran - https://www.openmp.org/ 

[3] OpenMPI implementation of the Message passing interface - https://www.open-mpi.org/

Course Planning

 SubjectsText References
1Introduction to High Performance Computing (HPC)Lecturer's notes
2Modern Architectures for Parallel Computing: Processors and HPC SystemsChap. 1 + 4 from [1]
3Software Stack for HPC SystemsChap. 4 from [1] / Lecturer's notes
4Parallelism Models and ScalabilityChap. 5 from [1]
5Code Optimization: Introduction and Basic ConceptsChap. 2 + 3 from [1]
6Guidelines for Code Optimization and Memory ManagementChap. 3 from [1]
7Techniques for Optimized Compilation and ProfilingChap. 2 from [1]
8Parallel Programming on Shared Memory: FundamentalsChap. 6 from [1]
9OpenMP: Directives for Structured ParallelismChap. 6 from [1] and documents from [2]
10OpenMP: Synchronization, Sections, and Workload BalancingChap. 6 from [1] and documents from [2]
11OpenMP: Shared and Private Data, SchedulingChap. 6 from [1] and documents from [2]
12Parallel Programming on Distributed Memory: FundamentalsChap. 9 from [1]
13Introduction to MPI and Its Communication ModelChap. 9 from [1]
14Point-to-Point Communication in MPIChap. 9 from [1] and documents from [3]
15Collective Communication and Process Synchronization in MPIChap. 9 from [1] and documents from [3]
16Concepts of Topology and Efficient Communication in MPIChap. 9 from [1] and documents from [3]
17Principles of Hybrid Programming with MPI and OpenMPChap. 11 from [1]
18MPI/OpenMP: Context and Hierarchical ManagementLecturer's notes

Learning Assessment

Learning Assessment Procedures

The course exam is divided into two parts: an initial written test and a subsequent project evaluation.

These tests may take place remotely, should conditions require it. The project evaluation may be held on the same day as the written test or within a few days thereafter.

The purpose of the exam is to thoroughly assess the student’s preparation, analytical and reasoning skills regarding the topics covered during the course, as well as the appropriateness of the technical language used.

The project evaluation has an integrative value with respect to the written test and contributes inseparably to the determination of the final grade. It does not represent an opportunity to increase the score, but rather a necessary component for the overall evaluation of the student’s preparation.

The following criteria will generally be used for awarding the final grade:

  • Fail: the student has not acquired the basic concepts and is unable to carry out the exercises.

  • 18–23: the student demonstrates a minimal grasp of the basic concepts; their ability to explain and connect contents is modest, but they can solve simple exercises.

  • 24–27: the student demonstrates good command of the course content; their ability to explain and connect contents is good, and they solve exercises with few errors.

  • 28–30 with honors: the student has acquired all the course content and is able to present it comprehensively and connect it with critical insight; they solve the exercises completely and without errors.

Students with disabilities and/or specific learning disorders (SLD) must contact, well in advance of the exam date, the lecturer, the CInAP contact person for the DMI (Prof. Daniele), and CInAP to notify their intention to take the exam with the appropriate compensatory measures.

To participate in the final exam, students must register on the SmartEdu portal. For any technical problems related to registration, they should contact the Teaching Office.

Examples of frequently asked questions and / or exercises

  1. What is the recommended strategy to implement a global sum across all ranks in MPI while minimizing synchronization overhead, and when would you prefer a nonblocking variant?

  2. Explain the difference between point-to-point blocking and nonblocking communication in MPI, and describe a common pattern to avoid deadlocks when exchanging halos in a stencil computation.

  3. What is the recommended OpenMP way to parallelize a sum over an array while avoiding races and minimizing contention?

  4. In shared-memory systems, what is the difference between UMA and NUMA?

VERSIONE IN ITALIANO