Principi della Programmazione Parallela

Academic Year 2025/2026 - Teacher: EVA SCIACCA

Expected Learning Outcomes

Knowledge and Understanding:

Students will acquire fundamental knowledge of methodologies for designing parallel algorithms and using high-performance computing (HPC) architectures. Specifically, the concepts underlying code parallelization, the main parallel programming paradigms, and their architectural implications will be explored. The theoretical skills necessary to understand techniques for shared and distributed memory programming will also be provided, using dedicated languages and programming models such as OpenMP and MPI.

Applying Knowledge and Understanding:

Students will acquire the ability to solve simple problems requiring the design and analysis of algorithmic solutions in the field of parallel programming.

Making Judgments:

Students will be able to apply their acquired knowledge to solve computational problems requiring the design, implementation, and evaluation of algorithmic solutions in a parallel environment. Students will be able to develop simple parallel applications using specific programming paradigms and models, identifying the most appropriate architectural choices based on the context (shared or distributed memory) and evaluating their performance.

Communication skills:

Students will acquire adequate communication skills and language skills to clearly and coherently describe issues related to the design and analysis of parallel algorithms. They will be able to present the main technical and conceptual solutions related to parallel programming, even to non-specialists, using a communicative register appropriate to the context.

Learning skills:

Students will develop the ability to independently update and expand their knowledge, including in relation to new application or technological contexts in the field of parallel programming. They will be able to critically consult and understand relevant scientific and technical literature, maintaining a flexible and informed approach to the evolution of methodologies and paradigms in the field.

Course Structure

Lessons will take place in person, in a traditional lecture format. The instructor will present the theoretical content with the support of slides and the blackboard. Active student participation will be encouraged through questions and moments of classroom discussion.

The course will also include laboratory activities. Lectures will introduce theoretical concepts, while the lab will enable their practical application through exercises and case studies.

If the course were to be delivered in blended or online mode, the necessary adjustments may be introduced with respect to the above description, in order to comply with the program outlined in the syllabus.

Required Prerequisites

Basic knowledge of computer architecture, operating systems, algorithms, and basic knowledge of C and/or C++.

Attendance of Lessons

For a thorough understanding of the topics covered and the methodologies presented, regular attendance at the lectures is strongly recommended.

Detailed Course Content

The course Principles of Parallel Programming provides a structured introduction to the main methodologies for the design and implementation of parallel algorithms, with an approach oriented toward computer science and programming. It is intended for students who already have a basic knowledge of programming and of computer architecture principles. The training path explores the fundamental concepts of High-Performance Computing (HPC), illustrating strategies to optimize code execution on modern architectures.

Particular attention is devoted to the analysis of parallelism models, the identification of computational bottlenecks, and the application of programming techniques on both shared and distributed memory. Students learn how to develop parallel programs using the OpenMP and MPI standards, and how to evaluate performance on real systems. The approach balances theoretical aspects with practical experimentation, keeping formal references accessible while emphasizing algorithmic intuition.

Textbook Information

[1] Hager, Georg, and Gerhard Wellein. Introduction to high performance computing for scientists and engineers. CRC Press, 2010.

[2] Multi-platform Shared-memory Parallel Programming in C/C++ and Fortran - https://www.openmp.org/

[3] OpenMPI implementation of the Message passing interface - https://www.open-mpi.org/

Course Planning

	Subjects	Text References
1	Introduction to High Performance Computing (HPC)	Lecturer's notes
2	Modern Architectures for Parallel Computing: Processors and HPC Systems	Chap. 1 + 4 from [1]
3	Software Stack for HPC Systems	Chap. 4 from [1] / Lecturer's notes
4	Parallelism Models and Scalability	Chap. 5 from [1]
5	Code Optimization: Introduction and Basic Concepts	Chap. 2 + 3 from [1]
6	Guidelines for Code Optimization and Memory Management	Chap. 3 from [1]
7	Techniques for Optimized Compilation and Profiling	Chap. 2 from [1]
8	Parallel Programming on Shared Memory: Fundamentals	Chap. 6 from [1]
9	OpenMP: Directives for Structured Parallelism	Chap. 6 from [1] and documents from [2]
10	OpenMP: Synchronization, Sections, and Workload Balancing	Chap. 6 from [1] and documents from [2]
11	OpenMP: Shared and Private Data, Scheduling	Chap. 6 from [1] and documents from [2]
12	Parallel Programming on Distributed Memory: Fundamentals	Chap. 9 from [1]
13	Introduction to MPI and Its Communication Model	Chap. 9 from [1]
14	Point-to-Point Communication in MPI	Chap. 9 from [1] and documents from [3]
15	Collective Communication and Process Synchronization in MPI	Chap. 9 from [1] and documents from [3]
16	Concepts of Topology and Efficient Communication in MPI	Chap. 9 from [1] and documents from [3]
17	Principles of Hybrid Programming with MPI and OpenMP	Chap. 11 from [1]
18	MPI/OpenMP: Context and Hierarchical Management	Lecturer's notes

Learning Assessment

Learning Assessment Procedures

The course exam is divided into two parts: an initial written test and a subsequent project evaluation.

These tests may take place remotely, should conditions require it. The project evaluation may be held on the same day as the written test or within a few days thereafter.

The purpose of the exam is to thoroughly assess the student’s preparation, analytical and reasoning skills regarding the topics covered during the course, as well as the appropriateness of the technical language used.

The project evaluation has an integrative value with respect to the written test and contributes inseparably to the determination of the final grade. It does not represent an opportunity to increase the score, but rather a necessary component for the overall evaluation of the student’s preparation.

The following criteria will generally be used for awarding the final grade:

Fail: the student has not acquired the basic concepts and is unable to carry out the exercises.
18–23: the student demonstrates a minimal grasp of the basic concepts; their ability to explain and connect contents is modest, but they can solve simple exercises.
24–27: the student demonstrates good command of the course content; their ability to explain and connect contents is good, and they solve exercises with few errors.
28–30 with honors: the student has acquired all the course content and is able to present it comprehensively and connect it with critical insight; they solve the exercises completely and without errors.

Students with disabilities and/or specific learning disorders (SLD) must contact, well in advance of the exam date, the lecturer, the CInAP contact person for the DMI (Prof. Daniele), and CInAP to notify their intention to take the exam with the appropriate compensatory measures.

To participate in the final exam, students must register on the SmartEdu portal. For any technical problems related to registration, they should contact the Teaching Office.

Examples of frequently asked questions and / or exercises

What is the recommended strategy to implement a global sum across all ranks in MPI while minimizing synchronization overhead, and when would you prefer a nonblocking variant?
Explain the difference between point-to-point blocking and nonblocking communication in MPI, and describe a common pattern to avoid deadlocks when exchanging halos in a stencil computation.
What is the recommended OpenMP way to parallelize a sum over an array while avoiding races and minimizing contention?
In shared-memory systems, what is the difference between UMA and NUMA?

VERSIONE IN ITALIANO

Degree Course in

Computer Science