AUDIO PROCESSING

Academic Year 2024/2025 - Teacher: DARIO ALLEGRA

Expected Learning Outcomes

The course is thought as an introduction to Audio Processing through lectures about Acoustic, Psychoacoustics, Audio Digitalization, Audio Compression, Audio Formats, and useful Audio Libraries for coding oriented to audio signal processing.
During the course, insights will be given about additional topics presented as seminars

General learning objectives in terms of expected learning outcomes.

Knowledge and understanding: The purpose of the course is to acquire knowledge that will enable the student to understand the theoretical and physical mechanisms underlying the human auditory system, the formation and processing of sound and audio signals, the improvement of the quality of audio signals.
Ability to apply knowledge and understanding: the student will acquire the skills needed to acquire, edit, compress, and save an audio signal. Particularly a part of the course will be related to an overview of specific software to apply such theoretical knowledge.
Making judgments: Through examples in the classroom, the student will be put into the condition of understanding whether the solutions offered by him meet a certain degree of quality.
Communication skills: The student will acquire the necessary communication skills and technical language skills in the computer music field.
Learning Skills: The aim of the course is to provide the student with the necessary theoretical and practical methodologies to deal with and solve new problems that arise during a work activity. To this end, several topics will be addressed in lesson by involving the student in the search for possible solutions to real problems.

Course Structure

Classroom lessons

Seminars

Should teaching be carried out in mixed mode or remotely, it may be necessary to introduce changes with respect to previous statements, in line with the programme planned and outlined in the syllabus.

Access to the teaching materials provided by the instructor is available on MS Teams, in the "Audio Processing" Team, code: z93t1vp

All communications will take place through the official Telegram channel of the course, so students are requested to join it: https://t.me/+-T70U1uiNAUxNjBk.

Required Prerequisites

No specific requirements.

Attendance of Lessons

Attendance to classes is mandatory.

Detailed Course Content

Acoustic
- Differences between sound and audio
- Definitions of physical properties of waves
- Root Mean Square (RMS)
- Decibel
- Inverse square law
- Speed of sound
- Refraction, Reflection, Diffraction
- Octaves in diatonic and temperated scale
- Introduction to Fourier Analysis
- Amplitude and Envelope
- Colored Noises
Psychoacoustics
- Physics and Cognition, physiology of hearing
- Perception parameters
- Fletcher-Munson Chart
- Tone
- Critical Bands
- Tonal and Non-Tonal Masking
- Localization through sound
Digitalization
- Digital representation of the sound
- SNR index
- Sampling and Aliasing
- Quantization, uniform and not uniform
- SNR and SQNR
- Sound Coding
- PCM Coding
- ECC and parity bits
- Waves amplitude representations
- Processing on dynamic range
Compression
- Compression of silence
- Memory required
- μ-law and A-law codings
- Ri-Quantization
- DPCM and ADPCM
- Compression rates
- Perceptive entropy
- Compansion technique
- Perceptive compression: Block Coding, Transform Coding, Sub-band Coding and Huffman Coding
Audio Formats
- MPEG formats and its most important variants
- MP1, MP2 and MP3
- Advanced audio formats
- FFmpeg
- MIDI protocols and messages
Useful Audio Libraries and relates scripts

Audio format conversion using FFmpeg
Coding in Python

Seminars

Textbook Information

Lombardo, V., & Valle, A. (2014). Audio e Multimedia (IV ed.) Apogeo.
For international students: Kirk, R. & Hunt, A. (1999). Digital Sound Processing for Music and Multimedia Focal Press.
Tarabella, L. (2014). Musica Informatica Apogeo.
Rocchesso, D. (2003). Sound Processin

Course Planning

	Subjects	Text References
1	Acoustic	Chapter 1 di "Audio e Multimedia"
2	Psychoacoustics	Chapter 2 di "Audio e Multimedia"
3	Digitization	Chapter 3 di "Audio e Multimedia"
4	Compression	Chapter 4 di "Audio e Multimedia"
5	MIDI	Chapter 6 di "Audio e Multimedia"
6	MPEG	Chapter 4 di "Audio e Multimedia"
7	Seminars on course topics	Online/Slides
8	FOR ERASMUS STUDENTS	Digital Sound Processing for Music and Multimedia, Ross Kirk, Andy Hunt

Learning Assessment

Learning Assessment Procedures

In order to take the exam, in accordance with the regulations, it is MANDATORY to register on the Smart Edu portal and on any other platform, as MS Forms, required by the instructor to optimize logistics.

The exam is a single test divided into two inseparable phases:

Phase (1): Students will take a multiple-choice test consisting of 10 questions on MS Teams. Students who correctly answer at least 6 questions will proceed to phase (2). Otherwise, the exam will conclude with a failing grade, and the student will have to retake the exam during the next session. This score is referred to as A.
Phase (2): Students will take a short written test where they will be required to solve a few exercises. At the end of this phase, students will be awarded a score between 0 and 8 based on the quality of their responses. This score is referred to as B.
Conclusion: The grade for the theory part is calculated as A*3 + (B-4). If this result is 18 or higher, the theory part is considered passed with that grade. Otherwise, the exam will be deemed insufficient, and the student will be required to retake it during the next session. The two phases cannot be separated or taken during different sessions. They form a single exam.

The examination is held in italian language according the rules described in the italian version of this section.

Erasmus students and other non-italian speakers may ask to take an oral exam.

For the assignment of grades, the following criteria are typically followed:

Fail: The student has not acquired the basic concepts and is unable to complete the exercises.
18-23: The student demonstrates a minimal mastery of the fundamental concepts; their ability to present and connect content is modest, and they can solve simple exercises.
24-27: The student shows a good grasp of the course content; their ability to present and connect the content is good, and they solve exercises with few errors.
28-30 with honors: The student has acquired all course content and can present them comprehensively with a critical perspective; they solve exercises completely and without errors

Students with disabilities and/or learning disorders (DSA) must contact the instructor and the CInAP representative at DMI well in advance of the exam date to inform them of their intention to take the exam with the appropriate compensatory measures.

Examples of frequently asked questions and / or exercises

Why is it preferable to talk about "perceived" volume?
What are isophonic curves? How are they constructed?
What is a phon? How is it related to SPL decibels?
What is the perceived volume in phons of a sound with a frequency of 1 KHz and an amplitude of 200 dB SPL?
Does it make sense to perform data compression on audio by removing frequencies between 1 and 5 KHz in favor of the lows and highs? Justify.
What is frequency masking? Describe the phenomenon.
How does it differ from temporal masking? (No need to describe temporal masking in detail).
Why can both be used to compress an audio signal?

VERSIONE IN ITALIANO

Degree Course in

Computer Science