AUDIO PROCESSING

Academic Year 2023/2024 - Teacher: DARIO ALLEGRA

Expected Learning Outcomes

The course is thought as an introduction to Audio Processing through lectures about Acoustic, Psychoacoustics, Audio Digitalization, Audio Compression, Audio Formats, and useful Audio Libraries for coding oriented to audio signal processing.
During the course, insights will be given about additional topics like noise thresholds in workplace, Shannon’s impact on the subject, or the ffmpeg tool for audio/video stream management.
Finally, relying on the positive experience gathered in the previous years of the course, additional topics will be presented as seminars.

General learning objectives in terms of expected learning outcomes.

Knowledge and understanding: The purpose of the course is to acquire knowledge that will enable the student to understand the theoretical and physical mechanisms underlying the human auditory system, the formation and processing of sound and audio signals, the improvement of the quality of audio signals.
Ability to apply knowledge and understanding: the student will acquire the skills needed to acquire, edit, compress, and save an audio signal. Particularly a part of the course will be related to an overview of specific software to apply such theoretical knowledge.
Making judgments: Through examples in the classroom, the student will be put into the condition of understanding whether the solutions offered by him meet a certain degree of quality.
Communication skills: The student will acquire the necessary communication skills and technical language skills in the computer music field.
Learning Skills: The aim of the course is to provide the student with the necessary theoretical and practical methodologies to deal with and solve new problems that arise during a work activity. To this end, several topics will be addressed in lesson by involving the student in the search for possible solutions to real problems.

Course Structure

Classroom lessons

Self-assessment tests

Laboratory lessons

Seminars

Should teaching be carried out in mixed mode or remotely, it may be necessary to introduce changes with respect to previous statements, in line with the programme planned and outlined in the syllabus.

Required Prerequisites

No specific requirements.

Attendance of Lessons

Attending the course is strongly recommended.

Detailed Course Content

Acoustic
- Differences between sound and audio
- Definitions of physical properties of waves
- Root Mean Square (RMS)
- Decibel
- Inverse square law
- Speed of sound
- Refraction, Reflection, Diffraction
- Octaves in diatonic and temperated scale
- Introduction to Fourier Analysis
- Amplitude and Envelope
- Colored Noises
Psychoacoustics
- Physics and Cognition, physiology of hearing
- Noise tolerance thresholds while working
- Perception parameters
- Fletcher-Munson Chart
- Tone
- Critical Bands
- Tonal and Non-Tonal Masking
- Localization through sound
Digitalization
- Digital representation of the sound
- SNR index
- Sampling and Aliasing
- Quantization
- SNR and SQNR
- Sound Coding
- PCM Coding
- ECC and parity bits
- Waves amplitude representations
- Graphic and Parametric Equalizer
- Filters: HPF, LPF, Shelving, Peaking, Telephone, Walkie-Talkie, etc.
- Processing on dynamic range
Compression
- Compression of silence
- Memory required
- μ-law and A-law codings
- Ri-Quantization
- DPCM and ADPCM
- Compression rates
- Perceptive entropy
- Compansion technique
- Perceptive compression: Block Coding, Transform Coding, Sub-band Coding and Huffman Coding
Audio Formats
- MPEG formats and its most important variants
- MP1, MP2 and MP3
- Advanced audio formats
- FFmpeg
- MIDI protocols and messages
Useful Audio Libraries and relates scripts
- Audio format conversion using FFmpeg
- Coding in Python
- Reading, Conversion, Processing and Writing of an audio file
Seminars

Textbook Information

Lombardo, V., & Valle, A. (2014). Audio e Multimedia (IV ed.) Apogeo.
For international students: Kirk, R. & Hunt, A. (1999). Digital Sound Processing for Music and Multimedia Focal Press.
Tarabella, L. (2014). Musica Informatica Apogeo.
Rocchesso, D. (2003). Sound Processin

Course Planning

	Subjects	Text References
1	Acoustic	Chapter 1 di "Audio e Multimedia"
2	Psychoacoustics	Chapter 2 di "Audio e Multimedia"
3	Digitization	Chapter 3 di "Audio e Multimedia"
4	Compression	Chapter 4 di "Audio e Multimedia"
5	MIDI	Chapter 6 di "Audio e Multimedia"
6	MPEG	Chapter 4 di "Audio e Multimedia"
7	Seminars on course topics	Online/Slides
8	FOR ERASMUS STUDENTS	Digital Sound Processing for Music and Multimedia, Ross Kirk, Andy Hunt

Learning Assessment

Learning Assessment Procedures

The examination is held in italian language according the rules described in the italian version of this section.

Erasmus students and other non-italian speakers may ask to take an oral exam.

For the assignment of grades, the following criteria are typically followed:

Fail: The student has not acquired the basic concepts and is unable to complete the exercises.

18-23: The student demonstrates a minimal mastery of the fundamental concepts; their ability to present and connect content is modest, and they can solve simple exercises.

24-27: The student shows a good grasp of the course content; their ability to present and connect the content is good, and they solve exercises with few errors.

28-30 with honors: The student has acquired all course content and can present them comprehensively with a critical perspective; they solve exercises completely and without errors

Examples of frequently asked questions and / or exercises

Why is it preferable to talk about "perceived" volume?
What are isophonic curves? How are they constructed?
What is a phon? How is it related to SPL decibels?
What is the perceived volume in phons of a sound with a frequency of 1 KHz and an amplitude of 200 dB SPL?
Does it make sense to perform data compression on audio by removing frequencies between 1 and 5 KHz in favor of the lows and highs? Justify.
What is frequency masking? Describe the phenomenon.
How does it differ from temporal masking? (No need to describe temporal masking in detail).
Why can both be used to compress an audio signal?

Degree Course in

Computer Science