NATURAL LANGUAGE PROCESSING

Academic Year 2024/2025 - Teacher: MISAEL MONGIOVI'

Expected Learning Outcomes

Based on the Dublin descriptors, students will, at the end of the course, acquire:

1) Knowledge and understanding:

The student will have a solid understanding of the basic principles and tools for natural language processing.

They will be able to understand the most recent advancements in the state of the art in the field of natural language processing.

2) Ability to apply knowledge and understanding:

The student will be able to analyze a text to extract relevant syntactic and semantic information.

They will effectively use specific tools for addressing the main tasks in natural language processing.

They will be able to navigate the landscape of natural language processing techniques and propose innovative solutions to tackle practical problems in this field.

3) Judgment skills:

The student will be able to evaluate the techniques and tools available in the field of natural language processing and select those most suited to solve specific problems.

4) Communication skills:

The student will have acquired the typical lexicon of the field of natural language processing and will be able to use terms in a correct and unambiguous manner, facilitating communication with other experts in the field and with non-specialists.

5) Learning skills:

The student will possess both theoretical and practical methodological skills to face and solve new challenges in the field of natural language processing.

They will also have a solid autonomy in their studies, allowing them to delve into specific topics and stay updated on the latest developments and advancements in the sector.

Course Structure

The course will alternate between lectures and guided exercises with the aim of providing a clear connection between theoretical concepts and their practical application.

Required Prerequisites

The course requires knowledge of programming, data structures, and elementary algorithms, as well as basic concepts of probability and statistics.

Attendance of Lessons

Attendance is mandatory

Detailed Course Content

The initial part of the course will focus on basic tools for natural language processing, including text normalization, N-gram-based language models, and text document classification. The semantic representation of words and lexical resources will be introduced. A second part will be dedicated to neural language models, starting with simpler models based on Feedforward neural networks, followed by an introduction to more advanced models, such as those based on Transformers. The use of these models for sequence processing will be explored, particularly for Part-Of-Speech tagging and Named Entity Recognition. A third, more advanced section will cover tools for syntactic and semantic parsing, and finally, an introduction to Generative Pretrained Transformers (e.g., GPT-3, GPT-4, ChatGPT). Throughout the course, various applications of language models will be introduced, including Sentiment Analysis, Question Answering, Machine Translation, and Summarization. Several exercises in Python will also be conducted using publicly available libraries, tools, and repositories (e.g., spacy, PyTorch, Transformers, HuggingFace).

Textbook Information

D. Jurafsky & J.H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (third edition), Prentice Hall

Course Planning

	Subjects	Text References
1	Introduction to Natural Language Processing
2	Text normalization
3	N-gram language models
4	Naive Bayes for text classification
5	Logistic Regression for text classification
6	Distributional semantics and word embeddings
7	Lexical resources
8	Neural language models
9	Transformers
10	BERT
11	POS-tagging and Named Entity Resolution
12	Syntactic parsers
13	Semantic parsers
14	GPT (Generative Pretrained Transformer) architectures
15	Large Language Models (LLMs) in practice
16	NLP applications

Learning Assessment

Learning Assessment Procedures

The knowledge will be assessed through a written test, followed by an oral examination that will evaluate the project and the ability to apply the acquired knowledge.

Students with disabilities and/or specific learning disorders (SLD) must contact the professor, the CInAP representative of the DMI (Prof. Daniele), and the CInAP well in advance of the exam date to communicate their intention to take the exam using appropriate compensatory measures.

Examples of frequently asked questions and / or exercises

Describe the smoothing techniques for N-gram-based language models, highlighting the advantages and limitations of various methods.
Discuss the importance of word embeddings in understanding natural language. Explain the functioning of the Word2Vec model.
Explore the role of Transformers within neural language models. Illustrate the functioning of the BERT model.

Degree Course in

Computer Science