NATURAL LANGUAGE PROCESSING

Academic Year 2024/2025 - Teacher: MISAEL MONGIOVI'

Expected Learning Outcomes

Based on the Dublin descriptors, students will, at the end of the course, acquire:

1) Knowledge and understanding:

The student will have a solid understanding of the basic principles and tools for natural language processing.

They will be able to understand the most recent advancements in the state of the art in the field of natural language processing.

2) Ability to apply knowledge and understanding:

The student will be able to analyze a text to extract relevant syntactic and semantic information.

They will effectively use specific tools for addressing the main tasks in natural language processing.

They will be able to navigate the landscape of natural language processing techniques and propose innovative solutions to tackle practical problems in this field.

3) Judgment skills:

The student will be able to evaluate the techniques and tools available in the field of natural language processing and select those most suited to solve specific problems.

4) Communication skills:

The student will have acquired the typical lexicon of the field of natural language processing and will be able to use terms in a correct and unambiguous manner, facilitating communication with other experts in the field and with non-specialists.

5) Learning skills:

The student will possess both theoretical and practical methodological skills to face and solve new challenges in the field of natural language processing.

They will also have a solid autonomy in their studies, allowing them to delve into specific topics and stay updated on the latest developments and advancements in the sector.

Course Structure

The course will alternate between lectures and guided exercises with the aim of providing a clear connection between theoretical concepts and their practical application.

Required Prerequisites

The course requires knowledge of programming, data structures, and elementary algorithms, as well as basic concepts of probability and statistics.

Attendance of Lessons

Attendance is optional but highly recommended

Detailed Course Content

The initial part of the course will focus on basic tools for natural language processing, including text normalization, N-gram-based language models, and text document classification. The semantic representation of words and lexical resources will be introduced. A second part will be dedicated to neural language models, starting with simpler models based on Feedforward neural networks, followed by an introduction to more advanced models, such as those based on Transformers. The use of these models for sequence processing will be explored, particularly for Part-Of-Speech tagging and Named Entity Recognition. A third, more advanced section will cover tools for syntactic and semantic parsing, and finally, an introduction to Generative Pretrained Transformers (e.g., GPT-3, GPT-4, ChatGPT). Throughout the course, various applications of language models will be introduced, including Sentiment Analysis, Question Answering, Machine Translation, and Summarization. Several exercises in Python will also be conducted using publicly available libraries, tools, and repositories (e.g., spacy, PyTorch, Transformers, HuggingFace).

Textbook Information

D. Jurafsky & J.H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (third edition), Prentice Hall

Course Planning

 SubjectsText References
1Introduction to Natural Language Processing
2Text normalization
3N-gram language models
4Naive Bayes for text classification
5Logistic Regression for text classification
6Distributional semantics and word embeddings
7Lexical resources
8Neural language models
9Transformers
10BERT
11POS-tagging and Named Entity Resolution
12Syntactic parsers
13Semantic parsers
14GPT (Generative Pretrained Transformer) architectures
15Large Language Models (LLMs) in practice
16NLP applications

Learning Assessment

Learning Assessment Procedures

The knowledge will be assessed through a written test, followed by an oral examination that will evaluate the project and the ability to apply the acquired knowledge.

Students with disabilities and/or specific learning disorders (SLD) must contact the professor, the CInAP representative of the DMI (Prof. Daniele), and the CInAP well in advance of the exam date to communicate their intention to take the exam using appropriate compensatory measures.

Examples of frequently asked questions and / or exercises

  • Describe the smoothing techniques for N-gram-based language models, highlighting the advantages and limitations of various methods.
  • Discuss the importance of word embeddings in understanding natural language. Explain the functioning of the Word2Vec model.
  • Explore the role of Transformers within neural language models. Illustrate the functioning of the BERT model.