NATURAL LANGUAGE PROCESSING

Academic Year 2023/2024 - Teacher: MISAEL MONGIOVI'

Expected Learning Outcomes

Based on the Dublin descriptors, students will, at the end of the course, acquire:

1) Knowledge and understanding:

The student will have a solid understanding of the basic principles and tools for natural language processing.

They will be able to understand the most recent advancements in the state of the art in the field of natural language processing.

2) Ability to apply knowledge and understanding:

The student will be able to analyze a text to extract relevant syntactic and semantic information.

They will effectively use specific tools for addressing the main tasks in natural language processing.

They will be able to navigate the landscape of natural language processing techniques and propose innovative solutions to tackle practical problems in this field.

3) Judgment skills:

The student will be able to evaluate the techniques and tools available in the field of natural language processing and select those most suited to solve specific problems.

4) Communication skills:

The student will have acquired the typical lexicon of the field of natural language processing and will be able to use terms in a correct and unambiguous manner, facilitating communication with other experts in the field and with non-specialists.

5) Learning skills:

The student will possess both theoretical and practical methodological skills to face and solve new challenges in the field of natural language processing.

They will also have a solid autonomy in their studies, allowing them to delve into specific topics and stay updated on the latest developments and advancements in the sector.

Course Structure

The course will alternate between lectures and guided exercises with the aim of providing a clear connection between theoretical concepts and their practical application.

Required Prerequisites

The course requires knowledge of programming, data structures, and elementary algorithms, as well as basic concepts of probability and statistics.

Attendance of Lessons

Attendance is optional but highly recommended

Detailed Course Content

Introduction to natural language processing. Lexical and syntactic analysis. N-gram-based language models. Word embeddings. Part-Of-Speech Tagging. Syntactic parsing. Semantic analysis. Word Sense Disambiguation. Lexical databases and semantic networks: WordNet, BabelNet. Named Entity Recognition. Entity linking. Semantic Role Labeling. Co-reference resolution. Neural language models: Transformers and contextual embedding. Text classification. Semantic text similarity. Knowledge extraction from textual corpora. Large language models. Applications: question answering and chatbots; machine translation; sentiment analysis and stance detection; automatic fact-checking. Libraries, tools, and repositories: OpenNLP, NLTK, spaCy, PyTorch, HuggingFace.

Textbook Information

D. Jurafsky & J.H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (third edition), Prentice Hall

Course Planning

 SubjectsText References
1Introduction to Natural Language Processing
2Text normalization
3N-gram language models
4Naive Bayes for text classification
5Logistic Regression for text classification
6Distributional semantics and word embeddings
7Lexical resources
8Neural language models
9Transformers
10BERT
11POS-tagging and Named Entity Resolution
12Syntactic parsers
13Semantic parsers
14GPT (Generative Pretrained Transformer) architectures
15Large Language Models (LLMs) in practice
16NLP applications

Learning Assessment

Learning Assessment Procedures

The knowledge will be assessed through a written test, followed by an oral examination that will evaluate the project and the ability to apply the acquired knowledge.

Examples of frequently asked questions and / or exercises

  • Describe the smoothing techniques for N-gram-based language models, highlighting the advantages and limitations of various methods.
  • Discuss the importance of word embeddings in understanding natural language. Explain the functioning of the Word2Vec model.
  • Explore the role of Transformers within neural language models. Illustrate the functioning of the BERT model.