NATURAL LANGUAGE PROCESSING

Academic Year 2023/2024 - Teacher: MISAEL MONGIOVI'

Expected Learning Outcomes

Based on the Dublin descriptors, students will, at the end of the course, acquire:

1) Knowledge and understanding:

The student will have a solid understanding of the basic principles and tools for natural language processing.

They will be able to understand the most recent advancements in the state of the art in the field of natural language processing.

2) Ability to apply knowledge and understanding:

The student will be able to analyze a text to extract relevant syntactic and semantic information.

They will effectively use specific tools for addressing the main tasks in natural language processing.

They will be able to navigate the landscape of natural language processing techniques and propose innovative solutions to tackle practical problems in this field.

3) Judgment skills:

The student will be able to evaluate the techniques and tools available in the field of natural language processing and select those most suited to solve specific problems.

4) Communication skills:

The student will have acquired the typical lexicon of the field of natural language processing and will be able to use terms in a correct and unambiguous manner, facilitating communication with other experts in the field and with non-specialists.

5) Learning skills:

The student will possess both theoretical and practical methodological skills to face and solve new challenges in the field of natural language processing.

They will also have a solid autonomy in their studies, allowing them to delve into specific topics and stay updated on the latest developments and advancements in the sector.

Course Structure

The course will alternate between lectures and guided exercises with the aim of providing a clear connection between theoretical concepts and their practical application.

Required Prerequisites

The course requires knowledge of programming, data structures, and elementary algorithms, as well as basic concepts of probability and statistics.

Attendance of Lessons

Attendance is optional but highly recommended

Detailed Course Content

Introduction to natural language processing. Lexical and syntactic analysis. N-gram-based language models. Word embeddings. Part-Of-Speech Tagging. Syntactic parsing. Semantic analysis. Word Sense Disambiguation. Lexical databases and semantic networks: WordNet, BabelNet. Named Entity Recognition. Entity linking. Semantic Role Labeling. Co-reference resolution. Neural language models: Transformers and contextual embedding. Text classification. Semantic text similarity. Knowledge extraction from textual corpora. Large language models. Applications: question answering and chatbots; machine translation; sentiment analysis and stance detection; automatic fact-checking. Libraries, tools, and repositories: OpenNLP, NLTK, spaCy, PyTorch, HuggingFace.

Textbook Information

D. Jurafsky & J.H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition (third edition), Prentice Hall

Course Planning

	Subjects	Text References
1	Introduction to Natural Language Processing
2	Text normalization
3	N-gram language models
4	Naive Bayes for text classification
5	Logistic Regression for text classification
6	Distributional semantics and word embeddings
7	Lexical resources
8	Neural language models
9	Transformers
10	BERT
11	POS-tagging and Named Entity Resolution
12	Syntactic parsers
13	Semantic parsers
14	GPT (Generative Pretrained Transformer) architectures
15	Large Language Models (LLMs) in practice
16	NLP applications

Learning Assessment

Learning Assessment Procedures

The knowledge will be assessed through a written test, followed by an oral examination that will evaluate the project and the ability to apply the acquired knowledge.

Examples of frequently asked questions and / or exercises

Describe the smoothing techniques for N-gram-based language models, highlighting the advantages and limitations of various methods.
Discuss the importance of word embeddings in understanding natural language. Explain the functioning of the Word2Vec model.
Explore the role of Transformers within neural language models. Illustrate the functioning of the BERT model.

Degree Course in

Computer Science