An Introduction to Modern NLP Techniques
Welcome to this workshop on Natural Language Processing (NLP) and Large Language Models (LLMs). The primary goal of this workshop is to provide a solid introduction to modern NLP techniques and demonstrate how these methods can be applied in research and business settings.
During the workshop, we will cover the following topics:
Binary Classification of Tweets: We’ll start by building a simple binary classifier for Twitter data. This will serve as an introduction to basic NLP concepts and techniques.
FewShot Classification and Data Curation: Next, we’ll move on to more challenging cases that were not easily solvable until autumn 2022. We’ll explore FewShot classification, which involves using only a few examples for each class. Additionally, we’ll learn about labeling tools used for data curation.
LLMs for Data Labeling: In this section, we’ll examine how LLMs can be utilized to label data in the 0-shot case scenario. This technique is particularly useful when there are no labeled examples available.
Topic Modeling with LLMs: We will then explore the use of LLMs alongside modern topic modeling using BerTopic. In this approach, an LLM acts as a domain expert and interprets our identified topics.
Retrieval-Augmented Generation (RAG) with LangChain: Next, we’ll focus on RAG using LangChain. We will try out two approaches: one that retrieves data from a vector store based on semantic similarity and another more advanced approach that combines semantic retrieval with metadata filtering.
LLMs for Knowledge Extraction and Data Structuration: Finally, we’ll look into the use of LLMs for knowledge extraction and data structuration. We will examine a use case involving long-form text as well as how to generate synthetic datasets and fine-tune relatively small language models using instruction-based techniques.
By the end of this workshop, you should have gained valuable insights into modern NLP techniques and their practical applications in research and business settings.
Some data for LLMs
Finetune an LLM
