Hugging Face Transformers: Getting Started

huggingface
transformers
nlp
bert
Introduction to using Hugging Face datasets and transformers for NLP tasks.
Author

Mohammed Adil Siraju

Published

September 26, 2025

This notebook demonstrates how to use Hugging Face’s datasets and transformers libraries to work with pre-trained models like BERT.

%pip install datasets
Requirement already satisfied: datasets in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (4.1.1)
Requirement already satisfied: filelock in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (3.18.0)
Requirement already satisfied: numpy>=1.17 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (1.26.4)
Requirement already satisfied: pyarrow>=21.0.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (21.0.0)
Requirement already satisfied: dill<0.4.1,>=0.3.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (0.4.0)
Requirement already satisfied: pandas in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (2.3.1)
Requirement already satisfied: requests>=2.32.2 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (2.32.4)
Requirement already satisfied: tqdm>=4.66.3 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (4.67.1)
Requirement already satisfied: xxhash in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (3.5.0)
Requirement already satisfied: multiprocess<0.70.17 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (0.70.16)
Requirement already satisfied: fsspec<=2025.9.0,>=2023.1.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (2025.7.0)
Requirement already satisfied: huggingface-hub>=0.24.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (0.34.4)
Requirement already satisfied: packaging in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (25.0)
Requirement already satisfied: pyyaml>=5.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (6.0.2)
Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (3.12.15)
Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (2.6.1)
Requirement already satisfied: aiosignal>=1.4.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (1.4.0)
Requirement already satisfied: async-timeout<6.0,>=4.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (5.0.1)
Requirement already satisfied: attrs>=17.3.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (25.3.0)
Requirement already satisfied: frozenlist>=1.1.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (1.7.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (6.6.4)
Requirement already satisfied: propcache>=0.2.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (0.3.2)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (1.20.1)
Requirement already satisfied: typing-extensions>=4.1.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from multidict<7.0,>=4.5->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (4.14.1)
Requirement already satisfied: idna>=2.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from yarl<2.0,>=1.17.0->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (3.10)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from huggingface-hub>=0.24.0->datasets) (1.1.9)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (3.4.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2025.8.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from pandas->datasets) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from pandas->datasets) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from pandas->datasets) (2025.2)
Requirement already satisfied: six>=1.5 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)
Note: you may need to restart the kernel to use updated packages.

Installing Required Libraries

First, we need to install the Hugging Face datasets library to access pre-built datasets.

from datasets import load_dataset

ds = load_dataset('imdb')

print(ds)
DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})

Loading a Dataset

Load the IMDB movie reviews dataset, which contains 50,000 movie reviews labeled as positive or negative sentiment.

Converting to Pandas DataFrame

Convert the Hugging Face dataset to a pandas DataFrame for easier exploration and analysis.

import pandas as pd

df = pd.DataFrame(ds['train'])
df
text label
0 I rented I AM CURIOUS-YELLOW from my video sto... 0
1 "I Am Curious: Yellow" is a risible and preten... 0
2 If only to avoid making this type of film in t... 0
3 This film was probably inspired by Godard's Ma... 0
4 Oh, brother...after hearing about this ridicul... 0
... ... ...
24995 A hit at the time but now better categorised a... 1
24996 I love this movie like no other. Another time ... 1
24997 This film and it's sequel Barry Mckenzie holds... 1
24998 'The Adventures Of Barry McKenzie' started lif... 1
24999 The story centers around Barry McKenzie who mu... 1

25000 rows × 2 columns

Installing Transformers Library

Install the transformers library to access pre-trained models and tokenizers.

%pip install transformers
Requirement already satisfied: transformers in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (4.56.2)
Requirement already satisfied: filelock in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (3.18.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.34.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (0.34.4)
Requirement already satisfied: numpy>=1.17 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (1.26.4)
Requirement already satisfied: packaging>=20.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (25.0)
Requirement already satisfied: pyyaml>=5.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (6.0.2)
Requirement already satisfied: regex!=2019.12.17 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (2025.9.18)
Requirement already satisfied: requests in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (2.32.4)
Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (0.22.1)
Requirement already satisfied: safetensors>=0.4.3 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (0.6.2)
Requirement already satisfied: tqdm>=4.27 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from transformers) (4.67.1)
Requirement already satisfied: fsspec>=2023.5.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (2025.7.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (4.14.1)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.34.0->transformers) (1.1.9)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests->transformers) (3.4.3)
Requirement already satisfied: idna<4,>=2.5 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests->transformers) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests->transformers) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests->transformers) (2025.8.3)
Note: you may need to restart the kernel to use updated packages.

Loading Pre-trained BERT Model

Load the BERT-base-uncased model and its tokenizer. BERT (Bidirectional Encoder Representations from Transformers) is a powerful language model.

from transformers import AutoModel, AutoTokenizer

model_name = 'bert-base-uncased'
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer('Hello, Hugging Face!', return_tensors='pt')
outputs = model(**inputs)

print(outputs.last_hidden_state.shape)
torch.Size([1, 7, 768])

Summary

This notebook demonstrated the basics of using Hugging Face libraries: - Loading datasets with datasets - Working with pre-trained models using transformers - Tokenizing text and generating embeddings

These are fundamental building blocks for many NLP tasks!

Using the Model and Tokenizer

Tokenize a sample text and pass it through the BERT model to get embeddings. The output shows the shape of the hidden states.