This notebook demonstrates how to use Hugging Face’s datasets and transformers libraries to work with pre-trained models like BERT.
Requirement already satisfied: datasets in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (4.1.1)
Requirement already satisfied: filelock in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (3.18.0)
Requirement already satisfied: numpy>=1.17 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (1.26.4)
Requirement already satisfied: pyarrow>=21.0.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (21.0.0)
Requirement already satisfied: dill<0.4.1,>=0.3.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (0.4.0)
Requirement already satisfied: pandas in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (2.3.1)
Requirement already satisfied: requests>=2.32.2 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (2.32.4)
Requirement already satisfied: tqdm>=4.66.3 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (4.67.1)
Requirement already satisfied: xxhash in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (3.5.0)
Requirement already satisfied: multiprocess<0.70.17 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (0.70.16)
Requirement already satisfied: fsspec<=2025.9.0,>=2023.1.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (2025.7.0)
Requirement already satisfied: huggingface-hub>=0.24.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (0.34.4)
Requirement already satisfied: packaging in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (25.0)
Requirement already satisfied: pyyaml>=5.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from datasets) (6.0.2)
Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (3.12.15)
Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (2.6.1)
Requirement already satisfied: aiosignal>=1.4.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (1.4.0)
Requirement already satisfied: async-timeout<6.0,>=4.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (5.0.1)
Requirement already satisfied: attrs>=17.3.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (25.3.0)
Requirement already satisfied: frozenlist>=1.1.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (1.7.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (6.6.4)
Requirement already satisfied: propcache>=0.2.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (0.3.2)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (1.20.1)
Requirement already satisfied: typing-extensions>=4.1.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from multidict<7.0,>=4.5->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (4.14.1)
Requirement already satisfied: idna>=2.0 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from yarl<2.0,>=1.17.0->aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<=2025.9.0,>=2023.1.0->datasets) (3.10)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from huggingface-hub>=0.24.0->datasets) (1.1.9)
Requirement already satisfied: charset_normalizer<4,>=2 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (3.4.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from requests>=2.32.2->datasets) (2025.8.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from pandas->datasets) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from pandas->datasets) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from pandas->datasets) (2025.2)
Requirement already satisfied: six>=1.5 in /home/adil/miniconda3/envs/fastai/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pandas->datasets) (1.17.0)
Note: you may need to restart the kernel to use updated packages.
Installing Required Libraries
First, we need to install the Hugging Face datasets library to access pre-built datasets.
from datasets import load_dataset
ds = load_dataset('imdb' )
print (ds)
DatasetDict({
train: Dataset({
features: ['text', 'label'],
num_rows: 25000
})
test: Dataset({
features: ['text', 'label'],
num_rows: 25000
})
unsupervised: Dataset({
features: ['text', 'label'],
num_rows: 50000
})
})
Loading a Dataset
Load the IMDB movie reviews dataset, which contains 50,000 movie reviews labeled as positive or negative sentiment.
Converting to Pandas DataFrame
Convert the Hugging Face dataset to a pandas DataFrame for easier exploration and analysis.
import pandas as pd
df = pd.DataFrame(ds['train' ])
df
0
I rented I AM CURIOUS-YELLOW from my video sto...
0
1
"I Am Curious: Yellow" is a risible and preten...
0
2
If only to avoid making this type of film in t...
0
3
This film was probably inspired by Godard's Ma...
0
4
Oh, brother...after hearing about this ridicul...
0
...
...
...
24995
A hit at the time but now better categorised a...
1
24996
I love this movie like no other. Another time ...
1
24997
This film and it's sequel Barry Mckenzie holds...
1
24998
'The Adventures Of Barry McKenzie' started lif...
1
24999
The story centers around Barry McKenzie who mu...
1
25000 rows × 2 columns
Loading Pre-trained BERT Model
Load the BERT-base-uncased model and its tokenizer. BERT (Bidirectional Encoder Representations from Transformers) is a powerful language model.
from transformers import AutoModel, AutoTokenizer
model_name = 'bert-base-uncased'
model = AutoModel.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer('Hello, Hugging Face!' , return_tensors= 'pt' )
outputs = model(** inputs)
print (outputs.last_hidden_state.shape)
Summary
This notebook demonstrated the basics of using Hugging Face libraries: - Loading datasets with datasets - Working with pre-trained models using transformers - Tokenizing text and generating embeddings
These are fundamental building blocks for many NLP tasks!
Using the Model and Tokenizer
Tokenize a sample text and pass it through the BERT model to get embeddings. The output shows the shape of the hidden states.