Saba Shahrukh August 3, 2025 0

Part-of-Speech (POS) tagging is a fundamental technique in Natural Language Processing (NLP).1 This tutorial provides a comprehensive guide to POS tagging using popular Python libraries like NLTK and spaCy. You’ll learn what POS tagging is, why it’s essential, and how to implement it with practical code examples.

Whether you’re a student, a data scientist, or an NLP enthusiast, this article will equip you with the knowledge to perform accurate POS tagging and leverage it for more advanced machine learning projects.


What is POS Tagging?

POS tagging, also known as grammatical tagging or word-category disambiguation, is the process of labeling each word in a sentence with its corresponding part of speech.2 This is a crucial step in NLP because the same word can have different meanings and functions depending on the context.

For instance, consider the word “fly”:

  • “Birds fly.” (Here, “fly” is a verb.)
  • “I swatted a housefly.” (Here, “housefly” is a noun.)3

A POS tagger analyzes the surrounding words to assign the correct tag. It provides a deeper grammatical context that is vital for computers to understand human language.

Part of SpeechNLTK TagExample
Noun (singular)NNcat
Verb (non-3rd person singular present)VBPrun
AdjectiveJJhappy
AdverbRBquickly
PronounPRPshe

Why POS Tagging is a Key NLP Task

POS tagging is more than just a labeling exercise; it’s a powerful tool that unlocks many downstream NLP applications.4 By providing grammatical context, it helps in:

  • Word Sense Disambiguation: Differentiating the meaning of words with multiple definitions (e.g., “bank” as a financial institution vs. a riverbank).5
  • Syntactic Analysis: Parsing sentences to understand their grammatical structure, which is critical for machine translation and chatbots.
  • Named Entity Recognition (NER): Identifying and classifying entities like people, places, and organizations.6 A sequence of proper nouns (NNP) often indicates a person’s name.
  • Text-to-Speech: Determining the correct pronunciation of words that are spelled the same but pronounced differently (e.g., “record” as a noun vs. a verb).

POS Tagging with NLTK in Python 🐍

The Natural Language Toolkit (NLTK) is a popular library for NLP tasks in Python.7 It’s great for learning and research.

Step 1: Installation and Setup

First, install NLTK and download the necessary data.

Python

pip install nltk

Then, in your Python script, download the pre-trained models.

Python

import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

  • 'punkt' is used for tokenization (splitting text into sentences and words).
  • 'averaged_perceptron_tagger' is a pre-trained POS tagger model.8

Step 2: Tokenize and Tag a Sentence

Once the data is downloaded, you can easily tokenize your text and perform tagging.

Python

from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Define a sentence
sentence = "The quick brown fox jumps over the lazy dog."

# 1. Tokenize the sentence into words
tokens = word_tokenize(sentence)
print(f"Tokens: {tokens}")

# 2. Perform POS tagging on the tokens
pos_tags = pos_tag(tokens)
print(f"POS Tags: {pos_tags}")

Output:

Tokens: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', '.']
POS Tags: [('The', 'DT'), ('quick', 'JJ'), ('brown', 'JJ'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]

The output is a list of tuples, where each tuple contains a word and its corresponding Penn Treebank tag. For example, DT is a determiner, JJ is an adjective, and NN is a noun.

Step 3: Understand the Tags

If you’re unsure about a tag, NLTK provides a helpful function to get its description.

Python

from nltk.help import upenn_tagset
upenn_tagset('NN')

Output:

NN: noun, singular or mass
    common-noun, singular or mass
    'woman' 'place' 'thing'

POS Tagging with spaCy in Python 🚀

spaCy is another leading Python library for NLP, known for its speed and efficiency.9 It’s often the preferred choice for production-ready applications.

Step 1: Installation and Setup

Install spaCy and download its pre-trained English model.

Python

pip install spacy
python -m spacy download en_core_web_sm

Step 2: Process Text and Get Tags

spaCy simplifies the process by handling tokenization and tagging in a single step.

Python

import spacy

# Load the pre-trained English model
nlp = spacy.load("en_core_web_sm")

# Process the sentence
doc = nlp("The quick brown fox jumps over the lazy dog.")

# Iterate through the document and print the tags
for token in doc:
    print(f"Word: {token.text}, POS: {token.pos_}, Detailed Tag: {token.tag_}")

Output:

Word: The, POS: DET, Detailed Tag: DT
Word: quick, POS: ADJ, Detailed Tag: JJ
Word: brown, POS: ADJ, Detailed Tag: JJ
Word: fox, POS: NOUN, Detailed Tag: NN
Word: jumps, POS: VERB, Detailed Tag: VBZ
Word: over, POS: ADP, Detailed Tag: IN
Word: the, POS: DET, Detailed Tag: DT
Word: lazy, POS: ADJ, Detailed Tag: JJ
Word: dog, POS: NOUN, Detailed Tag: NN
Word: ., POS: PUNCT, Detailed Tag: .

spaCy provides both a simplified tag set (.pos_) and the more detailed Penn Treebank tags (.tag_), offering flexibility for your projects.10


Real-World Applications of POS Tagging 💡

POS tagging is a foundational step in many NLP projects.11 Here are three practical project ideas to get you started.

🎯 Project 1: A Simple News Headline Summarizer

A rule-based summarizer can use POS tags to identify the most important words in a sentence, which are typically nouns and adjectives. By filtering out less important words like determiners and prepositions, we can create a concise summary.

How it works:

  1. Tokenize and tag the headline.
  2. Filter words based on a defined set of important POS tags (e.g., NN, JJ).
  3. Join the filtered words to form a summary.

Python Code (NLTK):

Python

import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

def simple_summarizer(sentence):
    tokens = word_tokenize(sentence)
    tagged_tokens = pos_tag(tokens)
    
    important_tags = {'NN', 'NNS', 'NNP', 'NNPS', 'JJ', 'JJR', 'JJS'}
    summary_words = [word for word, tag in tagged_tokens if tag in important_tags]
    
    return " ".join(summary_words)

# Example
headline = "A catastrophic oil spill threatens the unique marine ecosystem."
summary = simple_summarizer(headline)
print(f"Original: {headline}")
print(f"Summary: {summary}")

Output:

Original: A catastrophic oil spill threatens the unique marine ecosystem.
Summary: catastrophic oil spill unique marine ecosystem

💬 Project 2: A Customer Service Chatbot

POS tagging can help a chatbot understand user intent.12 By analyzing the structure and content of a user’s message, the chatbot can provide a more relevant response.

How it works:

  1. Tag the user’s message.
  2. Identify Intent: Use rules based on POS tags. For example, a word tagged WRB (a wh-adverb like “why” or “how”) at the beginning of a sentence indicates a question.13
  3. Analyze Sentiment: Combine with a simple sentiment analysis to classify the message as positive or negative.
  4. Generate a Response: Use the intent and sentiment to craft an appropriate, context-aware reply.

Python Code Snippet (NLTK):

Python

import nltk
from nltk import pos_tag
from nltk.tokenize import word_tokenize

def get_intent(tagged_tokens):
    first_word, first_tag = tagged_tokens[0]
    if first_tag in ['WRB', 'WP'] or first_tag.startswith('VB'):
        return 'question'
    if any(tag.startswith('JJ') for _, tag in tagged_tokens):
        return 'feedback'
    return 'statement'

def chatbot_response(user_input):
    tokens = word_tokenize(user_input)
    tagged_tokens = pos_tag(tokens)
    intent = get_intent(tagged_tokens)
    
    if intent == 'question':
        return "I can help with common questions. What would you like to know?"
    elif intent == 'feedback':
        return "Thank you for your feedback! We appreciate you sharing your thoughts."
    else:
        return "I'm not sure how to respond to that."


✍️ Project 3: A Basic Grammar and Style Checker

POS tagging can be used to identify common writing errors and provide suggestions for improvement.14 This demonstrates how you can use the sequence of tags as a set of grammar rules.

How it works:

  1. Tag a sentence.
  2. Define Rules: Create rules based on tag sequences to find errors.
    • Passive Voice: Look for a form of “to be” (VBD, VBG) followed by a past participle (VBN).
    • Redundant Adverbs: Find an adverb (RB) followed by an absolute adjective.
    • Incorrect Order: Check for a noun (NN) immediately followed by an adjective (JJ).

Python Code Snippet (NLTK):

Python

import nltk
from nltk import pos_tag, word_tokenize

def check_passive_voice(tagged_tokens):
    for i, (word, tag) in enumerate(tagged_tokens):
        if (word.lower() in ["is", "was", "are", "were"]) and (i + 1 < len(tagged_tokens)):
            if tagged_tokens[i+1][1].startswith('VBN'):
                return True
    return False

# Example
sentence = "The ball was thrown by the boy."
tokens = word_tokenize(sentence)
tagged_tokens = pos_tag(tokens)

if check_passive_voice(tagged_tokens):
    print("🚩 Potential passive voice found. Consider using active voice.")

Output:

🚩 Potential passive voice found. Consider using active voice.

This project shows how a sequence of POS tags can be used to analyze and critique the grammatical structure of text.

Category: Uncategorized