Skip to content

Tagging Use Cases (TAG)

Module Purpose: Extract and manage tags for documents using NER, keyword extraction, and custom taxonomies. This module contains 8 use cases.


Use Case Quick Reference

ID Title Priority
TAG-001 Extract Keywords P1
TAG-002 Run Named Entity Recognition P1
TAG-003 Apply Auto-Tags P1
TAG-004 Add Manual Tags P2
TAG-005 Remove Tags P2
TAG-006 Create Custom Tag P2
TAG-007 Manage Tag Taxonomy P3
TAG-008 Suggest Tags Based on Similar Docs P3

UC-TAG-001: Extract Keywords

Overview

Field Value
ID TAG-001
Title Extract Keywords
Actor Tagging Service
Priority P1 (MVP Phase 3)

Description

Extract important keywords and phrases from document text using statistical methods.

Methods

Method Tool Best For
TF-IDF scikit-learn Corpus-level importance
YAKE yake Unsupervised, fast
KeyBERT keybert Semantic keywords

Steps

  1. Retrieve extracted text
  2. Run keyword extraction algorithm
  3. Filter by score threshold
  4. Deduplicate similar keywords
  5. Return top N keywords

Output

{
  "document_id": "doc_abc",
  "keywords": [
    {"keyword": "payment terms", "score": 0.85},
    {"keyword": "invoice", "score": 0.82},
    {"keyword": "net 30", "score": 0.78},
    {"keyword": "accounts payable", "score": 0.72}
  ]
}

Acceptance Criteria

  • Top 10 keywords extracted per document
  • Keywords are relevant to document content
  • Processing time <1s

UC-TAG-002: Run Named Entity Recognition

Overview

Field Value
ID TAG-002
Title Run Named Entity Recognition
Actor Tagging Service
Priority P1 (MVP Phase 3)

Description

Extract structured entities (names, orgs, dates, amounts) from document text.

Entity Types

Entity Label Examples
Person PERSON "John Smith", "Dr. Patel"
Organization ORG "Acme Corp", "HDFC Bank"
Date DATE "January 15, 2024", "Q1 2024"
Money MONEY "$5,000", "₹50,000"
Location GPE "Mumbai", "New York"
Email EMAIL "john@acme.com"
Phone PHONE "+1-555-1234"

Output

{
  "document_id": "doc_abc",
  "entities": [
    {"text": "Acme Corporation", "label": "ORG", "start": 45, "end": 61},
    {"text": "January 15, 2024", "label": "DATE", "start": 120, "end": 136},
    {"text": "$5,000.00", "label": "MONEY", "start": 200, "end": 209}
  ]
}

Acceptance Criteria

  • Standard NER entities extracted
  • Custom entities (emails, phones) extracted
  • Entity positions tracked

UC-TAG-003: Apply Auto-Tags

Overview

Field Value
ID TAG-003
Title Apply Auto-Tags
Actor System
Priority P1 (MVP Phase 3)

Description

Automatically apply tags to documents based on extracted keywords, entities, and classification.

Tag Sources

Source Example Tags
Document Type type:invoice, type:contract
Category category:finance, category:legal
Keywords payment, agreement, quarterly
Entities org:Acme Corp, person:John Smith
Rules client:ABC (based on patterns)

Steps

  1. Collect keywords from TAG-001
  2. Collect entities from TAG-002
  3. Apply classification tags from CLS-002/003
  4. Match against custom rules
  5. Deduplicate and normalize tags
  6. Store document-tag associations

Output

{
  "document_id": "doc_abc",
  "auto_tags": [
    {"tag": "type:invoice", "source": "classification", "confidence": 0.94},
    {"tag": "org:Acme Corp", "source": "ner", "confidence": 0.92},
    {"tag": "payment", "source": "keyword", "confidence": 0.85}
  ]
}

Acceptance Criteria

  • Tags applied from all sources
  • Tag source and confidence tracked
  • No duplicate tags

UC-TAG-004: Add Manual Tags

Overview

Field Value
ID TAG-004
Title Add Manual Tags
Actor User
Priority P2

Description

Allow users to manually add tags to documents.

Steps

  1. User selects document(s)
  2. User enters tag(s) or selects from suggestions
  3. System validates tag format
  4. Tags are added with source="manual"
  5. Audit log entry created

Input

{
  "document_id": "doc_abc",
  "tags": ["urgent", "client:XYZ", "project:2024-Q1"]
}

Acceptance Criteria

  • Multiple tags can be added at once
  • Tag autocomplete from existing tags
  • Manual tags distinguished from auto-tags

UC-TAG-005: Remove Tags

Overview

Field Value
ID TAG-005
Title Remove Tags
Actor User
Priority P2

Description

Allow users to remove tags from documents.

Acceptance Criteria

  • Individual tags can be removed
  • Bulk tag removal supported
  • Removal is logged

UC-TAG-006: Create Custom Tag

Overview

Field Value
ID TAG-006
Title Create Custom Tag
Actor Admin
Priority P2

Description

Define new tags in the system taxonomy.

Tag Definition

Field Description
name Tag identifier (lowercase, no spaces)
display_name Human-readable name
category Grouping category
color Display color (hex)
description Usage description

Output

{
  "tag": {
    "id": "tag_123",
    "name": "priority:high",
    "display_name": "High Priority",
    "category": "priority",
    "color": "#FF0000"
  }
}

Acceptance Criteria

  • Custom tags can be created
  • Tag categories for organization
  • Tags are unique

UC-TAG-007: Manage Tag Taxonomy

Overview

Field Value
ID TAG-007
Title Manage Tag Taxonomy
Actor Admin
Priority P3

Description

Organize tags into hierarchical categories and manage the tag structure.

Taxonomy Structure

tags/
├── type/
│   ├── invoice
│   ├── contract
│   └── report
├── category/
│   ├── finance
│   ├── legal
│   └── hr
├── org/
│   └── (dynamic entities)
└── custom/
    └── (user-defined)

Acceptance Criteria

  • Hierarchical tag organization
  • Tag merging (combine synonyms)
  • Tag deprecation

UC-TAG-008: Suggest Tags Based on Similar Docs

Overview

Field Value
ID TAG-008
Title Suggest Tags Based on Similar Docs
Actor System
Priority P3

Description

Suggest tags based on tags applied to similar documents.

Steps

  1. Find similar documents (via embeddings)
  2. Collect tags from similar documents
  3. Rank by frequency and similarity
  4. Return as suggestions

Output

{
  "document_id": "doc_abc",
  "suggested_tags": [
    {"tag": "quarterly-report", "similar_docs": 15, "confidence": 0.78},
    {"tag": "finance", "similar_docs": 23, "confidence": 0.85}
  ]
}

Acceptance Criteria

  • Suggestions based on similar documents
  • Suggestions are relevant

← Back to Use Cases | Previous: Classification | Next: OCR Processing →