Tagging Use Cases (TAG)¶

Module Purpose: Extract and manage tags for documents using NER, keyword extraction, and custom taxonomies. This module contains 8 use cases.

Use Case Quick Reference¶

ID	Title	Priority
TAG-001	Extract Keywords	P1
TAG-002	Run Named Entity Recognition	P1
TAG-003	Apply Auto-Tags	P1
TAG-004	Add Manual Tags	P2
TAG-005	Remove Tags	P2
TAG-006	Create Custom Tag	P2
TAG-007	Manage Tag Taxonomy	P3
TAG-008	Suggest Tags Based on Similar Docs	P3

UC-TAG-001: Extract Keywords¶

Overview¶

Field	Value
ID	TAG-001
Title	Extract Keywords
Actor	Tagging Service
Priority	P1 (MVP Phase 3)

Description¶

Extract important keywords and phrases from document text using statistical methods.

Methods¶

Method	Tool	Best For
TF-IDF	scikit-learn	Corpus-level importance
YAKE	yake	Unsupervised, fast
KeyBERT	keybert	Semantic keywords

Steps¶

Retrieve extracted text
Run keyword extraction algorithm
Filter by score threshold
Deduplicate similar keywords
Return top N keywords

Output¶

{
  "document_id": "doc_abc",
  "keywords": [
    {"keyword": "payment terms", "score": 0.85},
    {"keyword": "invoice", "score": 0.82},
    {"keyword": "net 30", "score": 0.78},
    {"keyword": "accounts payable", "score": 0.72}
  ]
}

Acceptance Criteria¶

Top 10 keywords extracted per document
Keywords are relevant to document content
Processing time <1s

UC-TAG-002: Run Named Entity Recognition¶

Overview¶

Field	Value
ID	TAG-002
Title	Run Named Entity Recognition
Actor	Tagging Service
Priority	P1 (MVP Phase 3)

Description¶

Extract structured entities (names, orgs, dates, amounts) from document text.

Entity Types¶

Entity	Label	Examples
Person	PERSON	"John Smith", "Dr. Patel"
Organization	ORG	"Acme Corp", "HDFC Bank"
Date	DATE	"January 15, 2024", "Q1 2024"
Money	MONEY	"$5,000", "₹50,000"
Location	GPE	"Mumbai", "New York"
Email	EMAIL	"john@acme.com"
Phone	PHONE	"+1-555-1234"

Output¶

{
  "document_id": "doc_abc",
  "entities": [
    {"text": "Acme Corporation", "label": "ORG", "start": 45, "end": 61},
    {"text": "January 15, 2024", "label": "DATE", "start": 120, "end": 136},
    {"text": "$5,000.00", "label": "MONEY", "start": 200, "end": 209}
  ]
}

Acceptance Criteria¶

Standard NER entities extracted
Custom entities (emails, phones) extracted
Entity positions tracked

UC-TAG-003: Apply Auto-Tags¶

Overview¶

Field	Value
ID	TAG-003
Title	Apply Auto-Tags
Actor	System
Priority	P1 (MVP Phase 3)

Description¶

Automatically apply tags to documents based on extracted keywords, entities, and classification.

Tag Sources¶

Source	Example Tags
Document Type	`type:invoice`, `type:contract`
Category	`category:finance`, `category:legal`
Keywords	`payment`, `agreement`, `quarterly`
Entities	`org:Acme Corp`, `person:John Smith`
Rules	`client:ABC` (based on patterns)

Steps¶

Collect keywords from TAG-001
Collect entities from TAG-002
Apply classification tags from CLS-002/003
Match against custom rules
Deduplicate and normalize tags
Store document-tag associations

Output¶

{
  "document_id": "doc_abc",
  "auto_tags": [
    {"tag": "type:invoice", "source": "classification", "confidence": 0.94},
    {"tag": "org:Acme Corp", "source": "ner", "confidence": 0.92},
    {"tag": "payment", "source": "keyword", "confidence": 0.85}
  ]
}

Acceptance Criteria¶

Tags applied from all sources
Tag source and confidence tracked
No duplicate tags

UC-TAG-004: Add Manual Tags¶

Overview¶

Field	Value
ID	TAG-004
Title	Add Manual Tags
Actor	User
Priority	P2

Description¶

Allow users to manually add tags to documents.

Steps¶

User selects document(s)
User enters tag(s) or selects from suggestions
System validates tag format
Tags are added with source="manual"
Audit log entry created

Input¶

{
  "document_id": "doc_abc",
  "tags": ["urgent", "client:XYZ", "project:2024-Q1"]
}

Acceptance Criteria¶

Multiple tags can be added at once
Tag autocomplete from existing tags
Manual tags distinguished from auto-tags

UC-TAG-005: Remove Tags¶

Overview¶

Field	Value
ID	TAG-005
Title	Remove Tags
Actor	User
Priority	P2

Description¶

Allow users to remove tags from documents.

Acceptance Criteria¶

Individual tags can be removed
Bulk tag removal supported
Removal is logged

UC-TAG-006: Create Custom Tag¶

Overview¶

Field	Value
ID	TAG-006
Title	Create Custom Tag
Actor	Admin
Priority	P2

Description¶

Define new tags in the system taxonomy.

Tag Definition¶

Field	Description
name	Tag identifier (lowercase, no spaces)
display_name	Human-readable name
category	Grouping category
color	Display color (hex)
description	Usage description

Output¶

{
  "tag": {
    "id": "tag_123",
    "name": "priority:high",
    "display_name": "High Priority",
    "category": "priority",
    "color": "#FF0000"
  }
}

Acceptance Criteria¶

Custom tags can be created
Tag categories for organization
Tags are unique

UC-TAG-007: Manage Tag Taxonomy¶

Overview¶

Field	Value
ID	TAG-007
Title	Manage Tag Taxonomy
Actor	Admin
Priority	P3

Description¶

Organize tags into hierarchical categories and manage the tag structure.

Taxonomy Structure¶

tags/
├── type/
│   ├── invoice
│   ├── contract
│   └── report
├── category/
│   ├── finance
│   ├── legal
│   └── hr
├── org/
│   └── (dynamic entities)
└── custom/
    └── (user-defined)

Acceptance Criteria¶

Hierarchical tag organization
Tag merging (combine synonyms)
Tag deprecation

UC-TAG-008: Suggest Tags Based on Similar Docs¶

Overview¶

Field	Value
ID	TAG-008
Title	Suggest Tags Based on Similar Docs
Actor	System
Priority	P3

Description¶

Suggest tags based on tags applied to similar documents.

Steps¶

Find similar documents (via embeddings)
Collect tags from similar documents
Rank by frequency and similarity
Return as suggestions

Output¶

{
  "document_id": "doc_abc",
  "suggested_tags": [
    {"tag": "quarterly-report", "similar_docs": 15, "confidence": 0.78},
    {"tag": "finance", "similar_docs": 23, "confidence": 0.85}
  ]
}

Acceptance Criteria¶

Suggestions based on similar documents
Suggestions are relevant

← Back to Use Cases | Previous: Classification | Next: OCR Processing →