Usage Guide¶
This guide covers how to use CLAMS Vocabulary types in your Python code.
Best Practices¶
Use Unversioned Types: Import
TimeFramenotTimeFrame_v7unless you need to pin a specific versionUse Namespace Classes: Access types via
AnnotationTypes.TimeFramefor discoverability and IDE supportCanonical Property Names: Always use canonical field names in new code, aliases are for backward compatibility only
Type Comparison: Use fuzzy comparison (unversioned types) for flexibility, strict comparison (versioned types) only when version matters
URI Resolution: Use
from_str()for parsing URIs from MMIF files, direct import for creating new annotations
Importing Types¶
Basic Import Patterns¶
Import specific types directly:
from clams_vocabulary import TimeFrame, TextDocument, Annotation
Import namespace classes for type discovery:
from clams_vocabulary import AnnotationTypes, DocumentTypes
Import base classes for type checking:
from clams_vocabulary import ClamsTypesBase, AnnotationTypesBase
Namespace Classes¶
The vocabulary provides two namespace classes for convenient access to all types:
AnnotationTypes¶
Contains all annotation types as class attributes:
from clams_vocabulary import AnnotationTypes
# Access latest version of each type
tf = AnnotationTypes.TimeFrame
span = AnnotationTypes.Span
bbox = AnnotationTypes.BoundingBox
DocumentTypes¶
Contains all document types as class attributes:
from clams_vocabulary import DocumentTypes
# Access document types
text_doc = DocumentTypes.TextDocument
video_doc = DocumentTypes.VideoDocument
Versioned Type Access¶
Access specific versions explicitly if needed:
from clams_vocabulary import AnnotationTypes
# Latest version (recommended)
tf_latest = AnnotationTypes.TimeFrame
# Specific version (advanced use)
tf_v5 = AnnotationTypes.TimeFrame_v5
tf_v6 = AnnotationTypes.TimeFrame_v6
Type Resolution¶
from_str() Method¶
Create type instances from URI strings:
from clams_vocabulary import ClamsTypesBase
# Parse from full URI
uri = "https://clams.ai/vocabulary/type/TimeFrame/v7"
type_instance = ClamsTypesBase.from_str(uri)
# Returns a TimeFrame_v7 instance
print(type_instance.shortname) # "TimeFrame"
print(type_instance.version) # "v7"
print(str(type_instance)) # Full URI
Legacy URI Support (alsoKnownAs)¶
The vocabulary maintains backward compatibility with old URIs via the
alsoKnownAs mechanism:
# Old MMIF vocabulary URI
old_uri = "http://mmif.clams.ai/vocabulary/TimeFrame/v5"
type_instance = ClamsTypesBase.from_str(old_uri)
# Returns TimeFrame_v5 instance
# Preserves original URI in initialized_from attribute
print(type_instance.initialized_from) # Original old_uri
print(type_instance.uri) # Canonical new URI
Round-Trip Fidelity¶
When deserializing MMIF files, original URIs are preserved:
# Input MMIF uses old URI format
type_instance = ClamsTypesBase.from_str("http://mmif.clams.ai/vocabulary/Span/v3")
# Serialization preserves original format
print(repr(type_instance)) # Uses initialized_from (old URI)
# Canonical URI available for comparisons
print(type_instance.uri) # Canonical new URI
Type Comparison¶
Class-Level Comparison¶
Compare types using classes:
from clams_vocabulary import AnnotationTypes, TimeFrame, TimeFrame_v7
# Class-to-class comparison
if TimeFrame == TimeFrame_v7:
print("Fuzzy match!") # True - unversioned matches versioned
# Namespace class comparison
if TimeFrame == AnnotationTypes.TimeFrame:
print("Same type!") # True
Instance-to-Class Comparison¶
Compare type instances with classes:
from clams_vocabulary import AnnotationTypes, ClamsTypesBase
# Create instance from URI
type_instance = ClamsTypesBase.from_str(
"https://clams.ai/vocabulary/type/TimeFrame/v7"
)
# Compare with class
if type_instance == AnnotationTypes.TimeFrame:
print("Match!") # True
# Compare with string URI
if type_instance == "https://clams.ai/vocabulary/type/TimeFrame/v7":
print("String match!") # True
Fuzzy Equality¶
Versioned vs Unversioned Classes¶
The vocabulary supports two comparison modes:
Strict comparison (versioned classes):
from clams_vocabulary import TimeFrame_v5, TimeFrame_v6
# Strict - versions must match exactly
TimeFrame_v5 == TimeFrame_v6 # False
Fuzzy comparison (unversioned aliases):
from clams_vocabulary import TimeFrame, TimeFrame_v5, TimeFrame_v6
# Fuzzy - ignores version differences
TimeFrame == TimeFrame_v5 # True
TimeFrame == TimeFrame_v6 # True
TimeFrame_v5 == TimeFrame # True (symmetric)
Automatic Mode Detection¶
Comparison mode is determined by class name:
TimeFrame_v5→ Strict comparison (has version suffix)TimeFrame→ Fuzzy comparison (no version suffix)If either operand uses fuzzy mode, fuzzy comparison is used
Property Aliases¶
Deserialization Aliases¶
The vocabulary supports legacy property names via Pydantic field aliases:
from clams_vocabulary import TimeFrame
# Modern MMIF uses canonical name
data = {"label": "speech-segment"}
tf = TimeFrame(**data)
print(tf.label) # "speech-segment"
# Legacy MMIF uses old name (alias)
legacy_data = {"frameType": "speech-segment"}
tf = TimeFrame(**legacy_data)
print(tf.label) # "speech-segment" - automatically canonicalized
Serialization Behavior¶
Important: Serialization always uses canonical field names:
from clams_vocabulary import TimeFrame
# Input uses legacy name
tf = TimeFrame(frameType="speech-segment")
# Output uses canonical name
print(tf.model_dump()) # {"label": "speech-segment"}
This means reading and re-writing MMIF files will canonicalize property names.
URI Registry¶
Global Type Registry¶
All vocabulary types are registered in a global URI-to-type mapping:
from clams_vocabulary import URI_TO_TYPE
# Lookup by canonical URI
uri = "https://clams.ai/vocabulary/type/TimeFrame/v7"
type_class = URI_TO_TYPE[uri]
# Lookup by legacy URI (alsoKnownAs)
old_uri = "http://mmif.clams.ai/vocabulary/TimeFrame/v7"
type_class = URI_TO_TYPE[old_uri] # Same class
Type Attributes¶
Every type class and instance provides standard attributes:
Class Attributes (ClassVar)¶
from clams_vocabulary import TimeFrame
print(TimeFrame.shortname) # "TimeFrame"
print(TimeFrame.version) # "v7"
print(TimeFrame.uri) # Full canonical URI
print(TimeFrame.description) # Human-readable description
Instance Attributes¶
from clams_vocabulary import ClamsTypesBase
tf = ClamsTypesBase.from_str("https://clams.ai/vocabulary/type/TimeFrame/v7")
print(tf.shortname) # "TimeFrame"
print(tf.version) # "v7"
print(str(tf)) # Full URI
print(tf.initialized_from) # Original input URI
Annotation ID Prefixes¶
The vocabulary provides automatic prefix generation for annotation IDs:
from clams_vocabulary import AnnotationTypes
# Get prefix for annotation ID generation
prefix = AnnotationTypes.TimeFrame.get_prefix()
print(prefix) # "tf"
# Use in annotation ID
annotation_id = f"{prefix}_1" # "tf_1"
Prefixes are auto-generated from type shortnames and guaranteed collision-free.