Overview¶
Introduction¶
The CLAMS Vocabulary defines the type system used in the CLAMS project, notably used for annotating and describing multimedia content in a Multi-Media Interchange Format (MMIF) serialization. It provides:
Type Definitions - Pydantic-based models for annotation and document types
Runtime Validation - Automatic validation of serialized Annotation objects against type schemas
Versioning - Independent versioning of vocabulary types separate from the MMIF specification
Machine-Readable - JSON Schema generation and IDE support through Pydantic
Purpose¶
MMIF is a JSON-LD based format for representing multi-modal annotations on multimedia content. The CLAMS Vocabulary serves as the controlled vocabulary layer for “Annotation” objects in the MMIF serialization, defining:
Annotation Types - How to represent linguistic, visual, and audio annotations
Document Types - How to represent source documents (text, video, audio, images)
(planned) Controlled Vocabularies - Enumerated value sets for constrained properties
(planned) Task Types - Definitions for processing content analysis tasks (NLP, CV, etc) and their general input/output relationships
Architecture¶
Pure Python Implementation¶
Earlier versions of the CLAMS Vocabulary were published as part of the MMIF specification in YAML format. The new implementation (as of 2026) uses pure Python with Pydantic models. Benefits include:
Single source of truth for types and validation
No separate schema generation step
IDE autocomplete and type checking
Runtime validation with clear error messages
Versioning Strategy¶
Each vocabulary type maintains independent versions:
Archetype - The current/desired state (human-edited)
Snapshots - Immutable historical versions (auto-generated)
When a type changes, the build system generates a new version snapshot, ensuring backward compatibility with older MMIF files.
Using the Vocabulary¶
For detailed usage instructions including import patterns, type resolution, comparison behavior, and property aliases, see the Usage Guide guide.
Integration with mmif-python SDK¶
The vocabulary integrates with mmif-python for MMIF serialization and deserialization. For integration details and migration guides, see the mmif-python documentation.