Getting Started

Overview

MultiMedia Interchange Format (MMIF) is a JSON(-LD)-based data format designed for reproducibility, transparency and interoperability for customized computational analysis application workflows. This documentation focuses on Python implementation of the MMIF. To learn more about the data format specification, please visit the MMIF website. mmif-python is a public, open source implementation of the MMIF data format. mmif-python supports serialization/deserialization of MMIF objects from/to Python objects, as well as many navigation and manipulation helpers for MMIF objects.

Prerequisites

  • Python: the latest mmif-python requires Python 3.10 or newer.

Installation

Package mmif-python is distributed via the official PyPI. Users are supposed to pip-install to get latest release.

pip install mmif-python

This will install a package mmif to your local python library.

The MMIF format and specification is evolving over time, and mmif-python package will be updated along with the changes in MMIF format.

Note

The MMIF format is not always backward-compatible. To find out more about relations between MMIF specification versions and mmif-python versions, please take time to read our decision on the subject here. If you need to know which python SDK supports which specification version, see Target MMIF Versions page.

MMIF Serialization

mmif.serialize.mmif.Mmif represents the top-level MMIF object. Subcomponents of the MMIF object (views, annotation objects and metadata for each object) and the MMIF object itself are all subclasses of mmif.serialize.model.MmifObject. To start with an existing MMIF str, simply initiate a new Mmif object with that string.

from mmif import Mmif

mmif_str = """{
"metadata": {
  "mmif": "http://mmif.clams.ai/1.0.0"
},
"documents": [
  {
    "@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
    "properties": {
      "id": "m1",
      "mime": "video/mp4",
      "location": "file:///var/archive/video-0012.mp4"
    }
  },
  {
    "@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1",
    "properties": {
      "id": "m2",
      "mime": "text/plain",
      "location": "file:///var/archive/video-0012-transcript.txt"
    }
  }
],
"views": []}"""

mmif_obj = Mmif(mmif_str)

Few notes;

  1. MMIF objects do not carry the primary source files in it (although there are exceptions for text documents).

  2. MMIF objects specify the MMIF version at the top. As not all MMIF versions are backward-compatible, a version of the mmif-python implementation might not be able to load an unsupported MMIF versions.

When serializing back to str, call mmif.serialize.model.MmifObject.serialize() on the object.

To get subcomponents, you can use various getters implemented in subclasses. For example;

from mmif.vocabulary.document_types import DocumentTypes

for video in mmif_obj.Mmif.get_documents_by_type(DocumentTypes.VideoDocument):
    with open(video.location_path(), 'b') as in_video:
        # do something with the video file

For a full list of available helper methods, please refer to the API documentation pages (See left sidebar).

MMIF usage in CLAMS Workflows

In the context of CLAMS, a Workflow refers to the sequence of CLAMS applications that have been executed to generate the views and annotations within a MMIF file.

When using the mmif-python SDK, a unique identifier for a workflow (workflowId) is calculated based on the applications involved. This identifier is constructed by concatenating the application name, version, and a hash of the runtime parameters for each step in the sequence. This ensures that the identifier uniquely represents not just the apps used, but their specific configurations, aiding in reproducibility.