Developing plugins for MMIF Python SDK
Overview
As MMIF JSON files can carry information about media document files without actually carrying file contents, file accessing modules in mmif
Python package (distributed as mmif-python
on PyPI) are designed to be lightweight and flexible so that it can work with additional “plugin” Python packages that can handle concrete file access.
This documentation focuses on Python implementation of the MMIF. To learn more about the data format specification, please visit the MMIF website.
mmif-python
is a public, open source implementation of the MMIF data format. mmif-python
supports serialization/deserialization of MMIF objects from/to Python objects, as well as many navigation and manipulation helpers for MMIF objects.
Developer can write simple plugins that can provide additional functionalities to mmif-python
. For example, mmif-python
does not provide any file access functionality beyond a local file system (with file
scheme), but it can be extended with a plugin that can handle file access over different protocols (e.g. http
, s3
, ftp
, etc).
This document contains information about how to write plugins for mmif-python
.
Document Location Scheme Plugins
mmif.serialize.annotation.Document
class has various methods to access parts of location
property of the document. The location is in URI/IRI format (SCHEME://HOSTNAME/PATH
, in a nutshell) and it has to be resolved to a local file for CLAMS Apps to process the local file to analyze and extract information about the media and its contents. The core mmif-python
distribution only provides a default implementation that can handle file
scheme URIs.
To add a document location handler plugin, you need to implement a Python “package” (do not confuse with PYPI distribution), that meets these requirements;
the package must be named
mmif_docloc_<SCHEME>
. For example, to implement a handler fors3
scheme, the package name must bemmif_docloc_s3
. The prefix is important as it’s used in the plugin discovery process from the coremmif-python
modules.the top module of the package must have a function named
resolve
. The function must take a single argument, which is astr
of the document location URI. The function must return astr
of the local file path. For example, if the document location iss3://mybucket/myfile.mp4
, a Python user should be able to to something like this;
import mmif_docloc_s3
resolved = mmif_docloc_s3.resolve('s3://mybucket/myfile.mp4')
# then resolved must be a local file path that can be used to open the file
Here’s a minimal example codebase that you refer to when you develop a docloc
plugin.
(However, before you start writing your own plugin for a specific URI scheme, checking if there’s already a PyPI distribution for the scheme might be a good idea.)
$ tree .
.
├── mmif_docloc_dummy
│ └── __init__.py
├── pyproject.toml
└── setup.cfg
$ cat pyproject.toml
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
$ cat setup.cfg
[metadata]
name = mmif_docloc_dummy # this name is IMPORTANT
version = 0.0.1
description = a plugin to mmif-pyhon to handle `dummy` location scheme
And the plugin code.
# mmif_docloc_dummy/__init__.py
doc_types = {'video': 'mp4'}
def resolve(docloc):
scheme = 'dummy'
if docloc.startswith(f'{scheme}://'):
doc_id, doc_type = docloc.split('.')
return f'/path/to/{doc_id}.{doc_types[doc_type]}'
else:
raise ValueError(f'cannot handle document location scheme: {docloc}')
Bulit-in Document Location Scheme Plugins
At the moment, mmif-python
PyPI distribution ships a built-in docloc plugin that support both http
and https
schemes.
Take a look at mmif_docloc_http
module for details.