mmif shell command

mmif-python comes with a command line interface (CLI) that allows you to handle MMIF files. Many of these commands are designed to handle MMIF files in the context of CLAMS workflows.

The CLI is installed as mmif shell command. To see the available commands, run

mmif --help

The following documentation is automatically generated from the CLI help messages.

Main Command

usage: sphinx-build [-h] [-v] {} ...

options:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

sub-command:
  {}

describe

usage: mmif describe [-h] [-o OUTPUT] [-p] [MMIF_FILE]

provides CLI to describe the workflow specification from a MMIF file or a collection of MMIF files.

This command extracts workflow information from a single MMIF file or
    summarizes a directory of MMIF files.

    ==========================
    For a single MMIF file
    ==========================
    Reads a MMIF file and extracts the workflow specification from it.

This function provides an app-centric summarization of the workflow. The
conceptual hierarchy is that a **workflow** is a sequence of **apps**,
and each **app** execution can produce one or more **views**. This function
groups views that share the same ``app`` and ``metadata.timestamp`` into
a single logical "app execution".

.. note::
    For MMIF files generated by ``clams-python`` <= 1.3.3, all views
    are independently timestamped. This means that even if multiple views
    were generated by a single execution of an app, their
    ``metadata.timestamp`` values will be unique. As a result, the grouping
    logic will treat each view as a separate app execution. The change
    that aligns timestamps for views from a single app execution is
    implemented in `clams-python PR #271
    <https://github.com/clamsproject/clams-python/pull/271>`_.

The output format is a dictionary with the following keys:

* ``workflowId``
    A unique identifier for the workflow, based on the
    sequence of app executions (app, version, parameter hashes). App
    executions with errors are excluded from this identifier. App
    executions with warnings are still considered successful for the purpose
    of this identifier.
* ``stats``
    A dictionary with the following keys:

    ``appCount``
        Total number of identified app executions.
    ``errorViews``
        A list of view IDs that reported errors.
    ``warningViews``
        A list of view IDs that reported warnings.
    ``emptyViews``
        A list of view IDs that contain no annotations.
    ``annotationCountByType``
        A dictionary mapping each annotation type to its count, plus a
        ``total`` key for the sum of all annotations across all app
        executions.
* ``apps``
    A list of objects, where each object represents one app
    execution. It includes metadata, profiling, and aggregated statistics
    for all views generated by that execution. A special entry for views
    that could not be assigned to an execution will be at the end of the list.

---
The docstring above is used to generate help messages for the CLI command.
Do not remove the triple-dashed lines.

    ===============================
    For a directory of MMIF files
    ===============================
    Reads all MMIF files in a directory and extracts a summarized workflow specification.

This function provides an overview of a collection of MMIF files, aggregating
statistics across multiple files.

The output format is a dictionary with the following keys:

* ``mmifCountByStatus``
    A dictionary summarizing the processing status of all MMIF files in the
    collection. It includes:

    ``total``
        Total number of MMIF files found.
    ``successful``
        Number of MMIF files processed without errors (may contain warnings).
    ``withErrors``
        Number of MMIF files containing app executions that reported errors.
    ``withWarnings``
        Number of MMIF files containing app executions that reported warnings.
    ``invalid``
        Number of files that failed to be parsed as valid MMIF.
* ``workflows``
    A list of "workflow" objects found in the "successful" MMIF files (files
    with errors are excluded), where each object contains:

    ``workflowId``
        The unique identifier for the workflow.
    ``apps``
        A list of app objects, each with ``app`` (name+ver identifier),
        ``appConfiguration``, and ``appProfiling`` statistics (avg, min, max,
        stdev running times) aggregated per workflow.
    ``mmifs``
        A list of MMIF file basenames belonging to this workflow.
    ``mmifCount``
        The number of MMIF files in this workflow.
* ``annotationCountByType``
    A dictionary aggregating annotation counts across the entire collection.
    It includes a ``total`` key for the grand total, plus integer counts for
    each individual annotation type.

---
The docstring above is used to generate help messages for the CLI command.
Do not remove the triple-dashed lines.

positional arguments:
  MMIF_FILE             input MMIF file, a directory of MMIF files, or STDIN if `-` or not
                        provided.

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        output file path, or STDOUT if not provided.
  -p, --pretty          Pretty-print JSON output

rewind

usage: mmif rewind [-h] [-o OUTPUT] [-p] [-n NUMBER] [-m {app,view}] [MMIF_FILE]

provides CLI to rewind a MMIF from a CLAMS workflow.

MMIF rewinder rewinds a MMIF by deleting the last N views.
N can be specified as a number of views, or a number of producer apps.

positional arguments:
  MMIF_FILE             input MMIF file path, or STDIN if `-` or not provided.

options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output OUTPUT
                        output file path, or STDOUT if not provided.
  -p, --pretty          Pretty-print rewound MMIF
  -n NUMBER, --number NUMBER
                        Number of views or apps to rewind, must be a positive integer. If 0, the
                        user will be prompted to choose. (default: 0)
  -m {app,view}, --mode {app,view}
                        Choose to rewind by number of views or number of producer apps. (default:
                        view)

source

usage: mmif source [-h] [-p [PATH]] [-o OUTPUT] [-s [SCHEME]] documents [documents ...]

provides CLI to create a "source" MMIF json.

A source MMIF is a MMIF with a list of source documents but empty views.
It can be used as a starting point for a CLAMS workflow.

positional arguments:
  documents             This list of documents MUST be colon-delimited pairs of document types and
                        file locations. A document type can be one of `audio`, `video`, `text`,
                        `image`, or a MIME type string (such as video/mp4). The file locations
                        MUST be valid URI strings (e.g. `file:///path/to/file.mp4`, or URI scheme
                        part can be omitted, when `--scheme` flag is used). Note that when
                        `file://` scheme is used (default), locations MUST BE POSIX forms (Windows
                        forms are not supported). The output will be a MMIF file containing a
                        document for each of those file paths, with the appropriate ``@type`` and
                        MIME type (if given).

options:
  -h, --help            show this help message and exit
  -p [PATH], --prefix [PATH]
                        An absolute path to use as prefix for file paths (ONLY WORKS with the
                        default `file://` scheme, ignored otherwise. MUST BE a POSIX form, Windows
                        form is not supported). If prefix is set, document file paths MUST be
                        relative. Useful when creating source MMIF files from a system that's
                        different from the environment that actually runs the workflow (e.g. in a
                        container).
  -o OUTPUT, --output OUTPUT
                        output file path, or STDOUT if not provided.
  -s [SCHEME], --scheme [SCHEME]
                        A scheme to associate with the document location URI. When not given, the
                        default scheme is `file://`. (AVAILABLE ADDITIONAL SCHEMES) "http"
                        (location must be a URL string.)