mmif.utils package¶
Package containing utility modules for handling different types of source documents, and general implementation of common data structures and algorithms.
Subpackages¶
- mmif.utils.cli package
- mmif.utils.summarizer package
argparser()pp_args()main()- Submodules
- mmif.utils.summarizer.config module
- mmif.utils.summarizer.graph module
- mmif.utils.summarizer.nodes module
- mmif.utils.summarizer.summary module
- mmif.utils.summarizer.utils module
Submodules¶
mmif.utils.sequence_helper module¶
This module provides helpers for handling sequence labeling. Specifically, it provides
a generalized label re-mapper for “post-binning” of labels
conversion from a list of CLAMS annotations (with
classificationprops) into a list of reals (scores by labels), can be combined with the label re-mapper mentioned abovemmif.utils.sequence_helper.smooth_outlying_short_intervals(): a simple smoothing algorithm by trimming “short” outlier sequences
However, it DOES NOT provide
direct conversion between CLAMS annotations. For example, it does not directly handle stitching of
TimePointintoTimeFrames.support for multi-class scenario, such as handling of _competing_ subsequence or overlapping labels.
Some functions can use optional external libraries (e.g., numpy) for better performance.
Hence, if you see a warning about missing optional packages, you might want to install them by running pip install mmif-python[seq].
- mmif.utils.sequence_helper.smooth_outlying_short_intervals(scores: List[float], min_spseq_size: int, min_snseq_size: int, min_score: float = 0.5)[source]¶
Given a list of scores, a score threshold, and smoothing parameters, identify the intervals of “positive” scores by “trimming” the short positive sequences (“spseq”) and short negative sequences (“snseq”). To decide the positivity, first step is binarization of the scores by the
min_scorethreshold. GivenSras “raw” input real-number scores list, andmin_score=0.5,Sr: [0.3, 0.6, 0.2, 0.8, 0.2, 0.9, 0.8, 0.5, 0.1, 0.5, 0.8, 0.3, 1.0, 0.7, 0.5, 0.5, 0.5, 0.8, 0.3, 0.6]
the binarization is done by simply comparing each score to the threshold to get
Slist of binary scores1.0 : | 0.9 : | | 0.8 : | | | | | | 0.7 : | | | | | | | 0.6 : | | | | | | | | | 0.5 :----+-----+-----+--+--+-----+--+-----+--+--+--+--+--+-----+- 0.4 : | | | | | | | | | | | | | | 0.3 : | | | | | | | | | | | | | | | | | 0.2 : | | | | | | | | | | | | | | | | | | | 0.1 : | | | | | | | | | | | | | | | | | | | | 0.0 +------------------------------------------------------------ raw :.3 .6 .2 .8 .2 .9 .8 .5 .1 .5 .8 .3 1. .7 .5 .5 .5 .8 .3 .6 S : 0 1 0 1 0 1 1 0 0 0 1 0 1 1 0 1 1 1 0 1
Note that the size of a positive or negative sequence can be as small as 1.
Then, here are examples of smoothing a list of binary scores into intervals, by trimming “very short” (under thresholds) sequences of positive or negative:
Note
legends:
tis unit index (e.g. time index)Sis the list of binary scores (zeros and ones)Iis the list of intervals after smoothing
with params
min_spseq_size==1,min_snseq_size==4t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1] I: [0, 1--1--1--1--1--1--1--1--1--1--1--1, 0--0--0--0--0--0, 1]
Explanation:
min_snseq_sizeis used to smooth short sequences of negative predictions. In this, zeros from t[7:10] are smoothed into “one” I, while zeros from t[13:19] are kept as “zero” I. Note that the “short” snseqs at the either ends (t[0:1]) are never smoothed.with params
min_spseq_size==4,min_snseq_size==2t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1] I: [0, 1--1--1--1--1--1, 0--0--0--0--0--0--0--0--0--0--0--0--0]
Explanation:
min_spseq_sizeis used to smooth short sequences of positive predictions. In this example, the spseqs of ones from both t[10:13] and t[19:20] are smoothed. Note that the “short” spseqs at the either ends (t[19:20]) are always smoothed.with params
min_spseq_size==4,min_snseq_size==4t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1] I: [0, 1--1--1--1--1--1--1--1--1--1--1--1--0--0--0--0--0--0--0]
Explanation: When two threshold parameters are working together, the algorithm will prioritize the smoothing of the snseqs over the smoothing of the spseqs. Thus, in this example, the snseq t[7:10] gets first smoothed “up” before the spseq t[10:13] is smoothed “down”, resulting in a long final I.
with params
min_spseq_size==4,min_snseq_size==4t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1] I: [1--1--1--1--1--1--1, 0--0--0--0, 1--1--1--1--1--1--1--1--1]
Explanation: Since smoothing of snseqs is prioritized, short spseqs at the beginning or the end can be kept.
with params
min_spseq_size==1,min_snseq_size==1t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1] I: [0--0--0, 1--1--1--1, 0--0--0--0, 1--1--1--1, 0--0--0, 1--1]
Explanation: When both width thresholds are set to 1, the algorithm works essentially in the “stitching” only mode.
- Parameters:
scores – SORTED list of scores to be smoothed. The score list is assumed to be “exhaust” the entire time or space of the underlying document segment. (Sorted by the start, and then by the end of anchors)
min_spseq_size – minimum size of a positive sequence not to be smoothed (greater or equal to)
min_snseq_size – minimum size of a negative sequence not to be smoothed (greater or equal to)
min_score – minimum threshold to use to discard low-scored units (strictly less than)
- Returns:
list of tuples of start(inclusive)/end(exclusive) indices of the “positive” sequences. Negative sequences (regardless of their size) are not included in the output.
- mmif.utils.sequence_helper.validate_labelset(annotations: Iterable[Annotation]) List[str][source]¶
Simple check for a list of annotations to see if they have the same label set.
- Raise:
AttributeError if an element in the input list doesn’t have the
labelsetproperty- Raise:
ValueError if different
labelsetvalues are found- Returns:
a list of the common
labelsetvalue (list of label names)
- mmif.utils.sequence_helper.build_label_remapper(src_labels: List[str], dst_labels: Dict[str, str | int | float | bool | None]) Dict[str, str | int | float | bool | None][source]¶
Build a label remapper dictionary from source and destination labels.
- Parameters:
src_labels – a list of all labels on the source side
dst_labels – a dict from source labels to destination labels. Source labels not in this dict will be remapped to a negative label (
-).
- Returns:
a dict that exhaustively maps source labels to destination labels
- mmif.utils.sequence_helper.build_score_lists(classifications: ~typing.List[~typing.Dict], label_remapper: ~typing.Dict, score_remap_op: ~typing.Callable[[...], float] = <built-in function max>) Tuple[Dict[str, int], numpy.ndarray][source]¶
Build lists of scores indexed by the label names.
- Parameters:
classifications – list of dictionaries of classification results, taken from input annotation objects
label_remapper – a dictionary that maps source label names to destination label names (formerly “postbin”)
score_remap_op – a function to remap the scores from multiple source labels binned to a destination label common choices are
max,min, orsum
- Returns:
a dictionary that maps label names to their index in the score list
2-d numpy array of scores, of which rows are indexed by label map dict (first return value)
mmif.utils.text_document_helper module¶
- mmif.utils.text_document_helper.slice_text(mmif_obj, start: int, end: int, unit: str = 'milliseconds') str[source]¶
Extracts text from tokens within a specified time range.
- Parameters:
mmif_obj – MMIF object to search for tokens
start – start time point
end – end time point
unit – time unit for start and end parameters (default: “milliseconds”)
- Returns:
space-separated string of token words found in the time range
mmif.utils.timeunit_helper module¶
- mmif.utils.timeunit_helper.convert(t: int | float | str, in_unit: str, out_unit: str, fps: float) int | float | str[source]¶
Converts time from one unit to another. Works with
frames,seconds,milliseconds.- Parameters:
t – time value to convert
in_unit – input time unit, one of
frames,seconds,millisecondsout_unit – output time unit, one of
frames,seconds,millisecondsfps – frames per second
- Returns:
converted time value
mmif.utils.video_document_helper module¶
- class mmif.utils.video_document_helper.SamplingMode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
EnumDetermines how timepoints are selected from a TimeFrame.
- REPRESENTATIVES = 'representatives'¶
- SINGLE = 'single'¶
- ALL = 'all'¶
- mmif.utils.video_document_helper.open_container(video_document: Document)[source]¶
Opens a video file and caches stream metadata on the document.
Reads
time_base,start_time,duration, andaverage_ratefrom the first video stream and writesfps,frameCount, andduration(in ms) to the document as informational properties. These properties are informational only; seek and extraction use actual PTS read from decoded frames.- Parameters:
video_document –
Documentholding a video document ("@type": ".../VideoDocument/...")- Returns:
open PyAV
av.container.InputContainer- Return type:
av.container.InputContainer
- Raises:
ValueError – if
video_documentis missing or of the wrong type
- mmif.utils.video_document_helper.get_framerate(video_document: Document) float[source]¶
Gets the frame rate of a video document. First by checking the fps property of the document, then by opening the video via PyAV.
- Parameters:
video_document –
Documentinstance that holds a video document ("@type": ".../VideoDocument/...")- Returns:
frames per second as a float, rounded to 2 decimal places
- mmif.utils.video_document_helper.extract_images_from_timepoints(video_document: Document, timepoints_ms: Iterable[int], as_PIL: bool = False)[source]¶
Extracts images at the given media-timeline timepoints (in milliseconds).
For each requested timepoint, returns the image whose actual presentation timestamp (PTS) is closest to it. Duplicate timepoints produce duplicate images at the same list positions as the input.
- Parameters:
video_document –
Documentholding a video document ("@type": ".../VideoDocument/...")timepoints_ms – iterable of timepoint values in milliseconds
as_PIL – return
PIL.Image.Image(RGB) instead ofndarray(BGR)
- Returns:
images in the same order (and with the same multiplicity) as
timepoints_ms- Return type:
list
- mmif.utils.video_document_helper.extract_images_by_count_with_sources(mmif: Mmif, annotation: Annotation, min_timepoints: int = 0, max_timepoints: int = 9223372036854775807, fraction: float = 1.0, as_PIL: bool = False) Tuple[List, List[str]][source]¶
Extracts images at a count-controlled subset of the timepoints listed in the
targetsproperty of an annotation, alongside the IDs of the selected target TPs.The number of timepoints chosen is
max(min_timepoints, int(num_targets * fraction)), clamped tomax_timepointsand to the number of available targets. The chosen indices are spread evenly across the target list.- Parameters:
mmif –
Mmifinstanceannotation –
Annotationinstance containing atargetspropertymin_timepoints – minimum number of timepoints to include
max_timepoints – maximum number of timepoints to include
fraction – fraction of targets to include (ideally)
as_PIL – return
Imageinstead ofndarray
- Returns:
tuple of (list of images, list of selected target TP IDs); the two lists are parallel
- Return type:
tuple
- mmif.utils.video_document_helper.extract_images_by_count(mmif: Mmif, annotation: Annotation, min_timepoints: int = 0, max_timepoints: int = 9223372036854775807, fraction: float = 1.0, as_PIL: bool = False) List[source]¶
Extracts images at a count-controlled subset of the timepoints listed in the
targetsproperty of an annotation. Seeextract_images_by_count_with_sources()for selection details and for a variant that also returns the IDs of the selected target TPs.- Parameters:
mmif –
Mmifinstanceannotation –
Annotationinstance containing atargetspropertymin_timepoints – minimum number of timepoints to include
max_timepoints – maximum number of timepoints to include
fraction – fraction of targets to include (ideally)
as_PIL – return
Imageinstead ofndarray
- Returns:
list of images
- Return type:
list
- mmif.utils.video_document_helper.extract_images_by_mode_with_sources(mmif: Mmif, time_frame: Annotation, mode: SamplingMode | None = None, as_PIL: bool = False) Tuple[List, List[str | int]][source]¶
Extracts images from a TimeFrame using a
SamplingMode, alongside the per-image source: a TP annotation id (str) when the image was selected from a TP (a representative or a target), or the sampled timepoint in milliseconds (int) when a fallback path was used (SINGLE with no representatives, or ALL with no targets).If
modeis not specified, uses the context-level default (set via_sampling_modecontext variable).- Parameters:
mmif –
Mmifinstancetime_frame – TimeFrame annotation to sample from
mode –
SamplingMode, or None to use the context defaultas_PIL – return
PIL.Image.Imageinstead ofndarray
- Returns:
tuple of (list of images, list of sources); the two lists are parallel. May be
([], [])forREPRESENTATIVESmode when no representatives exist.- Return type:
tuple
- mmif.utils.video_document_helper.extract_images_by_mode(mmif: Mmif, time_frame: Annotation, mode: SamplingMode | None = None, as_PIL: bool = False) List[source]¶
Extracts images from a TimeFrame using a
SamplingMode. Seeextract_images_by_mode_with_sources()for the variant that also returns the per-image source IDs / timepoints.If
modeis not specified, uses the context-level default (set via_sampling_modecontext variable).- Parameters:
mmif –
Mmifinstancetime_frame – TimeFrame annotation to sample from
mode –
SamplingMode, or None to use the context defaultas_PIL – return
PIL.Image.Imageinstead ofndarray
- Returns:
list of images (may be empty for
REPRESENTATIVESmode when no representatives exist)- Return type:
list
- mmif.utils.video_document_helper.sample_timepoints(start_ms: int, end_ms: int, step_ms: int | float) List[int][source]¶
Samples timepoints (in ms) from a half-open time interval
[start_ms, end_ms)with a fixed step.- Parameters:
start_ms – start of the interval (inclusive), in ms
end_ms – end of the interval (exclusive), in ms
step_ms – step size between adjacent timepoints, in ms; may be fractional (e.g.
1000/fps), but emitted timepoints are always integer ms
- Returns:
list of integer timepoint values in ms
- Return type:
list
- Raises:
ValueError – if
step_msis not positive
- mmif.utils.video_document_helper.convert_timepoint(mmif: Mmif, timepoint: Annotation, out_unit: str) int | float | str[source]¶
Converts a time point included in an annotation to a different time unit. The input annotation must have
timePointproperty.- Parameters:
mmif – input MMIF to obtain fps and input timeunit
timepoint –
Annotationinstance withtimePointpropertyout_unit – time unit to which the point is converted (
frames,seconds,milliseconds)
- Returns:
frame number (integer) or second/millisecond (float) of input timepoint
- mmif.utils.video_document_helper.convert_timeframe(mmif: Mmif, time_frame: Annotation, out_unit: str) Tuple[int | float | str, int | float | str][source]¶
Converts start and end points in a
TimeFrameannotation a different time unit.- Parameters:
mmif –
Mmifinstancetime_frame –
Annotationinstance that holds a time interval annotation ("@type": ".../TimeFrame/...")out_unit – time unit to which the point is converted
- Returns:
tuple of frame numbers, seconds/milliseconds, or ISO notation of TimeFrame’s start and end
- mmif.utils.video_document_helper.capture(video_document: Document)[source]¶
Deprecated since version Use:
open_container()instead. See issue #379.Captures a video file using OpenCV and adds fps, frame count, and duration as properties to the document.
- Parameters:
video_document –
Documentinstance that holds a video document ("@type": ".../VideoDocument/...")- Returns:
OpenCV VideoCapture object
- mmif.utils.video_document_helper.extract_frames_as_images(video_document: Document, framenums: Iterable[int], as_PIL: bool = False, record_ffmpeg_errors: bool = False)[source]¶
Deprecated since version Use:
extract_images_from_timepoints()instead. See issue #379.Extracts frames from a video document as a list of
numpy.ndarray. Use withsample_frames()function to get the list of frame numbers first.- Parameters:
video_document –
Documentinstance that holds a video document ("@type": ".../VideoDocument/...")framenums – iterable integers representing the frame numbers to extract
as_PIL – return
PIL.Image.Imageinstead ofndarrayrecord_ffmpeg_errors – if True, records and warns about FFmpeg stderr output during extraction
- Returns:
frames as a list of
ndarrayorImage
Deprecated since version Use:
extract_frames_by_mode()instead.
- mmif.utils.video_document_helper.extract_mid_frame(mmif: Mmif, time_frame: Annotation, as_PIL: bool = False)[source]¶
Deprecated since version Use:
extract_frames_by_mode()instead.Extracts the middle frame of a time interval annotation as a numpy ndarray.
- Parameters:
mmif –
Mmifinstancetime_frame –
Annotationinstance that holds a time interval annotation ("@type": ".../TimeFrame/...")as_PIL – return
Imageinstead ofndarray
- Returns:
frame as a
numpy.ndarrayorPIL.Image.Image
Deprecated since version Use:
extract_frames_by_mode()instead.Calculates the representative frame numbers from an annotation. To pick the representative frames, it first looks up the
representativesproperty of theTimeFrameannotation. If it is not found, it will calculate the number of the middle frame.- Parameters:
mmif –
Mmifinstancetime_frame –
Annotationinstance that holds a time interval annotation containing a representatives property ("@type": ".../TimeFrame/...")
- Returns:
representative frame number as an integer
Deprecated since version Use:
extract_frames_by_mode()instead.A thin wrapper around
get_representative_framenums()to return a single representative frame number. Always return the first frame number found.
- mmif.utils.video_document_helper.extract_representative_frame(mmif: Mmif, time_frame: Annotation, as_PIL: bool = False, first_only: bool = True)[source]¶
Deprecated since version Use:
extract_frames_by_mode()instead.Extracts the representative frame of an annotation as a numpy ndarray or PIL Image.
- Parameters:
mmif –
Mmifinstancetime_frame –
Annotationinstance that holds a time interval annotation ("@type": ".../TimeFrame/...")as_PIL – return
Imageinstead ofndarrayfirst_only – return the first representative frame only
- Returns:
frame as a
numpy.ndarrayorPIL.Image.Image
- mmif.utils.video_document_helper.sample_frames(start_frame: int, end_frame: int, sample_rate: float = 1) List[int][source]¶
Deprecated since version Use:
sample_timepoints()instead. See issue #379.Helper function to sample frames from a time interval. Can also be used as a “cutoff” function when used with
start_frame==0andsample_rate==1.- Parameters:
start_frame – start frame of the interval
end_frame – end frame of the interval
sample_rate – sampling rate (or step) to configure how often to take a frame, default is 1, meaning all consecutive frames are sampled
- Returns:
list of frame numbers to extract
- mmif.utils.video_document_helper.get_annotation_property(mmif, annotation, prop_name)[source]¶
Deprecated since version 1.0.8: Will be removed in 2.0.0. Use
mmif.serialize.annotation.Annotation.get_property()method instead.Get a property value from an annotation. If the property is not found in the annotation, it will look up the metadata of the annotation’s parent view and return the value from there.
- Parameters:
mmif – MMIF object containing the annotation
annotation – Annotation object to get property from
prop_name – name of the property to retrieve
- Returns:
the property value
Deprecated since version Use:
convert()withms/sdirectly. See issue #379.
Deprecated since version Use:
convert()withms/sdirectly. See issue #379.
Deprecated since version Use:
extract_images_from_timepoints()or stay in the time domain. See issue #379.
Deprecated since version Use:
extract_images_from_timepoints()or stay in the time domain. See issue #379.
- mmif.utils.video_document_helper.extract_timepoints_as_images(*args, **kwargs)[source]¶
Deprecated since version Renamed: to
extract_images_from_timepoints().
- mmif.utils.video_document_helper.extract_target_frames(*args, **kwargs)[source]¶
Deprecated since version Renamed: to
extract_images_by_count_with_sources(). For a bare-images variant, useextract_images_by_count().
- mmif.utils.video_document_helper.extract_frames_by_mode(*args, **kwargs)[source]¶
Deprecated since version Renamed: to
extract_images_by_mode(). For per-image source IDs, useextract_images_by_mode_with_sources().
mmif.utils.workflow_helper module¶
- mmif.utils.workflow_helper.group_views_by_app(views: ViewsList) List[List[Any]][source]¶
Groups views into app executions based on app and timestamp.
An “app” is a set of views produced by the same app at the exact same timestamp.
- mmif.utils.workflow_helper.generate_param_hash(params: dict) str[source]¶
Generate MD5 hash from a parameter dictionary.
Parameters are sorted alphabetically, joined as key=value pairs, and hashed using MD5. This is not for security purposes, only for generating consistent identifiers.
- Parameters:
params – Dictionary of parameters
- Returns:
MD5 hash string (32 hex characters)
- mmif.utils.workflow_helper.generate_workflow_identifier(mmif_input: str | Path | Mmif, return_param_dicts: Literal[True]) Tuple[str, List[dict]][source]¶
- mmif.utils.workflow_helper.generate_workflow_identifier(mmif_input: str | Path | Mmif, return_param_dicts: Literal[False] = False) str
Generate a workflow identifier string from a MMIF file or object.
The identifier follows the storage directory structure format: source_composition/app_name/version/param_hash/app_name2/version2/param_hash2/…
The leading
source_compositionsegment encodes the top-level document mix asType-Npairs joined by-and sorted by type name (e.g.TextDocument-1-VideoDocument-1).Uses view.metadata.parameters (raw user-passed values) for hashing to ensure reproducibility. Views with errors or warnings are excluded from the identifier; empty views are included.
- Parameters:
mmif_input – Path to MMIF file (str or Path) or a Mmif object
return_param_dicts – If True, also return the parameter dictionaries
- Returns:
Workflow identifier string, or tuple of (identifier, param_dicts) if return_param_dicts=True
- class mmif.utils.workflow_helper.SingleMmifStats(*, appCount: int, errorViews: ~typing.List[str] = <factory>, warningViews: ~typing.List[str] = <factory>, emptyViews: ~typing.List[str] = <factory>, annotationCountByType: ~typing.Dict[str, int] = <factory>)[source]¶
Bases:
BaseModelAggregated statistics for a single MMIF file.
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- app_count: int¶
- error_views: List[str]¶
- warning_views: List[str]¶
- empty_views: List[str]¶
- annotation_count_by_type: Dict[str, int]¶
- class mmif.utils.workflow_helper.AppProfiling(*, runningTimeMS: int | None = None)[source]¶
Bases:
BaseModelProfiling data for a single app execution.
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- running_time_ms: int | None¶
- class mmif.utils.workflow_helper.AppExecution(*, app: str, viewIds: ~typing.List[str], appConfiguration: ~typing.Dict = <factory>, appProfiling: ~mmif.utils.workflow_helper.AppProfiling = <factory>, annotationCountByType: ~typing.Dict[str, int] = <factory>)[source]¶
Bases:
BaseModelRepresents a single execution of an app, which may produce multiple views.
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- app: str¶
- view_ids: List[str]¶
- app_configuration: Dict¶
- app_profiling: AppProfiling¶
- annotation_count_by_type: Dict[str, int]¶
- class mmif.utils.workflow_helper.SingleMmifDesc(*, workflowId: str, stats: SingleMmifStats, apps: List[AppExecution])[source]¶
Bases:
BaseModelDescription of a workflow extracted from a single MMIF file.
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- workflow_id: str¶
- stats: SingleMmifStats¶
- apps: List[AppExecution]¶
- mmif.utils.workflow_helper.describe_single_mmif(mmif_input: str | Path | Mmif) dict[source]¶
Reads a MMIF file or object and extracts the workflow specification from it.
This function provides an app-centric summarization of the workflow. The conceptual hierarchy is that a workflow is a sequence of apps, and each app execution can produce one or more views. This function groups views that share the same
appandmetadata.timestampinto a single logical “app execution”.Note
For MMIF files generated by apps based on
clams-python<= 1.3.3, all views are independently timestamped. This means that even if multiple views were generated by a single execution of an app, theirmetadata.timestampvalues will be unique. As a result, the grouping logic will treat each view as a separate app execution. The change that aligns timestamps for views from a single app execution is implemented in clams-python PR #271.The output is a serialized
SingleMmifDescobject.- Parameters:
mmif_input – Path to MMIF file (str or Path) or a Mmif object
- Returns:
A dictionary containing the workflow specification.
- class mmif.utils.workflow_helper.AppProfilingStats(*, avgRunningTimeMS: float | None = None, minRunningTimeMS: float | None = None, maxRunningTimeMS: float | None = None, stdevRunningTimeMS: float | None = None)[source]¶
Bases:
BaseModelAggregated profiling statistics for an app across a workflow.
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- avg_running_time_ms: float | None¶
- min_running_time_ms: float | None¶
- max_running_time_ms: float | None¶
- stdev_running_time_ms: float | None¶
- class mmif.utils.workflow_helper.WorkflowAppExecution(*, app: str, appConfiguration: ~typing.Dict = <factory>, appProfiling: ~mmif.utils.workflow_helper.AppProfilingStats = <factory>)[source]¶
Bases:
BaseModelAggregated information about an app’s usage within a specific workflow across multiple files.
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- app: str¶
- app_configuration: Dict¶
- app_profiling: AppProfilingStats¶
- class mmif.utils.workflow_helper.WorkflowCollectionEntry(*, workflowId: str, mmifs: List[str], mmifCount: int, apps: List[WorkflowAppExecution])[source]¶
Bases:
BaseModelSummary of a unique workflow found within a collection.
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- workflow_id: str¶
- mmifs: List[str]¶
- mmif_count: int¶
- apps: List[WorkflowAppExecution]¶
- class mmif.utils.workflow_helper.MmifCountByStatus(*, total: int, successful: int, withErrors: int, withWarnings: int, invalid: int)[source]¶
Bases:
BaseModelBreakdown of MMIF files in a collection by their processing status.
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- total: int¶
- successful: int¶
- with_errors: int¶
- with_warnings: int¶
- invalid: int¶
- class mmif.utils.workflow_helper.CollectionMmifDesc(*, mmifCountByStatus: ~mmif.utils.workflow_helper.MmifCountByStatus, workflows: ~typing.List[~mmif.utils.workflow_helper.WorkflowCollectionEntry], annotationCountByType: ~typing.Dict[str, int] = <factory>)[source]¶
Bases:
BaseModelSummary of a collection of MMIF files.
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- mmif_count_by_status: MmifCountByStatus¶
- workflows: List[WorkflowCollectionEntry]¶
- annotation_count_by_type: Dict[str, int]¶
- mmif.utils.workflow_helper.describe_mmif_collection(mmif_dir: str | Path) dict[source]¶
Reads all MMIF files in a directory and extracts a summarized workflow specification.
This function provides an overview of a collection of MMIF files, aggregating statistics across multiple files.
The output is a serialized
CollectionMmifDescobject.- Parameters:
mmif_dir – Path to the directory containing MMIF files.
- Returns:
A dictionary containing the summarized collection specification.