mmif.utils module
mmif.utils.video_document_helper module
- mmif.utils.video_document_helper.capture(video_document: Document)[source]
- Captures a video file using OpenCV and adds fps, frame count, and duration as properties to the document. - Parameters:
- video_document – - Documentinstance that holds a video document (- "@type": ".../VideoDocument/...")
- Returns:
- OpenCV VideoCapture object 
 
- mmif.utils.video_document_helper.convert_timeframe(mmif: Mmif, time_frame: Annotation, out_unit: str) Tuple[int | float | str, int | float | str][source]
- Converts start and end points in a - TimeFrameannotation a different time unit.- Parameters:
- mmif – - Mmifinstance
- time_frame – - Annotationinstance that holds a time interval annotation (- "@type": ".../TimeFrame/...")
- out_unit – time unit to which the point is converted 
 
- Returns:
- tuple of frame numbers (integer) or seconds/milliseconds (float) of input start and end 
 
- mmif.utils.video_document_helper.convert_timepoint(mmif: Mmif, timepoint: Annotation, out_unit: str) int | float | str[source]
- Converts a time point included in an annotation to a different time unit. The input annotation must have - timePointproperty.- Parameters:
- mmif – input MMIF to obtain fps and input timeunit 
- timepoint – - Annotationinstance with- timePointproperty
- out_unit – time unit to which the point is converted ( - frames,- seconds,- milliseconds)
 
- Returns:
- frame number (integer) or second/millisecond (float) of input timepoint 
 
- mmif.utils.video_document_helper.extract_frames_as_images(video_document: Document, framenums: List[int], as_PIL: bool = False)[source]
- Extracts frames from a video document as a list of - numpy.ndarray. Use with- sample_frames()function to get the list of frame numbers first.- Parameters:
- video_document – - Documentinstance that holds a video document (- "@type": ".../VideoDocument/...")
- framenums – integers representing the frame numbers to extract 
- as_PIL – return - PIL.Image.Imageinstead of- ndarray
 
- Returns:
- frames as a list of - ndarrayor- Image
 
- mmif.utils.video_document_helper.extract_mid_frame(mmif: Mmif, time_frame: Annotation, as_PIL: bool = False)[source]
- Extracts the middle frame of a time interval annotation as a numpy ndarray. - Parameters:
- mmif – - Mmifinstance
- time_frame – - Annotationinstance that holds a time interval annotation (- "@type": ".../TimeFrame/...")
- as_PIL – return - Imageinstead of- ndarray
 
- Returns:
- frame as a - numpy.ndarrayor- PIL.Image.Image
 
- Converts a frame number to a millisecond value. 
- Converts a frame number to a second value. 
- mmif.utils.video_document_helper.get_annotation_property(mmif, annotation, prop_name)[source]
- Deprecated since version 1.0.8: Use - mmif.serialize.annotation.Annotation.get_property()method instead.- Get a property value from an annotation. If the property is not found in the annotation, it will look up the metadata of the annotation’s parent view and return the value from there. xisting 
- mmif.utils.video_document_helper.get_framerate(video_document: Document) float[source]
- Gets the frame rate of a video document. First by checking the fps property of the document, then by capturing the video. - Parameters:
- video_document – - Documentinstance that holds a video document (- "@type": ".../VideoDocument/...")
- Returns:
- frames per second as a float, rounded to 2 decimal places 
 
- Calculates the middle frame number of a time interval annotation. - Parameters:
- mmif – - Mmifinstance
- time_frame – - Annotationinstance that holds a time interval annotation (- "@type": ".../TimeFrame/...")
 
- Returns:
- middle frame number as an integer 
 
- Converts a millisecond value to a frame number. 
- mmif.utils.video_document_helper.sample_frames(start_frame: int, end_frame: int, sample_rate: float = 1) List[int][source]
- Helper function to sample frames from a time interval. Can also be used as a “cutoff” function when used with - start_frame==0and- sample_rate==1.- Parameters:
- start_frame – start frame of the interval 
- end_frame – end frame of the interval 
- sample_rate – sampling rate (or step) to configure how often to take a frame, default is 1, meaning all consecutive frames are sampled 
 
 
- Converts a second value to a frame number. 
mmif.utils.sequence_helper module
This module provides helpers for handling sequence labeling. Specifically, it provides
- a generalized label re-mapper for “post-binning” of labels 
- conversion from a list of CLAMS annotations (with - classificationprops) into a list of reals (scores by labels), can be combined with the label re-mapper mentioned above
- mmif.utils.sequence_helper.smooth_outlying_short_intervals(): a simple smoothing algorithm by trimming “short” outlier sequences
However, it DOES NOT provide
- direct conversion between CLAMS annotations. For example, it does not directly handle stitching of - TimePointinto- TimeFrames.
- support for multi-class scenario, such as handling of _competing_ subsequence or overlapping labels. 
Some functions can use optional external libraries (e.g., numpy) for better performance. 
Hence, if you see a warning about missing optional packages, you might want to install them by running pip install mmif-python[seq].
- mmif.utils.sequence_helper.build_label_remapper(src_labels: List[str], dst_labels: Dict[str, str | int | float | bool | None]) Dict[str, str | int | float | bool | None][source]
- Build a label remapper dictionary from source and destination labels. - Parameters:
- src_labels – a list of all labels on the source side 
- dst_labels – a dict from source labels to destination labels. Source labels not in this dict will be remapped to a negative label ( - -).
 
- Returns:
- a dict that exhaustively maps source labels to destination labels 
 
- mmif.utils.sequence_helper.build_score_lists(classifications: ~typing.List[~typing.Dict], label_remapper: ~typing.Dict, score_remap_op: ~typing.Callable[[...], float] = <built-in function max>) Tuple[Dict[str, int], numpy.ndarray][source]
- Build lists of scores indexed by the label names. - Parameters:
- classifications – list of dictionaries of classification results, taken from input annotation objects 
- label_remapper – a dictionary that maps source label names to destination label names (formerly “postbin”) 
- score_remap_op – a function to remap the scores from multiple source labels binned to a destination label common choices are - max,- min, or- sum
 
- Returns:
- a dictionary that maps label names to their index in the score list 
- 2-d numpy array of scores, of which rows are indexed by label map dict (first return value) 
 
 
- mmif.utils.sequence_helper.smooth_outlying_short_intervals(scores: List[float], min_spseq_size: int, min_snseq_size: int, min_score: float = 0.5)[source]
- Given a list of scores, a score threshold, and smoothing parameters, identify the intervals of “positive” scores by “trimming” the short positive sequences (“spseq”) and short negative sequences (“snseq”). To decide the positivity, first step is binarization of the scores by the - min_scorethreshold. Given- Sras “raw” input real-number scores list, and- min_score=0.5,- Sr: [0.3, 0.6, 0.2, 0.8, 0.2, 0.9, 0.8, 0.5, 0.1, 0.5, 0.8, 0.3, 1.0, 0.7, 0.5, 0.5, 0.5, 0.8, 0.3, 0.6] - the binarization is done by simply comparing each score to the threshold to get - Slist of binary scores- 1.0 : | 0.9 : | | 0.8 : | | | | | | 0.7 : | | | | | | | 0.6 : | | | | | | | | | 0.5 :----+-----+-----+--+--+-----+--+-----+--+--+--+--+--+-----+- 0.4 : | | | | | | | | | | | | | | 0.3 : | | | | | | | | | | | | | | | | | 0.2 : | | | | | | | | | | | | | | | | | | | 0.1 : | | | | | | | | | | | | | | | | | | | | 0.0 +------------------------------------------------------------ raw :.3 .6 .2 .8 .2 .9 .8 .5 .1 .5 .8 .3 1. .7 .5 .5 .5 .8 .3 .6 S : 0 1 0 1 0 1 1 0 0 0 1 0 1 1 0 1 1 1 0 1 - Note that the size of a positive or negative sequence can be as small as 1. - Then, here are examples of smoothing a list of binary scores into intervals, by trimming “very short” (under thresholds) sequences of positive or negative: - Note - legends: - tis unit index (e.g. time index)
- Sis the list of binary scores (zeros and ones)
- Iis the list of intervals after smoothing
 - with params - min_spseq_size==1,- min_snseq_size==4- t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1] I: [0, 1--1--1--1--1--1--1--1--1--1--1--1, 0--0--0--0--0--0, 1] - Explanation: - min_snseq_sizeis used to smooth short sequences of negative predictions. In this, zeros from t[7:10] are smoothed into “one” I, while zeros from t[13:19] are kept as “zero” I. Note that the “short” snseqs at the either ends (t[0:1]) are never smoothed.
- with params - min_spseq_size==4,- min_snseq_size==2- t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1] I: [0, 1--1--1--1--1--1, 0--0--0--0--0--0--0--0--0--0--0--0--0] - Explanation: - min_spseq_sizeis used to smooth short sequences of positive predictions. In this example, the spseqs of ones from both t[10:13] and t[19:20] are smoothed. Note that the “short” spseqs at the either ends (t[19:20]) are always smoothed.
- with params - min_spseq_size==4,- min_snseq_size==4- t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1] I: [0, 1--1--1--1--1--1--1--1--1--1--1--1--0--0--0--0--0--0--0] - Explanation: When two threshold parameters are working together, the algorithm will prioritize the smoothing of the snseqs over the smoothing of the spseqs. Thus, in this example, the snseq t[7:10] gets first smoothed “up” before the spseq t[10:13] is smoothed “down”, resulting in a long final I. 
- with params - min_spseq_size==4,- min_snseq_size==4- t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1] I: [1--1--1--1--1--1--1, 0--0--0--0, 1--1--1--1--1--1--1--1--1] - Explanation: Since smoothing of snseqs is prioritized, short spseqs at the beginning or the end can be kept. 
- with params - min_spseq_size==1,- min_snseq_size==1- t: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9] S: [0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1] I: [0--0--0, 1--1--1--1, 0--0--0--0, 1--1--1--1, 0--0--0, 1--1] - Explanation: When both width thresholds are set to 1, the algorithm works essentially in the “stitching” only mode. 
 - Parameters:
- scores – SORTED list of scores to be smoothed. The score list is assumed to be “exhaust” the entire time or space of the underlying document segment. (Sorted by the start, and then by the end of anchors) 
- min_score – minimum threshold to use to discard low-scored units (strictly less than) 
- min_spseq_size – minimum size of a positive sequence not to be smoothed (greater or equal to) 
- min_snseq_size – minimum size of a negative sequence not to be smoothed (greater or equal to) 
 
- Returns:
- list of tuples of start(inclusive)/end(exclusive) indices of the “positive” sequences. Negative sequences (regardless of their size) are not included in the output. 
 
- mmif.utils.sequence_helper.validate_labelset(annotations: Iterable[Annotation]) List[str][source]
- Simple check for a list of annotations to see if they have the same label set. - Raise:
- AttributeError if an element in the input list doesn’t have the - labelsetproperty
- Raise:
- ValueError if different - labelsetvalues are found
- Returns:
- a list of the common - labelsetvalue (list of label names)