mmif.serialize module
mmif.serialize.model module
The model
module contains the classes used to represent an
abstract MMIF object as a live Python object.
The MmifObject
class or one of its derivatives is subclassed by
all other classes defined in this SDK, except for MmifObjectEncoder
.
These objects are generally instantiated from JSON, either as a string or as an already-loaded Python dictionary. This base class provides the core functionality for deserializing MMIF JSON data into live objects and serializing live objects into MMIF JSON data. Specialized behavior for the different components of MMIF is added in the subclasses.
- class mmif.serialize.model.DataDict(mmif_obj: Optional[Union[bytes, str, dict]] = None)[source]
Bases:
mmif.serialize.model.MmifObject
,Generic
[T
,S
]
- class mmif.serialize.model.DataList(mmif_obj: Optional[Union[bytes, str, list]] = None)[source]
Bases:
mmif.serialize.model.MmifObject
,Generic
[T
]The DataList class is an abstraction that represents the various lists found in a MMIF file, such as documents, subdocuments, views, and annotations.
- Parameters
mmif_obj (Union[str, list]) – the data that the list contains
- class mmif.serialize.model.FreezableDataDict(*args, **kwargs)[source]
Bases:
mmif.serialize.model.FreezableMmifObject
,mmif.serialize.model.DataDict
[T
,S
]- deep_freeze(*args, **kwargs) bool [source]
Deeply freezes this FreezableMmifObject, calling deep_freeze on all FreezableMmifObjects contained as attributes or members of iterable attributes.
Note: in general, this makes no promises about the mutability of non-FreezableMmifObject state within the object. However, if all attributes and members of iterable attributes are either Freezable or hashable, this method will return True. Note that whether an object is hashable is not a contract of immutability but merely a suggestion, as anyone can implement __hash__.
- Parameters
additional_containers – any names of attributes in the object that should have their contents frozen but not themselves. This is only used for FreezableDataList and FreezableDataDict classes to freeze their contents.
- Returns
True if all state is either Freezable or Hashable
- class mmif.serialize.model.FreezableDataList(*args, **kwargs)[source]
Bases:
mmif.serialize.model.FreezableMmifObject
,mmif.serialize.model.DataList
[T
]- deep_freeze(*args, **kwargs) bool [source]
Deeply freezes this FreezableMmifObject, calling deep_freeze on all FreezableMmifObjects contained as attributes or members of iterable attributes.
Note: in general, this makes no promises about the mutability of non-FreezableMmifObject state within the object. However, if all attributes and members of iterable attributes are either Freezable or hashable, this method will return True. Note that whether an object is hashable is not a contract of immutability but merely a suggestion, as anyone can implement __hash__.
- Parameters
additional_containers – any names of attributes in the object that should have their contents frozen but not themselves. This is only used for FreezableDataList and FreezableDataDict classes to freeze their contents.
- Returns
True if all state is either Freezable or Hashable
- class mmif.serialize.model.FreezableMmifObject(*args, **kwargs)[source]
Bases:
mmif.serialize.model.MmifObject
- deep_freeze(*additional_containers: str) bool [source]
Deeply freezes this FreezableMmifObject, calling deep_freeze on all FreezableMmifObjects contained as attributes or members of iterable attributes.
Note: in general, this makes no promises about the mutability of non-FreezableMmifObject state within the object. However, if all attributes and members of iterable attributes are either Freezable or hashable, this method will return True. Note that whether an object is hashable is not a contract of immutability but merely a suggestion, as anyone can implement __hash__.
- Parameters
additional_containers – any names of attributes in the object that should have their contents frozen but not themselves. This is only used for FreezableDataList and FreezableDataDict classes to freeze their contents.
- Returns
True if all state is either Freezable or Hashable
- class mmif.serialize.model.MmifObject(mmif_obj: Optional[Union[bytes, str, dict]] = None)[source]
Bases:
object
Abstract superclass for MMIF related key-value pair objects.
Any MMIF object can be initialized as an empty placeholder or an actual representation with a JSON formatted string or equivalent dict object argument.
This superclass has three specially designed instance variables, and these variable names cannot be used as attribute names for MMIF objects.
_unnamed_attributes: Only can be either None or an empty dictionary. If it’s set to None, it means the class won’t take any
Additional Attributes
in the JSON schema sense. If it’s a dict, users can throw any k-v pairs to the class, EXCEPT for the reserved two key names._attribute_classes: This is a dict from a key name to a specific python class to use for deserialize the value. Note that a key name in this dict does NOT have to be a named attribute, but is recommended to be one.
_required_attributes: This is a simple list of names of attributes that are required in the object. When serialize, an object will skip its empty (e.g. zero-length, or None) attributes unless they are in this list. Otherwise, the serialized JSON string would have empty representations (e.g.
""
,[]
).
# TODO (krim @ 8/17/20): this dict is however, a duplicate with the type hints in the class definition. Maybe there is a better way to utilize type hints (e.g. getting them as a programmatically), but for now developers should be careful to add types to hints as well as to this dict.
Also note that those two special attributes MUST be set in the __init__() before calling super method, otherwise deserialization will not work.
And also, a subclass that has one or more named attributes, it must set those attributes in the __init__() before calling super method. When serializing a MmifObject, all empty attributes will be ignored, so for optional named attributes, you must leave the values empty (len == 0), but NOT None. Any None-valued named attributes will cause issues with current implementation.
- Parameters
mmif_obj – JSON string or dict to initialize an object. If not given, an empty object will be initialized, sometimes with an ID value automatically generated, based on its parent object.
- deserialize(mmif_json: Union[str, dict]) None [source]
Takes a JSON-formatted string or a simple dict that’s json-loaded from such a string as an input and populates object’s fields with the values specified in the input.
- Parameters
mmif_json – JSON-formatted string or dict from such a string that represents a MMIF object
- disallow_additional_properties() None [source]
Call this method in
__init__()
to prevent the insertion of unnamed attributes after initialization.
- static is_empty(obj) bool [source]
return True if the obj is None or “emtpy”. The emptiness first defined as having zero length. But for objects that lack __len__ method, we need additional check.
- reserved_names: pyrsistent._pset.PSet = pset(['_parent_view_id', '_frozen', 'reserved_names', '_attribute_classes', '_unnamed_attributes', '_id_counts', '_required_attributes'])[source]
- serialize(pretty: bool = False) str [source]
Generates JSON representation of an object.
- Parameters
pretty – If True, returns string representation with indentation.
- Returns
JSON string of the object.
- set_additional_property(key: str, value: Any) None [source]
Method to set values in _unnamed_attributes.
- Parameters
key – the attribute name
value – the desired value
- Returns
None
- Raise
AttributeError if additional properties are disallowed by
disallow_additional_properties()
- class mmif.serialize.model.MmifObjectEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]
Bases:
json.encoder.JSONEncoder
Encoder class to define behaviors of de-/serialization
- default(obj: mmif.serialize.model.MmifObject)[source]
Overrides default encoding behavior to prioritize
MmifObject.serialize()
.
mmif.serialize.mmif module
The mmif
module contains the classes used to represent a full MMIF
file as a live Python object.
See the specification docs and the JSON Schema file for more information.
- class mmif.serialize.mmif.Mmif(mmif_obj: Optional[Union[bytes, str, dict]] = None, *, validate: bool = True, frozen: bool = True)[source]
Bases:
mmif.serialize.model.MmifObject
MmifObject that represents a full MMIF file.
- Parameters
mmif_obj – the JSON data
validate – whether to validate the data against the MMIF JSON schema.
- add_document(document: mmif.serialize.annotation.Document, overwrite=False) None [source]
Appends a Document object to the documents list.
Fails if there is already a document with the same ID in the MMIF object.
- Parameters
document – the Document object to add
overwrite – if set to True, will overwrite an existing view with the same ID
- Returns
None
- add_view(view: mmif.serialize.view.View, overwrite=False) None [source]
Appends a View object to the views list.
Fails if there is already a view with the same ID in the MMIF object.
- Parameters
view – the Document object to add
overwrite – if set to True, will overwrite an existing view with the same ID
- Returns
None
- freeze_documents() bool [source]
Deeply freezes the list of documents. Returns the result of the deep_freeze() call, signifying whether everything was fully frozen or not.
- freeze_views() bool [source]
Deeply freezes all of the existing views without freezing the list of views itself. Returns the conjunct of the returns of all of the deep_freeze() calls, signifying whether everything was fully frozen or not.
- get_alignments(at_type1: Union[str, mmif.vocabulary.base_types.TypesBase], at_type2: Union[str, mmif.vocabulary.base_types.TypesBase]) Dict[str, List[mmif.serialize.annotation.Annotation]] [source]
Finds views where alignments between two given annotation types occurred.
- Returns
a dict that keyed by view IDs (str) and has lists of alignment Annotation objects as values.
- get_all_views_contain(at_types: Union[mmif.vocabulary.base_types.TypesBase, str, List[Union[str, mmif.vocabulary.base_types.TypesBase]]]) List[mmif.serialize.view.View] [source]
Returns the list of all views in the MMIF if given types are present in that view’s ‘contains’ metadata.
- Parameters
at_types – a list of types or just a type to check for. When given more than one types, all types must be found.
- Returns
the list of views that contain the type
- get_document_by_id(doc_id: str) mmif.serialize.annotation.Document [source]
Finds a Document object with the given ID.
- Parameters
doc_id – the ID to search for
- Returns
a reference to the corresponding document, if it exists
- Raises
Exception – if there is no corresponding document
- get_document_location(m_type: Union[mmif.vocabulary.document_types.DocumentTypes, str], path_only=False) Optional[str] [source]
Method to get the location of first document of given type.
- Parameters
m_type – the type to search for
- Returns
the value of the location field in the corresponding document
- get_documents_by_app(app_id: str) List[mmif.serialize.annotation.Document] [source]
Method to get all documents object queries by its originated app name.
- Parameters
app_id – the app name to search for
- Returns
a list of documents matching the requested app name, or an empty list if the app not found
- get_documents_by_property(prop_key: str, prop_value: str) List[mmif.serialize.annotation.Document] [source]
Method to retrieve documents by an arbitrary key-value pair in the document properties objects.
- Parameters
prop_key – the metadata key to search for
prop_value – the metadata value to match
- Returns
a list of documents matching the requested metadata key-value pair
- get_documents_by_type(doc_type: Union[str, mmif.vocabulary.document_types.DocumentTypes]) List[mmif.serialize.annotation.Document] [source]
Method to get all documents where the type matches a particular document type, which should be one of the CLAMS document types.
- Parameters
doc_type – the type of documents to search for, must be one of
Document
type defined in the CLAMS vocabulary.- Returns
a list of documents matching the requested type, or an empty list if none found.
- get_documents_in_view(vid: Optional[str] = None) List[mmif.serialize.annotation.Document] [source]
Method to get all documents object queries by a view id.
- Parameters
vid – the source view ID to search for
- Returns
a list of documents matching the requested source view ID, or an empty list if the view not found
- get_documents_locations(m_type: Union[mmif.vocabulary.document_types.DocumentTypes, str], path_only=False) List[Optional[str]] [source]
This method returns the file paths of documents of given type. Only top-level documents have locations, so we only check them.
- Parameters
m_type – the type to search for
- Returns
a list of the values of the location fields in the corresponding documents
- get_view_by_id(req_view_id: str) mmif.serialize.view.View [source]
Finds a View object with the given ID.
- Parameters
req_view_id – the ID to search for
- Returns
a reference to the corresponding view, if it exists
- Raises
Exception – if there is no corresponding view
- get_view_contains(at_types: Union[mmif.vocabulary.base_types.TypesBase, str, List[Union[str, mmif.vocabulary.base_types.TypesBase]]]) Optional[mmif.serialize.view.View] [source]
Returns the last view appended that contains the given types in its ‘contains’ metadata.
- Parameters
at_types – a list of types or just a type to check for. When given more than one types, all types must be found.
- Returns
the view, or None if the type is not found
- get_views_contain(at_types: Union[mmif.vocabulary.base_types.TypesBase, str, List[Union[str, mmif.vocabulary.base_types.TypesBase]]]) List[mmif.serialize.view.View] [source]
An alias to get_all_views_contain method.
- get_views_for_document(doc_id: str)[source]
Returns the list of all views that have annotations anchored on a particular document. Note that when the document is insids a view (generated during the pipeline’s running), doc_id must be prefixed with the view_id.
- new_view() mmif.serialize.view.View [source]
Creates an empty view with a new ID and appends it to the views list.
- Returns
a reference to the new View object
- static validate(json_str: Union[bytes, str, dict]) None [source]
Validates a MMIF JSON object against the MMIF Schema. Note that this method operates before processing by MmifObject._load_str, so it expects @ and not _ for the JSON-LD @-keys.
- Raises
jsonschema.exceptions.ValidationError – if the input fails validation
- Parameters
json_str – a MMIF JSON dict or string
- Returns
None
mmif.serialize.view module
The view
module contains the classes used to represent a MMIF view
as a live Python object.
In MMIF, views are created by apps in a pipeline that are annotating data that was previously present in the MMIF file.
- class mmif.serialize.view.Contain(*args, **kwargs)[source]
Bases:
mmif.serialize.model.FreezableMmifObject
Contain object that represents the metadata of a single annotation type in the
contains
metadata of a MMIF view.
- class mmif.serialize.view.View(view_obj: Optional[Union[bytes, str, dict]] = None)[source]
Bases:
mmif.serialize.model.FreezableMmifObject
View object that represents a single view in a MMIF file.
A view is identified by an ID, and contains certain metadata, a list of annotations, and potentially a JSON-LD
@context
IRI.If
view_obj
is not provided, an empty View will be generated.- Parameters
view_obj – the JSON data that defines the view
- add_annotation(annotation: mmif.serialize.annotation.Annotation, overwrite=False) mmif.serialize.annotation.Annotation [source]
Adds an annotation to the current view.
Fails if there is already an annotation with the same ID in the view, unless
overwrite
is set to True.- Parameters
annotation – the
mmif.serialize.annotation.Annotation
object to addoverwrite – if set to True, will overwrite an existing annotation with the same ID
- Raises
KeyError – if
overwrite
is set to False and an annotation with the same ID exists in the view- Returns
the same Annotation object passed in as
annotation
- add_document(document: mmif.serialize.annotation.Document, overwrite=False) mmif.serialize.annotation.Annotation [source]
Appends a Document object to the annotations list.
Fails if there is already a document with the same ID in the annotations list.
- Parameters
document – the Document object to add
overwrite – if set to True, will overwrite an existing view with the same ID
- Returns
None
- get_annotations(at_type: Optional[Union[str, mmif.vocabulary.base_types.TypesBase]] = None, **properties) Generator[mmif.serialize.annotation.Annotation, None, None] [source]
Look for certain annotations in this view, specified by parameters
- Parameters
at_type – @type of the annotations to look for. When this is None, any @type will match.
properties – properties of the annotations to look for. When given more than one property, all properties must match. Note that annotation type metadata are specified in the contains view metadata, not in individual annotation objects.
- get_document_by_id(doc_id) mmif.serialize.annotation.Document [source]
- get_documents() List[mmif.serialize.annotation.Document] [source]
- new_annotation(at_type: Union[str, mmif.vocabulary.base_types.TypesBase], aid: Optional[str] = None, overwrite=False, **properties) mmif.serialize.annotation.Annotation [source]
Generates a new
mmif.serialize.annotation.Annotation
object and adds it to the current view.Fails if there is already an annotation with the same ID in the view, unless
overwrite
is set to True.- Parameters
at_type – the desired
@type
of the annotation.aid – the desired ID of the annotation, when not given, the mmif SDK tries to automatically generate an ID based on Annotation type and existing annotations in the view.
overwrite – if set to True, will overwrite an existing annotation with the same ID.
- Raises
KeyError – if
overwrite
is set to False and an annotation with the same ID exists in the view.- Returns
the generated
mmif.serialize.annotation.Annotation
- new_contain(at_type: Union[str, mmif.vocabulary.base_types.TypesBase], **contains_metadata) Optional[mmif.serialize.view.Contain] [source]
Adds a new element to the
contains
metadata.- Parameters
at_type – the
@type
of the annotation type being addedcontains_metadata – any metadata associated with the annotation type
- Returns
the generated
Contain
object
- new_textdocument(text: str, lang: str = 'en', did: Optional[str] = None, overwrite=False, **properties) mmif.serialize.annotation.Document [source]
Generates a new
mmif.serialize.annotation.Document
object, particularly typed as TextDocument and adds it to the current view.Fails if there is already a text document with the same ID in the view, unless
overwrite
is set to True.- Parameters
text – text content of the new document
lang – ISO 639-1 code of the language used in the new document
did – the desired ID of the document, when not given, the mmif SDK tries to automatically generate an ID based on Annotation type and existing documents in the view.
overwrite – if set to True, will overwrite an existing document with the same ID
- Raises
KeyError – if
overwrite
is set to False and an document with the same ID exists in the view- Returns
the generated
mmif.serialize.annotation.Document
- class mmif.serialize.view.ViewMetadata(viewmetadata_obj: Optional[Union[bytes, str, dict]] = None)[source]
Bases:
mmif.serialize.model.FreezableMmifObject
ViewMetadata object that represents the
metadata
object within a MMIF view.- Parameters
viewmetadata_obj – the JSON data that defines the metadata
- new_contain(at_type: Union[str, mmif.vocabulary.base_types.TypesBase], **contains_metadata) Optional[mmif.serialize.view.Contain] [source]
Adds a new element to the
contains
dictionary.- Parameters
at_type – the
@type
of the annotation type being addedcontains_metadata – any metadata associated with the annotation type
- Returns
the generated
Contain
object
mmif.serialize.annotation module
The annotation
module contains the classes used to represent a
MMIF annotation as a live Python object.
In MMIF, annotations are created by apps in a pipeline as a part
of a view. For documentation on how views are represented, see
mmif.serialize.view
.
- class mmif.serialize.annotation.Annotation(anno_obj: Optional[Union[bytes, str, dict]] = None)[source]
Bases:
mmif.serialize.model.FreezableMmifObject
MmifObject that represents an annotation in a MMIF view.
- add_property(name: str, value: Union[str, int, float, None, List[Optional[Union[str, int, float]]], List[List[Optional[Union[str, int, float]]]]]) None [source]
Adds a property to the annotation’s properties. :param name: the name of the property :param value: the property’s desired value :return: None
- class mmif.serialize.annotation.AnnotationProperties(mmif_obj: Optional[Union[bytes, str, dict]] = None)[source]
Bases:
mmif.serialize.model.FreezableMmifObject
AnnotationProperties object that represents the
properties
object within a MMIF annotation.- Parameters
mmif_obj – the JSON data that defines the properties
- class mmif.serialize.annotation.Document(doc_obj: Optional[Union[bytes, str, dict]] = None)[source]
Bases:
mmif.serialize.annotation.Annotation
Document object that represents a single document in a MMIF file.
A document is identified by an ID, and contains certain attributes and potentially contains the contents of the document itself, metadata about how the document was created, and/or a list of subdocuments grouped together logically.
If
document_obj
is not provided, an empty Document will be generated.- Parameters
document_obj – the JSON data that defines the document
- add_property(name: str, value: Union[str, int, float, None, List[Optional[Union[str, int, float]]]]) None [source]
Adds a property to the annotation’s properties. :param name: the name of the property :param value: the property’s desired value :return: None
- property location: Optional[str][source]
location
property must be a legitimate URI. That is, should the document be a local file then the file:// scheme must be used. Returns None when no location is set.
- location_address() Optional[str] [source]
Retrieves the full address from the document location URI. Returns None when no location is set.
- location_path() Optional[str] [source]
Retrieves only path name of the document location (hostname is ignored). Useful to get a path of a local file. Returns None when no location is set.
- class mmif.serialize.annotation.DocumentProperties(mmif_obj: Optional[Union[bytes, str, dict]] = None)[source]
Bases:
mmif.serialize.annotation.AnnotationProperties
DocumentProperties object that represents the
properties
object within a MMIF document.- Parameters
mmif_obj – the JSON data that defines the properties
- property location: Optional[str][source]
location
property must be a legitimate URI. That is, should the document be a local file then the file:// scheme must be used. Returns None when no location is set.
- location_address() Optional[str] [source]
Retrieves the full address from the document location URI. Returns None when no location is set.
- location_path() Optional[str] [source]
Retrieves only path name of the document location (hostname is ignored). Useful to get a path of a local file. Returns None when no location is set.