Advanced features
Class hierarchies
One of the main features of object oriented design is inheritance. If your objects can be categorised in classes and subclasses, then Python lets you code them like that, and YAtiML can read and write them.
For example, let’s add a description of the drawing to our Submission, in the form of a list of the shapes that it consists of. We’ll content ourselves with a somewhat crude representation consisting of circles and squares.
docs/examples/class_hierarchy.py
from typing import List, Union
import yatiml
# Create document classes
class Shape:
def __init__(self, center: List[float]) -> None:
self.center = center
class Circle(Shape):
def __init__(self, center: List[float], radius: float) -> None:
super().__init__(center)
self.radius = radius
class Square(Shape):
def __init__(self, center: List[float], width: float, height: float) -> None:
super().__init__(center)
self.width = width
self.height = height
class Submission(Shape):
def __init__(
self,
name: str,
age: Union[int, str],
drawing: List[Shape]
) -> None:
self.name = name
self.age = age
self.drawing = drawing
# Create loader
load = yatiml.load_function(Submission, Shape, Circle, Square)
# Load YAML
yaml_text = ('name: Janice\n'
'age: 6\n'
'drawing:\n'
' - center: [1.0, 1.0]\n'
' radius: 2.0\n'
' - center: [5.0, 5.0]\n'
' width: 1.0\n'
' height: 1.0\n')
doc = load(yaml_text)
print(doc.name)
print(doc.age)
print(doc.drawing)
Here, we have defined a class Shape
, and have added a list of Shapes as an
attribute to Submission
. Each shape has a location, its center, which is a
list of coordinates. Classes Circle
and Square
inherit from
Shape
, and have some additional attributes. All the classe are passed when
creating the load function, and that’s important, because only those classes
will be considered by YAtiML.
YAtiML will automatically recognize which subclass matches the object actually
specified in the list from the attributes that it has. If more than one subclass
matches, it will give an error message stating that the file being read is
ambiguous. If both a parent class and its child class match, YAtiML will prefer
the child class, and not consider it ambiguous. Abstract base classes (ones
inheriting from abc.ABC
, and/or with functions marked
@abc.abstractmethod
) never match, as they cannot be instantiated.
Note that the child classes include the parent’s class’s center
attribute in
their __init__
, and pass it on using super()
. This is required, as
otherwise YAtiML won’t accept the center
attribute for a subclass. Another
design option here would be to automatically merge the named attributes along
the inheritance path, and allow using a **kwargs
on __init__
to forward
additional attributes to the parent classes. The more explicit option is more
typing, but it also makes it easier to see what’s going on when reading the
code, and that’s very important for code maintainability. So that’s what YAtiML
does.
Enumerations
Enumerations, or enums, are types that are defined by listing a set of possible
values. In Python 3, they are made by creating a class that inherits from
enum.Enum
. YAML does not have enumerations, but strings work fine provided
that you have something like YAtiML to check that the string that the user put
in actually matches one of the values of the enum type, and return the correct
value from the enum class. Here’s how to add some colour to the drawings.
docs/examples/enums.py
import enum
from typing import List, Union
import yatiml
# Create document classes
class Color(enum.Enum):
red = 0xff0000
orange = 0xff8000
yellow = 0xffff00
green = 0x008000
blue = 0x00aeef
class Shape:
def __init__(self, center: List[float], color: Color) -> None:
self.center = center
self.color = color
class Circle(Shape):
def __init__(self, center: List[float], color: Color, radius: float) -> None:
super().__init__(center, color)
self.radius = radius
class Square(Shape):
def __init__(self, center: List[float], color: Color, width: float, height: float) -> None:
super().__init__(center, color)
self.width = width
self.height = height
class Submission(Shape):
def __init__(
self,
name: str,
age: Union[int, str],
drawing: List[Shape]
) -> None:
self.name = name
self.age = age
self.drawing = drawing
# Create loader
load = yatiml.load_function(Submission, Color, Shape, Circle, Square)
# Load YAML
yaml_text = ('name: Janice\n'
'age: 6\n'
'drawing:\n'
' - center: [1.0, 1.0]\n'
' color: red\n'
' radius: 2.0\n'
' - center: [5.0, 5.0]\n'
' color: blue\n'
' width: 1.0\n'
' height: 1.0\n')
doc = load(yaml_text)
print(doc.name)
print(doc.age)
print(doc.drawing[0].color)
Note that the labels that YAtiML looks for are the names of the enum members, not their values. In many existing standards, enums map to numerical values, or if you’re making something new, it’s often convenient to use the values for something else. The names are usually what you want to write though, so they’re probably easier for the users to write in the YAML file too. If you want something else though, you can season your enumerations. See below for a general explanation of seasoning, or look at Seasoning enumerations in the Recipes section for some examples.
User-Defined Strings
When defining file formats, you often find yourself with a need to define a
string with constraints. For example, Dutch postal codes consist of four digits,
followed by two uppercase letters. If you use a generic string type for a postal
code, you may end up accepting invalid values. A better solution is to define a
custom string type with a built-in constraint. In Python 3, you can do this
by deriving a class either from str
or from collections.UserString
. The
latter is easier, so that’s what we’ll use in this example. Let’s add the town
that our participant lives in to our YAML format, but insist that it be
capitalised.
docs/examples/user_defined_string.py
from collections import UserString
from typing import Any, Union
import yatiml
# Create document class
class TitleCaseString(UserString):
def __init__(self, seq: Any) -> None:
super().__init__(seq)
if not self.data.istitle():
raise ValueError('Invalid TitleCaseString \'{}\': Each word must'
' start with a capital letter'.format(self.data))
class Submission:
def __init__(self, name: str, age: Union[int, str],
town: TitleCaseString) -> None:
self.name = name
self.age = age
self.town = town
# Create loader
load_submission = yatiml.load_function(Submission, TitleCaseString)
# Load YAML
yaml_text = ('name: Janice\n'
'age: 6\n'
'town: Piedmont')
doc = load_submission(yaml_text)
print(type(doc))
print(doc.name)
print(doc.town)
Python’s UserString
provides an attribute data
containing the actual
string, so all we need to do is test that for validity in the constructor. If
you spell the town using only lowercase letters, you’ll get:
ValueError: Invalid TitleCaseString 'piedmont': Each word must start with a
capital letter
Note that you can’t make a TitleCaseString object containing ‘piedmont’ from Python either, so the object model and the YAML format are consistent.
Python’s UserString class tries very hard to look like a string by overloading
various special methods. Most of the time that’s fine, but sometimes you have a
class that’s really not much like a string on the Python side, but still should
be written to YAML as a string. In this case, you can add yatiml.String
as a base class. YAtiML will then expect a string on the YAML side, call
__init__
with that string as the sole argument, and when dumping use
str(obj)
to obtain the string representation to write to the YAML file
(the result is then passed to _yatiml_sweeten()
if you have it, so you can
still modify it if desired). Like classes derived from str
and
UserString
, such classes can be used as keys for dictionaries, but be sure
to implement __hash__()
and __eq__()
to make that work on the Python
side.
Seasoning your YAML
For users who are manually typing YAML files, it is usually nice to have some flexibility. For programmers processing the data read from such a file, it is very convenient if everything is rigidly defined, so that they do not have to take into account all sorts of corner cases. YAtiML helps you bridge this gap with its support for seasoning.
In programming languages, small features that make the language easier to type, but which do not add any real functionality are known as syntactic sugar. With YAtiML, you can add a bit of extra processing halfway through the dumping process to format your object in a nicer way. YAtiML calls this sweetening. When loading, you can convert back to the single representation that matches your class definition by savourising, savoury being the opposite of sweet. Together, sweetening and savourising are referred to as seasoning.
Let’s do another example. Having ages either as strings or as ints is not very convenient if you want to check which age category to file a submission under. So let’s add a savourising function to convert strings to int on loading:
docs/examples/savorizing.py
from typing import Optional, Union
import yatiml
# Create document class
class Submission:
def __init__(
self,
name: str,
age: Union[int, str],
tool: Optional[str]=None
) -> None:
self.name = name
self.age = age
self.tool = tool
@classmethod
def _yatiml_savorize(cls, node: yatiml.Node) -> None:
str_to_int = {
'five': 5,
'six': 6,
'seven': 7,
}
if node.has_attribute_type('age', str):
str_val = node.get_attribute('age').get_value()
if str_val in str_to_int:
node.set_attribute('age', str_to_int[str_val])
else:
raise yatiml.SeasoningError('Invalid age string')
# Create loader
load = yatiml.load_function(Submission)
# Load YAML
yaml_text = ('name: Janice\n'
'age: six\n')
doc = load(yaml_text)
print(doc.name)
print(doc.age)
print(doc.tool)
We have added a new _yatiml_savorize()
class method to our Submission class.
This method will be called by YAtiML after the YAML text has been parsed, but
before our Submission object has been generated. This method is passed the
node representing the mapping that will become the object. The node is of
type yatiml.Node
, which in turn is a wrapper for an internal
PyYAML object. Note that this method needs to be a classmethod, since
there is no object yet to call it with.
The yatiml.Node
class has a number of methods that you can use to
manipulate the node. In this case, we first check if there is an age
attribute at all, and if so, whether it has a string as its value. This is
needed, because we are operating on the freshly-parsed YAML input, before any
type checks have taken place. In other words, that node may contain anything.
Next, we get the attribute’s value, and then try to convert it to an int and set
it as the new value. If a string value was used that we do not know how to
convert, we raise a yatiml.SeasoningError
, which is the appropriate way
to signal an error during execution of _yatiml_savorize()
.
(At this point I should apologise for the language mix-up; the code uses North-American spelling because it’s rare to use British spelling in code and so it would confuse everyone, while the documentation uses British spelling because it’s what its author is used to.)
When saving a Submission, we may want to apply the opposite transformation, and
convert some ints back to strings. That can be done with a _yatiml_sweeten
classmethod:
docs/examples/sweetening.py
from typing import Optional, Union
import yatiml
# Create document class
class Submission:
def __init__(
self,
name: str,
age: Union[int, str],
tool: Optional[str]=None
) -> None:
self.name = name
self.age = age
self.tool = tool
@classmethod
def _yatiml_sweeten(cls, node: yatiml.Node) -> None:
int_to_str = {
5: 'five',
6: 'six',
7: 'seven'
}
int_val = int(node.get_attribute('age').get_value())
if int_val in int_to_str:
node.set_attribute('age', int_to_str[int_val])
# Create dumper
dumps = yatiml.dumps_function(Submission)
# Dump YAML
doc = Submission('Youssou', 7, 'pencils')
yaml_text = dumps(doc)
print(yaml_text)
The _yatiml_sweeten()
method has the same signature as
_yatiml_savorize()
but is called when dumping rather than when loading. It
gives you access to the YAML node that has been produced from a Submission
object before it is written out to the YAML output. Here, we use the same
functions as before to convert some of the int values back to strings. Since we
converted all the strings to ints on loading above, we can assume that the value
is indeed an int, and we do not have to check.
Indeed, if we run this example, we get:
name: Youssou
age: seven
tool: pencils
However, there is still an issue. We have now used the seasoning functionality
of YAtiML to give the user the freedom to write ages either as words or as
numbers, while always giving the programmer ints to work with. However, the
programmer could still accidentally put a string into the age field when
constructing a Submission directly in the code, as the type annotation allows
it. This would then crash the _yatiml_sweeten()
method when trying to dump
the object.
The solution, of course, is to change the type on the age
attribute of
__init__
to int
. Unfortunately, this breaks loading. If you try to run
the savourise example above with age
as type int
, then you will get
yatiml.exceptions.RecognitionError: in "<unicode string>", line 1, column 1:
name: Janice
^ (line: 1)
Type mismatch, expected a Submission
The reason we get the error above is that by default, YAtiML recognises objects
of custom classes by their attributes, checking both names and types.
With the type of the age
attribute now defined as int
, a mapping
containing an age
with a string value is now no longer recognised as a
Submission object. A potential solution would be to apply seasoning before
trying to recognise, but to know how to savorise a mapping we need to know which
type it is or should be, and for that we need to recognise it. The way to fix
this is to override the default recognition function with our own, and make that
recognise both int
and str
values for age
.
Customising recognition
Customising the recognition function is done by adding a
_yatiml_recognize()
method to your class, like this:
docs/examples/custom_recognition.py
from typing import Optional, Union
import yatiml
# Create document class
class Submission:
def __init__(
self,
name: str,
age: int,
tool: Optional[str]=None
) -> None:
self.name = name
self.age = age
self.tool = tool
@classmethod
def _yatiml_recognize(cls, node: yatiml.UnknownNode) -> None:
node.require_attribute('name', str)
node.require_attribute('age', Union[int, str])
@classmethod
def _yatiml_savorize(cls, node: yatiml.Node) -> None:
str_to_int = {
'five': 5,
'six': 6,
'seven': 7,
}
if node.has_attribute_type('age', str):
str_val = node.get_attribute('age').get_value()
if str_val in str_to_int:
node.set_attribute('age', str_to_int[str_val])
else:
raise yatiml.SeasoningError('Invalid age string')
# Create loader
load = yatiml.load_function(Submission)
# Load YAML
yaml_text = ('name: Janice\n'
'age: six\n')
doc = load(yaml_text)
print(doc.name)
print(doc.age)
print(doc.tool)
This is again a classmethod, with a single argument of type
yatiml.UnknownNode
representing the node. Like
yatiml.Node
, yatiml.UnknownNode
wraps a YAML node, but this
class has helper functions intended for writing recognition functions. Here,
we use require_attribute()
to list the required attributes and their
types. Since tool
is optional, it is not required, and not listed. The
age
attribute is specified with the Union type we used before. Now, any
mapping that is in a place where we expect a Submission will be recognised as a
Submission, as long as it has a name
attribute with a string value, and an
age
attribute that is either a string or an integer. If age
is a
string, the _yatiml_savorize()
method will convert it to an int, after
which a Submission object can be constructed without violating the type
constraint in the __init__()
method.
In fact, the _yatiml_recognize()
method here could be even simpler. In
every place in our document where a Submission can occur (namely the root),
only a Submission can occur. The Submission class does not have ancestors,
and it is never part of a Union. So there is never any doubt as to how to treat
the mapping, and in fact, the following will also work:
@classmethod
def _yatiml_recognize(cls, node: yatiml.UnknownNode) -> None:
pass
Now, if you try to read a document with, say, a float argument to age
, it
will be recognised as a Submission, the _yatiml_savorize()
method will do
nothing with it, and you’ll get an error message at the type check just before a
Submission is constructed.
This makes it clear that recognition is not a type check. Instead, its job is to distinguish between different possible types in places where the class hierarchy leaves some leeway to put in objects of different classes. If there is no such leeway, the recognition stage does not need to do anything. If there is some leeway, it just needs to do the minimum to exclude other possibilities.
However, since data models tend to evolve, it is usually a good idea to do a full check anyway, so that if this class ends up being used in a Union, or if you or someone else adds derived classes later, things will still work correctly and there won’t be any unnecessary ambiguity errors for the users.
Speaking of derived classes, note that while _yatiml_recognize()
is
inherited by derived classes like any other Python method, YAtiML will only use
it for the class on which it is defined; derived classes will use automatic
recognition unless they have their own _yatiml_recognize()
. The same goes
for _yatiml_savorize()
and `` _yatiml_sweeten()``.
Extra attributes
By default, YAtiML will match a mapping in a YAML file exactly: each required
attribute must be there, and any extraneous attributes give an error. However,
you may want to give your users the option of adding additional attributes. The
logical way for YAtiML to support this would be through having a **kwargs
attribute to the __init__
method, but unfortunately this would lose the
ordering information, since **kwargs
is a plain unordered dict (although
this is in the process of changing in newer versions of Python). Also, there
wouldn’t be an obvious way of saving such extra attributes again.
So, instead, extra attributes are sent to a _yatiml_extra
parameter of type
OrderedDict
on __init__
, if there is one. You put this value into a
_yatiml_extra
attribute, whose contents YAtiML will then dump appended to
the normal attributes. If you want to be able to add extra attributes when
constructing an object using keyword arguments, then you can add a **kwargs
parameter as well, and put any key-value pairs in it into self._yatiml_extra
in your favourite order yourself.
Here is an example:
docs/examples/extra_attributes.py
from collections import OrderedDict
from typing import Union
import yatiml
# Create document class
class Submission:
def __init__(
self,
name: str,
age: int,
_yatiml_extra: OrderedDict
) -> None:
self.name = name
self.age = age
self._yatiml_extra = _yatiml_extra
# Create loader
load = yatiml.load_function(Submission)
# Load YAML
yaml_text = ('name: Janice\n'
'age: 6\n'
'tool: crayons\n')
doc = load(yaml_text)
print(doc.name)
print(doc.age)
print(doc._yatiml_extra['tool'])
In this example, we use the tool
attribute again, but with this code, we
could add any attribute, and it would show up in _yatiml_extra
with no
errors generated.
Note that any explicit YAML tags on any mapping values of the extra attributes or anywhere beneath them in the YAML tree will be stripped, so that this tree will consist of plain lists and dicts. This is to avoid unexpected user-controlled object construction, for safety reasons. These tags are currently not added back on saving either, so it’s good if the extra data does not rely on them, better if it does not have any.
Hiding attributes
By default, YAtiML assumes that your classes have a public attribute
corresponding to each parameter of their __init__
method. If this
arrangement does not work for you, then you can override it by creating a
_yatiml_attributes()
method. This is not a classmethod, but an ordinary
method, because it is used for saving a particular instance of your class, to
which it needs access. If your custom class has a _yatiml_attributes()
method defined, YAtiML will call that method instead of looking for public
attributes. It should return an OrderedDict
with names and values of the
attributes.
So far, we have been printing the values of public attributes to see the results
of our work. It would be better encapsulation to use private attributes instead,
with a __str__
method to help printing. With _yatiml_attributes()
, that
can be done:
docs/examples/private_attributes.py
from collections import OrderedDict
from typing import Union
import yatiml
# Create document class
class Submission:
def __init__(self, name: str, age: Union[int, str]) -> None:
self.__name = name
self.__age = age
def __str__(self) -> str:
return '{}\n{}'.format(self.__name, self.__age)
def _yatiml_attributes(self) -> OrderedDict:
return OrderedDict([
('name', self.__name),
('age', self.__age)])
# Create loader
load = yatiml.load_function(Submission)
# Load YAML
yaml_text = ('name: Janice\n'
'age: 6\n')
doc = load(yaml_text)
print(doc)
Further reading
You’ve reached the end of this tutorial, which means that you have seen all the major features that YAtiML has. If you haven’t already started, now is the time to start making your awn YAML-based file format. You may want to have a look at the API Reference, and if you get stuck, there is the Problem solving section to help you out.