This article originally appeared as the chapter "Data Modeling with Python" in Programming Google App Engine, 2nd edition, published in 2012.
Newer versions of the book replaced this chapter with a chapter on
ndb, a newer Python datastore library intended as a replacement for
ext.db. The original text is offered here for developers still using the older library.
Data modeling is the process of translating the data requirements of your application to the features of your data storage technology. While the application deals in players, towns, weapons, potions, and gold, the datastore knows only entities, entity groups, keys, properties, and indexes. The data model describes how the data is stored and how it is manipulated. Entities represent players and game objects; properties describe the status of objects and the relationships between them. When an object changes location, the data is updated in a transaction, so the object cannot be in two places at once. When a player wants to know about the weapons in her inventory, the application performs a query for all weapon objects whose location is the player, possibly requiring an index.
In the last few chapters, we’ve been using the Python class
db.Expando to create and manipulate entities and their properties. As we’ve been doing it, this class illustrates the flexible nature of the datastore. The datastore itself does not impose or enforce a structure on entities or their properties, giving the application control over how individual entities represent data objects. This flexibility is also an essential feature for scalability: changing the structure of millions of records is a large task, and the proper strategy for doing this is specific to the task and the application.
But structure is needed. Every player has a number of health points, and a
Player entity without a
health property, or with a
health property whose value is not an integer, is likely to confuse the battle system. The data ought to conform to a structure, or schema, to meet the expectations of the code. Because the datastore does not enforce this schema itself—the datastore is schemaless—it is up to the application to ensure that entities are created and updated properly.
App Engine includes a data modeling library for defining and enforcing data schemas in Python. This library resides in the
google.appengine.ext.db package. It includes several related classes for representing data objects, including
db.PolyModel. To give structure to entities of a given kind, you create a subclass of one of these classes. The definition of the class specifies the properties for those objects, their allowed value types, and other requirements.
In this chapter, we’ll introduce the Python data modeling library and discuss how to use it to enforce a schema for the otherwise schemaless datastore. We’ll also discuss how the library works and how to extend it.
Models and Properties
db.Model superclass lets you specify a structure for every entity of a kind. This structure can include the names of the properties, the types of the values allowed for those properties, whether the property is required or optional, and a default value. Here is a definition of a
Book class similar to the one we created in "Datastore Entities":
from google.appengine.ext import db import datetime class Book(db.Model): title = db.StringProperty(required=True) author = db.StringProperty(required=True) copyright_year = db.IntegerProperty() author_birthdate = db.DateProperty() obj = Book(title='The Grapes of Wrath', author='John Steinbeck') obj.copyright_year = 1939 obj.author_birthdate = datetime.date(1902, 2, 27) obj.put()
Book class inherits from
db.Model. In the class definition, we declare that all
Book entities have four properties, and we declare their value types:
author are strings,
copyright_year is an integer, and
author_birthdate is a date-time. If someone tries to assign a value of the wrong type to one of these properties, the assignment raises a
We also declare that
author are required properties. If someone tries to create a
Book without these properties set as arguments to the
Book constructor, the attempt raises a
author_birthdate are optional, so we can leave them unset on construction, and assign values to the properties later. If these properties are not set by the time the object is saved, the resulting entity will not have these properties—and that’s allowed by this model.
A property declaration ensures that the entity created from the object has a value for the property, possibly
None. As we’ll see in the next section, you can further specify what values are considered valid using arguments to the property declaration.
A model class that inherits from
db.Model ignores all attributes that are not declared as properties when it comes time to save the object to the datastore. In the resulting entity, all declared properties are set, and no others.
This is the sole difference between
db.Model class ignores undeclared properties. A
db.Expando class saves all attributes of the object as properties of the corresponding entity. That is, a model using a
db.Expando class “expands” to accommodate assignments to undeclared properties.
You can use property declarations with
db.Expando just as with
db.Model. The result is a data object that validates the values of the declared properties, and accepts any values for additional undeclared properties.
The official documentation refers to properties with declarations as static properties and properties on a
db.Expando without declarations as dynamic properties. These terms have a nice correspondence with the notions of static and dynamic typing in programming languages. Property declarations implement a sort of runtime validated static typing for model classes, on top of Python’s own dynamic typing.
As we’ll see, property declarations are even more powerful than static typing, because they can validate more than just the type of the value.
db.Expando, object attributes whose names begin with an underscore (
_) are always ignored. You can use these private attributes to attach transient data or functions to model objects. (It’s possible to create an entity with a property whose name starts with an underscore; this convention only applies to object attributes in the modeling API.)
Because model objects also have attributes that are methods and other features, you cannot use certain names for properties in the Python model API. Some of the more pernicious reserved names are
parent. The official documentation has a complete list of reserved names. In the next section, we’ll see a way to use these reserved names for datastore properties even though they aren’t allowed as attribute names in the API.
Beyond the model definition,
db.Expando have the same interface for saving, fetching, and deleting entities, and for performing queries and transactions.
db.Expando is a subclass of
You declare a property for a model by assigning a property declaration object to an attribute of the model class. The name of the attribute is the name of the datastore property. The value is an object that describes the terms of the declaration. As discussed earlier, the
db.StringProperty object assigned to the
title class attribute says that the entity that an instance of the class represents can only have a string value for its
title property. The
required=True argument to the
db.StringProperty constructor says that the object is not valid unless it has a value for the
This can look a little confusing if you’re expecting the class attribute to shine through as an attribute of an instance of the class, as it normally does in Python. Instead, the
db.Model class hooks into the attribute assignment mechanism so it can use the property declaration to validate a value assigned to an attribute of the object. In Python terms, the model uses property descriptors to enhance the behavior of attribute assignment.
Property declarations act as intermediaries between the application and the datastore. They can ensure that only values that meet certain criteria are assigned to properties. They can assign default values when constructing an object. They can even convert values between a data type used by the application and one of the datastore’s native value types, or otherwise customize how values are stored.
db.StringProperty declaration has a feature that always trips me up, so I’m mentioning it here. By default, a string property value enforced by this declaration cannot contain newline characters. If you want to allow values with newline characters, specify the
multiline=True argument to the declaration:
prop = db.StringProperty(multiline=True)
This feature corresponds with a similar feature in the Django web application framework, which is used to help ensure that text fields in forms don’t accidentally contain newline characters. This is not a restriction of the App Engine datastore, it is merely the default behavior of
Property Value Types
db.StringProperty is an example of a property declaration class. There are several property declaration classes included with the Python SDK, one for each native datastore type. Each one ensures that the property can only be assigned a value of the corresponding type:
class Book(db.Model): title = db.StringProperty() b = Book() b.title = 99 # db.BadValueError, title must be a string b.title = 'The Grapes of Wrath' # OK
The following table lists the datastore native value types and their corresponding property declaration classes.
[Table: Datastore property value types and the corresponding property declaration classes]
|Data type||Python type||Property class|
|Unicode text string (up to 500 bytes, indexed)||
|Long Unicode text string (not indexed)||
|Short byte string (up to 500 bytes, indexed)||
|Long byte string (not indexed)||
|Float (double precision)||
|A Google account||
|A Blobstore key||
You can customize the behavior of a property declaration by passing arguments to the declaration’s constructor. We’ve already seen one example: the
All property declaration classes support the
required argument. If
True, the property is required and must not be
None. You must provide an initial value for each required property to the constructor when creating a new object. (You can provide an initial value for any property this way.)
class Book(db.Model): title = db.StringProperty(required=True) b = Book() # db.BadValueError, title is required b = Book(title='The Grapes of Wrath') # OK
The datastore makes a distinction between a property that is not set and a property that is set to the null value (
None). Property declarations do not make this distinction, because all declared properties must be set (possibly to
None). Unless you say otherwise, the default value for declared properties is
None, so the
required validator treats the
None value as an unspecified property.
You can change the default value with the
default argument. When you create an object without a value for a property that has a default value, the constructor assigns the default value to the property.
A property that is required and has a default value uses the default if constructed without an explicit value. The value can never be
class Book(db.Model): rating = db.IntegerProperty(default=1) b = Book() # b.rating == 1 b = Book(rating=5) # b.rating == 5
By default, the name of the class attribute is used as the name of the datastore property. If you wish to use a different name for the datastore property than is used for the attribute, specify a
name argument. This allows you to use names already taken by the API for class or instance attributes as datastore properties.
class Song(db.Model): song_key = db.StringProperty(name='key') s = Song() s.song_key = 'C# min' # The song_key attribute is stored as the # datastore property named 'key'. s.put()
You can declare that a property should contain only one of a fixed set of values by providing a list of possible values as the
choices argument. If
None is not one of the choices, this acts as a more restrictive form of
required: the property must be set to one of the valid choices using a keyword argument to the constructor.
_KEYS = ['C', 'C min', 'C 7', 'C#', 'C# min', 'C# 7', # ... ] class Song(db.Model): song_key = db.StringProperty(choices=_KEYS) s = Song(song_key='H min') # db.BadValueError s = Song() # db.BadValueError, None is not an option s = Song(song_key='C# min') # OK
All of these features validate the value assigned to a property, and raise a
db.BadValueError if the value does not meet the appropriate conditions. For even greater control over value validation, you can define your own validation function and assign it to a property declaration as the
validator argument. The function should take the value as an argument, and raise a
db.BadValueError (or an exception of your choosing) if the value should not be allowed.
def is_recent_year(val): if val < 1923: raise db.BadValueError class Book(db.Model): copyright_year = db.IntegerProperty(validator=is_recent_year) b = Book(copyright_year=1922) # db.BadValueError b = Book(copyright_year=1924) # OK
In "Datastore Queries", we mentioned that you can set properties of an entity in such a way that they are available on the entity, but are considered unset for the purposes of indexes. In the Python API, you establish a property as nonindexed using a property declaration. If the property declaration is given an
indexed argument of
False, entities created with that model class will set that property as nonindexed.
class Book(db.Model): first_sentence = db.StringProperty(indexed=False) b = Book() b.first_sentence = "On the Internet, popularity is swift and fleeting." b.put() # Count the number of Book entities with # an indexed first_sentence property... c = Book.all().order('first_sentence').count(1000) # c = 0
Several property declaration classes include features for setting values automatically.
db.TimeProperty classes can populate the value automatically with the current date and time. To enable this behavior, you provide the
auto_now_add arguments to the property declaration.
If you set
auto_now=True, the declaration class overwrites the property value with the current date and time when you save the object. This is useful when you want to keep track of the last time an object was saved.
class Book(db.Model): last_updated = db.DateTimeProperty(auto_now=True) b = Book() b.put() # last_updated is set to the current time # ... b.put() # last_updated is set to the current time again
If you set
auto_now_add=True, the property is set to the current time only when the object is saved for the first time. Subsequent saves do not overwrite the value.
class Book(db.Model): create_time = db.DateTimeProperty(auto_now_add=True) b = Book() b.put() # create_time is set to the current time # ... b.put() # create_time stays the same
db.UserProperty declaration class also includes an automatic value feature. If you provide the argument
auto_current_user=True, the value is set to the user accessing the current request handler if the user is signed in. If you provide
auto_current_user_add=True, the value is only set to the current user when the entity is saved for the first time, and left untouched thereafter. If the current user is not signed in, the value is set to
class BookReview(db.Model): created_by_user = db.UserProperty(auto_current_user_add=True) last_edited_by_user = db.UserProperty(auto_current_user=True) br = BookReview() br.put() # created_by_user and last_edited_by_user set # ... br.put() # last_edited_by_user set again
At first glance, it might seem reasonable to set a default for a
db.UserProperty this way:
from google.appengine.api import users class BookReview(db.Model): created_by_user = db.UserProperty( default=users.get_current_user()) # WRONG
This would set the default value to be the user who is signed in when the class is imported. Subsequent requests handled by the instance of the application will use a previous user instead of the current user as the default.
To guard against this mistake,
db.UserProperty does not accept the
default argument. You can use only
auto_current_user_add to set an automatic value.
The data modeling API provides a property declaration class for multivalued properties, called
db.ListProperty. This class ensures that every value for the property is of the same type. You pass this type to the property declaration, like so:
class Book(db.Model): tags = db.ListProperty(basestring) b = Book() b.tags = ['python', 'app engine', 'data']
The type argument to the
db.ListProperty constructor must be the Python representation of one of the native datastore types. Refer back to [the property types table] for a complete list.
The datastore does not distinguish between a multivalued property with no elements and no property at all. As such, an undeclared property on a
db.Expando object can’t store the empty list. If it did, when the entity is loaded back into an object, the property simply wouldn’t be there, potentially confusing code that’s expecting to find an empty list. To avoid confusion,
db.Expando disallows assigning an empty list to an undeclared property.
db.ListProperty declaration makes it possible to keep an empty list value on a multivalued property. The declaration interprets the state of an entity that doesn’t have the declared property as the property being set to the empty list, and maintains that distinction on the object. This also means that you cannot assign
None to a declared list property—but this isn’t of the expected type for the property anyway.
The datastore does distinguish between a property with a single value and a multivalued property with a single value. An undeclared property on a
db.Expando object can store a list with one element, and represent it as a list value the next time the entity is loaded.
The example above declares a list of string values. (
basestring is the Python base type for
unicode.) This case is so common that the API also provides
You can provide a default value to
db.ListProperty using the
default argument. If you specify a nonempty list as the default, a shallow copy of the list value is made for each new object that doesn’t have an initial value for the property.
db.ListProperty does not support the
required validator, since every list property technically has a list value (possibly empty). If you wish to disallow the empty list, you can provide your own
validator function that does so:
def is_not_empty(lst): if len(lst) == 0: raise db.BadValueError class Book(db.Model): tags = db.ListProperty(basestring, validator=is_not_empty) b = Book(tags=) # db.BadValueError b = Book() # db.BadValueError, default "tags" is empty b = Book(tags=['awesome']) # OK
db.ListProperty does not allow
None as an element in the list because it doesn’t match the required value type. It is possible to store
None as an element in a list for an undeclared property.
Models and Schema Migration
Property declarations prevent the application from creating an invalid data object, or assigning an invalid value to a property. If the application always uses the same model classes to create and manipulate entities, then all entities in the datastore will be consistent with the rules you establish using property declarations.
In real life, it is possible for an entity that does not fit a model to exist in the datastore. When you change a model class—and you will change model classes in the lifetime of your application—you are making a change to your application code, not the datastore. Entities created from a previous version of a model stay the way they are.
If an existing entity does not comply with the validity requirements of a model class, you’ll get a
db.BadValueError when you try to fetch the entity from the datastore. Fetching an entity gets the entity’s data, then calls the model class constructor with its values. This executes each property’s validation routines on the data.
Some model changes are “backward compatible” such that old entities can be loaded into the new model class and be considered valid. Whether it is sufficient to make a backward-compatible change without updating existing entities depends on your application. Changing the type of a property declaration or adding a required property are almost always incompatible changes. Adding an optional property will not cause a
db.BadValueError when an old entity is loaded, but if you have indexes on the new property, old entities will not appear in those indexes (and therefore won’t be results for those queries) until the entities are loaded and then saved with the new property’s default value.
The most straightforward way to migrate old entities to new schemas is to write a script that queries all of the entities and applies the changes. We’ll discuss how to implement this kind of batch operation in a scalable way using task queues, in "Task Chaining".
You can model relationships between entities by storing entity keys as property values. The Python data modeling interface includes several powerful features for managing relationships.
db.ReferenceProperty declaration describes a relationship between one model class and another. It stores the key of an entity as the property value. The first argument to the
db.ReferenceProperty constructor is the model class of the kind of entity referenced by the property. If someone creates a relationship to an entity that is not of the appropriate kind, the assignment raises a
You can assign a data object directly to the property. The property declaration stores the key of the object as the property’s value to create the relationship. You can also assign a
class Book(db.Model): title = db.StringProperty() author = db.StringProperty() class BookReview(db.Model): book = db.ReferenceProperty(Book, collection_name='reviews') b = Book() b.put() br = BookReview() br.book = b # sets br's 'book' property to b's key br.book = b.key() # same thing
We’ll explain what
collection_name does in a moment.
The referenced object must have a “complete” key before it can be assigned to a reference property. A key is complete when it has all of its parts, including the string name or the system-assigned numeric ID. If you create a new object without a key name, the key is not complete until you save the object. When you save the object, the system completes the key with a numeric ID. If you create the object (or a
db.Key) with a key name, the key is already complete, and you can use it for a reference without saving it first.
b = Book() br = BookReview() br.book = b # db.BadValueError, b's key is not complete b.put() br.book = b # OK, b's key has system ID b = Book(key_name='The_Grapes_of_Wrath') br = BookReview() br.book = b # OK, b's key has a name db.put([b, br])
A model class must be defined before it can be the subject of a
db.ReferenceProperty. To declare a reference property that can refer to another instance of the same class, you use a different declaration,
class Book(db.Model): previous_edition = db.SelfReferenceProperty() b1 = Book() b2 = Book() b2.previous_edition = b1
Reference properties have a powerful and intuitive syntax for accessing referenced objects. When you access the value of a reference property, the property fetches the entity from the datastore using the stored key, then returns it as an instance of its model class. A referenced entity is loaded “lazily”: it is not fetched from the datastore until the property is dereferenced.
br = db.get(book_review_key) # br is a BookReview instance title = br.book.title # fetches book, gets its title property
This automatic dereferencing of reference properties occurs the first time you access the reference property. Subsequent uses of the property use the in-memory instance of the data object. This caching of the referenced entity is specific to the object with the property. If another object has a reference to the same entity, accessing its reference fetches the entity anew.
db.ReferenceProperty does another clever thing: it creates automatic back-references from a referenced object to the objects that refer to it. If a
BookReview class has a reference property that refers to the
Book class, the
Book class gets a special property whose name is specified by the
collection_name argument to the declaration (e.g.,
reviews). This property is special because it isn’t actually a property stored on the entity. Instead, when you access the back-reference property, the API performs a datastore query for all
BookReview entities whose reference property equals the key of the
Book. Since this is a single-property query, it uses the built-in indexes, and never requires a custom index.
b = db.get(book_key) # b is a Book instance for review in b.reviews: # review is a BookReview instance # ...
If you don’t specify a
collection_name, the name of the back-reference property is the name of the referring class followed by
_set. If a class has multiple reference properties that refer to the same class, you must provide a
collection_name to disambiguate the back-reference properties.
class BookReview(db.Model): # Book gets a BookReview_set special property. book = db.ReferenceProperty(Book) # Book gets a recommended_book_set special property. recommended_book = db.ReferenceProperty(Book, collection_name='recommended_book_set')
Because the back-reference property is implemented as a query, it incurs no overhead if you don’t use it.
As with storing
db.Key values as properties, neither the datastore nor the property declaration requires that a reference property refer to an entity that exists. Dereferencing a reference property that points to an entity that does not exist raises a
db.ReferencePropertyResolveError. Keys cannot change, so a relationship is only severed when the referenced entity is deleted from the datastore.
A reference property and its corresponding back-reference represent a one-to-many relationship between classes in your data model. The reference property establishes a one-way relationship from one entity to another, and the declaration sets up the back-reference mechanism on the referenced class. The back-reference uses the built-in query index, so determining which objects refer to the referenced object is reasonably fast. It’s not quite as fast as storing a list of keys on a property, but it’s easier to maintain.
A common use of one-to-many relationships is to model ownership. In the previous example, each
BookReview was related to a single
Book, and a
Book could have many
BookReviews belong to the
You can also use a reference property to model a one-to-one relationship. The property declaration doesn’t enforce that only one entity can refer to a given entity, but this is easy to maintain in the application code. Because the performance of queries scales with the size of the result set and not the size of the data set, it’s usually sufficient to use the back-reference query to follow a one-to-one relationship back to the object with the reference.
If you’d prefer not to use a query to traverse the back-reference, you could also store a reference on the second object back to the first, at the expense of having to maintain the relationship in two places. This is tricky, because the class has to be defined before it can be the subject of a
ReferenceProperty. One option is to use
db.Expando and an undeclared property for one of the classes.
A one-to-one relationship can be used to model partnership. A good use of one-to-one relationships in App Engine is to split a large object into multiple entities to provide selective access to its properties. A player might have an avatar image up to 64 kilobytes in size, but the application probably doesn’t need the 64 KB of image data every time it fetches the
Player entity. You can create a separate
PlayerAvatarImage entity to contain the image, and establish a one-to-one relationship by creating a reference property from the
Player to the
PlayerAvatarImage. The application must know to delete the related objects when deleting a
class PlayerAvatarImage(db.Model): image_data = db.BlobProperty() mime_type = db.StringProperty() class Player(db.Model): name = db.StringProperty() avatar = db.ReferenceProperty(PlayerAvatarImage) # Fetch the name of the player (a string) a # reference to the avatar image (a key). p = db.get(player_key) # Fetch the avatar image entity and access its # image_data property. image_data = p.avatar.image_data
A many-to-many relationship is a type of relationship between entities of two kinds where entities of either kind can have that relationship with many entities of the other kind, and vice versa. For instance, a player may be a member of one or more guilds, and a guild can have many members.
There are at least two ways to implement many-to-many relationships using the datastore. Let’s consider two of these. The first method we’ll call “the key list method,” and the second we’ll call “the link model method.”
The key list method
With the key list method, you store a list of entity keys on one side of the relationship using a
db.ListProperty. Such a declaration does not have any of the features of a
db.ReferenceProperty such as back-references or automatic dereferencing, because it does not involve that class. To model the relationship in the other direction, you can implement the back-reference feature using a method and the Python annotation
class Player(db.Model): name = db.StringProperty() guilds = db.ListProperty(db.Key) class Guild(db.Model): name = db.StringProperty() @property def members(self): return Player.all().filter('guilds', self.key()) # Guilds to which a player belongs: p = db.get(player_key) guilds = db.get(p.guilds) # batch get using list of keys for guild in guilds: # ... # Players that belong to a guild: g = db.get(guild_key) for player in g.members: # ...
Instead of manipulating the list of keys, you could implement automatic dereferencing using advanced Python techniques to extend how the values in the list property are accessed. A good way to do this is with a custom property declaration. We’ll consider this in a later section.
The key list method is best suited for situations where there are fewer objects on one side of the relationship than on the other, and the short list is small enough to store directly on an entity. In this example, many players each belong to a few guilds; each player has a short list of guilds, while each guild may have a long list of players. We put the list property on the
Player side of the relationship to keep the entity small, and use queries to produce the long list when it is needed.
The link model method
The link model method represents each relationship as an entity. The relationship entity has reference properties pointing to the related classes. You traverse the relationship by going through the relationship entity via the back-references.
class Player(db.Model): name = db.StringProperty() class Guild(db.Model): name = db.StringProperty() class GuildMembership(db.Model): player = db.ReferenceProperty(Player, collection_name='guild_memberships') guild = db.ReferenceProperty(Guild, collection_name='player_memberships') p = Player() g = Guild() db.put([p, g]) gm = GuildMembership(player=p, guild=g) db.put(gm) # Guilds to which a player belongs: for gm in p.guild_memberships: guild_name = gm.guild.name # ... # Players that belong to a guild: for gm in g.player_memberships: player_name = gm.player.name # ...
This technique is similar to how you’d use “join tables” in a SQL database. It’s a good choice if either side of the relationship may get too large to store on the entity itself. You can also use the relationship entity to store metadata about the relationship (such as when the player joined the guild), or model more complex relationships between multiple classes.
The link model method is more expensive than the key list method. It requires fetching the relationship entity to access the related object.
Remember that App Engine doesn’t support SQL-style join queries on these objects. You can achieve a limited sort of join by repeating information from the data objects on the link model objects, using code on the model classes to keep the values in sync. To do this with strong consistency, the link model object and the two related objects would need to be in the same entity group, which is not always possible or practical.
If eventual consistency would suffice, you could use task queues to propagate the information. See "Task Queues and Scheduled Tasks".
In data modeling, it’s often useful to derive new kinds of objects from other kinds. The game world may contain many different kinds of carryable objects, with shared properties and features common to all objects you can carry. Since you implement classes from the data model as Python classes, you’d expect to be able to use inheritance in the implementation to represent inheritance in the model. And you can, sort of.
If you define a class based on either
db.Expando, you can create other classes that inherit from that data class, like so:
class CarryableObject(db.Model): weight = db.IntegerProperty() location = db.ReferenceProperty(Location) class Bottle(CarryableObject): contents = db.StringProperty() amount = db.IntegerProperty() is_closed = db.BooleanProperty()
The subclass inherits the property declarations of the parent class. A
Bottle has five property declarations:
Objects based on the child class will be stored as entities whose kind is the name of the child class. The datastore has no notion of inheritance, and so by default will not treat
Bottle entities as if they are
CarryableObject entities. This is mostly significant for queries, and we have a solution for that in the next section.
If a child class declares a property already declared by a parent class, the class definition raises a
db.DuplicatePropertyError. The data modeling API does not support overriding property declarations in subclasses.
A model class can inherit from multiple classes, using Python’s own support for multiple inheritance:
class PourableObject(GameObject): contents = db.StringProperty() amount = db.IntegerProperty() class Bottle(CarryableObject, PourableObject): is_closed = db.BooleanProperty()
Each parent class must not declare a property with the same name as declarations in the other parent classes, or the class definition raises a
db.DuplicatePropertyError. However, the modeling API does the work to support “diamond inheritance,” where two parent classes themselves share a parent class:
class GameObject(db.Model): name = db.StringProperty() location = db.ReferenceProperty(Location) class CarryableObject(GameObject): weight = db.IntegerProperty() class PourableObject(GameObject): contents = db.StringProperty() amount = db.IntegerProperty() class Bottle(CarryableObject, PourableObject): is_closed = db.BooleanProperty()
In this example, both
PourableObject inherit two property declarations from
GameObject, and are both used as parent classes to
Bottle. The model API allows this because the two properties are defined in the same class, so there is no conflict.
Bottle gets its
location declarations from
Queries and PolyModels
The datastore knows nothing of our modeling classes and inheritance. Instances of the
Bottle class are stored as entities of the kind
'Bottle', with no inherent knowledge of the parent classes. It’d be nice to be able to perform a query for
CarryableObject entities and get back
Bottle entities and others. That is, it’d be nice if a query could treat
Bottle entities as if they were instances of the parent classes, as Python does in our application code. We want polymorphism in our queries.
For this, the data modeling API provides a special base class:
db.PolyModel. Model classes using this base class support polymorphic queries. Consider the
Bottle class defined previously. Let’s change the base class of
db.PolyModel, like so:
from google.appengine.ext.db import polymodel class GameObject(polymodel.PolyModel): # ...
We can now perform queries for any kind in the hierarchy, and get the expected results:
here = db.get(location_key) q = CarryableObject.all() q.filter('location', here) q.filter('weight >', 100) for obj in q: # obj is a carryable object that is here # and weighs more than 100 kilos. # ...
This query can return any
Bottle entities. The query can use filters on any property of the specified class (such as
CarryableObject) or parent classes (such as
Behind the scenes,
db.PolyModel does three clever things differently from its cousins:
Objects of the class
GameObjector any of its child classes are all stored as entities of the kind
All such objects are given a property named
classthat represents the inheritance hierarchy starting from the root class. This is a multivalued property, where each value is the name of an ancestor class, in order.
Queries for objects of any kind in the hierarchy are translated by the
db.PolyModelclass into queries for the base class, with additional equality filters that compare the class being queried to the
db.PolyModel stores information about the inheritance hierarchy on the entities, then uses it for queries to support polymorphism.
Each model class that inherits directly from
db.PolyModel is the root of a class hierarchy. All objects from the hierarchy are stored as entities whose kind is the name of the root class. As such, your data will be easier to maintain if you use many root classes to form many class hierarchies, as opposed to putting all classes in a single hierarchy. That way, the datastore viewer and bulk loading tools can still use the datastore’s built-in notion of entity kinds to distinguish between kinds of objects.
Creating Your Own Property Classes
The property declaration classes serve several functions in your data model:
- Value validation
The model calls the class when a value is assigned to the property, and the class can raise an exception if the value does not meet its conditions.
- Type conversion
The model calls the class to convert from the value type used by the app to one of the core datastore types for storage, and back again.
- Default behavior
The model calls the class if no value was assigned to determine an appropriate default value.
Every property declaration class inherits from the
db.Property base class. This class implements features common to all property declarations, including support for the common constructor arguments (such as
indexed). Declaration classes override methods and members to specialize the validation and type conversion routines.
Validating Property Values
Here is a very simple property declaration class. It accepts any string value, and stores it as a datastore short string (the default behavior for Python string values).
from google.appengine.ext import db class PlayerNameProperty(db.Property): data_type = basestring def validate(self, value): value = super(PlayerNameProperty, self).validate(value) if value is not None and not isinstance(value, self.data_type): raise db.BadValueError('Property %s must be a %s.' % (self.name, self.data_type.__name__)) return value
And here is how you would use the new property declaration:
class Player(db.Model): player_name = PlayerNameProperty() p = Player() p.player_name = 'Ned Nederlander' p.player_name = 12345 # db.BadValueError
validate() method takes the value as an argument, and either returns the value, returns a different value, or raises an exception. The value returned by the method becomes the application-facing value for the attribute, so you can use the
validate() method for things like type coercion. In this example, the method raises a
db.BadValueError if the value is not a string or
None. The exception message can refer to the name of the property using
data_type member is used by the base class. It represents the core datastore type the property uses to store the value. For string values, this is
validate() method should call the superclass’s implementation before checking its own conditions. The base class’s validator supports the
validator arguments of the declaration constructor.
If the app does not provide a value for a property when it constructs the data object, the property starts out with a default value. This default value is passed to the
validate() method during the object constructor. If it is appropriate for your property declaration to allow a default value of
None, make sure your
validate() method allows it.
So far, this example doesn’t do much beyond
db.StringProperty. This by itself can be useful to give the property type a class for future expansion. Let’s add a requirement that player names be between 6 and 30 characters in length by extending the
class PlayerNameProperty(db.Property): data_type = basestring def validate(self, value): value = super(PlayerNameProperty, self).validate(value) if value is not None: if not isinstance(value, self.data_type): raise db.BadValueError('Property %s must be a %s.' % (self.name, self.data_type.__name__)) if (len(value) < 6 or len(value) > 30): raise db.BadValueError(('Property %s must be between 6 and ' + '30 characters.') % self.name) return value
The new validation logic disallows strings with an inappropriate length:
p = Player() p.player_name = 'Ned' # db.BadValueError p.player_name = 'Ned Nederlander' # OK p = Player(player_name = 'Ned') # db.BadValueError
Marshaling Value Types
The datastore supports a fixed set of core value types for properties, listed in [the property types table]. A property declaration can support the use of other types of values in the attributes of model instances by marshaling between the desired type and one of the core datastore types. For example, the
db.ListProperty class converts between the empty list of the app side and the condition of being unset on the datastore side.
get_value_for_datastore() method converts the application value to the datastore value. Its argument is the complete model object, so you can access other aspects of the model when doing the conversion.
make_value_from_datastore() method takes the datastore value and converts it to the type to be used in the application. It takes the datastore value and returns the desired object attribute value.
Say we wanted to represent player name values within the application using a
PlayerName class instead of a simple string. Each player name has a surname and an optional first name. We can store this value as a single property, using the property declaration to convert between the application type (
PlayerName) and a core datastore type (such as
class PlayerName(object): def __init__(self, first_name, surname): self.first_name = first_name self.surname = surname def is_valid(self): return (isinstance(self.first_name, unicode) and isinstance(self.surname, unicode) and len(self.surname) >= 6) class PlayerNameProperty(db.Property): data_type = basestring def validate(self, value): value = super(PlayerNameProperty, self).validate(value) if value is not None: if not isinstance(value, PlayerName): raise db.BadValueError('Property %s must be a PlayerName.' % (self.name)) # Let the data class have a say in validity. if not value.is_valid(): raise db.BadValueError('Property %s must be a valid PlayerName.' % self.name) # Disallow the serialization delimiter in the first field. if value.surname.find('|') != -1: raise db.BadValueError(('PlayerName surname in property %s cannot ' + 'contain a "|".') % self.name) return value def get_value_for_datastore(self, model_instance): # Convert the data object's PlayerName to a unicode. return (getattr(model_instance, self.name).surname + u'|' + getattr(model_instance, self.name).first_name) def make_value_for_datastore(self, value): # Convert a unicode to a PlayerName. i = value.find(u'|') return PlayerName(first_name=value[i+1:], surname=value[:i])
And here’s how you’d use it:
p = Player() p.player_name = PlayerName(u'Ned', u'Nederlander') p.player_name = PlayerName(u'Ned', u'Neder|lander') # db.BadValueError, surname contains serialization delimiter p.player_name = PlayerName(u'Ned', u'Neder') # db.BadValueError, PlayerName.is_valid() == False, surname too short p.player_name = PlayerName('Ned', u'Nederlander') # db.BadValueError, PlayerName.is_valid() == False, first_name is not unicode
Here, the application value type is a
PlayerName instance, and the datastore value type is that value encoded as a Unicode string. The encoding format is the
surname field, followed by a delimiter, followed by the
first_name field. We disallow the delimiter character in the surname using the
validate() method. (Instead of disallowing it, we could also escape it in
get_value_for_datastore() and unescape it in
In this example,
PlayerName(u'Ned', u'Nederlander') is stored as this Unicode string:
The datastore value puts the surname first so that the datastore will sort
PlayerName values first by surname, then by first name. In general, you choose a serialization format that has the desired ordering characteristics for your custom property type. (The core type you choose also impacts how your values are ordered when mixed with other types, though if you’re modeling consistently this isn’t usually an issue.)
If the conversion from the application type to the datastore type may fail, put a check for the conversion failure in the
validate() method. This way, the error is caught when the bad value is assigned, instead of when the object is saved.
Customizing Default Values
When the app constructs a data object and does not provide a value for a declared property, the model calls the property declaration class to determine a default value. The base class implementation sets the default value to
None, and allows the app to customize the default value in the model using the
default argument to the declaration.
A few of the built-in declaration classes provide more sophisticated default values. For instance, if a
db.DateTimeProperty was set with
auto_now_add=True, the default value is the current system date and time. (
get_value_for_datastore() to implement
auto_now=True, so the value is updated whether or not it has a value.)
The default value passes through the validation logic after it is set. This allows the app to customize the validation logic and disallow the default value. This is what happens when
required=True: the base class’s validation logic disallows the
None value, which is the base class’s default value.
To specify custom default behavior, override the
default_value() method. This method takes no arguments and returns the desired default value.
Here’s a simple implementation of
class PlayerNameProperty(db.Property): # ... def default_value(self): default = super(PlayerNameProperty, self).default_value() if default is not None: return default return PlayerName(u'', u'Anonymous')
In this example, we call the superclass
default() method to support the
default argument to the constructor, which allows the app to override the default value in the model. If that returns
None, we create a new
PlayerName instance to be the default value.
Without further changes, this implementation breaks the
required feature of the base class, because the value of the property is never
None (unless the app explicitly assigns a
None value). We can fix this by amending our validation logic to check
self.required and disallow the anonymous
PlayerName value if it’s
If you want the application to be able to control the behavior of your custom property declaration class using arguments, you override the
__init__() method. The method should call the superclass
__init__() method to enable the features of the superclass that use arguments (like
Property API requires that the
verbose_name property come first, but after that all
__init__() arguments are keyword values.
class PlayerNameProperty(db.Property): # ... def __init__(self, verbose_name=None, require_first_name=False, **kwds): super(PlayerNameProperty, self).__init__(verbose_name, **kwds) self.require_first_name = require_first_name def validate(self, value): value = super(PlayerNameProperty, self).validate(value) if value is not None: # ... if self.require_first_name and not value.first_name: raise db.BadValueError('Property %s PlayerName needs a first_name.' % self.name) # ...
You’d use this feature like this:
class Player(db.Model): player_name = PlayerNameProperty(require_first_name=True) p = Player(player_name=PlayerName(u'Ned', u'Nederlander')) p.player_name = PlayerName(u'', u'Charo') # db.BadValueError, first name required p = Player() # db.BadValueError, default value PlayerName(u'', u'Anonymous') has empty first_name