Data Modeling with Python and ext.db

From Programming Google App Engine, 2nd edition, 2012.

This article originally appeared as the chapter "Data Modeling with Python" in Programming Google App Engine, 2nd edition, published in 2012.

Newer versions of the book replaced this chapter with a chapter on ndb, a newer Python datastore library intended as a replacement for ext.db. The original text is offered here for developers still using the older library.

Data modeling is the process of translating the data requirements of your application to the features of your data storage technology. While the application deals in players, towns, weapons, potions, and gold, the datastore knows only entities, entity groups, keys, properties, and indexes. The data model describes how the data is stored and how it is manipulated. Entities represent players and game objects; properties describe the status of objects and the relationships between them. When an object changes location, the data is updated in a transaction, so the object cannot be in two places at once. When a player wants to know about the weapons in her inventory, the application performs a query for all weapon objects whose location is the player, possibly requiring an index.

In the last few chapters, we’ve been using the Python class db.Expando to create and manipulate entities and their properties. As we’ve been doing it, this class illustrates the flexible nature of the datastore. The datastore itself does not impose or enforce a structure on entities or their properties, giving the application control over how individual entities represent data objects. This flexibility is also an essential feature for scalability: changing the structure of millions of records is a large task, and the proper strategy for doing this is specific to the task and the application.

But structure is needed. Every player has a number of health points, and a Player entity without a health property, or with a health property whose value is not an integer, is likely to confuse the battle system. The data ought to conform to a structure, or schema, to meet the expectations of the code. Because the datastore does not enforce this schema itself—the datastore is schemaless—it is up to the application to ensure that entities are created and updated properly.

App Engine includes a data modeling library for defining and enforcing data schemas in Python. This library resides in the google.appengine.ext.db package. It includes several related classes for representing data objects, including db.Model, db.Expando and db.PolyModel. To give structure to entities of a given kind, you create a subclass of one of these classes. The definition of the class specifies the properties for those objects, their allowed value types, and other requirements.

In this chapter, we’ll introduce the Python data modeling library and discuss how to use it to enforce a schema for the otherwise schemaless datastore. We’ll also discuss how the library works and how to extend it.

Models and Properties

The db.Model superclass lets you specify a structure for every entity of a kind. This structure can include the names of the properties, the types of the values allowed for those properties, whether the property is required or optional, and a default value. Here is a definition of a Book class similar to the one we created in "Datastore Entities":

from google.appengine.ext import db
import datetime

class Book(db.Model):
    title = db.StringProperty(required=True)
    author = db.StringProperty(required=True)
    copyright_year = db.IntegerProperty()
    author_birthdate = db.DateProperty()

obj = Book(title='The Grapes of Wrath',
           author='John Steinbeck')
obj.copyright_year = 1939
obj.author_birthdate = datetime.date(1902, 2, 27)

obj.put()

This Book class inherits from db.Model. In the class definition, we declare that all Book entities have four properties, and we declare their value types: title and author are strings, copyright_year is an integer, and author_birthdate is a date-time. If someone tries to assign a value of the wrong type to one of these properties, the assignment raises a db.BadValueError.

We also declare that title and author are required properties. If someone tries to create a Book without these properties set as arguments to the Book constructor, the attempt raises a db.BadValueError. copyright_year and author_birthdate are optional, so we can leave them unset on construction, and assign values to the properties later. If these properties are not set by the time the object is saved, the resulting entity will not have these properties—and that’s allowed by this model.

A property declaration ensures that the entity created from the object has a value for the property, possibly None. As we’ll see in the next section, you can further specify what values are considered valid using arguments to the property declaration.

A model class that inherits from db.Model ignores all attributes that are not declared as properties when it comes time to save the object to the datastore. In the resulting entity, all declared properties are set, and no others.

This is the sole difference between db.Model and db.Expando. A db.Model class ignores undeclared properties. A db.Expando class saves all attributes of the object as properties of the corresponding entity. That is, a model using a db.Expando class “expands” to accommodate assignments to undeclared properties.

You can use property declarations with db.Expando just as with db.Model. The result is a data object that validates the values of the declared properties, and accepts any values for additional undeclared properties.

The official documentation refers to properties with declarations as static properties and properties on a db.Expando without declarations as dynamic properties. These terms have a nice correspondence with the notions of static and dynamic typing in programming languages. Property declarations implement a sort of runtime validated static typing for model classes, on top of Python’s own dynamic typing.

As we’ll see, property declarations are even more powerful than static typing, because they can validate more than just the type of the value.

For both db.Model and db.Expando, object attributes whose names begin with an underscore (_) are always ignored. You can use these private attributes to attach transient data or functions to model objects. (It’s possible to create an entity with a property whose name starts with an underscore; this convention only applies to object attributes in the modeling API.)

Because model objects also have attributes that are methods and other features, you cannot use certain names for properties in the Python model API. Some of the more pernicious reserved names are key, kind, and parent. The official documentation has a complete list of reserved names. In the next section, we’ll see a way to use these reserved names for datastore properties even though they aren’t allowed as attribute names in the API.

Beyond the model definition, db.Model and db.Expando have the same interface for saving, fetching, and deleting entities, and for performing queries and transactions. db.Expando is a subclass of db.Model.

Property Declarations

You declare a property for a model by assigning a property declaration object to an attribute of the model class. The name of the attribute is the name of the datastore property. The value is an object that describes the terms of the declaration. As discussed earlier, the db.StringProperty object assigned to the title class attribute says that the entity that an instance of the class represents can only have a string value for its title property. The required=True argument to the db.StringProperty constructor says that the object is not valid unless it has a value for the title property.

This can look a little confusing if you’re expecting the class attribute to shine through as an attribute of an instance of the class, as it normally does in Python. Instead, the db.Model class hooks into the attribute assignment mechanism so it can use the property declaration to validate a value assigned to an attribute of the object. In Python terms, the model uses property descriptors to enhance the behavior of attribute assignment.

Property declarations act as intermediaries between the application and the datastore. They can ensure that only values that meet certain criteria are assigned to properties. They can assign default values when constructing an object. They can even convert values between a data type used by the application and one of the datastore’s native value types, or otherwise customize how values are stored.

The db.StringProperty declaration has a feature that always trips me up, so I’m mentioning it here. By default, a string property value enforced by this declaration cannot contain newline characters. If you want to allow values with newline characters, specify the multiline=True argument to the declaration:

    prop = db.StringProperty(multiline=True)

This feature corresponds with a similar feature in the Django web application framework, which is used to help ensure that text fields in forms don’t accidentally contain newline characters. This is not a restriction of the App Engine datastore, it is merely the default behavior of db.StringProperty.

Property Value Types

db.StringProperty is an example of a property declaration class. There are several property declaration classes included with the Python SDK, one for each native datastore type. Each one ensures that the property can only be assigned a value of the corresponding type:

class Book(db.Model):
    title = db.StringProperty()

b = Book()

b.title = 99  # db.BadValueError, title must be a string

b.title = 'The Grapes of Wrath'  # OK

The following table lists the datastore native value types and their corresponding property declaration classes.

[Table: Datastore property value types and the corresponding property declaration classes]

Data type Python type Property class
Unicode text string (up to 500 bytes, indexed) unicode or str (converted to unicode as ASCII) db.StringProperty
Long Unicode text string (not indexed) db.Text db.TextProperty
Short byte string (up to 500 bytes, indexed) db.ByteString db.ByteStringProperty
Long byte string (not indexed) db.Blob db.BlobProperty
Boolean bool db.BooleanProperty
Integer (64-bit) int or long (converted to 64-bit long) db.IntegerProperty
Float (double precision) float db.FloatProperty
Date-time datetime.date db.DateProperty
datetime.datetime db.DateTimeProperty
datetime.time db.TimeProperty
Entity key db.Key or a model instance db.ReferenceProperty, db.SelfReferenceProperty
A Google account users.User db.UserProperty
A Blobstore key blobstore.BlobKey blobstore.BlobReferenceProperty

Property Validation

You can customize the behavior of a property declaration by passing arguments to the declaration’s constructor. We’ve already seen one example: the required argument.

All property declaration classes support the required argument. If True, the property is required and must not be None. You must provide an initial value for each required property to the constructor when creating a new object. (You can provide an initial value for any property this way.)

class Book(db.Model):
    title = db.StringProperty(required=True)

b = Book()  # db.BadValueError, title is required

b = Book(title='The Grapes of Wrath')  # OK

The datastore makes a distinction between a property that is not set and a property that is set to the null value (None). Property declarations do not make this distinction, because all declared properties must be set (possibly to None). Unless you say otherwise, the default value for declared properties is None, so the required validator treats the None value as an unspecified property.

You can change the default value with the default argument. When you create an object without a value for a property that has a default value, the constructor assigns the default value to the property.

A property that is required and has a default value uses the default if constructed without an explicit value. The value can never be None.

class Book(db.Model):
    rating = db.IntegerProperty(default=1)

b = Book()  # b.rating == 1

b = Book(rating=5)  # b.rating == 5

By default, the name of the class attribute is used as the name of the datastore property. If you wish to use a different name for the datastore property than is used for the attribute, specify a name argument. This allows you to use names already taken by the API for class or instance attributes as datastore properties.

class Song(db.Model):
    song_key = db.StringProperty(name='key')

s = Song()
s.song_key = 'C# min'

# The song_key attribute is stored as the
# datastore property named 'key'.
s.put()

You can declare that a property should contain only one of a fixed set of values by providing a list of possible values as the choices argument. If None is not one of the choices, this acts as a more restrictive form of required: the property must be set to one of the valid choices using a keyword argument to the constructor.

_KEYS = ['C', 'C min', 'C 7',
         'C#', 'C# min', 'C# 7',
         # ...
        ]

class Song(db.Model):
    song_key = db.StringProperty(choices=_KEYS)

s = Song(song_key='H min')  # db.BadValueError

s = Song()  # db.BadValueError, None is not an option

s = Song(song_key='C# min')  # OK

All of these features validate the value assigned to a property, and raise a db.BadValueError if the value does not meet the appropriate conditions. For even greater control over value validation, you can define your own validation function and assign it to a property declaration as the validator argument. The function should take the value as an argument, and raise a db.BadValueError (or an exception of your choosing) if the value should not be allowed.

def is_recent_year(val):
    if val < 1923:
        raise db.BadValueError

class Book(db.Model):
    copyright_year = db.IntegerProperty(validator=is_recent_year)

b = Book(copyright_year=1922)  # db.BadValueError

b = Book(copyright_year=1924)  # OK

Nonindexed Properties

In "Datastore Queries", we mentioned that you can set properties of an entity in such a way that they are available on the entity, but are considered unset for the purposes of indexes. In the Python API, you establish a property as nonindexed using a property declaration. If the property declaration is given an indexed argument of False, entities created with that model class will set that property as nonindexed.

class Book(db.Model):
    first_sentence = db.StringProperty(indexed=False)

b = Book()
b.first_sentence = "On the Internet, popularity is swift and fleeting."
b.put()

# Count the number of Book entities with
# an indexed first_sentence property...
c = Book.all().order('first_sentence').count(1000)

# c = 0

Automatic Values

Several property declaration classes include features for setting values automatically.

The db.DateProperty, db.DateTimeProperty, and db.TimeProperty classes can populate the value automatically with the current date and time. To enable this behavior, you provide the auto_now or auto_now_add arguments to the property declaration.

If you set auto_now=True, the declaration class overwrites the property value with the current date and time when you save the object. This is useful when you want to keep track of the last time an object was saved.

class Book(db.Model):
    last_updated = db.DateTimeProperty(auto_now=True)

b = Book()
b.put()  # last_updated is set to the current time

# ...

b.put()  # last_updated is set to the current time again

If you set auto_now_add=True, the property is set to the current time only when the object is saved for the first time. Subsequent saves do not overwrite the value.

class Book(db.Model):
    create_time = db.DateTimeProperty(auto_now_add=True)

b = Book()
b.put()  # create_time is set to the current time

# ...

b.put()  # create_time stays the same

The db.UserProperty declaration class also includes an automatic value feature. If you provide the argument auto_current_user=True, the value is set to the user accessing the current request handler if the user is signed in. If you provide auto_current_user_add=True, the value is only set to the current user when the entity is saved for the first time, and left untouched thereafter. If the current user is not signed in, the value is set to None.

class BookReview(db.Model):
    created_by_user = db.UserProperty(auto_current_user_add=True)
    last_edited_by_user = db.UserProperty(auto_current_user=True)

br = BookReview()
br.put()  # created_by_user and last_edited_by_user set

# ...

br.put()  # last_edited_by_user set again

At first glance, it might seem reasonable to set a default for a db.UserProperty this way:

from google.appengine.api import users

class BookReview(db.Model):
    created_by_user = db.UserProperty(
        default=users.get_current_user())
    # WRONG

This would set the default value to be the user who is signed in when the class is imported. Subsequent requests handled by the instance of the application will use a previous user instead of the current user as the default.

To guard against this mistake, db.UserProperty does not accept the default argument. You can use only auto_current_user or auto_current_user_add to set an automatic value.

List Properties

The data modeling API provides a property declaration class for multivalued properties, called db.ListProperty. This class ensures that every value for the property is of the same type. You pass this type to the property declaration, like so:

class Book(db.Model):
    tags = db.ListProperty(basestring)

b = Book()
b.tags = ['python', 'app engine', 'data']

The type argument to the db.ListProperty constructor must be the Python representation of one of the native datastore types. Refer back to [the property types table] for a complete list.

The datastore does not distinguish between a multivalued property with no elements and no property at all. As such, an undeclared property on a db.Expando object can’t store the empty list. If it did, when the entity is loaded back into an object, the property simply wouldn’t be there, potentially confusing code that’s expecting to find an empty list. To avoid confusion, db.Expando disallows assigning an empty list to an undeclared property.

The db.ListProperty declaration makes it possible to keep an empty list value on a multivalued property. The declaration interprets the state of an entity that doesn’t have the declared property as the property being set to the empty list, and maintains that distinction on the object. This also means that you cannot assign None to a declared list property—but this isn’t of the expected type for the property anyway.

The datastore does distinguish between a property with a single value and a multivalued property with a single value. An undeclared property on a db.Expando object can store a list with one element, and represent it as a list value the next time the entity is loaded.

The example above declares a list of string values. (basestring is the Python base type for str and unicode.) This case is so common that the API also provides db.StringListProperty.

You can provide a default value to db.ListProperty using the default argument. If you specify a nonempty list as the default, a shallow copy of the list value is made for each new object that doesn’t have an initial value for the property.

db.ListProperty does not support the required validator, since every list property technically has a list value (possibly empty). If you wish to disallow the empty list, you can provide your own validator function that does so:

def is_not_empty(lst):
    if len(lst) == 0:
        raise db.BadValueError

class Book(db.Model):
    tags = db.ListProperty(basestring, validator=is_not_empty)

b = Book(tags=[])  # db.BadValueError

b = Book()  # db.BadValueError, default "tags" is empty

b = Book(tags=['awesome'])  #  OK

db.ListProperty does not allow None as an element in the list because it doesn’t match the required value type. It is possible to store None as an element in a list for an undeclared property.

Models and Schema Migration

Property declarations prevent the application from creating an invalid data object, or assigning an invalid value to a property. If the application always uses the same model classes to create and manipulate entities, then all entities in the datastore will be consistent with the rules you establish using property declarations.

In real life, it is possible for an entity that does not fit a model to exist in the datastore. When you change a model class—and you will change model classes in the lifetime of your application—you are making a change to your application code, not the datastore. Entities created from a previous version of a model stay the way they are.

If an existing entity does not comply with the validity requirements of a model class, you’ll get a db.BadValueError when you try to fetch the entity from the datastore. Fetching an entity gets the entity’s data, then calls the model class constructor with its values. This executes each property’s validation routines on the data.

Some model changes are “backward compatible” such that old entities can be loaded into the new model class and be considered valid. Whether it is sufficient to make a backward-compatible change without updating existing entities depends on your application. Changing the type of a property declaration or adding a required property are almost always incompatible changes. Adding an optional property will not cause a db.BadValueError when an old entity is loaded, but if you have indexes on the new property, old entities will not appear in those indexes (and therefore won’t be results for those queries) until the entities are loaded and then saved with the new property’s default value.

The most straightforward way to migrate old entities to new schemas is to write a script that queries all of the entities and applies the changes. We’ll discuss how to implement this kind of batch operation in a scalable way using task queues, in "Task Chaining".

Modeling Relationships

You can model relationships between entities by storing entity keys as property values. The Python data modeling interface includes several powerful features for managing relationships.

The db.ReferenceProperty declaration describes a relationship between one model class and another. It stores the key of an entity as the property value. The first argument to the db.ReferenceProperty constructor is the model class of the kind of entity referenced by the property. If someone creates a relationship to an entity that is not of the appropriate kind, the assignment raises a db.BadValueError.

You can assign a data object directly to the property. The property declaration stores the key of the object as the property’s value to create the relationship. You can also assign a db.Key directly.

class Book(db.Model):
    title = db.StringProperty()
    author = db.StringProperty()

class BookReview(db.Model):
    book = db.ReferenceProperty(Book, collection_name='reviews')

b = Book()
b.put()

br = BookReview()

br.book = b        # sets br's 'book' property to b's key

br.book = b.key()  # same thing

We’ll explain what collection_name does in a moment.

The referenced object must have a “complete” key before it can be assigned to a reference property. A key is complete when it has all of its parts, including the string name or the system-assigned numeric ID. If you create a new object without a key name, the key is not complete until you save the object. When you save the object, the system completes the key with a numeric ID. If you create the object (or a db.Key) with a key name, the key is already complete, and you can use it for a reference without saving it first.

b = Book()
br = BookReview()
br.book = b  # db.BadValueError, b's key is not complete

b.put()
br.book = b  # OK, b's key has system ID

b = Book(key_name='The_Grapes_of_Wrath')
br = BookReview()
br.book = b  # OK, b's key has a name

db.put([b, br])

A model class must be defined before it can be the subject of a db.ReferenceProperty. To declare a reference property that can refer to another instance of the same class, you use a different declaration, db.SelfReferenceProperty.

class Book(db.Model):
    previous_edition = db.SelfReferenceProperty()

b1 = Book()
b2 = Book()
b2.previous_edition = b1

Reference properties have a powerful and intuitive syntax for accessing referenced objects. When you access the value of a reference property, the property fetches the entity from the datastore using the stored key, then returns it as an instance of its model class. A referenced entity is loaded “lazily”: it is not fetched from the datastore until the property is dereferenced.

br = db.get(book_review_key)
# br is a BookReview instance

title = br.book.title  # fetches book, gets its title property

This automatic dereferencing of reference properties occurs the first time you access the reference property. Subsequent uses of the property use the in-memory instance of the data object. This caching of the referenced entity is specific to the object with the property. If another object has a reference to the same entity, accessing its reference fetches the entity anew.

db.ReferenceProperty does another clever thing: it creates automatic back-references from a referenced object to the objects that refer to it. If a BookReview class has a reference property that refers to the Book class, the Book class gets a special property whose name is specified by the collection_name argument to the declaration (e.g., reviews). This property is special because it isn’t actually a property stored on the entity. Instead, when you access the back-reference property, the API performs a datastore query for all BookReview entities whose reference property equals the key of the Book. Since this is a single-property query, it uses the built-in indexes, and never requires a custom index.

b = db.get(book_key)
# b is a Book instance

for review in b.reviews:
    # review is a BookReview instance
    # ...

If you don’t specify a collection_name, the name of the back-reference property is the name of the referring class followed by _set. If a class has multiple reference properties that refer to the same class, you must provide a collection_name to disambiguate the back-reference properties.

class BookReview(db.Model):
     # Book gets a BookReview_set special property.
     book = db.ReferenceProperty(Book)

     # Book gets a recommended_book_set special property.
     recommended_book = db.ReferenceProperty(Book, 
                                             collection_name='recommended_book_set')

Because the back-reference property is implemented as a query, it incurs no overhead if you don’t use it.

As with storing db.Key values as properties, neither the datastore nor the property declaration requires that a reference property refer to an entity that exists. Dereferencing a reference property that points to an entity that does not exist raises a db.ReferencePropertyResolveError. Keys cannot change, so a relationship is only severed when the referenced entity is deleted from the datastore.

One-to-Many Relationships

A reference property and its corresponding back-reference represent a one-to-many relationship between classes in your data model. The reference property establishes a one-way relationship from one entity to another, and the declaration sets up the back-reference mechanism on the referenced class. The back-reference uses the built-in query index, so determining which objects refer to the referenced object is reasonably fast. It’s not quite as fast as storing a list of keys on a property, but it’s easier to maintain.

A common use of one-to-many relationships is to model ownership. In the previous example, each BookReview was related to a single Book, and a Book could have many BookReviews. The BookReviews belong to the Book.

One-to-One Relationships

You can also use a reference property to model a one-to-one relationship. The property declaration doesn’t enforce that only one entity can refer to a given entity, but this is easy to maintain in the application code. Because the performance of queries scales with the size of the result set and not the size of the data set, it’s usually sufficient to use the back-reference query to follow a one-to-one relationship back to the object with the reference.

If you’d prefer not to use a query to traverse the back-reference, you could also store a reference on the second object back to the first, at the expense of having to maintain the relationship in two places. This is tricky, because the class has to be defined before it can be the subject of a ReferenceProperty. One option is to use db.Expando and an undeclared property for one of the classes.

A one-to-one relationship can be used to model partnership. A good use of one-to-one relationships in App Engine is to split a large object into multiple entities to provide selective access to its properties. A player might have an avatar image up to 64 kilobytes in size, but the application probably doesn’t need the 64 KB of image data every time it fetches the Player entity. You can create a separate PlayerAvatarImage entity to contain the image, and establish a one-to-one relationship by creating a reference property from the Player to the PlayerAvatarImage. The application must know to delete the related objects when deleting a Player:

class PlayerAvatarImage(db.Model):
    image_data = db.BlobProperty()
    mime_type = db.StringProperty()

class Player(db.Model):
    name = db.StringProperty()
    avatar = db.ReferenceProperty(PlayerAvatarImage)

# Fetch the name of the player (a string) a
# reference to the avatar image (a key).
p = db.get(player_key)

# Fetch the avatar image entity and access its
# image_data property.
image_data = p.avatar.image_data

Many-to-Many Relationships

A many-to-many relationship is a type of relationship between entities of two kinds where entities of either kind can have that relationship with many entities of the other kind, and vice versa. For instance, a player may be a member of one or more guilds, and a guild can have many members.

There are at least two ways to implement many-to-many relationships using the datastore. Let’s consider two of these. The first method we’ll call “the key list method,” and the second we’ll call “the link model method.”

The key list method

With the key list method, you store a list of entity keys on one side of the relationship using a db.ListProperty. Such a declaration does not have any of the features of a db.ReferenceProperty such as back-references or automatic dereferencing, because it does not involve that class. To model the relationship in the other direction, you can implement the back-reference feature using a method and the Python annotation @property:

class Player(db.Model):
    name = db.StringProperty()
    guilds = db.ListProperty(db.Key)

class Guild(db.Model):
    name = db.StringProperty()

    @property
    def members(self):
        return Player.all().filter('guilds', self.key())

# Guilds to which a player belongs:
p = db.get(player_key)
guilds = db.get(p.guilds)  # batch get using list of keys
for guild in guilds:
    # ...

# Players that belong to a guild:
g = db.get(guild_key)
for player in g.members:
    # ...

Instead of manipulating the list of keys, you could implement automatic dereferencing using advanced Python techniques to extend how the values in the list property are accessed. A good way to do this is with a custom property declaration. We’ll consider this in a later section.

The key list method is best suited for situations where there are fewer objects on one side of the relationship than on the other, and the short list is small enough to store directly on an entity. In this example, many players each belong to a few guilds; each player has a short list of guilds, while each guild may have a long list of players. We put the list property on the Player side of the relationship to keep the entity small, and use queries to produce the long list when it is needed.

The link model method represents each relationship as an entity. The relationship entity has reference properties pointing to the related classes. You traverse the relationship by going through the relationship entity via the back-references.

class Player(db.Model):
    name = db.StringProperty()

class Guild(db.Model):
    name = db.StringProperty()

class GuildMembership(db.Model):
    player = db.ReferenceProperty(Player, collection_name='guild_memberships')
    guild = db.ReferenceProperty(Guild, collection_name='player_memberships')

p = Player()
g = Guild()
db.put([p, g])

gm = GuildMembership(player=p, guild=g)
db.put(gm)

# Guilds to which a player belongs:
for gm in p.guild_memberships:
    guild_name = gm.guild.name
    # ...

# Players that belong to a guild:
for gm in g.player_memberships:
    player_name = gm.player.name
    # ...

This technique is similar to how you’d use “join tables” in a SQL database. It’s a good choice if either side of the relationship may get too large to store on the entity itself. You can also use the relationship entity to store metadata about the relationship (such as when the player joined the guild), or model more complex relationships between multiple classes.

The link model method is more expensive than the key list method. It requires fetching the relationship entity to access the related object.

Remember that App Engine doesn’t support SQL-style join queries on these objects. You can achieve a limited sort of join by repeating information from the data objects on the link model objects, using code on the model classes to keep the values in sync. To do this with strong consistency, the link model object and the two related objects would need to be in the same entity group, which is not always possible or practical.

If eventual consistency would suffice, you could use task queues to propagate the information. See "Task Queues and Scheduled Tasks".

Model Inheritance

In data modeling, it’s often useful to derive new kinds of objects from other kinds. The game world may contain many different kinds of carryable objects, with shared properties and features common to all objects you can carry. Since you implement classes from the data model as Python classes, you’d expect to be able to use inheritance in the implementation to represent inheritance in the model. And you can, sort of.

If you define a class based on either db.Model or db.Expando, you can create other classes that inherit from that data class, like so:

class CarryableObject(db.Model):
    weight = db.IntegerProperty()
    location = db.ReferenceProperty(Location)

class Bottle(CarryableObject):
    contents = db.StringProperty()
    amount = db.IntegerProperty()
    is_closed = db.BooleanProperty()

The subclass inherits the property declarations of the parent class. A Bottle has five property declarations: weight, location, contents, amount, and is_closed.

Objects based on the child class will be stored as entities whose kind is the name of the child class. The datastore has no notion of inheritance, and so by default will not treat Bottle entities as if they are CarryableObject entities. This is mostly significant for queries, and we have a solution for that in the next section.

If a child class declares a property already declared by a parent class, the class definition raises a db.DuplicatePropertyError. The data modeling API does not support overriding property declarations in subclasses.

A model class can inherit from multiple classes, using Python’s own support for multiple inheritance:

class PourableObject(GameObject):
    contents = db.StringProperty()
    amount = db.IntegerProperty()

class Bottle(CarryableObject, PourableObject):
    is_closed = db.BooleanProperty()

Each parent class must not declare a property with the same name as declarations in the other parent classes, or the class definition raises a db.DuplicatePropertyError. However, the modeling API does the work to support “diamond inheritance,” where two parent classes themselves share a parent class:

class GameObject(db.Model):
    name = db.StringProperty()
    location = db.ReferenceProperty(Location)

class CarryableObject(GameObject):
    weight = db.IntegerProperty()

class PourableObject(GameObject):
    contents = db.StringProperty()
    amount = db.IntegerProperty()

class Bottle(CarryableObject, PourableObject):
    is_closed = db.BooleanProperty()

In this example, both CarryableObject and PourableObject inherit two property declarations from GameObject, and are both used as parent classes to Bottle. The model API allows this because the two properties are defined in the same class, so there is no conflict. Bottle gets its name and location declarations from GameObject.

Queries and PolyModels

The datastore knows nothing of our modeling classes and inheritance. Instances of the Bottle class are stored as entities of the kind 'Bottle', with no inherent knowledge of the parent classes. It’d be nice to be able to perform a query for CarryableObject entities and get back Bottle entities and others. That is, it’d be nice if a query could treat Bottle entities as if they were instances of the parent classes, as Python does in our application code. We want polymorphism in our queries.

For this, the data modeling API provides a special base class: db.PolyModel. Model classes using this base class support polymorphic queries. Consider the Bottle class defined previously. Let’s change the base class of GameObject to db.PolyModel, like so:

from google.appengine.ext.db import polymodel

class GameObject(polymodel.PolyModel):
    # ...

We can now perform queries for any kind in the hierarchy, and get the expected results:

here = db.get(location_key)

q = CarryableObject.all()
q.filter('location', here)
q.filter('weight >', 100)

for obj in q:
    # obj is a carryable object that is here
    # and weighs more than 100 kilos.
    # ...

This query can return any CarryableObject, including Bottle entities. The query can use filters on any property of the specified class (such as weight from CarryableObject) or parent classes (such as location from GameObject).

Behind the scenes, db.PolyModel does three clever things differently from its cousins:

  • Objects of the class GameObject or any of its child classes are all stored as entities of the kind 'GameObject'.

  • All such objects are given a property named class that represents the inheritance hierarchy starting from the root class. This is a multivalued property, where each value is the name of an ancestor class, in order.

  • Queries for objects of any kind in the hierarchy are translated by the db.PolyModel class into queries for the base class, with additional equality filters that compare the class being queried to the class property’s values.

In short, db.PolyModel stores information about the inheritance hierarchy on the entities, then uses it for queries to support polymorphism.

Each model class that inherits directly from db.PolyModel is the root of a class hierarchy. All objects from the hierarchy are stored as entities whose kind is the name of the root class. As such, your data will be easier to maintain if you use many root classes to form many class hierarchies, as opposed to putting all classes in a single hierarchy. That way, the datastore viewer and bulk loading tools can still use the datastore’s built-in notion of entity kinds to distinguish between kinds of objects.

Creating Your Own Property Classes

The property declaration classes serve several functions in your data model:

Value validation

The model calls the class when a value is assigned to the property, and the class can raise an exception if the value does not meet its conditions.

Type conversion

The model calls the class to convert from the value type used by the app to one of the core datastore types for storage, and back again.

Default behavior

The model calls the class if no value was assigned to determine an appropriate default value.

Every property declaration class inherits from the db.Property base class. This class implements features common to all property declarations, including support for the common constructor arguments (such as required, name, and indexed). Declaration classes override methods and members to specialize the validation and type conversion routines.

Validating Property Values

Here is a very simple property declaration class. It accepts any string value, and stores it as a datastore short string (the default behavior for Python string values).

from google.appengine.ext import db

class PlayerNameProperty(db.Property):
    data_type = basestring

    def validate(self, value):
        value = super(PlayerNameProperty, self).validate(value)
        if value is not None and not isinstance(value, self.data_type):
            raise db.BadValueError('Property %s must be a %s.' %
                                   (self.name, self.data_type.__name__))
        return value

And here is how you would use the new property declaration:

class Player(db.Model):
    player_name = PlayerNameProperty()

p = Player()
p.player_name = 'Ned Nederlander'

p.player_name = 12345  # db.BadValueError

The validate() method takes the value as an argument, and either returns the value, returns a different value, or raises an exception. The value returned by the method becomes the application-facing value for the attribute, so you can use the validate() method for things like type coercion. In this example, the method raises a db.BadValueError if the value is not a string or None. The exception message can refer to the name of the property using self.name.

The data_type member is used by the base class. It represents the core datastore type the property uses to store the value. For string values, this is basestring.

The validate() method should call the superclass’s implementation before checking its own conditions. The base class’s validator supports the required, choices, and validator arguments of the declaration constructor.

If the app does not provide a value for a property when it constructs the data object, the property starts out with a default value. This default value is passed to the validate() method during the object constructor. If it is appropriate for your property declaration to allow a default value of None, make sure your validate() method allows it.

So far, this example doesn’t do much beyond db.StringProperty. This by itself can be useful to give the property type a class for future expansion. Let’s add a requirement that player names be between 6 and 30 characters in length by extending the validate() method:

class PlayerNameProperty(db.Property):
    data_type = basestring

    def validate(self, value):
        value = super(PlayerNameProperty, self).validate(value)
        if value is not None:
            if not isinstance(value, self.data_type):
                raise db.BadValueError('Property %s must be a %s.' %
                                       (self.name, self.data_type.__name__))
            if (len(value) < 6 or len(value) > 30):
                raise db.BadValueError(('Property %s must be between 6 and ' +
                                        '30 characters.') % self.name)

        return value

The new validation logic disallows strings with an inappropriate length:

p = Player()
p.player_name = 'Ned'    # db.BadValueError
p.player_name = 'Ned Nederlander'    # OK

p = Player(player_name = 'Ned')  # db.BadValueError

Marshaling Value Types

The datastore supports a fixed set of core value types for properties, listed in [the property types table]. A property declaration can support the use of other types of values in the attributes of model instances by marshaling between the desired type and one of the core datastore types. For example, the db.ListProperty class converts between the empty list of the app side and the condition of being unset on the datastore side.

The get_value_for_datastore() method converts the application value to the datastore value. Its argument is the complete model object, so you can access other aspects of the model when doing the conversion.

The make_value_from_datastore() method takes the datastore value and converts it to the type to be used in the application. It takes the datastore value and returns the desired object attribute value.

Say we wanted to represent player name values within the application using a PlayerName class instead of a simple string. Each player name has a surname and an optional first name. We can store this value as a single property, using the property declaration to convert between the application type (PlayerName) and a core datastore type (such as unicode).

class PlayerName(object):
    def __init__(self, first_name, surname):
        self.first_name = first_name
        self.surname = surname

    def is_valid(self):
        return (isinstance(self.first_name, unicode)
                and isinstance(self.surname, unicode)
                and len(self.surname) >= 6)

class PlayerNameProperty(db.Property):
    data_type = basestring

    def validate(self, value):
        value = super(PlayerNameProperty, self).validate(value)
        if value is not None:
            if not isinstance(value, PlayerName):
                raise db.BadValueError('Property %s must be a PlayerName.' %
                                       (self.name))

            # Let the data class have a say in validity.
            if not value.is_valid():
                raise db.BadValueError('Property %s must be a valid PlayerName.' %
                                       self.name)

            # Disallow the serialization delimiter in the first field.
            if value.surname.find('|') != -1:
                raise db.BadValueError(('PlayerName surname in property %s cannot ' +
                                        'contain a "|".') % self.name)
        return value

    def get_value_for_datastore(self, model_instance):
        # Convert the data object's PlayerName to a unicode.
        return (getattr(model_instance, self.name).surname + u'|'
                + getattr(model_instance, self.name).first_name)

    def make_value_for_datastore(self, value):
        # Convert a unicode to a PlayerName.
        i = value.find(u'|')
        return PlayerName(first_name=value[i+1:],
                          surname=value[:i])

And here’s how you’d use it:

p = Player()
p.player_name = PlayerName(u'Ned', u'Nederlander')

p.player_name = PlayerName(u'Ned', u'Neder|lander')
    # db.BadValueError, surname contains serialization delimiter

p.player_name = PlayerName(u'Ned', u'Neder')
    # db.BadValueError, PlayerName.is_valid() == False, surname too short

p.player_name = PlayerName('Ned', u'Nederlander')
    # db.BadValueError, PlayerName.is_valid() == False, first_name is not unicode

Here, the application value type is a PlayerName instance, and the datastore value type is that value encoded as a Unicode string. The encoding format is the surname field, followed by a delimiter, followed by the first_name field. We disallow the delimiter character in the surname using the validate() method. (Instead of disallowing it, we could also escape it in get_value_for_datastore() and unescape it in make_value_for_datastore().)

In this example, PlayerName(u'Ned', u'Nederlander') is stored as this Unicode string:

Nederlander|Ned

The datastore value puts the surname first so that the datastore will sort PlayerName values first by surname, then by first name. In general, you choose a serialization format that has the desired ordering characteristics for your custom property type. (The core type you choose also impacts how your values are ordered when mixed with other types, though if you’re modeling consistently this isn’t usually an issue.)

If the conversion from the application type to the datastore type may fail, put a check for the conversion failure in the validate() method. This way, the error is caught when the bad value is assigned, instead of when the object is saved.

Customizing Default Values

When the app constructs a data object and does not provide a value for a declared property, the model calls the property declaration class to determine a default value. The base class implementation sets the default value to None, and allows the app to customize the default value in the model using the default argument to the declaration.

A few of the built-in declaration classes provide more sophisticated default values. For instance, if a db.DateTimeProperty was set with auto_now_add=True, the default value is the current system date and time. (db.DateTimeProperty uses get_value_for_data⁠store() to implement auto_now=True, so the value is updated whether or not it has a value.)

The default value passes through the validation logic after it is set. This allows the app to customize the validation logic and disallow the default value. This is what happens when required=True: the base class’s validation logic disallows the None value, which is the base class’s default value.

To specify custom default behavior, override the default_value() method. This method takes no arguments and returns the desired default value.

Here’s a simple implementation of default_value() for PlayerNameProperty:

class PlayerNameProperty(db.Property):
    # ...

    def default_value(self):
        default = super(PlayerNameProperty, self).default_value()
        if default is not None:
            return default

        return PlayerName(u'', u'Anonymous')

In this example, we call the superclass default() method to support the default argument to the constructor, which allows the app to override the default value in the model. If that returns None, we create a new PlayerName instance to be the default value.

Without further changes, this implementation breaks the required feature of the base class, because the value of the property is never None (unless the app explicitly assigns a None value). We can fix this by amending our validation logic to check self.required and disallow the anonymous PlayerName value if it’s True.

Accepting Arguments

If you want the application to be able to control the behavior of your custom property declaration class using arguments, you override the __init__() method. The method should call the superclass __init__() method to enable the features of the superclass that use arguments (like required). The Property API requires that the verbose_name property come first, but after that all __init__() arguments are keyword values.

class PlayerNameProperty(db.Property):
    # ...

    def __init__(self, verbose_name=None,
                 require_first_name=False, **kwds):
        super(PlayerNameProperty, self).__init__(verbose_name, **kwds)
        self.require_first_name = require_first_name

    def validate(self, value):
        value = super(PlayerNameProperty, self).validate(value)
        if value is not None:
            # ...

            if self.require_first_name and not value.first_name:
                raise db.BadValueError('Property %s PlayerName needs a first_name.' %
                                       self.name)

        # ...

You’d use this feature like this:

class Player(db.Model):
    player_name = PlayerNameProperty(require_first_name=True)

p = Player(player_name=PlayerName(u'Ned', u'Nederlander'))

p.player_name = PlayerName(u'', u'Charo')
# db.BadValueError, first name required

p = Player()  
# db.BadValueError, default value PlayerName(u'', u'Anonymous') has empty first_name