Any initial training on RavenDB or any other document database will talk a lot about a concept in Domain Driven Design called aggregate. An aggregate is a set of domain objects that can all be contained within one other domain object, called the aggregate root. The standard modeling advice for a document database is to identify the aggregate roots in your domain model, and use those to create your documents.
Armed with this standard advice, we set out to find all the aggregates in our domain, and we made some mistakes. One of the first mistakes that we made was creating an aggregate out of what was really two entities, thereby creating a gigantic document.
To illustrate this, let’s use a standard order tracking system as an example. In an order tracking system, one probably has a set of customers, and a set of orders. Each order must be associated with a customer. Because an order cannot exist without a customer, we maybe think to model a document in a way that looks like this:
public class Customer { public int Id {get;set;} public String Name {get;set;} public List<Order> Orders {get;set;} } public class Order { public int Id {get; set;} public decimal Amount {get; set;} public List<Item> Items {get; set;} } public class Item { public int Id {get; set;} public String Descritpion {get;set;} public decimal Price {get;set;} }
This code does kind of makes sense. But it totally breaks down the minute we want to find an order by ID. First of all, Raven doesn’t include a way to generate an ID for a collection inside of a document, so to guarantee the uniqueness of the Id on the order, we’re forced to go to a GUID. This is just inconvenient from a usability standpoint. No customer wants to call in and have to rattle off a 32 character order id jut to get shipping status!
In addition, there is no way to get information about an order without querying an index on the customer document. Now instead of just calling _session.Load<Order>(Id) we’ve been forced to make an index on customer and query it just to grab something we already know the identity of!
In addition, our document now has the potential for boundless growth. A loyal customer’s document might end up having thousands of orders attached over the course of several years, and grow to be several megabytes. This would be neither fun to pass across the wire using HTTP, nor fun to hold in memory.
The fundamental problem here was that we put so much effort into finding aggregates, that we forgot to recognize an entity that was staring us in the face!
Remember, if an object has an identity that stays the same as the object changes, the object is an entity. Another way to put it is that an entity is identified by it’s Id, and not by it’s attributes. An entity should not exist as part of an aggregate, the root of which is another entity. The example given here is obvious (and not our real situation), but it turns out that this mistake is actually easy to make.
It is worth noting that this is a common mistake in Raven, but it’s a general mistake made with any document database.