Category: Raven Mistakes

RavenDB Mistake 3 – Giant Documents

Any initial training on RavenDB or any other document database will talk a lot about a concept in Domain Driven Design called aggregate. An aggregate is a set of domain objects that can all be contained within one other domain object, called the aggregate root. The standard modeling advice for a document database is to identify the aggregate roots in your domain model, and use those to create your documents.

Armed with this standard advice, we set out to find all the aggregates in our domain, and we made some mistakes. One of the first mistakes that we made was creating an aggregate out of what was really two entities, thereby creating a gigantic document.

To illustrate this, let’s use a standard order tracking system as an example. In an order tracking system, one probably has a set of customers, and a set of orders. Each order must be associated with a customer. Because an order cannot exist without a customer, we maybe think to model a document in a way that looks like this:

public class Customer
{
	public int Id {get;set;}
	public String Name {get;set;}
	public List<Order> Orders {get;set;}
}

public class Order
{
	public int Id {get; set;}
	public decimal Amount {get; set;}
	public List<Item> Items {get; set;}
}
public class Item
{
	public int Id {get; set;}
	public String Descritpion {get;set;}
	public decimal Price {get;set;}
}

This code does kind of makes sense. But it totally breaks down the minute we want to find an order by ID. First of all, Raven doesn’t include a way to generate an ID for a collection inside of a document, so to guarantee the uniqueness of the Id on the order, we’re forced to go to a GUID. This is just inconvenient from a usability standpoint. No customer wants to call in and have to rattle off a 32 character order id jut to get shipping status!
In addition, there is no way to get information about an order without querying an index on the customer document. Now instead of just calling _session.Load<Order>(Id) we’ve been forced to make an index on customer and query it just to grab something we already know the identity of!

In addition, our document now has the potential for boundless growth. A loyal customer’s document might end up having thousands of orders attached over the course of several years, and grow to be several megabytes. This would be neither fun to pass across the wire using HTTP, nor fun to hold in memory.

The fundamental problem here was that we put so much effort into finding aggregates, that we forgot to recognize an entity that was staring us in the face!

Remember, if an object has an identity that stays the same as the object changes, the object is an entity. Another way to put it is that an entity is identified by it’s Id, and not by it’s attributes. An entity should not exist as part of an aggregate, the root of which is another entity. The example given here is obvious (and not our real situation), but it turns out that this mistake is actually easy to make.

It is worth noting that this is a common mistake in Raven, but it’s a general mistake made with any document database.

Eventual Consistency – A Raven Specific Follow Up

After an awesome conversation with a RavenDB employee at Twin Cities Code Camp, I wanted to add a quick follow up to my eventual consistency blog post.

I illustrated the issue that eventual consistency can cause in certain user interfaces. Because I was trying to remain platform agnostic in my code camp talk, I didn’t point out a feature of RavenDB that helps a lot (and that we use a at work often enough).

                    RavenQueryStatistics stats;
                    var lists = session.Query<ToDoList>()
                        .Statistics(out stats)
                        .Where(x => x.Name == "Maggie")
                        .Customize(x => x.WaitForNonStaleResults(TimeSpan.FromSeconds(5)))
                         .ToList();

As you can see, it’s possible to ask raven to wait for non-stale results on a query.

This can solve the UI problem I illustrated. It was omitted from my talk yesterday because I was trying to avoid giving a pure Raven talk, but I think it belongs here :-).

I will give this a caveat, which is that I’ve seen some LONG index build times in Raven before. I don’t jump to this solution because the timeout puts us in a situation where no data may be returned. In the eventual consistency philosophy of ‘better stale than none’, I’d rather rework my UI to better handle eventual consistency.

RavenDB Common Mistake 2 – Not Respecting Eventual Consistency

I had an interesting issue come up at work last week. I got to have a discussion with my designer about how my persistence layer wouldn’t allow his design to work.

The scenario was pretty basic. We were making a task tracking system as part of a single page angular app. The UI looked something like this amazing rendition:

Tasks are ordered by date due, ascending. Not all tasks will appear at once, but clicking load more tasks will load another 10 tasks until there are no more. Hitting the add task brings up a dialog where a task is added. This is where our eventual consistency problem lies. Once that task is added, it should of course show up in the list. If it had a due date of say 11/15, then it wouldn’t show up on this screen, but the user would expect it to show up if the load more tasks button was pushed.

We have eventual consistency though. What that means is if the user adds the task, and then immediately pushes that ‘load more tasks’ button, the task may not yet be in the index, so it may not show up, even if it theoretically should.

So, the solution is easy right? Just append the task to the list from the client.

The problem is, where does it belong in the list? It might come in on the next click of the button, or it might not come in for three more clicks, depending on how many tasks are between 10/26 and 11/15. For all of the new loads until the task is either returned from the server, or added to the list from the client, we have to track on the client whether the task is in the list, and whether it SHOULD be in the list. Then we have to make the client side determination whether to add it.

This is a lot of business logic! I sent this one back to my designer to think about.

Many document database advocates downplay eventual consistency by saying things like ‘Of course your search results can be stale!’. You know what, that’s right, my search results CAN be stale, no big deal. However when I have a paged ordered list like this, staleness is a killer.

RavenDB Common Mistake 1 – Nested List Indexes

Suppose you were making an app that tracks mobile device usage in families. You might end up with an object model that looks something like this:

public class Family
    {
        public string Id { get; set; }
        public IList<Person> FamilyMembers { get; set; }
    }

    public class Person
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public IList<Device> Devices { get; set; }
    }

    public class Device
    {
        public string DeviceName { get; set; }
        public bool IsPhone { get; set; }
        public bool IsTablet { get; set; }
        public decimal UsageMinutes { get; set; }
    }

Suppose you wanted to query all people with a phone. You would probably write an index something like this to start:

public Person_Phone()
        {
            Map = families => from family in families
                              from person in family.FamilyMembers
                              from device in person.Devices
                              where device.IsPhone == true
                              select new {
                                  person.FirstName,
                                  person.LastName
                              }
        }

This is what the team at RavenDB dubs a fan-out index. It is an index that produces multiple entries for one document.
These indexes cause some issues inside RavenDB that can result in the server running itself out of memory. This problem has been mitigated in Raven 3 by preventing the server from generating more than a maximum number of results for an index. However, depending on your situation that might just result in your having *gasp* totally wrong data!

In this case, there isn’t a great way to eliminate the fan out. There will always be multiple people in one document. That said, you CAN drastically improve the performance of this operation. We had an index like this in our code at work, and we got the server to stop running out of resources and play nice by simply denormalizing some data within the document.
By changing the person class to the following:

public class Person
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public IList<Device> Devices { get; set; }
        public bool HasPhone { get; set; }
        public bool HasTablet { get; set; }
    }

And maintaining the denormalization of HasPhone and HasTablet, we are able to change our index definition to:

public Person_Phone()
        {
            Map = families => from family in families
                              from person in family.FamilyMembers
                              where person.HasPhone == true
                              select new {
                                  person.FirstName,
                                  person.LastName
                              }
        }

The removal of the single level of nesting is a game-changer for RavenDB and it allowed us to get our server back on the move!

The RavenDB User Story

I’m happy to say I’m back home after an excellent week at “That Conference” in Wisconsin Dells, and then a short trip to Chicago with my husband.

While I was at “That Conference” I hosted a small open spaces talk (basically a discussion group for those who aren’t familiar), about RavenDB. I met quite a few users of the technology, and it felt like everyone had the same story. It went like this:

When I first started using RavenDB I loved the .NET client. It was so easy to use. Then I put it in production and an indexing job consumed all the resources on my server. I was told I shouldn’t have made an index that way, but how was I supposed to know?

This story, and other similar “how could I have known that” moments seem to be a huge part of the RavenDB user experience. This causes some people to abandon the product, and others to be much less happy with it than they could be.

Personally, I think the real issue at hand is not a poor product. Instead, the issue is a young product that hasn’t had enough adoption to really have a set of best practices come up around it yet.

I can’t claim to be an expert in RavenDB – I’ve used it for about a year and a half, and after making quite a few mistakes with it, I can say that we are having success. That said, one thing I CAN do, is share my learning experiences so that the next person doesn’t have to go through the same thing. As such, I am starting a series of blog posts on places where I went wrong with RavenDB, and where others can avoid my mistakes.