RavenDB Mistake 3 – Giant Documents

Any initial training on RavenDB or any other document database will talk a lot about a concept in Domain Driven Design called aggregate. An aggregate is a set of domain objects that can all be contained within one other domain object, called the aggregate root. The standard modeling advice for a document database is to identify the aggregate roots in your domain model, and use those to create your documents.

Armed with this standard advice, we set out to find all the aggregates in our domain, and we made some mistakes. One of the first mistakes that we made was creating an aggregate out of what was really two entities, thereby creating a gigantic document.

To illustrate this, let’s use a standard order tracking system as an example. In an order tracking system, one probably has a set of customers, and a set of orders. Each order must be associated with a customer. Because an order cannot exist without a customer, we maybe think to model a document in a way that looks like this:

public class Customer
{
	public int Id {get;set;}
	public String Name {get;set;}
	public List<Order> Orders {get;set;}
}

public class Order
{
	public int Id {get; set;}
	public decimal Amount {get; set;}
	public List<Item> Items {get; set;}
}
public class Item
{
	public int Id {get; set;}
	public String Descritpion {get;set;}
	public decimal Price {get;set;}
}

This code does kind of makes sense. But it totally breaks down the minute we want to find an order by ID. First of all, Raven doesn’t include a way to generate an ID for a collection inside of a document, so to guarantee the uniqueness of the Id on the order, we’re forced to go to a GUID. This is just inconvenient from a usability standpoint. No customer wants to call in and have to rattle off a 32 character order id jut to get shipping status!
In addition, there is no way to get information about an order without querying an index on the customer document. Now instead of just calling _session.Load<Order>(Id) we’ve been forced to make an index on customer and query it just to grab something we already know the identity of!

In addition, our document now has the potential for boundless growth. A loyal customer’s document might end up having thousands of orders attached over the course of several years, and grow to be several megabytes. This would be neither fun to pass across the wire using HTTP, nor fun to hold in memory.

The fundamental problem here was that we put so much effort into finding aggregates, that we forgot to recognize an entity that was staring us in the face!

Remember, if an object has an identity that stays the same as the object changes, the object is an entity. Another way to put it is that an entity is identified by it’s Id, and not by it’s attributes. An entity should not exist as part of an aggregate, the root of which is another entity. The example given here is obvious (and not our real situation), but it turns out that this mistake is actually easy to make.

It is worth noting that this is a common mistake in Raven, but it’s a general mistake made with any document database.

Eventual Consistency – A Raven Specific Follow Up

After an awesome conversation with a RavenDB employee at Twin Cities Code Camp, I wanted to add a quick follow up to my eventual consistency blog post.

I illustrated the issue that eventual consistency can cause in certain user interfaces. Because I was trying to remain platform agnostic in my code camp talk, I didn’t point out a feature of RavenDB that helps a lot (and that we use a at work often enough).

                    RavenQueryStatistics stats;
                    var lists = session.Query<ToDoList>()
                        .Statistics(out stats)
                        .Where(x => x.Name == "Maggie")
                        .Customize(x => x.WaitForNonStaleResults(TimeSpan.FromSeconds(5)))
                         .ToList();

As you can see, it’s possible to ask raven to wait for non-stale results on a query.

This can solve the UI problem I illustrated. It was omitted from my talk yesterday because I was trying to avoid giving a pure Raven talk, but I think it belongs here :-).

I will give this a caveat, which is that I’ve seen some LONG index build times in Raven before. I don’t jump to this solution because the timeout puts us in a situation where no data may be returned. In the eventual consistency philosophy of ‘better stale than none’, I’d rather rework my UI to better handle eventual consistency.

Twin Cities Code Camp 19

Thanks to anybody who came to see me talk at Twin Cities Code Camp 19 today.

Document databases talk slides:

Got documents? from Maggie Pint

It Depends – Database Administration:

It Depends from Maggie Pint

RavenDB Common Mistake 2 – Not Respecting Eventual Consistency

I had an interesting issue come up at work last week. I got to have a discussion with my designer about how my persistence layer wouldn’t allow his design to work.

The scenario was pretty basic. We were making a task tracking system as part of a single page angular app. The UI looked something like this amazing rendition:

Tasks are ordered by date due, ascending. Not all tasks will appear at once, but clicking load more tasks will load another 10 tasks until there are no more. Hitting the add task brings up a dialog where a task is added. This is where our eventual consistency problem lies. Once that task is added, it should of course show up in the list. If it had a due date of say 11/15, then it wouldn’t show up on this screen, but the user would expect it to show up if the load more tasks button was pushed.

We have eventual consistency though. What that means is if the user adds the task, and then immediately pushes that ‘load more tasks’ button, the task may not yet be in the index, so it may not show up, even if it theoretically should.

So, the solution is easy right? Just append the task to the list from the client.

The problem is, where does it belong in the list? It might come in on the next click of the button, or it might not come in for three more clicks, depending on how many tasks are between 10/26 and 11/15. For all of the new loads until the task is either returned from the server, or added to the list from the client, we have to track on the client whether the task is in the list, and whether it SHOULD be in the list. Then we have to make the client side determination whether to add it.

This is a lot of business logic! I sent this one back to my designer to think about.

Many document database advocates downplay eventual consistency by saying things like ‘Of course your search results can be stale!’. You know what, that’s right, my search results CAN be stale, no big deal. However when I have a paged ordered list like this, staleness is a killer.

MDC 2015

It was awesome to see everyone for my talk at MDC 2015 today. The slides from the talk can be found on slideshare:

It Depends from Maggie Pint

Thanks everyone for coming!

Maggie

Angular Translate Custom Loaders

Breaking from my RavenDB theme, I wanted to touch on an issue I encountered with Angular Translate today.

For those who aren’t familiar, Angular Translate is a library for AngularJS that is used to change the text on your website to the user’s language of choice. It’s a library that I’m generally quite happy with, and that has worked well for our app.

Our app has an unusual requirement that almost all text in the app be customizable on a per-customer basis, along with being translatable. Because of this, we are compelled to keep all text in a database, and load it into the app via HTTP. Our app is single-instance, so this can only be done after users authenticate.

Angular translate uses a provider called a ‘loader’ to get translation data into the app asynchronously. There are several pre-made loaders that can be used to get data from a specific URL, an API endpoint, or a file on the file system.

Because our data needed to use a HTTP request, we first tried using the angular translate partial loader, which is designed to make http requests to store parts of a translation set. However, we encountered an interesting issue. When we called $translate.refresh() to clear all the translations and fetch the new ones for the next authenticated user, the translations did not change. Why? As it turns out, the translations were cleared from the core angular translate tables as expected, but they were not similarly cleared from the translate partial loader. They were still cached in that provider, waiting to be used.

The solution? Our own custom loader for Angular Translate. As it turns out, this is very easy to do.

Angular translate expects that any custom loader simply needs to be a factory which returns a function that expects one options parameter. Because anything can be injected into that factory, we were able to concentrate all of our loading logic into one factory that:

Returns the customer specific translations if the user is authenticated
Returns the generic translation set from a configuration in angular if the user is not authenticated
Clears all data and starts over when a call is made to $translate.refresh()

    function translationLoader($http, $q, staticTranslations, appContext){

        return function(options){
            var deferred = $q.defer(),
                translations;

            if(!appContext.apiBasePath)
            {
                translations = staticTranslations[options.key];
                deferred.resolve(translations);
            }
            else
            {
                $http.get(appContext.apiBasePath + '/translations/' + options.key + '/labels').then(function(res){
                    translations = res.data;
                    deferred.resolve(translations);
                });
            }

            return deferred.promise;

        };

    }

I will have to extend this if I wish to load partial translation sets instead of all of the app’s translations at the same time, but given the openness of this interface, that shouldn’t be hard to do.

Moral of the story: sometimes things don’t work as expected, and when that happens, great extensibility points in the code base make up for it.

RavenDB Common Mistake 1 – Nested List Indexes

Suppose you were making an app that tracks mobile device usage in families. You might end up with an object model that looks something like this:

public class Family
    {
        public string Id { get; set; }
        public IList<Person> FamilyMembers { get; set; }
    }

    public class Person
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public IList<Device> Devices { get; set; }
    }

    public class Device
    {
        public string DeviceName { get; set; }
        public bool IsPhone { get; set; }
        public bool IsTablet { get; set; }
        public decimal UsageMinutes { get; set; }
    }

Suppose you wanted to query all people with a phone. You would probably write an index something like this to start:

public Person_Phone()
        {
            Map = families => from family in families
                              from person in family.FamilyMembers
                              from device in person.Devices
                              where device.IsPhone == true
                              select new {
                                  person.FirstName,
                                  person.LastName
                              }
        }

This is what the team at RavenDB dubs a fan-out index. It is an index that produces multiple entries for one document.
These indexes cause some issues inside RavenDB that can result in the server running itself out of memory. This problem has been mitigated in Raven 3 by preventing the server from generating more than a maximum number of results for an index. However, depending on your situation that might just result in your having *gasp* totally wrong data!

In this case, there isn’t a great way to eliminate the fan out. There will always be multiple people in one document. That said, you CAN drastically improve the performance of this operation. We had an index like this in our code at work, and we got the server to stop running out of resources and play nice by simply denormalizing some data within the document.
By changing the person class to the following:

public class Person
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public IList<Device> Devices { get; set; }
        public bool HasPhone { get; set; }
        public bool HasTablet { get; set; }
    }

And maintaining the denormalization of HasPhone and HasTablet, we are able to change our index definition to:

public Person_Phone()
        {
            Map = families => from family in families
                              from person in family.FamilyMembers
                              where person.HasPhone == true
                              select new {
                                  person.FirstName,
                                  person.LastName
                              }
        }

The removal of the single level of nesting is a game-changer for RavenDB and it allowed us to get our server back on the move!

The RavenDB User Story

I’m happy to say I’m back home after an excellent week at “That Conference” in Wisconsin Dells, and then a short trip to Chicago with my husband.

While I was at “That Conference” I hosted a small open spaces talk (basically a discussion group for those who aren’t familiar), about RavenDB. I met quite a few users of the technology, and it felt like everyone had the same story. It went like this:

When I first started using RavenDB I loved the .NET client. It was so easy to use. Then I put it in production and an indexing job consumed all the resources on my server. I was told I shouldn’t have made an index that way, but how was I supposed to know?

This story, and other similar “how could I have known that” moments seem to be a huge part of the RavenDB user experience. This causes some people to abandon the product, and others to be much less happy with it than they could be.

Personally, I think the real issue at hand is not a poor product. Instead, the issue is a young product that hasn’t had enough adoption to really have a set of best practices come up around it yet.

I can’t claim to be an expert in RavenDB – I’ve used it for about a year and a half, and after making quite a few mistakes with it, I can say that we are having success. That said, one thing I CAN do, is share my learning experiences so that the next person doesn’t have to go through the same thing. As such, I am starting a series of blog posts on places where I went wrong with RavenDB, and where others can avoid my mistakes.