Repository and unit of work ARE useful

The repository and unit of work design patterns are useful. This is a statement that I can get behind in any debate when this topic comes up. Don't worry, I'm not dogmatic, I have a couple of really good arguments (IMHO) as to why. But I'm very tired of having to reiterate it, so I decided I would share them with the public. And this also gives me a chance to distill my own thoughts about these patterns. I understand how this topic is controversial and can generate heated arguments, that's fine. If you feel like it, leave a comment below.

What are these patterns?

So first, let's look at what the repository pattern is. There is a lot of different literature on the subject; I think going with Martin Fowler is an acceptable choice by everyone :) So let's start of with the definition:

Mediates between the domain and data mapping layers using a collection-like interface for accessing domain objects.

I think it's quite clear, but there is an aspect of this definition that I would like to point out. Based on this definition, it is quite clear that this thing is not part of the business logic, but mediates between the the domain and data mapping layers. The domain layer is what I'll be calling business logic layer. So this means that anything that comes from your domain spec or business spec has absolutely no place in the repository. If you have products in the database and you have to display the ones that are low on stock, you cannot put this filtering into the repository, because this is a business requirement. Of course, if you have queries that do not come from business logic, you can put it here (I don't really see what queries can those be, but whatever).
Now bear in mind, that this defintion comes from a time when ORM libraries were not commonly used (or, in fact, existing) tools, so the next parts of the definition can be confusing. They are to me, and there really might be something there that could change my mind about the whole thing, but the last part is something that (to me, at least) is very clear and very important.

Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them, providing a more object-oriented view of the persistence layer. Repository also supports the objective of achieving a clean separation and one-way dependency between the domain and data mapping layers.

And this gives it away: a repository is a representation of the domain objects and the non-business operations that you can do with them. And it also sneaks in the most important, core benefit of repository: achieving one way dependendency between the domain and data layers. If you apply the Dependency Inversion principle right, this will yield so many benefits, I can even count (actually, I can, this is what this blog post will partially be about).

Now, onto the unit of work:

Maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems.

The definition of the repository didn't explicitly say it, but it is obviously a good idea to create separate repositories for separate domain object types (after all, Single responsibility, separation of concerns etc.). Again, obviously, a business process usually involves more than one type of buisness object, so we need something the coordinate the repositories and handle transactions. And that's the unit of work. The unit of work can be seen as a necessary compliment to the repository pattern.

Why do people think that they are not useful? Act 1

I have been part of many debates that were about the usefulness of these patterns. You can even find one on Stackoverflow, where I tried to defend my position. The argument usually goes like this:

Entity Framework (or subsitute your favorite ORM here) already encapsulates how queries are created and executed, so it's superflous to implement these patterns.

Let's disect this argument. Entity Framework really does implement these patterns, if you think about it. The IDbSet<T> is the repository interface, only in generic flavor. The DbContext is the unit of work, that handles transactions (SaveChanges()) and coordinates the repositories (i.e. the context has DbSet<T> properties). I don't know many other ORMs, but I'd assume the same reasoning can be applied to them as, after all, EVERY ORM IS LIKE THIS.

One problem with ORMs is that they are too big. Yes, they implement the repository and unit of work pattern, but they also serve as the Data Mapper. Let's go back to the definition of the repository a little bit again:

A system with a complex domain model often benefits from a layer, such as the one provided by Data Mapper (165), that isolates domain objects from details of the database access code. In such systems it can be worthwhile to build another layer of abstraction over the mapping layer where query construction code is concentrated.

Now when you look at the argument above, it is now obvious how it is false. It is not the resposiblity of the repository-uow patterns to implement query building, so saying that implementing these is superflous because EF can already do query building creates a false argumentation. It is true that EF also serves as a Data Mapper, so there is no need to implement a Data Mapper. And we usually don't :)

Why do people think they are not useful? Act 2

OK, so now that we have straightened out the logical errors in the argumentation, let's try to generalize it a bit. I mean, in the previous section I did declare that EF implements these patterns, so why would you implement it again? Probably because they are implemented wrong...

Again in the definition:

Repository also supports the objective of achieving a clean separation and one-way dependency between the domain and data mapping layers.

Why is this good? Because if you have a one-way dependency and a clean separation, then you can introduce the proper interfaces to further support dependency inversion. Depending on how far you can push this, you can arrive at the ideal state when the business logic does not depend on the actual implementation (i.e. Entity Framework) at all, but only on thse interfaces. After all, this is what clean separation means. As a final step, you can refactor your solution to use dependency injection and this makes it easier to mock the data access components when you have to unit test your business logic (because you do unit test your business logic, right?). Sounds simple enough. So let's see how does that work in practice. I will first refactor to use dependency injection and then add the interfaces to support proper dependency inversion.

You start off with something like this:

public class MyBusinessLogicComponent
{   
   public MyBusinessLogicComponent()
   {     
   }
   
   public IEnumerable<BusinessObject> DoBusiness()
   {
     using (var context = new ApplicationContext())
     {
        return context.BusinessObject.Where(b=>!b.Property).ToList();
     }
   }
}

Next, you introduce dependency injection (and hopefully a container to manage lifetime outside the business logic, but that's out of scope here — pun intended ):

public class MyBusinessLogicComponent
{   
   private readonly ApplicationContext context;
   public MyBusinessLogicComponent(ApplicationContext context)
   {     
     this.context = context;
   }
   
   public IEnumerable<BusinessObject> DoBusiness() => context.BusinessObjects.Where(b=>!b.Property).ToList();       
}

Next, you have to add the proper interfaces to the data access components. You start with your context (I'm assuming Code First, becase it's 2018):

public class ApplicationContext : DbContext
{
  public DbSet<BusinessObject> BusinessObjects { get; set; }
}

First, you implement your interface for the context:

public interface IApplicationContext
{
  DbSet<BusinessObject> BusinessObjects { get; }
  int SaveChanges();
  Task<int> SaveChangesAsync();
}

You add this to your original ApplicationContext and then inject it into the business logic:

public class MyBusinessLogicComponent
{   
   private readonly IApplicationContext context;
   public MyBusinessLogicComponent(IApplicationContext context)
   {     
     this.context = context;
   }
   
   public IEnumerable<BusinessObject> DoBusiness() => context.BusinessObjects.Where(b=>!b.Property).ToList();       
}

But this is not OK. The separation is not perfect, becase the interface contains a reference to the DbSet<T> type. Since this is an EF specific type, you cannot use this in your interface. You have to use a general .NET interface. You could try IQueryable<T>. This would change the interface and the implementation:

public class ApplicationContext : DbContext, IApplicationContext
{
  public IQueryable<BusinessObject> BusinessObjects { get; set; }
}

public interface IApplicationContext
{
  IQueryable<BusinessObject> BusinessObjects { get; }
  int SaveChanges();
  Task<int> SaveChangesAsync();
}

This seems OK, but there is a problem with it: EF only recognizes property types of IDbSet<T> as mapped tables. So you're screwed. You need the DbSet<T> for EF to work, but the IQueryable<T> for the interface to work. You have to use a dirty trick like explicit interface implementation to make this work:

public class ApplicationContext : DbContext, IApplicationContext
{
  public DbSet<BusinessObject> BusinessObjects { get; set; }
  IQueryable<BusinessObject> IApplicationContext.BusinessObjects => BusinessObjects;
  int IApplicationContext.SaveChanges() => this.SaveChanges();
  Task<int> IApplicationContext.SaveChangesAsync() => this.SaveChangesAsync();
}

(Actually, I have to look into how properties are mapped and see if I can somehow make EF recognize properties of other design-time types as well; but for now, I don't think this is possible).

And this works. Now the interface has no EF dependencies so you have separated EF completely from your business logic. But I have to be honest, I don't really like explicit interface implementation. It circumvents polymorphism and to be honest, I mostly see it in code only to support backward compatibility (like implementing the generic and non-generic version of the same interface).

So you have a dirty trick to make it work, but that's the only way you can achieve full separation. And since you don't own the code of DbSet<T> like you do that of your context, you cannot do anything about it. All you can do is wrap these components into new components whose source code you control, add interfaces yourself and then inject those. And that's why you need the Repository and Unit of work patterns.

Q.E.D. :)

Of course, first you have decide if you need full separation. But there are two benefits to full separation that I'd urge you to consider:

  • Clean code. I mean, this has no direct advantage, but if you are a craftsman, you should appreciate it.
  • Interchangable ORM. If you later on decide to change your ORM completely, you do not have to change anything in your business logic. This means you shouldn't re-test either. Of course, how often do you switch ORMs? Not often. Unless of course the ORM that you use has a new, lightweight, cross-platform, open-source and more efficient edition that you can use... :)