Generate index for TPH discriminator with Entity Framework Code First

When it comes to mapping inheritance hierarchies to database tables using an ORM, there are a lot of different options. Though it definitely has drawbacks, one of the most popular method is 'table-per-hierarhy' (TPH). TPH maps all the entities in the inheritance hierarchy to one table that contains all the properties of all the classes as columns, with an additional column whose purpose is to identify the actual entity type in the record. When only entities of a specific type are queried from the database (e.g. using OfType<T>()), the query translates to an additional filter on this discrimnator column.

As the discriminator plays a very important role when querying entities, it would make sense to have an index automatically built on it. Even though TPH is the default for Entity Framework and the only one supported by Entity Framework Core, neither of them adds this index. DB-savvy programmers usually add the index manually — defeating the whole purpose of having an ORM. How cool would it be to have this automatically generated, right?

From a high-level perspective, this is not a big deal. You have to do two things: find the discriminator column and then create custom migration sql generator. The second is not a big a deal, I have written custom SQL generators for other tasks before (like this or this). But getting the discriminator is not as easy as it might sound at first glance...

Getting the discriminator columns

To get the discriminator column, you have to dive deep into the model. Since the discriminator might not even be part of your model (usually it isn't, after all, that would be a violation of every basic OO principle there is), you have to look for it in the store schema definition layer and move from that layer upwards to connect it to a mapping. This little piece of code gets all the entity sets from the conceptual schema definition layer and all the types from the store schema defintion layer to start with.

var metadata = ((IObjectContextAdapter)context).ObjectContext.MetadataWorkspace;
var entitySets = metadata.GetItems<EntityContainer>(DataSpace.CSpace).Single().EntitySets;
var entityTypes = metadata.GetItems<EntityType>(DataSpace.SSpace);

Next, you have to iterate the entityTypes from the SSpace (those represent the actual tables, after all) and get the corresponding item from the CSpace:

var discriminators = new List<(string, string)>(entityTypes.Count);
foreach (var entitySet in entitySets)
{
   var mapping = metadata.GetItems<EntityContainerMapping(DataSpace.CSSpace).Single().EntitySetMappings.Single(s => s.EntitySet == entitySet);
   // ...
}

The next part is the key: if the mapping contains mappings for more than one entity, then this is potentially a hierachical mapping. Not necessarily, though (think of table splitting or other inheritance hierarchies, like TPC), but luckily we can check it easily if this is a hierachy mapping with a flag:

if (mapping.EntityTypeMappings.Count > 0)
{
  var hierarchyMapping = mapping.EntityTypeMappings.SingleOrDefault(etm => etm.IsHierarchyMapping);
  if (hierarchyMapping == null)
    continue;

If this is indeed a hierarchy mapping, we have to check for the mapping condition. If all the mappings are conditional on the same column having a different value, this is indeed a TPH mapping and that column is the discriminator:

var conditions = mapping.EntityTypeMappings.SelectMany(etm => etm.Fragments.Single().Conditions).OfType<ValueConditionMapping>().ToList();
if (conditions.Select(cc => cc.Column).Distinct().Count() > 1)
{
  Debug.WriteLine($"{mapping.EntitySet.Name} has multiple mappings one of them being a hierachy mapping, but the fragments' conditions refer more than one distinct edm property");
  continue;
}

if (conditions.Select(cc => cc.Column).Distinct().Count() < 1)
{
  Debug.WriteLine($"{mapping.EntitySet.Name} has a hierachy mapping, but none of the fragments' conditions are ValueConditionMappings");
  continue;
}

And finally, you need the table's name that can be queried from any of the entity mappings of the mapping:

var table = hierarchyMapping.Fragments.Single().StoreEntitySet;
string tableName = $"{table.Schema}.{(string)table.MetadataProperties["Table"].Value ?? table.Name}";
discriminators.Add((tableName, conditions.First().Column.Name));

Not sure how precise this solution is, but I've tested it with the default and custom-specified discriminators and it worked for both cases. Also tried to fool it with TPC, TPT, entity splitting and table splitting, but it was immune to my machinations, so I guess it works :) Of course, it's not perfect. Yet.

Drawbacks

There are drawbacks to this solution. One of them is that if you have any database initializer set up, accessing the object context will run the initializer. That's not ideal, but it's not a big deal, because the whole thing was designed to help me create a migration generator, and you can just disable the initializer.

The one thing that bothers me is that unfortunately, even if you disable the initializer this does open a connection to the database to do part of the preflight analysis of the database. Specifically, it opens a connection and queries the version of the database engine, so you have to be aware of this connection and remember to close the context, if you use it ahead-of-time in production. For a migration generator, this might be fine, since it's only part of the development, but still, it bothered me.

If you know the version of the database (and again, at design time you usually do and have a minimum version set up in the application spec) you can disable this part of the preflight check, by creating your own version of the service that queries the database version and return it manually:

public class SqlManualManifestTokenResolver : IManifestTokenResolver
{
  public string ResolvedVersion { get; }
  public SqlManualManifestTokenResolver(string version)
  {
    this.ResolvedVersion = version;
  }
  public string ResolveManifestToken(DbConnection connection) => ResolvedVersion;
}

And then apply it manually to your configuration:

public class TestContextConfiguration : DbConfiguration
  {
    public TestContextConfiguration()
    {
      SetDatabaseInitializer<TestContext>(null);
      SetManifestTokenResolver(new SqlManualManifestTokenResolver("2012"));
    }
}

If you are unsure of the manifest token that you have to specify, simply create a DefaultManifestTokenResolver in your own interface implementation and delegate the method call to it. Put a breakpoint in your method and check the value ;)

So now that we have this also disabled (again, might not bother you, but try to close the connection every time), the last thing to implement is the custom migration sql generator.

Creating a custom migration sql generator

This is fairly simple: we need the discriminators and if the column if a discriminator, we need to add an index to the script:

public class IndexedDiscriminatorGenerator : SqlServerMigrationSqlGenerator
{
  private List<(string tableName, string discriminatorColumName)> discriminatorList;

  public IndexedDiscriminatorGenerator(List<(string tableName, string discriminatorColumName)> discriminatorList)
  {
    this.discriminatorList = discriminatorList;
  }

  protected override void Generate(CreateTableOperation createTableOperation)
  {
    base.Generate(createTableOperation);            
    if (discriminatorList.Any(l => l.tableName == createTableOperation.Name))
    {
      CreateDiscriminatorIndex(discriminatorList.Single(l => l.tableName == createTableOperation.Name));
     }
  }

  private void CreateDiscriminatorIndex((string tableName, string columnName) discriminatorInfo)
  {
    using (var writer = Writer())
    {
      var str = 
               $"IF NOT EXISTS (SELECT 1 FROM sys.indexes WHERE name = N'IX_Discriminator' AND object_id = OBJECT_ID(N'{discriminatorInfo.tableName}')) "+
               $"EXECUTE('CREATE  NONCLUSTERED INDEX [IX_Discriminator] ON {discriminatorInfo.tableName} ({discriminatorInfo.tableName})')";
      writer.WriteLine(str);
      Statement(writer);
    }
  }
}   

And last but not least, don't forget to specify this as the migration generator (and don't forget to close the context for good measure either) in the constructor of your migration configuration class:

public Configuration()
{
  AutomaticMigrationsEnabled = false;
  using (var context = new TestContext())
  {
    this.SetSqlGenerator("System.Data.SqlClient", new IndexedDiscriminatorGenerator(context.GetDiscriminators()));
  }
}

So there you have it. I took me a lot of time to figure this one out, but I think it's awesome. Go ahead and check out the whole solution with a sample context on Github.