Extending Linq2Objects GroupBy

Akos Nagy
Nov 15, 2017

GroupBy() is one of the most versatile and underrated Linq standard query operator there is. You probably know the "base" version, which takes in a single parameter in the form of an expression (actually, Func, but that's not the point):

public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector);

This gives an IEnumrable of ILookup; a list of lists, if you'd like. The outer list contains exactly as many elements as the number of different key values in the original dataset and every element in it (the inner lists) are lists (again, ILookup) that have a Key property that contains a possible key value from the dataset, and its elements (that is, the elements in these inner lists) are the ones from the original dataset that belong to this specific key value.

But this is not the only way GroupBy() can be used. There is another overload that takes in a transformer function which allows you to specify the elements in the inner lists. So instead of grouping the elements in the original dataset and then putting the elements of the original dataset into groups, in this case the grouping happens in the original dataset but into the resulting inner list the transformed elements are placed.

public static IEnumerable<IGrouping<TKey, TElement>> GroupBy<TSource, TKey, TElement>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TSource, TElement> elementSelector);

Another overload gives you even more control over the elements: instead of transforming the elements that are placed into the inner lists, you can actually transform the keys and their respective groups before they become ILookups.

public static IEnumerable<TResult> GroupBy<TSource, TKey, TResult>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TKey, IEnumerable<TSource>, TResult> resultSelector);

And of course, there is one that combines the elementSelector and resultSelector versions into one. This is all nice and shiny, but...

Extending GroupBy()

... but this uses the default equality semantics of the type of the key. If the key is int, that's fine, but if the key is something more complex, because you want to use it in the resultSelector part of one of the overloads, you are screwed. Of course there is another overload for each overload that takes in an implementation of IEqualityComparer, which takes over the responsiblity of performing equality functions.

The problem is that this might require quite a lot of boilerplate; create a new class, implement the interface (which is usually mostly just a property accessor), and then pass in an instance of this new class. Instead of this, how cool would it be to just pass in a func, and use that func for equality and hashcode calculation?

It would be very cool. At least I think so, so I made it happen.
First, I created a special implementation of the IEqualityComparer interface, that can store a func, and call it for equality checking and hashcode calculation:

public class FuncEqualityComparer<T, TValue> : IEqualityComparer<T>
    private readonly Func<T, TValue> extractorFunc;
    public FuncEqualityComparer(Func<T, TValue> extractorFunc)
        this.extractorFunc = extractorFunc;
    public bool Equals(T x, T y)
        return extractorFunc(x).Equals(extractorFunc(y));

    public int GetHashCode(T obj)
        return extractorFunc(obj).GetHashCode();

And then create new overloads for the GroupBy() methods that take a func as a parameter instead of the IEqualityComparer, wrap the func into and instance of the FuncEqualityComparer and finally call the overload that actually requires the IEqualityComparer:

public static class IEnumerableExtensions
  public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey, TExtractor>(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector, Func<TKey, TExtractor> extractorFunc)
    return source.GroupBy(keySelector, new FuncEqualityComparer<TKey, TExtractor>(extractorFunc));

And now, you can write something like this:

cars.GroupBy(c=>c.Owner, o=>o.Name);

Now this might not be that awsome, (because you can simply write c=>c.Owner.Name into the first parameter), but if you want to use the key in the resultSelector overload, it makes a huge difference if it is a just the Name or the entire object :)
And now you can use the Owner as the key, hence leaving you with the option of using that in the resultSelector overload, but simply specifying how to compare two Owners:

cars.GroupBy(c=>c.Owner, (owner,cars) => new { Owner = owner, NoOfCars=cars.Count() }, o=>o.Name);

I think that's nice :) Source code on Github, have fun ;)

Akos Nagy
Posted in .NET C#