Implementing the IFormatter interface

One of the courses I teach is the standard Microsoft 70-483 exam preparation course. For some reason, the exam still has questions about how the BinaryFormatter and SoapFormatter components work (I'm not even going to try to guess why), so the training material also discusses the related concepts in detail.

The course goes on and on about how these formatter-serializers work: discussing the usual attributes like SerializableAttribute and NonSerializedAttribute, then goes on to discuss the ISerializable interface, serializer surrogates and binders, the serialization-cycle handling attributes like OnSerializingAttribute, and finally IFormmatter, the base interface for all formatters. And then comes an example to implement the IFormatter interface for the ini-format — without having even the smallest regard to the concepts and algorithm discussed just 10 minutes before during the course. No checking for the attribute, no checking the ISerializable interface, no "lifecycle-methods", nothing. Just a basic implementation using FormatterServices. And this doesn't help the students understand the role of the discussed concepts at all. Not only that, but it actually confuses them by omitting these from the process.

So I decided to create a more comprehensive implementation to show how the related API works. The full code can be accessed as one of my gists. Here's a breakdown of how it works.

Implementing the IFormatter

Implementing the IFormatter at a high level is actually pretty easy: you need the binder, the streaming context, the surrogate selector, and a method to serialize and another to deserialize:

public class IniFormatter
{
    public ISurrogateSelector SurrogateSelector { get; set; }
    public StreamingContext Context { get; set; }
    public SerializationBinder Binder { get; set; }
    
    public void Serialize(Stream serializationStream, object graph)
    {
      throw new NotImplementedException();
    }

    public object Deserialize(Stream serializationStream)
    {
      throw new NotImplementedException();
    }
}

Of course, the tricky part is implementing the methods. I haven't had the time yet to implement the deserialization properly, but the serialization works pretty well.

Implementing serialization

So the basic algorithm to implement serialization is not that hard, but you have to account for all the possible APIs that can do the actual serialization.

  1. Check to see if there is a surrogate defined in the formatter for the type. If so, use that.
  2. Check to see if the object to be serialized is serializable and implements ISerializable. If so, call its own implementation through that.
  3. Check to see of the object to be serialized is serializable. If so, use FormatterServices.
  4. Don't forget to call the lifecycle methods.

I actually looked into the source code of BinaryFormatter to see how these different options are priorized. In the end, the high-level code looks something like this:

public void Serialize(Stream serializationStream, object graph)
{
  var objectType = graph.GetType();
  var serializationSurrogate = SurrogateSelector?.GetSurrogate(objectType, Context, out var _);
  if (serializationSurrogate != null)            
    SerializeWithSurrogate(serializationStream, graph, objectType, serializationSurrogate);            
    else if (graph is ISerializable serializable)
      SerializeAsISerializable(serializationStream, graph, objectType, serializable);
    else
      SerializeWithFormatterServices(serializationStream, graph, objectType);
    GetCallbackDelegate(objectType, typeof(OnSerializedAttribute))
                       ?.DynamicInvoke(graph,Context);
}

Serializing using a surrogate

So the first option that the algorithm checks is the surrogate-based serialization mechanism. This is actually very simple: you just create a SerializationInfo object and pass it to the surrogate that fills this object with values for the serialization.

private void SerializeWithSurrogate(Stream serializationStream, object graph, Type objectType, ISerializationSurrogate serializationSurrogate)
{
  var serializationInfo = new SerializationInfo(objectType, new FormatterConverter());
  serializationSurrogate.GetObjectData(graph, serializationInfo, Context);
  SerializeFromSerializationInfo(serializationStream, graph, serializationInfo);
}

When the info is ready, the serialization itself is pretty easy. First, you have to write the type's name to the stream. To do this, you use your serialization binder, whose job is to map the type to name of the assembly and the type:

private void WriteTypeName(Type objectType, StreamWriter sw)
{
  Binder.BindToName(objectType, out var assemblyName, out var typeName);
  sw.WriteLine(typeName);
  sw.WriteLine(assemblyName);
}

The binder itself in this case is a specially implemented binder for the ini formatter, implemented as a nested class:

public class IniTypeBinder : SerializationBinder
{
  public override Type BindToType(string assemblyName, string typeName) => Type.GetType(typeName.Split('=')[1]);
  
  public override void BindToName(Type serializedType, out string assemblyName, out string typeName)
  {
    assemblyName = $"{IniFormatter.AssemblyNameKey}={serializedType.Assembly.FullName}";
    typeName = $"{IniFormatter.ClassNameKey}={serializedType.AssemblyQualifiedName}";
  }
}

When you have this one, you can use this to write out the type info and the actual data from the object:

GetCallbackDelegate(graph.GetType(), typeof(OnSerializingAttribute))?.DynamicInvoke(graph, Context);
using (var sw = new StreamWriter(serializationStream))
{
  WriteTypeName(graph.GetType(), sw);
  foreach (var item in serializationInfo)
  {
   sw.WriteLine($"{item.Name}={item.Value.ToString()}");
  }
}

This is where one of the life-cycle methods should be called (discussed later in the code) and then the object is serialized by writing the type name and the values from the SerializationInfo object as key-value pairs. This is the part that you have to define yourself as part of your actual serialization protocol.

Serializing as an ISerializable implementation

If there are no surrogates but the type implements ISerializable, the algorithm is basically the same. The only difference is that instead of the surrogate, it is the actual object that fills the SerializationInfo:

private void SerializeAsISerializable(Stream serializationStream, object graph, Type objectType, ISerializable serializable)
{
  if (!objectType.IsSerializable)
    throw new SerializationException($"Type {objectType} is not serializable");

  var serializationInfo = new SerializationInfo(objectType, new FormatterConverter());
  serializable.GetObjectData(serializationInfo, Context);
  SerializeFromSerializationInfo(serializationStream, graph, serializationInfo);
}

Note that you do have to check if the type is serializable here as well (if you want to mimic BinaryFormatter).

Serializing using FormatterServices

If there is no implicit information available in either the formatter or the type about how to serialize the object, you have to extract it manually using the FormatterServices class. This has the power to look through the type and get all the serializable data members and their values.

private void SerializeWithFormatterServices(Stream serializationStream, object graph, Type objectType)
{
  if (!objectType.IsSerializable)
    throw new SerializationException($"Type {objectType} is not serializable");

  var members = FormatterServices.GetSerializableMembers(objectType, this.Context);
  var memberData = FormatterServices.GetObjectData(graph, members);
  GetCallbackDelegate(objectType, typeof(OnSerializingAttribute))?.DynamicInvoke(graph, Context);
  using (var sw = new StreamWriter(serializationStream))
  {
    WriteTypeName(objectType, sw);
    foreach (var m in members)
    {
      sw.WriteLine($"{m.Name}={m.ToString()}");
    }
  }
}

Calling the serialization lifecycle methods

It's important to know that during deserialization, most of the formatter implementations don't call the type's constructor, but instead use FormatterServices again to create the object. That's why it's extremely important to remember that the code you have in your constructor is not necessarily called and you have to manage your object's lifecycle in a different way. This is what the lifecycle methods can be used for. In the code snippets above you can see me calling this method with the typeof(OnSerialized) and typeof(OnSerializing) arguments, depending on whether the code runs before or after the serialization. The code itself to get the method is pretty simple, though you do have to check for the runtime rules of applying this attribute:

private Delegate GetCallbackDelegate(Type objectType, Type methodAttribute)
{
  var onSerializingMethod = objectType
            .GetMethods(BindingFlags.DeclaredOnly | 
                       BindingFlags.Instance | 
                       BindingFlags.NonPublic | 
                       BindingFlags.Public)
            .SingleOrDefault(m => m.GetCustomAttribute<OnSerializingAttribute>() != null);
  if (onSerializingMethod == null)
    return null;
  if (!(onSerializingMethod.ReturnType == typeof(void) && 
        onSerializingMethod.GetParameters().Length == 1 && 
        onSerializingMethod.GetParameters()[0].ParameterType == typeof(StreamingContext)) &&
        !onSerializingMethod.IsVirtual)
    throw new InvalidOperationException($"Method {onSerializingMethod.Name} found with {methodAttribute}, but method is not compliant with the requirements of this attribute");

  var funcType = typeof(Action<,>).MakeGenericType(objectType, typeof(StreamingContext));
  return Delegate.CreateDelegate(funcType, onSerializingMethod);
}

And that's the last of the serialization process.

Implementing deserialization

Unfortunately I didn't have the time to hammer out the details of the deserialization process. It should be basically the same as the serialization, only in backwards (as one of my old climbing trainers said: "If you want to get down from the wall, you use the same moves as upwards only backwards").

There are some differences of course. One key difference is that the SerializationBinder comes into play to decide which type to use for the object being deserialized. The other one is the inverse use of FormatterServices and creating an object without calling the constructor. And other minor details like there's no IDeserializable interface.

The most important part I do have figured out: using the FormatterServices to deserialize the object:

private object DeserializeWithFormatterServices(StreamReader serializationReader, Type objectType)
{
  var result = FormatterServices.GetUninitializedObject(objectType);
  var members = FormatterServices.GetSerializableMembers(objectType, this.Context);
  var serializationData = new Dictionary<string, object>();
  while (!serializationReader.EndOfStream)
  {
    var data = serializationReader.ReadLine();
    var splitData = data.Split('=');
    serializationData.Add(splitData[0], splitData[1]);
  }
  var correctedTypes = new List<object>(members.Length);
  for (int i = 0; i < members.Length; i++)
  {
    var f = (FieldInfo)members[i];
    correctedTypes.Add(Convert.ChangeType(serializationData[f.Name], f.FieldType));
  }
  FormatterServices.PopulateObjectMembers(result, members, correctedTypes.ToArray());
  return result;
}

Note how FormatterServices.GetUnInitializedObject() is used to create the object instead of a full reflection-based solution.

Terms of use :)

So this is it; I uploaded the code as a gist to download.

Please note that this code is far from complete and is not field-tested. It is only meant to be a teaching aid and nothing more. If you have the time and want to contribute, any ideas and comments are welcome. You can implement the missing parts of the deserialization, or improve the serialization.

If you want to use it as a teaching aid, go ahead and be my guest. Of course, reasonable attribution is always nice and comments about the experiences are welcome.

If you think a course with quality material like this might be something you, your team or collegues would be interested in, don't hesitate to contact me