Adding custom parts to an xlsx workbook

Akos Nagy
Feb 2, 2017

A while back I came across an interesting task. Basically there was an xml document which we used in one of our applications. It was a special resource file: the structure was maintained by us, but the contents were maintained by the customer.

One day we decided that it would be much easier for our user if they could edit an Excel-workbook instead of a clumsy xml. The problem was that if we converted the xml to xlsx, there was no way to convert the modified xlsx back to xml — it was kinda like lossy compression.

So there I was with an impossible task. The usual approach could be the put the xml next to xlsx, the customer can modify the xlsx and then use the modified xlsx and the original xml for the conversion. Can work, but the customer has to take care of two files instead of one — and they are gonna lose it or it is gonna get corrupted. And it is a clumsy solution. Another thing that could have worked if I somehow embedded the values that were lost during conversion into the xlsx document somehow (maybe some hidden columns or something). Still risky and clumsy. What I needed was a solution that was totally transparent to the customer.

Custom parts to the rescue

Then it dawned on me: the xlsx format is just a zip file! So theoretically I should be able to add my original xml into the xlsx after conversion, just like I can add an item to a zip archive.

The idea was good, but the implementation was not straightforward.

Adding an item to xlsx manually

So I renamed the xlsx extension to zip, added a file to the root of the archive, then rezipped the whole thing and renamed it back to xlsx.
This approach was not working unfortunately: if you open the xlsx in Excel, edit it and then save it, the custom content is lost. There was time to dig in to the xlsx specs.

The Open Office XML File Formats

So just in case you are interested, here are the specs:
It was way to long to go over in detail for me then (and I haven't since), but the gist of it is that every Open Office format (not just xlsx, but like docx or pptx) is a collection of parts - a part is just that, a part of the whole document. To make up the document itself, relations are defined between the parts. And there you have it, you have the xlsx file.
After some digging it turned out that .NET knows about this mechanism and actually has a way to add custom parts to xlsx packages. The code is fairly simple (you need to add a reference to WindowsBase):

string excelFile = "sample.xlsx";
string xmlFile = "sample.xml";
XmlDocument partContent = new XmlDocument();

using (Package package = Package.Open(excelFile, FileMode.Open, FileAccess.ReadWrite))
  Uri uriPartTarget = new Uri("/customXml/item1.xml", UriKind.Relative);
  PackagePart customXml = package.CreatePart(uriPartTarget, "application/vnd.openxmlformats-officedocument.customXmlProperties+xml");
  using (Stream partStream = customXml.GetStream(FileMode.Create, FileAccess.ReadWrite))

Now unfortunately, this did not work for me either. I might have done something wrong, but I couldn't get this to work. The custom part was added to the xlsx, but after editing the document in Excel and saving changes, the custom content still went away.
Again some digging, and it turns out that you can do this via Excel interop. Again, the code is fairly simple:

string excelFile = @"..\sample.xlsx";
string xmlFile = @"sample.xml";

var xlApp = new Application();
Workbook wb = xlApp.Workbooks.Open(excelFile);
wb.CustomXMLParts.Add(File.ReadAllText(xmlFile), Type.Missing);

There is one big difference between compared to the other one: this actually worked as expected...

Finding the best solution

At this point I was pretty frustrated and I was totally dedicated to give the best possible solution. Excel interop is NEVER the best possible solution: first off, you need Excel for that to work. This was supposed to be included in our automated build process, and our build servers have no Excel (and why should they?). And then there's the problem with the excessive amount of different DLLs that you can choose from (and let's be honest, none of us have any clue what's the difference between them). And finally, the whole thing is based on dynamic types and a couple other, not-so-neat tricks.
Determined not to give up, I decided to compare the contents of the xlsx file that were outputted by the two solutions. Obviously, one of them was working and the other one was not — so all I had to do was look at both of the xlsx files as zip packages and compare the contents.
The differences that I have found:

  • My solution put the custom part into the document, and added a reference to it into the [Content_Types.xml] file. The interop version put two custom parts into the document, only to one of which was a reference in the [Content_Types.xml] file.
  • There was this whole other part, which I had to create and add to the document.
  • There was a relationship between the workbook part and this custom part.

And there it is: all I had to do was to add another custom part and add the proper relationships. Sadly, if you add a custom part, it is automatically added into the [Content_Types.xml] file, so I had to do some manual tweaking as the last step.

And there you have it: this adds the custom xml document to the package as the custom xml part, and it is kept after editing and saving the workbook in Excel. And no Excel-interop is used. The only problem that still remains is that unfortunately, whatever I do, the custom part is renamed to item1 after saving it from inside Excel.
Now please be advised that I have absolutely no idea what's going on and why is this working :) If I'm doing something wrong or someone can offer some actual insight about what's happening, please do so in the comments.

You can check out the source code at Github.

Akos Nagy
Posted in .NET C#