Awesome Northwind

Whenever I give a talk at a meetup or teach at a course or just want to try a demo about a data driven technology, my biggest problem (and the biggest nag) is always to get some real(istic) sample data. Thankfully there are sample databases available on the internet that eerybody can download and use. Microsoft has a lot of them: you might have seen AdventureWorks (that's the latest one), but before that there was Northwind, and you might also have heard about Contoso.

I started out a very long time ago (I'm getting old fast), and when I did, the most popular of these databases was the Northwind-sample. So I got used to it; I know the structure and I also know the data. Don't get me wrong, I like new stuff (live on the bleeding edge), but for sample data, this doesn't seem that important. I mean, products are products and users are users, right?

So I was very said to see that Microsoft has all but abandoned Northwind in favor of AdventureWorks; it's getting harder to find a fully working, up-to-date init script. Well, not on my watch :) I decided to create and curate a collection of Northwind sample databases for different database engines. An awesome collection of Northwind databases. If you are interested in the scripts, go to the Github repo. Here you can read about the different ports.

Northwind for SQL Server

Not much to tell about this one, this is almost the same as the original database. I did add some minor modifications on my own, though.

Right at the beginning of the script, thereare two calls to the sp_dboption stored procedure. Well, this has been deprecated for a while in newer versions of SQL Server, so if you run the script, you get warnings. It's not a big deal, but who likes warnings? So I changed the call to the stored procedure to the corresponding ALTER DATABASE calls:

ALTER DATABASE Northwind SET RECOVERY SIMPLE
ALTER DATABASE Northwind SET RECOVERY BULK_LOGGED

The other thing that I changed was the structure of the joining tables (OrderDetail and EmployeeTerritory). These had composite keys composed of the two foreign keys that they contained. I hate composite keys and any keys that are naturally part of the record. There are a couple of reasons (data cannot be changed, indices become fragmented and you have to defragment ). So I added a single new identity field with auto-increment values to these tables and populated the values. Oh, and I also removed the whitespace from the "Order Details" table (I mean, come on).

Northwind for SQL Azure

Azure SQL doesn't support a lot of features of the on-premise SQL Server, so the script above cannot be used. I removed everything from the script corresponding to unavailable features or options, but it's essentially the same as the one for the on-premise server.

Northwind for SQLite

A lot of time I teach at companies where the IT is very... overwhelmed with other tasks, so they don't have time to setup SQL Server for the students. For the courses about SQL Server or Entity Framework, this can be problematic :)

But for WPF or ASP.NET courses where you just need a data source to display something on the pages SQLite can be a real life-saver (or an XML file, but that's very hard to handle when it comes to inserting, updating or modifying data, and of course a cloud-based service is not an option either because in these cases usually network traffic is also blocked :) ). So I created a port for SQLite as well. Of course some of the data types are not supported (like money, or datetime are stored as strings) and unfortunately SQLite has no concept of stored procedures, so those are gone, sorry. And the best thing: the whole database is just a file :)

Northwind for Azure CosmosDB

This is the one that I'm very proud of :) When you create an Azure SQL database, you have the option to populate the database with sample data (AdventureWorks of course), but you have no such option for CosmosDB. And this makes it very inconvenient to get started with this technology. So I decided to create a port for CosmosDB as well. Since this is not a relational database, I had to make a couple of changes, though.

Of course I could have just created one collection for each table and just migrate the data as json, but that would have been a cop out — not to mention expensive as hell. So first I compressed the schema to be a little more non-relational:

  • Shippers and suppliers have their own entity
  • Categories have their own entity, and products are embedded
  • Customers have their own entity, orders are embedded in customers and order details are embedded in orders, while order details reference products with a "foreign key"
  • Regions have their own entity and territories are embedded
  • EmployeeTerritories have their own entities and reference territories and employees with "foreign keys"

This makes it a little more NoSQL-like. But to place every entity to a different collection would still be very expensive. This is a common problem with CosmosDB and comes up very often. The solution that Microsoft recommends is to put as many "entities" as possible to the same collection and add an extra field that denotes the entity type. Since every field is indexed by default, you can efficiently query based on this field (or you can later even partition based on this). So everything goes into one collection :)

Stored procedures are also migrated (except for one, because I was lazy). Again, stored procedures are inherently different in CosmosDB, so I had to make a lot of changes (a lot of things are not supported, unfortunately). Views are not supported, so those are gone, sorry.

To populate the database, you can use the migration tool and the json files from the Github repo. To create the stored procedures, you can use the text file and copy the bodies of the stored procedures to the Azure portal, or use the C# project that I added to the repository to add each of them automatically.

So go ahead and check out the Github repo. If you have any ideas to improve these databases or have a new database engine to port the database to, feel free to create an issue or submit the PR.