Migrating MySQL to SQL Azure with model mismatch

We have home grown web ERP system which is operating since 2004. It is built upon the LAMP stack. The ERP started for internal use but after many years of usage and polishing we decided to offer it for free to several other companies outside our organization. It was a success. We decided to rebuild it from the ground up as a SaaS product on .NET/IIS/MVC/SqlAzure/Blobs/Tables/ServiceBus/etc – MS stack. In the new version we have corrected all the design flaws we knew of, made it modular, implemented all the desired features, etc. One of the problems we have to solve is to migrate all the companies from the old system to the new one which means all the data – operational data, users with passwords etc. One of the problems that appear is with the data mismatch – the new core domain entity hierarchy is changed a bit and the other entities are changed too. To say it another way – domain model/entities impedance mismatch. Transfer of files is another story left for another post.

We have to somehow (1) easily ‘cast’ our old model into the new model module by module (2) and synchronize it bi-directionally with the new database (3) all the time.

Options

As stated at one MSDN page “Azure SQL General Guidelines and Limitations” here we have 4 options.

During my research I found the following additional options for MySQL/SQL to SQL Azure data migration and to sum everything up here there are :

  1. SQL Server 2008 Integration Services (SSIS)
  2. The bulk copy utility (BCP.exe)
  3. System.Data.SqlClient.SqlBulkCopy class
  4. Scripts that use INSERT statements to load data into the database
  5. SSMA
  6. SQL Azure Data Sync
  7. Custom with Sync Framework
  8. Custom code with some libraries like .NET connector, Entity Framework 6 and WCF.

Considerations, Limitations, Requirements

I will enumerate all the considerations, limitations and requirements at hand starting with requirements :

Migration requirements :

  1. Old companies must be transferred from MySQL to SqlAzure.
  2. Old users must be transferred and consolidated – in the legacy system the clients had one user for each company. In the new system they share the same user over all companies and now we have separate db for users and other stuff.
  3. Some entities have new properties (tables have new columns) and some have lost properties (columns).
  4. Some entities form new hierarchy and it must be created during the migration – Splits, Merges, Parent-Child relations, etc.
  5. A report must be generated and e-mailed with a list of all migration errors.
  6. The migration must support two types of triggers – scheduled and manual.
  7. *Change to the Sql Azure db (for each company) must reflect to the old MySQL db.
  8. **Change to the old MySQL db (for each company) must reflect to Sql Azure db.
  9. Lower the maintenance/developer cost for the long run. Migrations are gradual/by module and do not happen in one jump or for one week.
  10. The interference between the migration solution and the actual system must be minimal.
  11. The migration solution must have the option to be turned off by configuration when it is not needed anymore.
  12. The solution must support the migration of all modules which are developed by different developers on different schedule.
  13. The work on the migration solution must be traceable.

* Nice-to-have but easy to implement

**Nice-to-have but requires more effort

** + * = bi-directional which is a “nice to have” too in this case and not a requirement.

SSMA

Pros :

  1. Ready, by MS, been out there for a long time – expect no bugs and most of the cases covered.
  2. Can map schemas.
  3. Has support for script files with witch you can make type mapping – can map entities with different properties; can split/merge entities and create the desired hierarchies.
  4. Good documentation and examples here and here.

Cons :

  1. Uses SQL scripts which raises the bar for the developers needed.
  2. We need to transfer users from old db to two new dbs as required. SSMA does not support such scenario.
  3. Maintenance burden for a small start-up company.
  4. Not testable from developers perspective.
  5. Not plain and simple to deploy and schedule.

SQL Azure Data Sync

Pros :

  1. Ready, by MS – expect no bugs and lots of scenarios covered.
  2. Bi-directional
  3. Configurable schedule
  4. Has everything needed out of the box except custom mappings or scripts.

Cons :

  1. Not (easlily) applicable in my case because it supports SQL-SQLAzure only and requires SQL 2008 with some additional SQL stuff. In theory it is possible to first make SSMA from MySQL to SQL and then use the bi-directional SQL Azure Data Sync but I see this as a maintenance burden and a source for many possible mishaps – you have to setup another server, maintain it, setup two additional tools and have someone to run them and maintain their scripts in sync. Not impossible but this solution does not provide “low maintenance cost”.  If there was an option for SQL Azure Data Sync To MySQL with entity mappings/scripts – more flexible sync rules – it would solve the problem.

Custom code with Sync Framework

Not even considered for now because of the apparent (by NHO) complexity level and no open source MySQL sync provider. Such solution looks to me like sweat and tears – Sql Azure Data Sync to MySQL. Throwing another .net framework in the solution will not bring maintenance costs down.

Custom code with some libraries like .NET connector and Entity Framework 6

Pros :

  1. MySQL connector for free
  2. With Entity Framework 6 we can generate Code First Model from database
  3. Only .net – LINQ to Entities – knowledge is required to make a script – no SQL.
  4. Simple to configure deploy and redeploy on Azure Worker Role.
  5. Everything is testable
  6. Every migration script is version controlled.
  7. Compile time checks – if you change a type in EF mapping you will be prompted right away by the compiler.

Cons :

  1. Need to write code for everything.

I will not elaborate on all other options because they are way out of scope.

Final decision

I decided to go for the last option – Custom code with .NET connector and EF 6 deployed on Azure Worker Role. Why?

  1. Because different developers work on and are responsible for different modules the data migration scripts can be assigned across the team.
  2. Because there is no plain SQL there is no need for any team member to know SQL – less context switch + less needed knowledge + compile time checks = lower maintenance cost.

Implementation

With the data tool that VS 2013 provides it was easy to create POCO object model from the MySQL db :

using System.Data.Common;
using System.Data.Entity;

namespace Migrator.Model.Users
{
    public partial class UsersModel : DbContext
    {
        public UsersModel()
          : base("name=UsersModel")
        {
        }

        public UsersModel(string connString)
            : base(connString)
        {

        }

        public UsersModel(DbConnection connection)
            : base(connection, true)
        {

        }

        public virtual DbSet<user> users { get; set; }
        public virtual DbSet<contractingpartyuser> contractingpartyusers { get; set; }

...
    }
}

Then start using the model :

foreach(var company in companies)
{
    using (UsersModel usersModel = new     UsersModel(connectionProvider.GetDbConnection(Company.DbName)))
    {
        var oldUsers = usersModel.users.Where(...)
        foreach(var oldUser is oldUsers)
        {    
            User u = new User(){
            p1 = oldUser.p1,
            ....
            }
        }
        userService.Insert(u);
// Here goes the script that transforms the data from this model to our new model. This operation is not as simple as it looks. 
    }
}

Too simple solution you may say. It looks simple but be sure do not overlook it. For the requirements and considerations – schema mismatch, single to multiple db, atomic transfer operations, many developers non of them on the same level, ALM, maintainability, source control, deployment, reporting of the data tranfer/translation/migration etc. –  this is the intersection point. There is a lot of work to be done for the scripts : bad data, corner cases, handling datetime because MySQL allows null datetime, datetime to datetime2, logging, report generation and more. This solution involves more code but gives you a lot of space for maneuvers and AFAIK there is nothing that can not be done this way.

This approach can be used to migrate data between systems, data centers, etc. Remember that DbContext implements UnitOfWork pattern and EF 6 has good transaction support.

Results

All of the DBs of the ERP were happily migrated.

Resources

  1. http://msdn.microsoft.com/en-us/library/azure/ee336245.aspx#dlaps
  2. http://msdn.microsoft.com/en-us/library/hh313129%28v=sql.110%29.aspx – SSMA
  3. http://msdn.microsoft.com/en-us/library/hh313108%28v=sql.110%29.aspx – SSMA Scripts
  4. http://blogs.msdn.com/b/ssma/archive/2011/02/07/mysql-to-sql-server-migration-how-to-use-ssma.aspx – SSMA Tutorial
  5. http://sqlxpertise.com/2011/06/13/migrating-mysql-databases-to-sql-azure-using-ssmapart-1/ – SSMA Tutorial
  6. http://sqlxpertise.com/tag/mysql-to-sql-azure-migration/– SSMA Full Tutorial
  7. http://azure.microsoft.com/en-us/documentation/articles/sql-database-get-started-sql-data-sync/ – SQL Azure Data Sync Walkthrough
  8. http://msdn.microsoft.com/en-us/sync/bb736753.aspx – Sync Framework
  9. http://dev.mysql.com/downloads/connector/net/ – .NET connector
  10. https://www.nuget.org/packages/MySQL.Data.Entities/ – Connector/Net is a fully-managed ADO.NET driver for MySQL. Entity Framework 6.0 support package
  11. http://dev.mysql.com/doc/connector-net/en/connector-net-entityframework60.html – Entity Framework 6 documentation
  12. http://forums.mysql.com/read.php?174,601264,601264 -Database First and Model First quick steps. Get the isntaller from the link below before trying these steps. Otherwise it is possible that you do not see the MySQL connection provider in the wizard.
  13. http://dev.mysql.com/tech-resources/articles/mysql-installer-for-windows.html – Use MySQL Installer in order to have MySQL data provider available when trying to add MySQL database from Visual Studio.
  14. http://dev.mysql.com/doc/refman/5.0/es/connector-net-examples.html – .Net Connector examples