Refinery: SQL Migration Toolkit for Rust

sly010 · on Dec 10, 2019

Can I just say I have a problem with these kind of SQL migration tools. For efficiency and flexibility we write out schemas in SQL and these migration tools would have us stagger our schema across a bunch of files and would make it very had to look at the current state of things, especially after a few years of evolution. I generally prefer to write my schema as-it-should-be instead of writing diffs. I know this is the industry standard, I am not picking on refinery, this is a general complaint. Please don't say "use an ORM".

kbenson · on Dec 10, 2019

I was always under the impression that you would both have diffs, and also a full schema dump. I can't imagine you're expected to start from scratch and apply every diff in sequence every time you want to spin up a test box for the DB, so it makes sense to have a full dump done and saved after every diff. Occasionally, you might want to actually apply all the migration stuff from scratch to see if someone screwed up, but I see no reason you wouldn't also always have a full dump of what's the expected current schema.

samatman · on Dec 10, 2019

I have worked in a Django shop where spinning up a new instance involved starting with the initial version of the database schema and applying all the migrations.

Don't know if that's standard practice, but I don't have any reason to think that it isn't.

shantly · on Dec 10, 2019

It's a good way to be sure that the migrations lead to the same state as whatever you consider "current". Otherwise you risk diverging without noticing. Though you could probably safely sum up older migrations periodically by running them then grabbing the CREATEs and such for the resulting DB, after proving the output's identical and once you're very certain you have no running copies of code older than that point in the wild anymore.

djrobstep · on Dec 11, 2019

I find this sort of practice absolutely bizarre.

A bit like flying a plane to your destination with a compass and stopwatch rather than simply looking out the window or at the GPS.

Migrations should be checked against the current production schema directly, not by retracing historical steps.

There are two states you care about: current production, and desired production.

There is one migration you care about: the one between production and desired.

Involving anything else is pointless complexity for no reason.

zerd · on Dec 11, 2019

I disagree. If you have test environments only tracking "current" and "desired" doesn't hold up, then you would need different diffs for each environment. Making sure they all apply the same migrations in the same order is important to avoid drift and ensuring consistency. Also an issue if you have multiple production databases, e.g. sharding. When you have lots of migrations and you're sure all environments are up to date to a certain point you can "squash" the initial ones to make it apply faster.

djrobstep · on Dec 11, 2019

I don't see the problem. Each branch has its own desired state as necessary. A different database version is going to need a different migration script, no matter what your approach. Only difference is whether you generate those scripts by direct comparison with prod directly or with a crude reconstruction of it.

frenchyatwork · on Dec 10, 2019

Having worked with Django a bunch, I'd recommend squashing your migrations once you have a lot of them. It will speed up the time it takes to make a new database, including testing time if you're testing your database.

kbenson · on Dec 10, 2019

> I don't have any reason to think that it isn't.

Well, one reason it might not be is that it's somewhat equivalent to checking out the initial version of the code, and applying all diffs to get to the current version every time you want to see the latest commit/HEAD.

On one hand, it's good to know, on the other hand, other than the last few steps needed to roll back a change, I would argue it's much more important to make sure that the current full schema represented in dev/testing (including static/dictionary tables) is as accurate as possible to the current live schema when the production branch is tested, and if you're already doing a schema dump and diff, you might as well be saving that as well.

mjw1007 · on Dec 10, 2019

I recommend the following broad approach:

- Maintain your current SQL schema in one place, using normal version control.

- Use that form to set up new development versions and so on.

- Also maintain migrations (maybe using a migration tool, maybe just write them by hand).

- Make your system for applying migrations guarantee that the result of applying the migration to the previous schema exactly matches the new schema.

evanelias · on Dec 10, 2019

When talking strictly about DDL (managing schemas, not row data migrations) the third bullet doesn't really provide much benefit, IMO. As long as your tooling knows how to properly transition from any live schema state to the new desired state, the actual migration / specific ALTER statement is basically unnecessary.

Facebook has been using the declarative approach (repo of CREATE statements, pure SQL) company-wide for nearly a decade and it works extremely well. Conceptually, this approach allows you to treat database schemas just like code: request a schema change by pull request; CI validates and sanity-checks; review the PR within your own team just like code; merge to kick off the change using a CD pipeline.

I liked this approach so much that I started an open source project in early 2016 to help enable this workflow for MySQL and MariaDB, https://www.skeema.io. It has a number of validations and safety checks built in, including the one described in your 4th bullet.

mjw1007 · on Dec 10, 2019

If I had tooling which I trusted to be able to transition from any schema state I might want to use to any other such state, I would happily use it.

I've never found any, though it's a few years since I last looked. (I use Postgres, which has a fairly rich set of possible schema objects, and adds new bits frequently; I'd be reluctant to rely on a project that wasn't actively maintained and starting from 100% support).

The principle here is that in this case (as often), it's much easier to teach a computer to check that you've done manual work correctly than it is to teach the computer to solve the general problem.

This has been working well for me for many years, but I'm typically only dealing with a few dozen schema changes (in production) a year; I can imagine that Facebook sees the world just slightly differently.

0xffff2 · on Dec 10, 2019

Any advice on how to implement that last bullet point? I've got an experimental project that currently does the first two things. I'm getting close to a point where it actually needs to be deployed and I have to figure out the last two. Keeping both a complete up-to-date schema and a set of migrations seems like a good idea in theory, but I don't have any idea how I would automate a step to make sure they stay in sync in my CI workflow.

mjw1007 · on Dec 10, 2019

Something like this simpleminded approach:

Use VCS tags to identify schema versions, and a naming convention including those tags to track the migration files.

Have a script that validates a migration in the obvious sort of way (check out the project that contains the schema at tag a; build it to a scratch database; add some data; apply the migration; dump the schema; compare that against a dump made directly from tag b).

Maintain a store of migrations that have been validated in that way; the way to add a migration to that store goes through the checking script.

Store the tag corresponding to each database's current schema in the database itself (and have something automated to check it regularly if you like; that's basically the same code as the migration-checking script).

Make the only way to change an important database's schema be to run a script that takes the migration from that store, reading the starting tag from the database.

genuine_smiles · on Dec 10, 2019

If you were using rails you would run migrations and then with git check if the schema file has any changes.

ufmace · on Dec 10, 2019

Rails ActiveRecord kind of already does this. To update the database schema, you write migrations, either in Ruby pseudo-SQL or in actual SQL as desired. When you run the migration, it auto-generates a schema file that is also checked into the repo. Ideally, any commit with a migration should also have the updated schema file from applying that migration.

samatman · on Dec 10, 2019

For extra credit, have the last step involve the version control system, so that each migration contains a commit of the last version which contains the old schema, and each version contains a (database | data seed) sufficient to verify the validity of the migration step.

So to test from any given migration:

- Checkout the code at that point

- Apply the migration

- Apply tests

And repeat as necessary.

Diggsey · on Dec 10, 2019

Given two snapshots of the schema, there are multiple paths from one to the other. This makes it impossible to generate perfect migrations from the snapshots, you always end up needing manual changes.

You could make it easier to see the current schema by having a pre-commit hook or similar which runs all the migrations and then saves a snapshot of the final database state to a file.

djrobstep · on Dec 10, 2019

Yes, we need to move away from the notion of a migration "chain", it's not a very useful way to think about schema changes. Most people have an ideal schema in mind - they should be free to simply craft this directly, with tooling to make the changes necessary to evolve the schema accordingly.

I wrote a tool for this exactly this purpose, and there are an increasing number of similar tools to support working this way.

Another common problem is migration tooling tied to an ORM as rails/django do (it looks like refinery avoids this, which is good). Working with migrations directly on the database level means you can support multiple (or zero) ORMs and apps with less lock-in, support more database features, and confirm the structure of the database directly.

bradstewart · on Dec 10, 2019

> with tooling to make the changes necessary to evolve the schema accordingly

Isn't that exactly what most migration tools do? How do you get from schema version 1 to schema version 2 to schema version N without a "chain" of transformations? different from a "migration 'chain'"?

While Rails' Migrations are part of/use parts of ActiveRecord, Nothing about them forces you to use AR (or any other ORM) for application code. Heck, I've used Rails Migrations in Node.js projects because they work so well.

djrobstep · on Dec 10, 2019

Yeah most migration tools work this way, although refinery doesn't.

My preferred alternative to reconstructing prod from a historical chain of changes is to compare the preferred state to the production state directly. In most cases you can generate the script you need completely automatically from a diff tool. Direct comparison also allows you to directly test that your changes have been applied correctly and that prod schema explicitly matches.

If it's possible to use rails migrations without an ORM that's great, although that's always going to be secondary to the ORM use case. Django migrations are extremely heavily coupled to the ORM, and it's almost impossible to use one without the other.

bradstewart · on Dec 11, 2019

Generate a script from the diff of _what_ exactly, the old and new schemas? How did you get to the "new" schema without some sort of migration?

SQLAlchemy does something like what I think you're describing (and is tightly coupled with the application-level ORM): you update attributes on the model classes, and then it generates a migration for you. But I've always had to manually edit those migrations by hand.

The Rails "migration first, model later" method feels much cleaner to me, but maybe that's just because I learned it that way.

Migrations are one of the most dangerous things I run on prod, so I'm always interested in better/safer ways of doing them.

djrobstep · on Dec 11, 2019

Yeah you compare your intended state for prod to what you currently have on prod, and generate the differences.

It's still a "migration", the difference is it's generated from a direct comparison to production rather than a crude reconstruction of the database's history.

Checking production directly is much more robust (more testable, changes made outside the context of the migration tooling are no problem) and simpler because it you don't need to maintain a historical migration chain - there's only ever one migration script to consider.

And using a diff tool directly on the database means you can support database features that aren't directly supported by your ORM.

irishsultan · on Dec 11, 2019

> In most cases you can generate the script you need completely automatically from a diff tool.

I'd like to know what most cases is here, a diff tool isn't going to know whether adding a column and removing a column is a rename (assuming the right types), or whether they really are different columns. A diff tool isn't going to know what the value needs to be for your new non-NULL, non-DEFAULT column (nor is it going to know that for a NULL column existing data does need a value, but that new data needs a different value, if there is even a singular value that it can use).

RussianCow · on Dec 10, 2019

Sorry if I'm misunderstanding, but if the only thing you have is the "current state" SQL file, how do you make automated changes to the schema? The point of migration tools is that making a change is simply a matter of making a new file that lays out that change, and then running the tool. Do you alter the DB manually?

shantly · on Dec 10, 2019

You can derive the current state (in CREATE statements and such) as part of your pre-commit build process if you really want it, no? Or interrogate any active copy of the DB to get it at any time.

mjw1007 · on Dec 10, 2019

That's a bit like retrieving source code by pretty-printing a minimised form. It's not terrible, but it's not going to be as pleasant as the original source.

If you maintain your schema like source code, you can use comments, organise it into files in a way that makes sense to you, maybe do a bit of light templating, and so on.

shantly · on Dec 10, 2019

I find the format of code that comes from tools deriving CREATEs from live databases to be totally acceptable, usually. Then again I'm the kind of person who prefers team-wide auto-formatting and a strict style linter, so maybe that's just me.

> If you maintain your schema like source code, you can use comments, organise it into files in a way that makes sense to you, maybe do a bit of light templating, and so on.

As usual—for the comments part, at least—PostgreSQL has your back.

https://www.postgresql.org/docs/9.1/sql-comment.html

mjw1007 · on Dec 10, 2019

I use SQL-level COMMENTs and comments in the schema source for different things.

The former can be made visible to people using the database; the latter are for people maintaining the schema.

In any case, I don't think I'd get on with an auto-formatter that thought « COMMENT ON TABLE foo IS » was a friendly comment-introducing string. « -- » takes up rather less space.

paulryanrogers · on Dec 11, 2019

Laravel migrations too are diff-only, so I created a package [0] to automatically dump as migrations are run.

0. https://packagist.org/packages/orisintel/laravel-migration-s...

couchand · on Dec 10, 2019

The docs seem a bit light on the details of what this is doing. It seems like it's keeping metadata about applied migrations with their versions and checksums... somewhere. What is this doing under the hood?

Also, the docs seem a bit light on the usage details. Can you apply particular migrations? How about a facility to roll back a migration (doesn't seem like it?)

couchand · on Dec 10, 2019

Found the answer to my first question: https://github.com/rust-db/refinery/blob/3096aceb3bf16f90a51...

yebyen · on Dec 10, 2019

I was going to suggest that, but couldn't find it myself...

If this is the same rust migration tool that my Rust friend was telling me about last week, he was saying that it's probably inspired by Rails, as the author came from a Rails background. I'm not certain it is the same one though, as he was describing a system where each migration is both an "up" and a "down" operation, described in separate files, and I don't see that in any of these examples.

Rails stores the list of applied migrations in a table like this one. (It could be inspired by Rails, but I'm not seeing any suggestive evidence of that as I'm browsing through the author's other published github repos...)

It could also be that I'm misattributing the original idea, and Rails migrations are inspired by migrations from another framework, in another language (maybe Django?)

shantly · on Dec 10, 2019

AFAIK the reason one does this is because one can (in most non-shitty databases cough cough) apply most or all transformations to schema inside a transaction, and also write the current version in that transaction, such that if the transaction completes the risk that the version you think the DB is on and the version it's actually on don't match is minimized, if not eliminated. You can't really get that unless you tie it to the transaction on the DB level, which means writing a version somewhere in the DB, in some fashion.

amarshall · on Dec 10, 2019

> he was saying that it's probably inspired by Rails, as the author came from a Rails background

You may be thinking instead of Diesel and its associated migration system. Its creator is a Rails ActiveRecord core contributor.

http://diesel.rs/

thaunatos · on Dec 10, 2019

Your friend was probably talking about Diesel migrations: https://github.com/diesel-rs/diesel

asdkhadsj · on Dec 10, 2019

> How about a facility to roll back a migration (doesn't seem like it?)

Ack, I assumed it was building the rollback based on an understanding of what was applied. If it can't rollback (unconfirmed yet), that is a problem. Bummer

vmind · on Dec 10, 2019

Personally, beyond development, I haven't found rollback migrations particularly useful. Once you've run migrations on a live database, it's often much simpler operationally to treat any rollback as a new migration, to keep things append-only.

From that perspective, while rollbacks are nice, the technical investment needed to auto-generate sound rollbacks for all DDL operations is probably vastly outsized compared to the benefit, so I can see why it wouldn't be a high priority, especially if targeting multiple databases. If you're writing things by hand, there's not a whole lot of difference between the two.

asdkhadsj · on Dec 10, 2019

That's fair. It never occurred to me honestly, but it's reasonable - especially since I don't think humans can reliably make `down.sql`

shantly · on Dec 10, 2019

Those down files probably rarely receive as much implicit or explicit testing as the up files, too. I sure wouldn't trust them, in most cases.

jxs · on Dec 10, 2019

Hi! to roll back a migration you have to build a new one which undo's what you want to rollback. Refinery is very inspired on flyway, which only recently started offering undo migrations, but still explains why that is problematic in it's/our vision: https://flywaydb.org/documentation/command/undo#important-no...

sagacity · on Dec 10, 2019

The design looks quite similar to Flyway[0]. I can confirm that this design has been working quite well for us in a large-scale production environment.

[0] https://flywaydb.org/

jxs · on Dec 10, 2019

yes! it is in fact inspired by flyway, I am going to update and mention that, along trying to be more detailed on it's details

_gjrn · on Dec 10, 2019

How does it compare to sqitch[0] ?

[0] https://sqitch.org/

BrightOne · on Dec 10, 2019

This, together with Barrel, looks like a cleaner alternative to Diesel for me.

willglynn · on Dec 10, 2019

Barrel and Refinery are intended to specify and to execute DDL respectively, while Diesel is intended to specify and execute queries and DML. They target complimentary portions of the SQL problem space and are not alternatives to each other.

robsinatra · on Dec 10, 2019

Rust doesn't seem to lend any advantages for this kind of tool, but I'm sure it took weeks to write (months?) rather than just a couple of hours. What can this do that bash scripts couldn't? This migration tool uses plain sql files. I'm not hating but questioning the use of a complicated language to do no more than what native shell scripts can, with ease.

0xffff2 · on Dec 10, 2019

Just based on a quick skim of the readme, it looks to me like it supports, but does not require sql files. Presumably this tool works better for someone who already has a Rust application and wants to write Rust and not deal with SQL directly if possible.

migueloller · on Dec 10, 2019

Reminds me a lot of Postgrator [1], which we use in our NodeJS codebase.

[1] https://github.com/rickbergfalk/postgrator

janpot · on Dec 10, 2019

I don't see the point of these migration tools unless you are running many different versions of your schema at the same time.

setr · on Dec 10, 2019

One instance this occurs is when your website is open source, in which case people are upgrading from arbitrarily old versions.

I suppose it happens when you ship any kind of application with db included (probably more common/useful would be sqlite support)

mhd · on Dec 10, 2019

Who doesn't? Sure, not everyone runs the same software on different premises or shards by customers, but even just staging/production does benefit from it. And these days, each dev having their own db is quite common.