Can I just say I have a problem with these kind of SQL migration tools. For efficiency and flexibility we write out schemas in SQL and these migration tools would have us stagger our schema across a bunch of files and would make it very had to look at the current state of things, especially after a few years of evolution. I generally prefer to write my schema as-it-should-be instead of writing diffs. I know this is the industry standard, I am not picking on refinery, this is a general complaint. Please don't say "use an ORM".
I was always under the impression that you would both have diffs, and also a full schema dump. I can't imagine you're expected to start from scratch and apply every diff in sequence every time you want to spin up a test box for the DB, so it makes sense to have a full dump done and saved after every diff. Occasionally, you might want to actually apply all the migration stuff from scratch to see if someone screwed up, but I see no reason you wouldn't also always have a full dump of what's the expected current schema.
I have worked in a Django shop where spinning up a new instance involved starting with the initial version of the database schema and applying all the migrations.
Don't know if that's standard practice, but I don't have any reason to think that it isn't.
It's a good way to be sure that the migrations lead to the same state as whatever you consider "current". Otherwise you risk diverging without noticing. Though you could probably safely sum up older migrations periodically by running them then grabbing the CREATEs and such for the resulting DB, after proving the output's identical and once you're very certain you have no running copies of code older than that point in the wild anymore.
I disagree. If you have test environments only tracking "current" and "desired" doesn't hold up, then you would need different diffs for each environment. Making sure they all apply the same migrations in the same order is important to avoid drift and ensuring consistency. Also an issue if you have multiple production databases, e.g. sharding. When you have lots of migrations and you're sure all environments are up to date to a certain point you can "squash" the initial ones to make it apply faster.
I don't see the problem. Each branch has its own desired state as necessary. A different database version is going to need a different migration script, no matter what your approach. Only difference is whether you generate those scripts by direct comparison with prod directly or with a crude reconstruction of it.
Having worked with Django a bunch, I'd recommend squashing your migrations once you have a lot of them. It will speed up the time it takes to make a new database, including testing time if you're testing your database.
Well, one reason it might not be is that it's somewhat equivalent to checking out the initial version of the code, and applying all diffs to get to the current version every time you want to see the latest commit/HEAD.
On one hand, it's good to know, on the other hand, other than the last few steps needed to roll back a change, I would argue it's much more important to make sure that the current full schema represented in dev/testing (including static/dictionary tables) is as accurate as possible to the current live schema when the production branch is tested, and if you're already doing a schema dump and diff, you might as well be saving that as well.
When talking strictly about DDL (managing schemas, not row data migrations) the third bullet doesn't really provide much benefit, IMO. As long as your tooling knows how to properly transition from any live schema state to the new desired state, the actual migration / specific ALTER statement is basically unnecessary.
Facebook has been using the declarative approach (repo of CREATE statements, pure SQL) company-wide for nearly a decade and it works extremely well. Conceptually, this approach allows you to treat database schemas just like code: request a schema change by pull request; CI validates and sanity-checks; review the PR within your own team just like code; merge to kick off the change using a CD pipeline.
I liked this approach so much that I started an open source project in early 2016 to help enable this workflow for MySQL and MariaDB, https://www.skeema.io. It has a number of validations and safety checks built in, including the one described in your 4th bullet.
If I had tooling which I trusted to be able to transition from any schema state I might want to use to any other such state, I would happily use it.
I've never found any, though it's a few years since I last looked. (I use Postgres, which has a fairly rich set of possible schema objects, and adds new bits frequently; I'd be reluctant to rely on a project that wasn't actively maintained and starting from 100% support).
The principle here is that in this case (as often), it's much easier to teach a computer to check that you've done manual work correctly than it is to teach the computer to solve the general problem.
This has been working well for me for many years, but I'm typically only dealing with a few dozen schema changes (in production) a year; I can imagine that Facebook sees the world just slightly differently.
Any advice on how to implement that last bullet point? I've got an experimental project that currently does the first two things. I'm getting close to a point where it actually needs to be deployed and I have to figure out the last two. Keeping both a complete up-to-date schema and a set of migrations seems like a good idea in theory, but I don't have any idea how I would automate a step to make sure they stay in sync in my CI workflow.
Use VCS tags to identify schema versions, and a naming convention including those tags to track the migration files.
Have a script that validates a migration in the obvious sort of way (check out the project that contains the schema at tag a; build it to a scratch database; add some data; apply the migration; dump the schema; compare that against a dump made directly from tag b).
Maintain a store of migrations that have been validated in that way; the way to add a migration to that store goes through the checking script.
Store the tag corresponding to each database's current schema in the database itself (and have something automated to check it regularly if you like; that's basically the same code as the migration-checking script).
Make the only way to change an important database's schema be to run a script that takes the migration from that store, reading the starting tag from the database.
Rails ActiveRecord kind of already does this. To update the database schema, you write migrations, either in Ruby pseudo-SQL or in actual SQL as desired. When you run the migration, it auto-generates a schema file that is also checked into the repo. Ideally, any commit with a migration should also have the updated schema file from applying that migration.
For extra credit, have the last step involve the version control system, so that each migration contains a commit of the last version which contains the old schema, and each version contains a (database | data seed) sufficient to verify the validity of the migration step.
Given two snapshots of the schema, there are multiple paths from one to the other. This makes it impossible to generate perfect migrations from the snapshots, you always end up needing manual changes.
You could make it easier to see the current schema by having a pre-commit hook or similar which runs all the migrations and then saves a snapshot of the final database state to a file.
Yes, we need to move away from the notion of a migration "chain", it's not a very useful way to think about schema changes. Most people have an ideal schema in mind - they should be free to simply craft this directly, with tooling to make the changes necessary to evolve the schema accordingly.
I wrote a tool for this exactly this purpose, and there are an increasing number of similar tools to support working this way.
Another common problem is migration tooling tied to an ORM as rails/django do (it looks like refinery avoids this, which is good). Working with migrations directly on the database level means you can support multiple (or zero) ORMs and apps with less lock-in, support more database features, and confirm the structure of the database directly.
> with tooling to make the changes necessary to evolve the schema accordingly
Isn't that exactly what most migration tools do? How do you get from schema version 1 to schema version 2 to schema version N without a "chain" of transformations? different from a "migration 'chain'"?
While Rails' Migrations are part of/use parts of ActiveRecord, Nothing about them forces you to use AR (or any other ORM) for application code. Heck, I've used Rails Migrations in Node.js projects because they work so well.
Yeah most migration tools work this way, although refinery doesn't.
My preferred alternative to reconstructing prod from a historical chain of changes is to compare the preferred state to the production state directly. In most cases you can generate the script you need completely automatically from a diff tool. Direct comparison also allows you to directly test that your changes have been applied correctly and that prod schema explicitly matches.
If it's possible to use rails migrations without an ORM that's great, although that's always going to be secondary to the ORM use case. Django migrations are extremely heavily coupled to the ORM, and it's almost impossible to use one without the other.
Generate a script from the diff of _what_ exactly, the old and new schemas? How did you get to the "new" schema without some sort of migration?
SQLAlchemy does something like what I think you're describing (and is tightly coupled with the application-level ORM): you update attributes on the model classes, and then it generates a migration for you. But I've always had to manually edit those migrations by hand.
The Rails "migration first, model later" method feels much cleaner to me, but maybe that's just because I learned it that way.
Migrations are one of the most dangerous things I run on prod, so I'm always interested in better/safer ways of doing them.
Yeah you compare your intended state for prod to what you currently have on prod, and generate the differences.
It's still a "migration", the difference is it's generated from a direct comparison to production rather than a crude reconstruction of the database's history.
Checking production directly is much more robust (more testable, changes made outside the context of the migration tooling are no problem) and simpler because it you don't need to maintain a historical migration chain - there's only ever one migration script to consider.
And using a diff tool directly on the database means you can support database features that aren't directly supported by your ORM.
> In most cases you can generate the script you need completely automatically from a diff tool.
I'd like to know what most cases is here, a diff tool isn't going to know whether adding a column and removing a column is a rename (assuming the right types), or whether they really are different columns. A diff tool isn't going to know what the value needs to be for your new non-NULL, non-DEFAULT column (nor is it going to know that for a NULL column existing data does need a value, but that new data needs a different value, if there is even a singular value that it can use).
Sorry if I'm misunderstanding, but if the only thing you have is the "current state" SQL file, how do you make automated changes to the schema? The point of migration tools is that making a change is simply a matter of making a new file that lays out that change, and then running the tool. Do you alter the DB manually?
You can derive the current state (in CREATE statements and such) as part of your pre-commit build process if you really want it, no? Or interrogate any active copy of the DB to get it at any time.
That's a bit like retrieving source code by pretty-printing a minimised form. It's not terrible, but it's not going to be as pleasant as the original source.
If you maintain your schema like source code, you can use comments, organise it into files in a way that makes sense to you, maybe do a bit of light templating, and so on.
I find the format of code that comes from tools deriving CREATEs from live databases to be totally acceptable, usually. Then again I'm the kind of person who prefers team-wide auto-formatting and a strict style linter, so maybe that's just me.
> If you maintain your schema like source code, you can use comments, organise it into files in a way that makes sense to you, maybe do a bit of light templating, and so on.
As usual—for the comments part, at least—PostgreSQL has your back.
I use SQL-level COMMENTs and comments in the schema source for different things.
The former can be made visible to people using the database; the latter are for people maintaining the schema.
In any case, I don't think I'd get on with an auto-formatter that thought « COMMENT ON TABLE foo IS » was a friendly comment-introducing string. « -- » takes up rather less space.
The docs seem a bit light on the details of what this is doing. It seems like it's keeping metadata about applied migrations with their versions and checksums... somewhere. What is this doing under the hood?
Also, the docs seem a bit light on the usage details. Can you apply particular migrations? How about a facility to roll back a migration (doesn't seem like it?)
I was going to suggest that, but couldn't find it myself...
If this is the same rust migration tool that my Rust friend was telling me about last week, he was saying that it's probably inspired by Rails, as the author came from a Rails background. I'm not certain it is the same one though, as he was describing a system where each migration is both an "up" and a "down" operation, described in separate files, and I don't see that in any of these examples.
Rails stores the list of applied migrations in a table like this one. (It could be inspired by Rails, but I'm not seeing any suggestive evidence of that as I'm browsing through the author's other published github repos...)
It could also be that I'm misattributing the original idea, and Rails migrations are inspired by migrations from another framework, in another language (maybe Django?)
AFAIK the reason one does this is because one can (in most non-shitty databases cough cough) apply most or all transformations to schema inside a transaction, and also write the current version in that transaction, such that if the transaction completes the risk that the version you think the DB is on and the version it's actually on don't match is minimized, if not eliminated. You can't really get that unless you tie it to the transaction on the DB level, which means writing a version somewhere in the DB, in some fashion.
> How about a facility to roll back a migration (doesn't seem like it?)
Ack, I assumed it was building the rollback based on an understanding of what was applied. If it can't rollback (unconfirmed yet), that is a problem. Bummer
Personally, beyond development, I haven't found rollback migrations particularly useful. Once you've run migrations on a live database, it's often much simpler operationally to treat any rollback as a new migration, to keep things append-only.
From that perspective, while rollbacks are nice, the technical investment needed to auto-generate sound rollbacks for all DDL operations is probably vastly outsized compared to the benefit, so I can see why it wouldn't be a high priority, especially if targeting multiple databases. If you're writing things by hand, there's not a whole lot of difference between the two.
Hi!
to roll back a migration you have to build a new one which undo's what you want to rollback. Refinery is very inspired on flyway, which only recently started offering undo migrations, but still explains why that is problematic in it's/our vision: https://flywaydb.org/documentation/command/undo#important-no...
The design looks quite similar to Flyway[0]. I can confirm that this design has been working quite well for us in a large-scale production environment.
Barrel and Refinery are intended to specify and to execute DDL respectively, while Diesel is intended to specify and execute queries and DML. They target complimentary portions of the SQL problem space and are not alternatives to each other.
Rust doesn't seem to lend any advantages for this kind of tool, but I'm sure it took weeks to write (months?) rather than just a couple of hours. What can this do that bash scripts couldn't? This migration tool uses plain sql files. I'm not hating but questioning the use of a complicated language to do no more than what native shell scripts can, with ease.
Just based on a quick skim of the readme, it looks to me like it supports, but does not require sql files. Presumably this tool works better for someone who already has a Rust application and wants to write Rust and not deal with SQL directly if possible.
Who doesn't? Sure, not everyone runs the same software on different premises or shards by customers, but even just staging/production does benefit from it. And these days, each dev having their own db is quite common.