-
Notifications
You must be signed in to change notification settings - Fork 71
RDBMS/database dump and restore script #4178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
❌ 96 blocking issues (100 total)
@qltysh one-click actions:
|
Implements a comprehensive data migration script that can export and import all database tables via stdin/stdout in JSONL format. The script handles: - Batch processing of large tables (1000 records per batch) - Proper serialization of complex data types (YAML, JSON) - LOB handling for Oracle databases - Foreign key dependency ordering - Sequence/auto-increment fixes for Oracle, PostgreSQL, and MySQL - Support for composite primary keys and STI models - Callback-free imports to preserve data integrity Usage: RAILS_LOG_TO_STDOUT=false bundle exec rails runner dump_data.rb export > data.jsonl bundle exec rails runner dump_data.rb import < data.jsonl 🤖 Generated with [Claude Code](https://claude.com/claude-code)
8a68ba1 to
a7f791b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a single test because calling the script is slow, so one chain of calls should be preferred. Locally it takes some 50 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About tests.
- I'm reluctant to introduce a new test to the suite that takes almost a minute to run, can't it really be optimized?
- I think it would be better to have a unit test suite to test particular methods in the script, and an integration test suit like this, but with one or more tests for each command:
import,export,tunctate-allandfix-sequences. - I executed the tests and the import process failed for me (MySQL):
-> Reset auto_increment for proxy_rules to 4
-> Reset auto_increment for referrer_filters to 2
-> Reset auto_increment for service_tokens to 5
-> Reset auto_increment for services to 4
-> Reset auto_increment for settings to 3
W, [2025-11-27T12:24:53.019002 #107472] WARN -- : Creating SystemOperation defaults
-> ERROR importing system_operations on line 80: ActiveRecord::RecordNotUnique: Mysql2::Error: Duplicate entry '1' for key 'system_operations.PRIMARY'
-> Failed record: {"id"=>1, "ref"=>"user_signup", "name"=>"New user signup", "description"=>nil, "created_at"=>"2025-11-27 11:24:27 UTC", "updated_at"=>"2025-11-27 11:24:27 UTC", "pos"=>nil, "tenant_id"=>nil}
-> Backtrace:
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `_query'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `block in query'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:150:in `handle_interrupt'
-> Exiting due to error
.
Expected: 0
Actual: 1
test/integration/dump_data_test.rb:56:in `block in <class:DumpDataTest>'
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strange because it worked in CI. Which MySQL version are you running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reluctant to introduce a new test to the suite that takes almost a minute to run, can't it really be optimized?
the issue is that startup of the script fails. For testing the bare minimum, one has to dump data, clear data, import data. 3 invocations take time just to load all classes. Otherwise the operations are not so slow. And that's why I included only a single long test instead of many targeted ones. Not sure whether as a rake task it can run faster. Also at some point some monkey patching was done to prevent callbacks, I have to see whether the final version was safe monkey-patching. Otherwise it may well be wiser to have it as a script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the monkey patching I think should be safe, so a rake task can be done, I just don't want to spend time on it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It failed with MySQL 8.0.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #4178 +/- ##
==========================================
- Coverage 87.81% 87.78% -0.03%
==========================================
Files 1783 1783
Lines 44690 44690
Branches 686 686
==========================================
- Hits 39243 39231 -12
- Misses 5421 5433 +12
Partials 26 26 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
jlledom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments:
- The script doesn't support triggers, we need triggers to ensure tenant_id integrity, and probably sequences on oracle.
- Instead of outputting data to stdout and printing to stderr, I think it would be better to output/input data from files and use stdout and stderr like usually.
- Instead of a script, this would probably be a rake task
- At some point (in the task name, description, comments on the code, etc) I think we should mention that the only advantage of this script is to move data between db systems, if you want to dump/restore over the same system, better use rails standard tasks
db:data:dumpanddb:data:load
|
|
||
| # Export/Import all data from/to database via stdin/stdout | ||
| # Usage: | ||
| # RAILS_LOG_TO_STDOUT=false bundle exec rails runner dump_data.rb export > data.jsonl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this command and it sent logs to the output file anyway. So something is failing.
On the other hand, why setting RAILS_LOG_TO_STDOUT at all? If we are sure we'll never want logs to stdout, couldn't we just set the env variable from inside the script? or whatever other approach to get the same result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a few log files and does not break it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the import work with the traces in the jsonl file?
| # Export/Import all data from/to database via stdin/stdout | ||
| # Usage: | ||
| # RAILS_LOG_TO_STDOUT=false bundle exec rails runner dump_data.rb export > data.jsonl | ||
| # bundle exec rails runner dump_data.rb import < data.jsonl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this command, exported from MySQL, importing to PostgreSQL. It failed for me:
With the DB containing seed data:
-> Reset sequence proxy_rules_id_seq to 6
-> Reset sequence service_tokens_id_seq to 6
Warning: Error on line 73: PG::ForeignKeyViolation: ERROR: update or delete on table "services" violates foreign key constraint "fk_rails_e4d18239f1" on table "api_docs_services"
DETAIL: Key (id)=(2) is still referenced from table "api_docs_services".
-> Reset sequence settings_id_seq to 53
W, [2025-11-27T13:17:26.802839 #158128] WARN -- : Creating SystemOperation defaults
-> ERROR importing system_operations on line 76: ActiveRecord::RecordNotUnique: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "system_operations_pkey"
DETAIL: Key (id)=(1) already exists.
-> Failed record: {"id"=>1, "ref"=>"user_signup", "name"=>"New user signup", "description"=>nil, "created_at"=>"2025-11-26 12:26:27 UTC", "updated_at"=>"2025-11-26 12:26:27 UTC", "pos"=>nil, "tenant_id"=>nil}
-> Backtrace:
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/postgresql_adapter.rb:894:in `exec_params'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/postgresql_adapter.rb:894:in `block (2 levels) in exec_no_cache'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract_adapter.rb:1027:in `block in with_raw_connection'
-> Exiting due to error
With the DB with an empty DB, no data nor schema:
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `exec': PG::UndefinedTable: ERROR: relation "web_hooks" does not exist (ActiveRecord::StatementInvalid)
LINE 10: WHERE a.attrelid = '"web_hooks"'::regclass
^
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/postgresql/database_statements.rb:19:in `block (2 levels) in query'
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract_adapter.rb:1027:in `block in with_raw_connection'
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/concurrency/null_lock.rb:9:in `synchronize'
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract_adapter.rb:999:in `with_raw_connection'
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/postgresql/database_statements.rb:18:in `block in query'
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activesupport-7.1.5.2/lib/active_support/notifications/instrumenter.rb:58:in `instrument'
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract_adapter.rb:1142:in `log'
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/postgresql/database_statements.rb:17:in `query'
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/postgresql_adapter.rb:1074:in `column_definitions'
from /home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract/schema_statements.rb:109:in `columns'
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need a clean database state, there is a script command to wipe all tables when given a dump
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried these commands now:
$ bundle exec rails db:drop db:create db:schema:load
$ bundle exec rails runner script/dump_data.rb import < db/completedb.jsonl
With MySQL 8.4:
...
-> Reset auto_increment for service_tokens to 6
-> Reset auto_increment for services to 6
-> Reset auto_increment for settings to 53
W, [2025-11-28T09:18:06.532753 #45031] WARN -- : Creating SystemOperation defaults
-> ERROR importing system_operations on line 76: ActiveRecord::RecordNotUnique: Mysql2::Error: Duplicate entry '1' for key 'system_operations.PRIMARY'
-> Failed record: {"id"=>1, "ref"=>"user_signup", "name"=>"New user signup", "description"=>nil, "created_at"=>"2025-11-26 12:26:27 UTC", "updated_at"=>"2025-11-26 12:26:27 UTC", "pos"=>nil, "tenant_id"=>nil}
-> Backtrace:
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `_query'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `block in query'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:150:in `handle_interrupt'
With PSQL 15:
...
-> Reset sequence service_tokens_id_seq to 6
-> Reset sequence services_id_seq to 6
-> Reset sequence settings_id_seq to 53
W, [2025-11-28T09:36:52.310116 #69186] WARN -- : Creating SystemOperation defaults
-> ERROR importing system_operations on line 76: ActiveRecord::RecordNotUnique: PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "system_operations_pkey"
DETAIL: Key (id)=(1) already exists.
-> Failed record: {"id"=>1, "ref"=>"user_signup", "name"=>"New user signup", "description"=>nil, "created_at"=>"2025-11-26 12:26:27 UTC", "updated_at"=>"2025-11-26 12:26:27 UTC", "pos"=>nil, "tenant_id"=>nil}
-> Backtrace:
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/postgresql_adapter.rb:894:in `exec_params'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/postgresql_adapter.rb:894:in `block (2 levels) in exec_no_cache'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/activerecord-7.1.5.2/lib/active_record/connection_adapters/abstract_adapter.rb:1027:in `block in with_raw_connection'
-> Exiting due to error
| tables = get_tables_to_process | ||
|
|
||
| # Tables with foreign key constraints that should be imported last | ||
| # Based on foreign keys defined in db/oracle_schema.rb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why oracle_schema.rb in particular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
claude artifact
| if sequence_name.nil? | ||
| raise "sequence not found (tried: #{full_sequence_name}, #{shortened_sequence_name}). " \ | ||
| "Table may be using a different auto-increment mechanism. " \ | ||
| "Oracle Enhanced Adapter supports: :sequence (default), :trigger, :identity, or :autogenerated." | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this:
- If we are importing the DB, is the
user_sequencestable supposed to be imported too? - Is it auto-generated?
- Would it be returned by
ActiveRecord::Base.connection.tablesin the export process? - Won't this auto-increment method require triggers to be imported also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
triggers are out of scope, the schemas between the rails instances should be already in a working compatible state, we don't do DDL except for fixing up the auto-increment sequences
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Then it would be good to add a comment to the script to explain it.
| STDERR.puts " bundle exec rails runner dump_data.rb fix-sequences" | ||
| STDERR.puts "" | ||
| STDERR.puts " # Pipe directly between databases" | ||
| STDERR.puts " bundle exec rails runner dump_data.rb export | bundle exec rails runner dump_data.rb import" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it possible this example would work? At least we should set DATABASE_URL on one of the two sides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, you must have a working database config file on both sides, not necessary to set variables. Although in our usage we always do I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the command and it is now:
bundle exec rails runner dump_data.rb export | bundle exec rails runner dump_data.rb import
It runs both sides of the pipeline from the same folder, with the same config. It will dump and reload the data from/to the same DB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About tests.
- I'm reluctant to introduce a new test to the suite that takes almost a minute to run, can't it really be optimized?
- I think it would be better to have a unit test suite to test particular methods in the script, and an integration test suit like this, but with one or more tests for each command:
import,export,tunctate-allandfix-sequences. - I executed the tests and the import process failed for me (MySQL):
-> Reset auto_increment for proxy_rules to 4
-> Reset auto_increment for referrer_filters to 2
-> Reset auto_increment for service_tokens to 5
-> Reset auto_increment for services to 4
-> Reset auto_increment for settings to 3
W, [2025-11-27T12:24:53.019002 #107472] WARN -- : Creating SystemOperation defaults
-> ERROR importing system_operations on line 80: ActiveRecord::RecordNotUnique: Mysql2::Error: Duplicate entry '1' for key 'system_operations.PRIMARY'
-> Failed record: {"id"=>1, "ref"=>"user_signup", "name"=>"New user signup", "description"=>nil, "created_at"=>"2025-11-27 11:24:27 UTC", "updated_at"=>"2025-11-27 11:24:27 UTC", "pos"=>nil, "tenant_id"=>nil}
-> Backtrace:
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `_query'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:151:in `block in query'
/home/jlledom/.asdf/installs/ruby/3.1.5/lib/ruby/gems/3.1.0/gems/mysql2-0.5.6/lib/mysql2/client.rb:150:in `handle_interrupt'
-> Exiting due to error
.
Expected: 0
Actual: 1
test/integration/dump_data_test.rb:56:in `block in <class:DumpDataTest>'
...
It works so I don't really want to spend more time on it for minor reasons. We can see later if changes are needed.
Initially I didn't intend to commit it to the repo but then I thought it might be useful. Now it will require more work that I'm not really sure is worth investing.
I didn't notice Your points are generally good. If you think it is not suitable for merging as is I will have to think whether to put more time into it or drop it altogether. I think it is worth having a starting point although not ideal. But let me know your thought. |
I think using
Considering the points above and the addition of one minute in integration tests. I don't think it's worth merging. The only advantage it has over |
ddd7a8e to
a7f791b
Compare
With this script one can dump and restore 3scale data from one database instance to another. Possibly even between database types, e.g. from oracle to postgres. But I have only tested oracle -> oracle.
After migration, on has to:
Maybe needs a test, let me know. But it would only be within the current test database, not between types.
Implements a comprehensive data migration script that can export and import all database tables via stdin/stdout in JSONL format. The script handles:
Usage:
RAILS_LOG_TO_STDOUT=false bundle exec rails runner dump_data.rb export > data.jsonl bundle exec rails runner dump_data.rb import < data.jsonl
🤖 Generated with Claude Code