Conversation
4399e6d to
acf14fa
Compare
fix go tests remove pry changes changes fix tests
aaa1229 to
d83ec42
Compare
changes
8192736 to
031bd4c
Compare
|
You imported some svg and png files. Is this intended? |
Yeah, these files depict the performance graphs for offcpu and and cpu profiles in case anyone wants to take a look. Will delete it once the PR is ready to merge. |
|
You can just paste it directly in either a comment here or the PR description, would make it easier for people to see. |
ferry.go
Outdated
| }() | ||
| } | ||
|
|
||
| schemaFingerVerifierPrintWg := &sync.WaitGroup{} |
There was a problem hiding this comment.
Could probably just reuse the supportingServicesWg instead of having another one here.
ferry.go
Outdated
| var ( | ||
| binlogVerifyStore *BinlogVerifyStore = nil | ||
| ) |
There was a problem hiding this comment.
Question for my own knowledge: what prompted this change? A linting error?
There was a problem hiding this comment.
Yeah, seems like it. Will change it back.
schema_fingerprint_verifier.go
Outdated
|
|
||
| type SchemaFingerPrintVerifier struct { | ||
| SourceDB *sql.DB | ||
| TableRewrites map[string]string |
There was a problem hiding this comment.
Don't you need DatabaseRewrites as well?
There was a problem hiding this comment.
I was a bit confused as to what the purpose of TableRewrites is. Actually I only DatabaseRewrites.
schema_fingerprint_verifier.go
Outdated
| return schemaFingerPrints, err | ||
| } | ||
|
|
||
| _, isIgnored := table.IgnoredColumnsForVerification[string(rowData[3].([]byte))] |
There was a problem hiding this comment.
Wait. We ignore certain columns for verification for a very specific reason (see below). We shouldn't ignore it here as we can corrupt data if that column definition changed.
Lines 688 to 699 in e605f11
|
|
||
| schemaData := [][]interface{}{} | ||
| for rows.Next() { | ||
| rowData, err := ScanGenericRow(rows, 21) |
There was a problem hiding this comment.
Nit: can you write a comment about this magic 21? I think this is the number of columns in information_schema.columns, but if you have a comment here guiding the reader, then the reader doesn't have to go look on their own.
schema_fingerprint_verifier.go
Outdated
| dbSet[table.Schema] = struct{}{} | ||
|
|
||
| query := fmt.Sprintf("SELECT * FROM information_schema.columns WHERE table_schema = '%s' ORDER BY table_name, column_name", table.Schema) | ||
| rows, err := sf.SourceDB.Query(query) |
There was a problem hiding this comment.
Shouldn't we do this for the target DB too, if we want to be generic?
There was a problem hiding this comment.
You're correct, we should check both source and target.
schema_fingerprint_verifier.go
Outdated
| } | ||
|
|
||
| hash := md5.Sum([]byte(schemaDataInBytes)) | ||
| schemaFingerPrints[table.Schema] = hex.EncodeToString(hash[:]) |
There was a problem hiding this comment.
Can we store a single hash, instead of an array of hash? This map is stored into the state dump. If we have a lot of tables, it can be pretty big. A single hash is most likely good enough and save a lot of space.
ferry.go
Outdated
| fingerPrint := map[string]string{} | ||
| if f.StateToResumeFrom != nil && f.StateToResumeFrom.SchemaFingerPrint != nil { | ||
| fingerPrint = f.StateToResumeFrom.SchemaFingerPrint | ||
| } |
There was a problem hiding this comment.
Reading this code gives me a bit of pause. There's currently a race condition I think:
- Ghostferry starts copying rows
- within 1 second, the table schema changed
- The schema finger print verifier reads the schema and thinks the schema is unchanged.
Did I miss something and this situation is not possible?
There was a problem hiding this comment.
So for clarity currently the following happens when you run the migration with copydb
- ghostferry creates table and dbs on the target before running the ferry - https://github.com/Shopify/ghostferry/blob/master/copydb/cmd/main.go#L127
- the ferry starts running and that's when the schema_fingerprint is started in background.
Also just to avoid confusion by table schema I mean the information we retrieve from information_schema.columns table from mysql and not the TableSchema struct we use internally in ghostferry.
I am not sure I understand your point about after ghostferry starts copying rows why would the table schema would have changed. Copying rows shouldn't alter the schema structure of the table. Here is what an example output is when we try to retrieve the schema of a table.
mysql> SELECT * FROM information_schema.columns WHERE table_schema = 'sakila' ORDER BY table_name, column_name LIMIT 1\G
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: sakila
TABLE_NAME: actor
COLUMN_NAME: actor_id
ORDINAL_POSITION: 1
COLUMN_DEFAULT: NULL
IS_NULLABLE: NO
DATA_TYPE: smallint
CHARACTER_MAXIMUM_LENGTH: NULL
CHARACTER_OCTET_LENGTH: NULL
NUMERIC_PRECISION: 5
NUMERIC_SCALE: 0
DATETIME_PRECISION: NULL
CHARACTER_SET_NAME: NULL
COLLATION_NAME: NULL
COLUMN_TYPE: smallint unsigned
COLUMN_KEY: PRI
EXTRA: auto_increment
PRIVILEGES: select,insert,update,references
COLUMN_COMMENT:
GENERATION_EXPRESSION:
SRS_ID: NULL
1 row in set (0.00 sec)
Have posted the CPU profile, but since the off CPU profile (basically flamegraph) is in svg format, I had to attach it as a file. |
What I usually do is either convert it into a rasterized format like png, or take a screenshot of the SVG and paste it in. |
fixes more fixes
7fa52dc to
2afb0c3
Compare
Introduction
Introducing Schema Verifier
This componnent currently collects source and target schema fingerprints periodically (1 minute by default) and verifies that the schema has not been changed on the source and target while the migration is still going on.
Technical details
I don't think there is anything special about the code written, most if it is self-explanotary.
Performace
Collecting Schema fingerprint every 1 minute should not affect performance in any sort for the migration as the interval is just too high, although I did some performance testing by setting the
PeriodicallyVerifySchemaFingerprintInterval = 50ms, just for curiosity on how would the performance affect.I collected for 30s cpuprofile (using
pprof) and offcpuprofile using (offcputime-bpfcc), but didn't notice any performance issues at all.offcpu profile
cpu profile