there are quite a few variables that could be derived in GCP, reducing the amount of R code and raw data downloaded. The variables im thinking about in particular are the following (contained in the clean_data.R file:
d_827220437 -> apply hospital labels to this in GCP
sex -> this relies on 2 variables state_d_706256705 and state_d_435027713, simple if statements in BQ would handle the definition of this
biocol_type -> first, rename this to something that makes sense. second, this variable relies on if statements of the following 3 variables: d_878865966, d_167958071, d_684635302. very easily implemented in BQ.
Msrv_complt -> exact same case as previous case, except with a few other variables
income -> this just needs labels for the income brackets.
race-> for invited participants only
there are quite a few variables that could be derived in GCP, reducing the amount of R code and raw data downloaded. The variables im thinking about in particular are the following (contained in the
clean_data.Rfile:d_827220437-> apply hospital labels to this in GCPsex-> this relies on 2 variablesstate_d_706256705andstate_d_435027713, simple if statements in BQ would handle the definition of thisbiocol_type-> first, rename this to something that makes sense. second, this variable relies on if statements of the following 3 variables:d_878865966,d_167958071,d_684635302. very easily implemented in BQ.Msrv_complt-> exact same case as previous case, except with a few other variablesincome-> this just needs labels for the income brackets.race-> for invited participants only