Skip to content

Databricks hackathon/migrating to databricks page#263

Open
Lsnaathorst1 wants to merge 9 commits into
mainfrom
databricks-hackathon/migrating-to-databricks-page
Open

Databricks hackathon/migrating to databricks page#263
Lsnaathorst1 wants to merge 9 commits into
mainfrom
databricks-hackathon/migrating-to-databricks-page

Conversation

@Lsnaathorst1

@Lsnaathorst1 Lsnaathorst1 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Overview of changes

Adding in new Migrating to Databricks page to the Databricks and ADA section.

Why are these changes being made?

This is to help teams understand what their options are and what the different approaches are that can be taken when migrating and to point teams to relevant guidance in other sections of the Analysts Guide (e.g., connecting to RStudio from Databricks or Databricks Notebooks.

Detailed description of changes

Adding in Migrating to Databricks page after discussion within SDT team. We decided to include

  • Introduction explaining there are multiple approaches, each has pros and cons and it is down to the analyst to decide which is most suitable for their process.
  • Connecting to Databricks from RStudio approach with pros and cons
  • Coding in Databricks approach with pros and cons
  • Added 'What if I have both R and SSMS code in my current process?' as lots of teams will be in this position.

Databricks fundamentals

  • Removing 'What this means for existing code section' from Databricks fundamentals and using some of this detail where relevant in the above
  • Removing the diagram after a team discussion, as this is publication focused and doesn't show all options.

Issue ticket number/s and link

Resolves issue #184

Checklist before requesting a review

  • I have checked the contributing guidelines
  • I have checked for and linked any relevant issues that this may resolve
  • I have checked that these changes build locally
  • I understand that if merged into main, these changes will be publicly available

…ndex

- Removing content from fundamentals page where this is now covered or redundant due to new page. Also includes removing the diagram after team discussion.
- Moving workflows section up to still be covered in fundamentals but out of What Databricks means for exisitng code section
@Lsnaathorst1 Lsnaathorst1 marked this pull request as draft June 1, 2026 13:02
@Lsnaathorst1 Lsnaathorst1 marked this pull request as ready for review June 1, 2026 15:03

@laragarbett laragarbett left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Lsnaathorst1 for setting up the new page of recommendations! It's got a nice, clear structure and the two approaches are well laid out :)
I've added various suggestions for wording changes, as well as a couple of restructuring/layout changes. Do push back on any you disagree with!
It's also highlighted I should think about RStudio references in my own PR too.

Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread index.qmd Outdated
Comment thread ADA/databricks_fundamentals.qmd Outdated
@Lsnaathorst1

Copy link
Copy Markdown
Contributor Author

Hey @laragarbett, thank you for the detailed feedback! I think this is now back with you, with most changes implemented and just a few unresolved conversations to look at above.

Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd
Comment thread ADA/migrating_to_databricks.qmd Outdated
Comment thread ADA/migrating_to_databricks.qmd Outdated
@laragarbett

Copy link
Copy Markdown
Contributor

Thank you @Lsnaathorst1 for addressing my comments - there are a couple of unresolved ones from my first review but they're only small things.

I then had another review of the page now that you resolved my first round of comments, as it's such an important one for us to get right. I did pick up on some more things I think we should adjust. I've added some more comments, mainly around structuring but a few more wording suggestions. :)

… a bit smaller than previously, so you can tell they are smaller than H4, and then reducing H4 headers in size for the same reason.

Also adding in formatting lines for consistancty with other pages
@Lsnaathorst1

Copy link
Copy Markdown
Contributor Author

Hey @laragarbett, hopefully all of those changes are now reflected and this is back with you for re-review :)


------------------------------------------------------------------------

## Guidance on SQL code

@laragarbett laragarbett Jun 18, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I know I'm going back on a previously requested change but seeing this section rendered, I feel it's actually a bit confusing because we're first pointing people to Approach 2 for SQL-only code, then here we're saying that Approach 1 is good for complex SQL code, which we we talk about in the last section as one of the "hybrid" options anyway.

I suggest we:

  • Delete the "Guidance on SQL code" heading and paragraph under it
  • Put the "Translating T-SQL..." heading and paragraph into a callout box, so it's not a section but more of a note which appears after we've explained the 2 approaches
  • In the Approach 2 section, under the 3 option bullet points, add a sentence saying "SQL Editor is recommended for short, ad hoc SQL queries. For longer or more complex SQL analysis, consider using notebooks."


------------------------------------------------------------------------

This approach is a useful short-term or transitional approach when you want to reuse existing SQL code with minimal changes. It keeps SQL and R closely linked by embedding SQL queries within an R workflow. Any SQL code would first need updating from T-SQL to Spark SQL, where it could then be passed via R code using wrapper functions to run in Databricks, whilst R controls execution.You can run SQL from R by creating a reusable wrapper function that uses a Databricks connection (e.g., via the DBI and odbc packages) and executes queries with a function like dbGetQuery()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put "DBI", "odbc" and "dbGetQuery()" in apostrophes here, and this para also needs a full stop at the end :)


------------------------------------------------------------------------

This approach is a useful short-term or transitional approach when you want to reuse existing SQL code with minimal changes. It keeps SQL and R closely linked by embedding SQL queries within an R workflow. Any SQL code would first need updating from T-SQL to Spark SQL, where it could then be passed via R code using wrapper functions to run in Databricks, whilst R controls execution.You can run SQL from R by creating a reusable wrapper function that uses a Databricks connection (e.g., via the DBI and odbc packages) and executes queries with a function like dbGetQuery()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


This approach is most suitable where your code is written primarily in R and supports a quick and low-disruption migration. If you have an existing pipeline set up using RStudio / Positron / another IDE, there is no expectation that you must migrate your existing code or scripts into the Databricks platform (although there's no reason you shouldn't if you'd like to!).

Code that reads or writes data from or to SSMS databases will need to be redirected to your Databricks catalog. To do this, you'll need to manually set up a connection to a Databricks compute. The best compute option for this is an SQL Warehouse. You can find more information about setting up a connection to an SQL Warehouse on our [set up Databricks SQL Warehouse with RStudio](../ADA/databricks_rstudio_sql_warehouse.html) page.

@laragarbett laragarbett Jun 18, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we say this (or near enough this) at 4 different points on the page, I think it would be best to state it once at the bottom of the page and reference it with an asterisk or something at those 4 places.

So at the bottom of the page:
*For all processes that run SQL code or read from / write to the Databricks catalog from outside Databricks, you’ll need to manually set up a connection to a Databricks compute resource. The best compute option for this is an SQL Warehouse. You can find more information about setting up a connection to an SQL Warehouse on our Databricks SQL Warehouse with RStudio page.

And then here just say "Code that reads or writes data from or to SSMS databases will need to be redirected to your Databricks catalog*."


------------------------------------------------------------------------

This approach is suitable when your existing SQL code is complex, well-tested or often reused and you want to keep it as SQL code. It involves running the SQL code directly in Databricks to create intermediate or final tables, which are then written to the Databricks catalog. The SQL processing happens entirely in Databricks, after which you connect to the Databricks catalog from RStudio, Positron or another IDE to read in the created tables and continue the process with R code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the note on connecting to Databricks with a SQL warehouse at the bottom of the page, then we'd put an asterisk here:
"and continue the process with R code.*"

and delete the "For all processes...." paragraph


------------------------------------------------------------------------

This approach is appropriate when your team primarily works in R and your SQL logic is not particularly complex or lengthy. It involves translating all existing SQL logic into R so that all your code is in the same language. The R code would then be run from RStudio / Positron / another IDE and would connect to the Databricks catalog to access the data as in Approach 1 above.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the note on connecting to Databricks with a SQL warehouse at the bottom of the page, then we'd put an asterisk here:
"as in Approach 1 above.*"

and delete the "For all processes...." paragraph


------------------------------------------------------------------------

This approach is a useful short-term or transitional approach when you want to reuse existing SQL code with minimal changes. It keeps SQL and R closely linked by embedding SQL queries within an R workflow. Any SQL code would first need updating from T-SQL to Spark SQL, where it could then be passed via R code using wrapper functions to run in Databricks, whilst R controls execution.You can run SQL from R by creating a reusable wrapper function that uses a Databricks connection (e.g., via the DBI and odbc packages) and executes queries with a function like dbGetQuery()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the note on connecting to Databricks with a SQL warehouse at the bottom of the page, then we'd put an asterisk here:
"...function like dbGetQuery().*"

and delete the "For all processes...." paragraph


------------------------------------------------------------------------

This approach is a useful short-term or transitional approach when you want to reuse existing SQL code with minimal changes. It keeps SQL and R closely linked by embedding SQL queries within an R workflow. Any SQL code would first need updating from T-SQL to Spark SQL, where it could then be passed via R code using wrapper functions to run in Databricks, whilst R controls execution.You can run SQL from R by creating a reusable wrapper function that uses a Databricks connection (e.g., via the DBI and odbc packages) and executes queries with a function like dbGetQuery()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add space between "execution." and "You"

@laragarbett

Copy link
Copy Markdown
Contributor

Almost there @Lsnaathorst1 ! Just a handful of comments now, most very small!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants