diff --git a/Instructions/Labs/04-ingest-pipeline.md b/Instructions/Labs/04-ingest-pipeline.md
index 30334efa..82309084 100644
--- a/Instructions/Labs/04-ingest-pipeline.md
+++ b/Instructions/Labs/04-ingest-pipeline.md
@@ -18,7 +18,7 @@ Fabric also supports Apache Spark, enabling you to write and run code to process
This lab will take approximately **45** minutes to complete.
-> [!Note]
+> [!Note]
> You need access to a [Microsoft Fabric tenant](https://learn.microsoft.com/fabric/get-started/fabric-trial) to complete this exercise.
## Create a workspace
@@ -26,7 +26,7 @@ This lab will take approximately **45** minutes to complete.
Before working with data in Fabric, create a workspace with the Fabric trial enabled.
1. Navigate to the [Microsoft Fabric home page](https://app.fabric.microsoft.com/home?experience=fabric-developer) at `https://app.fabric.microsoft.com/home?experience=fabric-developer` in a browser and sign in with your Fabric credentials.
-1. In the menu bar on the left, select **Workspaces** (the icon looks similar to 🗇).
+1. In the navigation pane on the left, select **Workspaces** (the icon looks similar to 🗇).
1. Create a new workspace with a name of your choice, selecting a licensing mode in the **Advanced** section that includes Fabric capacity (*Trial*, *Premium*, or *Fabric*).
1. When your new workspace opens, it should be empty.
@@ -36,7 +36,7 @@ Before working with data in Fabric, create a workspace with the Fabric trial ena
Now that you have a workspace, it's time to create a data lakehouse into which you will ingest data.
-1. On the menu bar on the left, select **Create**. In the *New* page, under the *Data Engineering* section, select **Lakehouse**. Give it a unique name of your choice. Make sure the "Lakehouse schemas (Public Preview)" option is disabled.
+1. On the navigation pane on the left, select **Create**, and choose **Lakehouse**. Give it a unique name of your choice. Make sure the "Lakehouse schemas (Public Preview)" option is disabled.
>**Note**: If the **Create** option is not pinned to the sidebar, you need to select the ellipsis (**...**) option first.
@@ -48,68 +48,80 @@ Now that you have a workspace, it's time to create a data lakehouse into which y
A simple way to ingest data is to use a **Copy Data** activity in a pipeline to extract the data from a source and copy it to a file in the lakehouse.
-1. On the **Home** page for your lakehouse, select **Get data** and then select **New copy job**, and create a new data pipeline named `Ingest Sales Data`.
-1. If the **Copy Job** wizard doesn't open automatically, select **From any source to any destination** in the pipeline editor page.
-1. In the **Copy Job** wizard, on the **Choose data source** page, type HTTP in the search bar and then select **HTTP** in the **New sources** section.
+> **Note**: A *Copy Job* and a *Copy Data* activity are different methods for moving data in Fabric. A Copy Job is a standalone, simplified data movement tool and doesn't require a pipeline. A Copy Data activity is configured within a pipeline and supports orchestration with other activities. In this exercise, you use a **Copy Data** activity in a pipeline.
- 
+1. In the navigation pane on the left, select the name of your workspace.
+1. In the workspace, select **New item**, search for **Pipeline**, and create a new pipeline named `Ingest Sales Data`.
+1. In the pipeline editor canvas, select **Add pipeline activity** and then select **Copy activity**. A **Copy Data** activity is added to the pipeline canvas.
-1. In the **Connect to data source** pane, enter the following settings for the connection to your data source:
+ 
+
+### Configure the Source
+
+1. Select the **Copy Data** activity on the canvas, and then in the pane below the canvas select the **Source** tab.
+1. In the **Connection** drop-down, select **Browse all**.
+1. In the **New connection** dialog, search for **HTTP** and select it, then select **Continue**.
+1. Configure the following settings and then select **Connect**:
- **URL**: `https://raw.githubusercontent.com/MicrosoftLearning/dp-data/main/sales.csv`
- - **Connection**: Create new connection
- **Connection name**: *Specify a unique name*
- **Data gateway**: (none)
- **Authentication kind**: Anonymous
-1. Select **Next**. Then ensure the following settings are selected:
+1. Back on the **Source tab**, configure the following source settings:
- **Relative URL**: *Leave blank*
- - **Request method**: GET
- - **Additional headers**: *Leave blank*
- - **Binary copy**: Unselected
- - **Request timeout**: *Leave blank*
- - **Max concurrent connections**: *Leave blank*
-1. Select **Next**, and wait for the data to be sampled and then ensure that the following settings are selected:
- - **File format**: DelimitedText
+ - **File format**: Select **DelimitedText** from the drop-down
+
+ 
+
+1. Select the **Settings** button next to the **File format** drop-down. In the **File format settings** dialog, ensure the following settings are configured and then select **OK**:
+ - **Compression type**: No compression
- **Column delimiter**: Comma (,)
- **Row delimiter**: Line feed (\n)
- - **First row as header**: Selected
- - **Compression type**: None
-1. Select **Preview data** to see a sample of the data that will be ingested. Then close the data preview and select **Next**.
-1. On the **Connect to data destination** page, set the following data destination options, and then select **Next**:
+ - **First row as header**: *Selected*
+
+ 
+
+1. Select **Test connection** to verify the connection works.
+2. *Optional*: Select **Preview data** to confirm the data looks correct.
+
+### Configure the Destination
+
+1. Select the **Destination** tab. Then in the **Connection** drop-down, select **Browse all**.
+1. In the **New connection** dialog box, find and select your lakehouse in the *OneLake Catalog* section.
+1. After the connection is created, return to the **Destination** tab and configure the following settings:
+ - **Connection**: *Your newly created lakehouse connection*
+ - **Lakehouse**: *Select the lakehouse you created earlier*
- **Root folder**: Files
- - **Folder path name**: new_data
- - **File name**: sales.csv
- - **Copy behavior**: None
-1. Set the following file format options and then select **Next**:
- - **File format**: DelimitedText
- - **Column delimiter**: Comma (,)
- - **Row delimiter**: Line feed (\n)
- - **Add header to file**: Selected
- - **Compression type**: None
-1. On the **Copy summary** page, review the details of your copy operation and then select **Save + Run**.
+ - **File path**: *Directory*: new_data / *File name*: sales.csv
+1. No other changes are necessary.
- A new pipeline containing a **Copy Data** activity is created, as shown here:
+
- 
+### Run the pipeline
+
+1. On the **Home** tab, use the **🖫** (*Save*) icon to save the pipeline. Then use the **▷ Run** button to run the pipeline.
-1. When the pipeline starts to run, you can monitor its status in the **Output** pane under the pipeline designer. Use the **↻** (*Refresh*) icon to refresh the status, and wait until it has succeeeded.
-1. In the menu bar on the left, select your lakehouse.
+1. When the pipeline starts to run, you can monitor its status in the **Output** pane under the pipeline designer. Use the **↻** (*Refresh*) icon to refresh the status, and wait until it has succeeded.
+
+1. In the navigation pane on the left, select your lakehouse.
1. On the **Home** page, in the **Explorer** pane, expand **Files** and select the **new_data** folder to verify that the **sales.csv** file has been copied.
+
+
## Create a notebook
1. On the **Home** page for your lakehouse, in the **Open notebook** menu, select **New notebook**.
After a few seconds, a new notebook containing a single *cell* will open. Notebooks are made up of one or more cells that can contain *code* or *markdown* (formatted text).
-2. Select the existing cell in the notebook, which contains some simple code, and then replace the default code with the following variable declaration.
+1. Select the existing cell in the notebook, which contains some simple code, and then replace the default code with the following variable declaration.
```python
table_name = "sales"
```
-3. In the **...** menu for the cell (at its top-right) select **Toggle parameter cell**. This configures the cell so that the variables declared in it are treated as parameters when running the notebook from a pipeline.
+1. In the **...** menu for the cell (at its top-right) select **Toggle parameter cell**. This configures the cell so that the variables declared in it are treated as parameters when running the notebook from a pipeline.
-4. Under the parameters cell, use the **+ Code** button to add a new code cell. Then add the following code to it:
+1. Under the parameters cell, use the **+ Code** button to add a new code cell. Then add the following code to it:
```python
from pyspark.sql.functions import *
@@ -132,66 +144,66 @@ A simple way to ingest data is to use a **Copy Data** activity in a pipeline to
This code loads the data from the sales.csv file that was ingested by the **Copy Data** activity, applies some transformation logic, and saves the transformed data as a table - appending the data if the table already exists.
-5. Verify that your notebooks looks similar to this, and then use the **▷ Run all** button on the toolbar to run all of the cells it contains.
-
- 
+1. Verify that your notebooks looks similar to this, and then use the **▷ Run all** button on the toolbar to run all of the cells it contains.
> **Note**: Since this is the first time you've run any Spark code in this session, the Spark pool must be started. This means that the first cell can take a minute or so to complete.
-6. When the notebook run has completed, in the **Explorer** pane on the left, in the **...** menu for **Tables** select **Refresh** and verify that a **sales** table has been created.
-7. In the notebook menu bar, use the ⚙️ **Settings** icon to view the notebook settings. Then set the **Name** of the notebook to `Load Sales` and close the settings pane.
-8. In the hub menu bar on the left, select your lakehouse.
-9. In the **Explorer** pane, refresh the view. Then expand **Tables**, and select the **sales** table to see a preview of the data it contains.
+1. When the notebook run has completed, in the **Explorer** pane on the left, in the **...** menu for **Tables** select **Refresh** and verify that a **sales** table has been created.
+1. In the notebook menu bar, use the **Settings** icon to view the notebook settings. Then set the **Name** of the notebook to `Load Sales` and close the settings pane.
+1. In the hub menu bar on the left, select your lakehouse.
+1. In the **Explorer** pane, refresh the view. Then expand **Tables**, and select the **sales** table to see a preview of the data it contains.
+
+ 
## Modify the pipeline
Now that you've implemented a notebook to transform data and load it into a table, you can incorporate the notebook into a pipeline to create a reusable ETL process.
1. In the hub menu bar on the left select the **Ingest Sales Data** pipeline you created previously.
-2. On the **Activities** tab, in the **All activities** list, select **Delete data**. Then position the new **Delete data** activity to the left of the **Copy data** activity and connect its **On completion** output to the **Copy data** activity, as shown here:
+1. On the **Activities** tab, in the **All activities** list, select **Delete data**. Then position the new **Delete data** activity to the left of the **Copy data** activity and connect its **On completion** output to the **Copy data** activity, as shown here:

-3. Select the **Delete data** activity, and in the pane below the design canvas, set the following properties:
+1. Select the **Delete data** activity, and in the pane below the design canvas, set the following properties:
- **General**:
- **Name**: `Delete old files`
- **Source**
- - **Connection**: *Your lakehouse*
+ - **Connection**: Browse all, and select your lakehouse
- **File path type**: Wildcard file path
- **Folder path**: Files / **new_data**
- - **Wildcard file name**: `*.csv`
+ - **Wildcard file name**: `*.csv`
- **Recursively**: *Selected*
- **Logging settings**:
- **Enable logging**: *Unselected*
These settings will ensure that any existing .csv files are deleted before copying the **sales.csv** file.
-4. In the pipeline designer, on the **Activities** tab, select **Notebook** to add a **Notebook** activity to the pipeline.
-5. Select the **Copy data** activity and then connect its **On Completion** output to the **Notebook** activity as shown here:
+1. In the pipeline designer, on the **Activities** tab, select **Notebook** to add a **Notebook** activity to the pipeline.
+1. Select the **Copy data** activity and then connect its **On Completion** output to the **Notebook** activity as shown here:

-6. Select the **Notebook** activity, and then in the pane below the design canvas, set the following properties:
+1. Select the **Notebook** activity, and then in the pane below the design canvas, set the following properties:
- **General**:
- **Name**: `Load Sales notebook`
- **Settings**:
- - **Notebook**: Load Sales
+ - **Notebook**: Select your *Load Sales* notebook
- **Base parameters**: *Add a new parameter with the following properties:*
-
+
| Name | Type | Value |
| -- | -- | -- |
| table_name | String | new_sales |
The **table_name** parameter will be passed to the notebook and override the default value assigned to the **table_name** variable in the parameters cell.
-7. On the **Home** tab, use the **🖫** (*Save*) icon to save the pipeline. Then use the **▷ Run** button to run the pipeline, and wait for all of the activities to complete.
+1. On the **Home** tab, use the **🖫** (*Save*) icon to save the pipeline. Then use the **▷ Run** button to run the pipeline, and wait for all of the activities to complete.

> Note: In case you receive the error message *Spark SQL queries are only possible in the context of a lakehouse. Please attach a lakehouse to proceed*: Open your notebook, select the lakehouse you created on the left pane, select **Remove all Lakehouses** and then add it again. Go back to the pipeline designer and select **▷ Run**.
-8. In the hub menu bar on the left edge of the portal, select your lakehouse.
-9. In the **Explorer** pane, expand **Tables** and select the **new_sales** table to see a preview of the data it contains. This table was created by the notebook when it was run by the pipeline.
+1. In the hub menu bar on the left edge of the portal, select your lakehouse.
+1. In the **Explorer** pane, expand **Tables** and select the **new_sales** table to see a preview of the data it contains. This table was created by the notebook when it was run by the pipeline.
In this exercise, you implemented a data ingestion solution that uses a pipeline to copy data to your lakehouse from an external source, and then uses a Spark notebook to transform the data and load it into a table.
@@ -203,4 +215,4 @@ If you've finished exploring your lakehouse, you can delete the workspace you cr
1. In the bar on the left, select the icon for your workspace to view all of the items it contains.
1. Select **Workspace settings** and in the **General** section, scroll down and select **Remove this workspace**.
-1. Select **Delete** to delete the workspace.
\ No newline at end of file
+1. Select **Delete** to delete the workspace.
diff --git a/Instructions/Labs/Images/copy-data-destination.png b/Instructions/Labs/Images/copy-data-destination.png
new file mode 100644
index 00000000..a3038778
Binary files /dev/null and b/Instructions/Labs/Images/copy-data-destination.png differ
diff --git a/Instructions/Labs/Images/copy-data-pipeline.png b/Instructions/Labs/Images/copy-data-pipeline.png
index 56be97eb..e1b4e3cf 100644
Binary files a/Instructions/Labs/Images/copy-data-pipeline.png and b/Instructions/Labs/Images/copy-data-pipeline.png differ
diff --git a/Instructions/Labs/Images/copy-data-source-tab.png b/Instructions/Labs/Images/copy-data-source-tab.png
new file mode 100644
index 00000000..10b810de
Binary files /dev/null and b/Instructions/Labs/Images/copy-data-source-tab.png differ
diff --git a/Instructions/Labs/Images/file-format-settings.png b/Instructions/Labs/Images/file-format-settings.png
new file mode 100644
index 00000000..130cc7f8
Binary files /dev/null and b/Instructions/Labs/Images/file-format-settings.png differ
diff --git a/Instructions/Labs/Images/notebook.png b/Instructions/Labs/Images/notebook.png
index 7aa151f3..b6efa1c7 100644
Binary files a/Instructions/Labs/Images/notebook.png and b/Instructions/Labs/Images/notebook.png differ
diff --git a/Instructions/Labs/Images/pipeline-file-loaded.png b/Instructions/Labs/Images/pipeline-file-loaded.png
new file mode 100644
index 00000000..f13c0e4a
Binary files /dev/null and b/Instructions/Labs/Images/pipeline-file-loaded.png differ