diff --git a/CHANGELOG.md b/CHANGELOG.md index 07f2df07..bd3e0cfb 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,15 @@ ## lifebit-ai/cloudos-cli: changelog +## v2.92.0 (2026-05-28) + +### Feat: + +- Moves `cloudos link` into the `interactive-session` module as `cloudos interactive-session link` +- File Explorer paths now infer the project name from the first path segment (e.g. `my-project/Data/folder`); standard top-level folder names (`Data`, `AnalysesResults`, `Analyses_Results`, `Analyses-Results`, `Cohorts`) are treated as relative to the profile project +- Removes `--mount` from `cloudos interactive-session create` +- Introduces `--copy` as an optional flag of `--link` in `cloudos interactive-session create` to copy data into the session + + ## v2.91.0 (2026-05-28) ### Feat: @@ -7,12 +17,6 @@ - Implements linking of files in interactive session creation - Implements linking of files in `cloudos link` - Removes support for linking while resuming a paused interactive session -- Enforces a maximum of 100 linked items per interactive session -- Adds clearer, actionable error messages when mounts fail (e.g. translates "prefix does not exist" / "access denied" into workspace-permission guidance) - -### Breaking: - -- `cloudos link` and `cloudos datasets link`: File Explorer paths must now be RELATIVE to `--project-name` (do NOT prepend the project name). Previously the leading `/` segment was advertised but produced confusing errors; it is now rejected up front with a clear message pointing to the correct form. `cloudos interactive-session create --link` still uses `/` format — see each command's `--help` for the explicit cross-reference. ## v2.90.2 (2026-05-07) diff --git a/README.md b/README.md index 36471980..cd60e2d0 100644 --- a/README.md +++ b/README.md @@ -2147,9 +2147,9 @@ cloudos interactive-session create \ - `--shutdown-in`: Auto-shutdown duration (e.g., `8h`, `2d`, `30m`, default: `12h`) **Data & Storage Management:** -- `--mount`: Mount a data file into the session. Supports both Lifebit Platform datasets and S3 files (AWS only). Format: `project_name/dataset_path` (e.g., `leila-test/Data/file.csv`) or `s3://bucket/path/to/file` (e.g., `s3://my-bucket/data/file.csv`). Can be used multiple times. -- `--link`: Link a file or folder into the session for read access (AWS only). Supports S3 files/folders (e.g., `s3://my-bucket/data/file.csv`, `s3://my-bucket/data/`) and File Explorer files/folders (e.g., `my-project/Data/file.csv`, `my-project/Data/results`). S3 paths whose last segment contains a `.` are treated as files; paths ending with `/` or without an extension are treated as folders. Multiple items can be specified using multiple `--link` flags or as comma-separated paths in a single `--link` argument. -**Note:** Linking is not supported on Azure. Use Lifebit Platform File Explorer for data access. +- `--link`: Link a file or folder into the session for read access (AWS only). Supports S3 files/folders (e.g., `s3://my-bucket/data/file.csv`, `s3://my-bucket/data/`) and File Explorer files/folders (e.g., `my-project/Data/file.csv`, `my-project/Data/results`). S3 paths whose last segment contains a `.` are treated as files; paths ending with `/` or without an extension are treated as folders. Multiple items can be specified using multiple `--link` flags or as comma-separated paths in a single `--link` argument. Use `--copy` to copy data into the session instead. +- `--copy`: Copy data into the session instead of linking for read access. When specified, the paths provided by `--link` are copied into the session's data volume. Supports Lifebit Platform datasets (`project_name/Data/file.csv`) and S3 files (`s3://bucket/path/to/file`). AWS only for S3 files. +**Note:** Linking is not supported on Azure. Use `--link --copy` to copy Lifebit Platform data into the session on Azure. **Backend-Specific:** - `--r-version`: R version for RStudio (options: `4.4.2`, `4.5.2`) - **optional for rstudio** (default: `4.4.2`) @@ -2162,20 +2162,21 @@ cloudos interactive-session create \ **Data Management** CloudOS CLI supports multiple ways to access data in interactive sessions, depending on your execution platform: -- **Mount files** (`--mount`): Files are copied into the session's mounted-data volume. Supports CloudOS File Explorer files and S3 files (AWS only). +- **Copy files** (`--link --copy`): Files are copied into the session's data volume. Supports Lifebit Platform File Explorer files and S3 files (AWS only). - **Link files/folders** (`--link`): Files and folders are mounted as read-accessible items in the session (AWS only). Supports S3 files, S3 folders, and Lifebit Platform File Explorer files and folders. Linked items appear with unique mount names based on the item name. Maximum 100 items per session. -**Data Mounting Examples** +**Data Management Examples** -Mount a file from File Explorer: +Copy a file from File Explorer into the session: ```bash cloudos interactive-session create \ --profile my_profile \ --name "Data Analysis" \ --session-type jupyter \ - --mount "my_project/training_data.csv" + --link "my_project/training_data.csv" \ + --copy ``` Link an S3 folder: @@ -2578,7 +2579,114 @@ All configuration parameters are optional. If not specified, the session resumes - `--cost-limit ` - Update compute cost limit (-1 for unlimited) - `--shutdown-in ` - Update auto-shutdown time (e.g., 8h, 2d) -> To link or mount data to a running session, use `cloudos link` or `cloudos datasets link` after the session has resumed. +> To link or copy data to a running session, use `cloudos interactive-session link` after the session has resumed. + +### Link + +The `cloudos interactive-session link` command provides a unified interface for linking files and folders to interactive analysis sessions. It consolidates functionality previously available through separate commands (`cloudos job results --link`, `cloudos job workdir --link`, `cloudos job logs --link`, and `cloudos datasets link`) into a single, intuitive interface. + +#### Link Files and Folders to Interactive Analysis + +Link job-related folders or custom S3/File Explorer paths (files and folders) to your interactive analysis sessions for direct access to data without needing to copy files. + +**Two modes of operation:** + +1. **Job-based linking** (`--job-id`): Links folders from a completed or running job + - By default, links results, workdir, and logs folders + - Use `--results`, `--workdir`, or `--logs` flags to link only specific folders + +2. **Direct path linking** (PATH argument): Links specific S3 or File Explorer paths (files or folders). Supports a single path or comma-separated multiple paths. + - S3 paths whose last segment contains a `.` are treated as files (e.g., `s3://bucket/data/file.csv`) + - S3 paths ending with `/` or without an extension are treated as folders + - File Explorer paths can point to either files or folders — the CLI detects the type automatically + - If the first path segment is a standard top-level folder name (`Data`, `AnalysesResults`, `Analyses_Results`, `Analyses-Results`, `Cohorts`), the path is resolved against the profile project. Otherwise the first segment is treated as the project name (e.g. `other-project/Data/file.csv`). + +**Basic usage:** + +```bash +# Link all job folders (results, workdir, logs) - default behavior +cloudos interactive-session link --job-id --session-id --profile my_profile + +# Link only specific folders from a job +cloudos interactive-session link --job-id --session-id --results --profile my_profile +cloudos interactive-session link --job-id --session-id --workdir --logs --profile my_profile + +# Link a single S3 folder +cloudos interactive-session link s3://bucket/folder/ --session-id --profile my_profile + +# Link a single S3 file +cloudos interactive-session link s3://bucket/data/file.csv --session-id --profile my_profile + +# Link multiple S3 paths (comma-separated, files and folders mixed) +cloudos interactive-session link s3://bucket1/data/,s3://bucket2/results/file.csv --session-id --profile my_profile + +# Link a File Explorer folder from the profile project +cloudos interactive-session link Data/MyFolder --session-id --profile my_profile + +# Link a File Explorer file from a different project +cloudos interactive-session link other-project/Data/file.csv --session-id --profile my_profile + +# Mix paths from the profile project, another project, and S3 +cloudos interactive-session link Data/MultiQC,other-project/Data/file.csv,s3://bucket/results/ --session-id --profile my_profile +``` + +**Command options:** + +- `PATH`: S3 or File Explorer path(s) to link (positional argument, required if `--job-id` is not provided). Supports comma-separated multiple paths for batch linking (e.g., `s3://bucket1/path1,s3://bucket2/path2`) +- `--apikey` / `-k`: Your Lifebit Platform API key (required) +- `--cloudos-url` / `-c`: The Lifebit Platform URL (default: https://cloudos.lifebit.ai) +- `--workspace-id`: The specific Lifebit Platform workspace ID (required) +- `--session-id`: The specific Lifebit Platform interactive session ID (required) +- `--job-id`: The job ID in Lifebit Platform (links results, workdir, and logs by default) +- `--project-name`: Lifebit Platform project name (used as fallback for job-based linking) +- `--results`: Link only results folder (only works with `--job-id`) +- `--workdir`: Link only working directory (only works with `--job-id`) +- `--logs`: Link only logs folder (only works with `--job-id`) +- `--verbose`: Print detailed information messages +- `--disable-ssl-verification`: Disable SSL certificate verification +- `--ssl-cert`: Path to your SSL certificate file +- `--profile`: Profile to use from the config file + +**Examples:** + +```bash +# Link all folders from a completed job +cloudos interactive-session link --job-id 62c83a1191fe06013b7ef355 --session-id abc123 --profile my_profile + +# Link only results from a job +cloudos interactive-session link --job-id 62c83a1191fe06013b7ef355 --session-id abc123 --results --profile my_profile + +# Link workdir and logs (but not results) +cloudos interactive-session link --job-id 62c83a1191fe06013b7ef355 --session-id abc123 --workdir --logs --profile my_profile + +# Link a single S3 bucket folder +cloudos interactive-session link s3://my-bucket/analysis-results/2024 --session-id abc123 --profile my_profile + +# Link multiple S3 folders in one command +cloudos interactive-session link s3://bucket1/data,s3://bucket2/results,s3://bucket3/final-output --session-id abc123 --profile my_profile + +# Link File Explorer paths from the profile project +cloudos interactive-session link Data/MultiQC --session-id abc123 --profile my_profile + +# Link File Explorer paths from multiple projects in one command +cloudos interactive-session link leila-test/Data/MultiQC,Daniel_Test_Files/Data/20131219.populations.tsv --session-id abc123 --profile my_profile +``` + +**Error handling:** + +The command provides clear error messages for common scenarios: +- Job not completed (for results linking) +- Folders not available or deleted +- Job still initializing +- Invalid paths or permissions + +> [!NOTE] +> If running the CLI inside a Jupyter session, the pre-configured CLI installation will have the session ID already configured and only the `--apikey` needs to be added. + +> [!NOTE] +> Azure Blob Storage paths (az://) are not supported for linking in Azure environments. + +--- ### Datasets @@ -2772,113 +2880,6 @@ cloudos datasets rm --profile my_profile --- -### Link - -The `cloudos link` command provides a unified interface for linking files and folders to interactive analysis sessions. This command consolidates functionality previously available through separate commands (`cloudos job results --link`, `cloudos job workdir --link`, `cloudos job logs --link`, and `cloudos datasets link`) into a single, intuitive interface. - -#### Link Files and Folders to Interactive Analysis - -Link job-related folders or custom S3/File Explorer paths (files and folders) to your interactive analysis sessions for direct access to data without needing to copy files. - -**Two modes of operation:** - -1. **Job-based linking** (`--job-id`): Links folders from a completed or running job - - By default, links results, workdir, and logs folders - - Use `--results`, `--workdir`, or `--logs` flags to link only specific folders - -2. **Direct path linking** (PATH argument): Links specific S3 or File Explorer paths (files or folders). Supports a single path or comma-separated multiple paths. - - S3 paths whose last segment contains a `.` are treated as files (e.g., `s3://bucket/data/file.csv`) - - S3 paths ending with `/` or without an extension are treated as folders - - File Explorer paths can point to either files or folders — the CLI detects the type automatically - -**Basic usage:** - -```bash -# Link all job folders (results, workdir, logs) - default behavior -cloudos link --job-id --session-id --profile my_profile - -# Link only specific folders from a job -cloudos link --job-id --session-id --results --profile my_profile -cloudos link --job-id --session-id --workdir --logs --profile my_profile - -# Link a single S3 folder -cloudos link s3://bucket/folder/ --session-id --profile my_profile - -# Link a single S3 file -cloudos link s3://bucket/data/file.csv --session-id --profile my_profile - -# Link multiple S3 paths (comma-separated, files and folders mixed) -cloudos link s3://bucket1/data/,s3://bucket2/results/file.csv --session-id --profile my_profile - -# Link a File Explorer folder (path is RELATIVE to --project-name; do NOT prepend the project) -cloudos link "Data/MyFolder" --project-name my-project --session-id --profile my_profile - -# Link a File Explorer file (path is RELATIVE to --project-name) -cloudos link "Data/file.csv" --project-name my-project --session-id --profile my_profile - -# Link several File Explorer items at once (all in the same project) -cloudos link "Data/MyFolder,Data/file.csv,Results/run-1" --project-name my-project --session-id --profile my_profile -``` - -> [!IMPORTANT] -> **`cloudos link` is single-project for File Explorer paths.** All File Explorer items linked in one invocation must belong to the project named in `--project-name`. The path must be relative to that project — prepending the project name to the path (e.g. `my-project/Data/file.csv`) is rejected. To link items from a different project, run `cloudos link` again with a different `--project-name`. - -**Command options:** - - -- `PATH`: S3 or File Explorer path(s) to link (positional argument, required if `--job-id` is not provided). Supports comma-separated multiple paths for batch linking (e.g., `s3://bucket1/path1,s3://bucket2/path2`) -- `--apikey` / `-k`: Your Lifebit Platform API key (required) -- `--cloudos-url` / `-c`: The Lifebit Platform URL (default: https://cloudos.lifebit.ai) -- `--workspace-id`: The specific Lifebit Platform workspace ID (required) -- `--session-id`: The specific Lifebit Platform interactive session ID (required) -- `--job-id`: The job ID in Lifebit Platform (links results, workdir, and logs by default) -- `--project-name`: Lifebit Platform project name. Required when any PATH is a File Explorer path. All FE paths in one invocation must belong to this project and must be RELATIVE to it (do not prepend the project name) -- `--results`: Link only results folder (only works with `--job-id`) -- `--workdir`: Link only working directory (only works with `--job-id`) -- `--logs`: Link only logs folder (only works with `--job-id`) -- `--verbose`: Print detailed information messages -- `--disable-ssl-verification`: Disable SSL certificate verification -- `--ssl-cert`: Path to your SSL certificate file -- `--profile`: Profile to use from the config file - -**Examples:** - -```bash -# Link all folders from a completed job -cloudos link --job-id 62c83a1191fe06013b7ef355 --session-id abc123 --profile my_profile - -# Link only results from a job -cloudos link --job-id 62c83a1191fe06013b7ef355 --session-id abc123 --results --profile my_profile - -# Link workdir and logs (but not results) -cloudos link --job-id 62c83a1191fe06013b7ef355 --session-id abc123 --workdir --logs --profile my_profile - -# Link a single S3 bucket folder -cloudos link s3://my-bucket/analysis-results/2024 --session-id abc123 --profile my_profile - -# Link multiple S3 folders in one command -cloudos link s3://bucket1/data,s3://bucket2/results,s3://bucket3/final-output --session-id abc123 --profile my_profile - -# Mix different S3 prefixes from the same or different buckets -cloudos link s3://lifebit-datasets/pipelines/vep/,s3://lifebit-datasets/pipelines/phewas/,s3://my-results/output/ --session-id abc123 --profile my_profile -``` - -**Error handling:** - -The command provides clear error messages for common scenarios: -- Job not completed (for results linking) -- Folders not available or deleted -- Job still initializing -- Invalid paths or permissions - -> [!NOTE] -> If running the CLI inside a Jupyter session, the pre-configured CLI installation will have the session ID already configured and only the `--apikey` needs to be added. - -> [!NOTE] -> Azure Blob Storage paths (az://) are not supported for linking in Azure environments. - ---- - ### Procurement Lifebit Platform supports procurement functionality to manage and list images associated with organizations within a given procurement. This feature is useful for administrators and users who need to view available container images across different organizations in their procurement. diff --git a/cloudos_cli/__main__.py b/cloudos_cli/__main__.py index 58236fd0..267f3678 100644 --- a/cloudos_cli/__main__.py +++ b/cloudos_cli/__main__.py @@ -24,7 +24,6 @@ from cloudos_cli.procurement.cli import procurement from cloudos_cli.datasets.cli import datasets from cloudos_cli.configure.cli import configure -from cloudos_cli.link.cli import link from cloudos_cli.interactive_session.cli import interactive_session @@ -63,7 +62,6 @@ def run_cloudos_cli(ctx): run_cloudos_cli.add_command(procurement) run_cloudos_cli.add_command(datasets) run_cloudos_cli.add_command(configure) -run_cloudos_cli.add_command(link) run_cloudos_cli.add_command(interactive_session) if __name__ == '__main__': diff --git a/cloudos_cli/_version.py b/cloudos_cli/_version.py index 1271f796..363dce345 100644 --- a/cloudos_cli/_version.py +++ b/cloudos_cli/_version.py @@ -1 +1 @@ -__version__ = '2.91.0' +__version__ = '2.92.0' diff --git a/cloudos_cli/datasets/cli.py b/cloudos_cli/datasets/cli.py index a3514bc3..b9407cb8 100644 --- a/cloudos_cli/datasets/cli.py +++ b/cloudos_cli/datasets/cli.py @@ -4,7 +4,7 @@ import csv import sys from cloudos_cli.datasets import Datasets -from cloudos_cli.link import Link +from cloudos_cli.interactive_session.link import Link from cloudos_cli.utils.resources import ssl_selector, format_bytes from cloudos_cli.configure.configure import with_profile_config, CLOUDOS_URL from cloudos_cli.logging.logger import update_command_context_from_click @@ -326,8 +326,8 @@ def move_files(ctx, source_path, destination_path, apikey, cloudos_url, workspac if folder_type in ("VirtualFolder", "Folder"): target_kind = "Folder" elif folder_type == "S3Folder": - raise ValueError(f"Unable to move item '{source_item_name}' to '{destination_path}'. " + - "The destination is an S3 folder, and only virtual folders can be selected as valid move destinations.") + raise ValueError(f"Unable to move item '{source_item_name}' to '{destination_path}'. " + "The destination is an S3 folder, and only virtual folders can be selected as valid move destinations.") elif isinstance(folder_type, bool) and folder_type: # legacy dataset structure target_kind = "Dataset" else: @@ -335,8 +335,8 @@ def move_files(ctx, source_path, destination_path, apikey, cloudos_url, workspac except Exception as e: raise ValueError(f"Could not resolve destination path '{destination_path}'. {str(e)}") - print(f"Moving {source_kind} '{source_item_name}' to '{destination_path}' " + - f"in project '{destination_project_name} ...") + print(f"Moving {source_kind} '{source_item_name}' to '{destination_path}' " + f"in project '{destination_project_name} ...") # === Perform Move === try: response = source_client.move_files_and_folders( @@ -756,11 +756,10 @@ def link(ctx, """ Link a file or folder (S3 or File Explorer) to an active interactive analysis. - PATH [path]: the full path to the S3 file/folder, or a path RELATIVE to - the project named in --project-name for File Explorer items. Do NOT - prepend the project name to File Explorer paths. + PATH [path]: the full path to the S3 file/folder or relative path in File Explorer + (relative to the project specified by --project-name). E.g.: 's3://bucket-name/folder/subfolder', 's3://bucket/data/file.csv', - 'Data/Downloads', 'Data/file.csv'. + 'Data/Downloads', 'Data', or 'Data/file.csv'. """ if not path.startswith("s3://") and project_name is None: raise click.UsageError("When using File Explorer paths '--project-name' needs to be defined") @@ -777,13 +776,8 @@ def link(ctx, ) try: - succeeded = link_p.link_folder(path, session_id) + success = link_p.link_folder(path, session_id) except Exception as e: raise ValueError(f"Could not link item. {e}") - - if not succeeded: - click.secho( - "Linking did not complete successfully. See errors above.", - fg='red', err=True, - ) - raise SystemExit(1) + if not success: + raise click.ClickException("Linking failed: mount verification did not reach 'mounted' status.") diff --git a/cloudos_cli/interactive_session/__init__.py b/cloudos_cli/interactive_session/__init__.py index 1e1d8298..68c8e048 100644 --- a/cloudos_cli/interactive_session/__init__.py +++ b/cloudos_cli/interactive_session/__init__.py @@ -1 +1,5 @@ """CloudOS interactive session module.""" + +from .link import Link + +__all__ = ['Link'] diff --git a/cloudos_cli/interactive_session/cli.py b/cloudos_cli/interactive_session/cli.py index a1c9b9b2..b60bbfd7 100644 --- a/cloudos_cli/interactive_session/cli.py +++ b/cloudos_cli/interactive_session/cli.py @@ -4,8 +4,7 @@ import json import time from cloudos_cli.clos import Cloudos -from cloudos_cli.datasets import Datasets -from cloudos_cli.link import Link +from cloudos_cli.interactive_session.link import Link from cloudos_cli.utils.errors import BadRequestException from cloudos_cli.utils.resources import ssl_selector from cloudos_cli.interactive_session.interactive_session import ( @@ -18,7 +17,6 @@ parse_link_path, build_session_payload, format_session_creation_table, - resolve_data_file_id, validate_session_id, validate_instance_type, get_interactive_session_status, @@ -37,16 +35,52 @@ from cloudos_cli.utils.cli_helpers import pass_debug_to_subcommands -def _check_duplicate_mount_name(mount_name, link_path, seen): - """Register mount_name in seen, or exit cleanly if already present. +_PROJECT_ROOT_FOLDERS = {'data', 'analysesresults', 'analyses_results', 'analyses-results', 'cohorts'} + + +def _normalize_file_explorer_path(path, project_name): + """Resolve (folder_path, resolved_project_name) for a File Explorer path. + + If the first path segment is a known top-level folder name (Data, + AnalysesResults, Analyses_Results, Analyses-Results, Cohorts) the path is + treated as relative to the profile project (project_name). Otherwise the + first segment is treated as the project name and the remainder as the path. + S3 / Azure paths are returned unchanged with project_name=None. - Delegates duplicate detection to Link._raise_if_duplicate_mount so the - error wording stays consistent between this command and `cloudos link`. + Returns (normalized_path, resolved_project_name). """ - try: - Link._raise_if_duplicate_mount(mount_name, link_path, seen) - except ValueError as e: - click.secho(f"Error: {e}", fg='red', err=True) + if path.startswith('s3://') or path.startswith('az://'): + return path, None + if '/' not in path: + return path, project_name + first_segment, _ = path.split('/', 1) + if first_segment.lower() in _PROJECT_ROOT_FOLDERS: + return path, project_name + inferred_project, folder_path = path.split('/', 1) + return folder_path, inferred_project + + +def _make_link_client(cloudos_url, apikey, workspace_id, project_name, verify_ssl): + """Instantiate a Link client for the given project.""" + return Link( + cloudos_url=cloudos_url, + apikey=apikey, + cromwell_token=None, + workspace_id=workspace_id, + project_name=project_name, + verify=verify_ssl + ) + + +def _check_duplicate_mount_name(mount_name, link_path, seen): + """Raise SystemExit(1) if mount_name already exists in seen, otherwise register it.""" + if mount_name in seen: + click.secho( + f"Error: Duplicate mount name '{mount_name}' detected. " + f"The items '{seen[mount_name]}' and '{link_path}' " + f"would both be mounted with the same name. Please use items with unique names.", + fg='red', err=True + ) raise SystemExit(1) seen[mount_name] = link_path @@ -151,7 +185,7 @@ def list_sessions(ctx, raise ValueError('Please use a positive integer (>= 1) for the --page parameter') # Validate table columns if specified - valid_columns = {'id', 'name', 'status', 'type', 'instance', 'cost', 'owner', 'project', + valid_columns = {'id', 'name', 'status', 'type', 'instance', 'cost', 'owner', 'project', 'created_at', 'runtime', 'saved_at', 'resources', 'backend', 'version', 'spot', 'cost_limit', 'time_left'} selected_columns = table_columns @@ -192,9 +226,10 @@ def list_sessions(ctx, pagination_metadata = result.get('pagination_metadata', None) # Create callback function for fetching additional pages - fetch_page = lambda page_num: fetch_interactive_session_page( - cl, workspace_id, page_num, limit, filter_status, filter_only_mine, archived, verify_ssl - ) + def fetch_page(page_num): + return fetch_interactive_session_page( + cl, workspace_id, page_num, limit, filter_status, filter_only_mine, archived, verify_ssl + ) # Handle empty results if len(sessions) == 0: @@ -217,7 +252,7 @@ def list_sessions(ctx, with open(outfile, 'w') as o: o.write(json.dumps(sessions, indent=2)) print(f'\tInteractive session list collected with a total of {len(sessions)} sessions on this page.') - print(f'\tInteractive session list saved to {outfile}') + print(f'\tInteractive session list saved to {outfile}') else: raise ValueError('Unrecognised output format. Please use one of [stdout|csv|json]') @@ -293,27 +328,12 @@ def list_sessions(ctx, @click.option('--shutdown-in', help='Auto-shutdown duration (e.g., 8h, 2d). Default=12h.', default='12h') -@click.option('--mount', - multiple=True, - help='Mount a data file into the session. Supports both Lifebit Platform datasets and S3 files. Format: project_name/dataset_path (e.g., leila-test/Data/file.csv) or s3://bucket/path/to/file (e.g., s3://my-bucket/data/file.csv). Can be used multiple times.') @click.option('--link', multiple=True, - help=( - 'Link a file or folder into the session for read access. Supports ' - 'S3 files and folders (e.g. s3://bucket/path/file.csv or ' - 's3://bucket/path/) and File Explorer files and folders ' - '(project-name/path/to/item — must include project name). S3 paths ' - 'whose last segment contains a "." are treated as files; paths ending ' - 'with "/" or without an extension are treated as folders. Both S3 and ' - 'File Explorer items can be combined. Provide multiple paths as ' - 'comma-separated values or use --link multiple times. ' - 'Examples: --link s3://bucket/data/file.csv,my-project/Data/results ' - 'OR --link s3://bucket1/path/ --link my-project/Data/file.csv. ' - 'NOTE: format is `/` — the project is part of ' - 'the path, so a single command can link items from multiple projects. ' - 'This differs from `cloudos link`, where the project comes from ' - '--project-name and must NOT appear in the path.' - )) + help='Link a file or folder into the session for read access. Supports S3 files/folders (s3://bucket/path/) and File Explorer files/folders. File Explorer paths can be given in two forms: (1) include the project name explicitly (e.g. my-project/Data/results) or (2) start with a known root folder (Data/, AnalysesResults/, Cohorts/, etc.) and --project-name or a profile project will be used to resolve the project. Both S3 and File Explorer types can be combined. Provide multiple paths as comma-separated values or use --link multiple times. Use --copy to copy data into the session instead. Examples: --link s3://bucket/data/,my-project/Data/results OR --link s3://bucket1/path/ --link Data/results') +@click.option('--copy', + is_flag=True, + help='Copy data into the session instead of linking for read access. When specified, the paths provided by --link are copied into the session\'s data volume. Supports Lifebit Platform datasets (project_name/Data/file.csv) and S3 files (s3://bucket/path/to/file).') @click.option('--r-version', type=click.Choice(['4.5.2', '4.4.2'], case_sensitive=False), help='R version for RStudio. Options: 4.5.2 (default), 4.4.2.', @@ -357,8 +377,8 @@ def create_session(ctx, shared, cost_limit, shutdown_in, - mount, link, + copy, r_version, spark_master, spark_core, @@ -370,7 +390,7 @@ def create_session(ctx, verbose): """Create a new interactive session.""" - verify_ssl = ssl_selector(disable_ssl_verification, ssl_cert) + verify_ssl = ssl_selector(disable_ssl_verification, ssl_cert) # Default execution_platform to 'aws' if not specified by user or profile if execution_platform is None: execution_platform = 'aws' @@ -439,22 +459,35 @@ def create_session(ctx, click.secho(f'Error: Invalid shutdown duration: {str(e)}', fg='red', err=True) raise SystemExit(1) - # Parse and resolve mounted data files (both Lifebit Platform and S3) + # Flatten comma-separated paths within --link options + all_link_paths = [] + for link_entry in link: + paths = [p.strip() for p in link_entry.split(',') if p.strip()] + all_link_paths.extend(paths) + parsed_data_files = [] - parsed_link_items = [] # Items go into FUSE mounts (S3 folders/files + File Explorer folders/files) - if mount: + parsed_s3_mounts = [] # S3 folders/files go into FUSE mounts + _data_file_display_meta = [] # Parallel list: display metadata per entry in parsed_data_files + + # When --copy is set, copy data into the session (dataItems) instead of linking + if copy and all_link_paths: try: - for df in mount: - parsed = parse_data_file(df) + for link_path in all_link_paths: + if not link_path.startswith('s3://') and not link_path.startswith('az://'): + norm_path, resolved_project = _normalize_file_explorer_path(link_path, project_name) + if resolved_project is None: + raise click.UsageError( + f"--project-name is required for File Explorer paths that start with a known " + f"top-level folder name (Data, AnalysesResults, Cohorts, etc.). Got: '{link_path}'" + ) + link_path = f"{resolved_project}/{norm_path}" + parsed = parse_data_file(link_path) if parsed['type'] == 's3': - # S3 files are only supported on AWS if execution_platform != 'aws': - click.secho(f'Error: S3 mounts are only supported on AWS. Use Lifebit Platform file explorer paths for Azure.', fg='red', err=True) + click.secho(f'Error: S3 files are only supported on AWS. Use Lifebit Platform file explorer paths for Azure.', fg='red', err=True) raise SystemExit(1) - # S3 file: add to dataItems as S3File type if verbose: - print(f'\tMounting S3 file: s3://{parsed["s3_bucket"]}/{parsed["s3_prefix"]}') - # Use the full path as the name + print(f'\tCopying S3 file: s3://{parsed["s3_bucket"]}/{parsed["s3_prefix"]}') s3_file_item = { "type": "S3File", "data": { @@ -464,46 +497,56 @@ def create_session(ctx, } } parsed_data_files.append(s3_file_item) + _data_file_display_meta.append(None) if verbose: - print(f'\t ✓ Added S3 file to mount') + print(f'\t ✓ Added S3 file to copy') else: # type == 'cloudos' - # Lifebit Platform dataset file: resolve via Datasets API data_project = parsed['project_name'] dataset_path = parsed['dataset_path'] if verbose: - print(f'\tResolving dataset: {data_project}/{dataset_path}') - # Create a Datasets API instance for this specific project - datasets_api = Datasets( - cloudos_url=cloudos_url, - apikey=apikey, - workspace_id=workspace_id, - project_name=data_project, - verify=verify_ssl, - cromwell_token=None - ) - resolved = resolve_data_file_id(datasets_api, dataset_path) + print(f'\tCopying dataset: {data_project}/{dataset_path}') + fe_link = _make_link_client(cloudos_url, apikey, workspace_id, data_project, verify_ssl) + resolved = fe_link._parse_file_explorer_item(dataset_path)["dataItem"] parsed_data_files.append(resolved) + _data_file_display_meta.append({ + "is_file_explorer": True, + "original_path": f"{data_project}/{dataset_path}" + }) if verbose: - print(f'\t ✓ Resolved to file ID: {resolved["item"]}') + print(f'\t ✓ Resolved to ID: {resolved["item"]}') + except SystemExit: + raise except Exception as e: - click.secho(f'Error: Failed to resolve dataset files: {str(e)}', fg='red', err=True) + click.secho(f'Error: Failed to resolve data files for copy: {str(e)}', fg='red', err=True) raise SystemExit(1) - # Parse and add linked items from --link (S3 or CloudOS, files or folders) - # Flatten comma-separated paths within --link options - all_link_paths = [] - for link_entry in link: - paths = [p.strip() for p in link_entry.split(',') if p.strip()] - all_link_paths.extend(paths) + data_files_for_display = [] + for df, meta in zip(parsed_data_files, _data_file_display_meta or [None] * len(parsed_data_files)): + if meta is not None: + display_df = df.copy() + display_df['_isFileExplorer'] = meta['is_file_explorer'] + display_df['_originalPath'] = meta['original_path'] + data_files_for_display.append(display_df) + else: + data_files_for_display.append(df) + # Parse and add linked items from --link (S3 or CloudOS, files or folders) mount_names_seen = {} # Track mount names to detect duplicates - link_display_info = {} # Track File Explorer paths for display (not sent to API) - for link_path in all_link_paths: + s3_mount_display_info = {} # Track File Explorer paths for display (not sent to API) + for link_path in all_link_paths if not copy else []: try: # Block all linking on Azure platforms if execution_platform == 'azure': - click.secho(f'Error: Linking is not supported on Azure. Please use `cloudos interactive-session create --mount` to load your data in the session.', fg='red', err=True) + click.secho(f'Error: Linking is not supported on Azure. Use `--copy` flag with `--link` to copy data into the session instead.', fg='red', err=True) raise SystemExit(1) + if not link_path.startswith('s3://') and not link_path.startswith('az://'): + norm_path, resolved_project = _normalize_file_explorer_path(link_path, project_name) + if resolved_project is None: + raise click.UsageError( + f"--project-name is required for File Explorer paths that start with a known " + f"top-level folder name (Data, AnalysesResults, Cohorts, etc.). Got: '{link_path}'" + ) + link_path = f"{resolved_project}/{norm_path}" parsed = parse_link_path(link_path) if parsed['type'] == 's3': if execution_platform != 'aws': @@ -539,7 +582,7 @@ def create_session(ctx, "s3Prefix": parsed["s3_prefix"] } } - parsed_link_items.append(s3_mount_item) + parsed_s3_mounts.append(s3_mount_item) if verbose: print(f'\t ✓ Linked S3: {mount_name}') @@ -549,20 +592,11 @@ def create_session(ctx, if verbose: print(f'\tLinking Lifebit Platform item: {folder_project}/{folder_path}') try: - fe_link = Link( - cloudos_url=cloudos_url, - apikey=apikey, - workspace_id=workspace_id, - project_name=folder_project, - cromwell_token=None, - verify=verify_ssl - ) - fe_item = fe_link.parse_file_explorer_item(folder_path) + fe_link = _make_link_client(cloudos_url, apikey, workspace_id, folder_project, verify_ssl) + fe_item = fe_link._parse_file_explorer_item(folder_path) item_kind = fe_item["dataItem"]["kind"] item_id = fe_item["dataItem"]["item"] mount_name = fe_item["dataItem"]["name"] - except ValueError: - raise except Exception as e: error_msg = str(e) if "404" in error_msg or "not found" in error_msg.lower(): @@ -580,9 +614,9 @@ def create_session(ctx, "item": item_id, "name": mount_name } - parsed_link_items.append(cloudos_mount_item) + parsed_s3_mounts.append(cloudos_mount_item) - link_display_info[mount_name] = { + s3_mount_display_info[mount_name] = { "is_file_explorer": True, "original_path": f"{folder_project}/{folder_path}" } @@ -594,18 +628,18 @@ def create_session(ctx, click.secho(f'Error: Failed to link item: {str(e)}', fg='red', err=True) raise SystemExit(1) - # Create display version of link items with File Explorer markers - link_items_for_display = [] - for mount in parsed_link_items: + # Create display version of s3_mounts with File Explorer markers + s3_mounts_for_display = [] + for mount in parsed_s3_mounts: # FE items use kind/item/name; S3 items use type/data mount_name = mount.get('name') or mount.get('data', {}).get('name', '') - if mount_name in link_display_info: + if mount_name in s3_mount_display_info: display_mount = mount.copy() - display_mount['_isFileExplorer'] = link_display_info[mount_name]['is_file_explorer'] - display_mount['_originalPath'] = link_display_info[mount_name]['original_path'] - link_items_for_display.append(display_mount) + display_mount['_isFileExplorer'] = s3_mount_display_info[mount_name]['is_file_explorer'] + display_mount['_originalPath'] = s3_mount_display_info[mount_name]['original_path'] + s3_mounts_for_display.append(display_mount) else: - link_items_for_display.append(mount) + s3_mounts_for_display.append(mount) # Build the session payload payload = build_session_payload( @@ -620,7 +654,7 @@ def create_session(ctx, shutdown_at=shutdown_at_parsed, project_id=project_id, data_files=parsed_data_files, - s3_mounts=parsed_link_items if execution_platform == 'aws' else [], + s3_mounts=parsed_s3_mounts if execution_platform == 'aws' else [], r_version=r_version, spark_master_type=spark_master, spark_core_type=spark_core, @@ -645,8 +679,8 @@ def create_session(ctx, spark_master=spark_master, spark_core=spark_core, spark_workers=spark_workers, - data_files=parsed_data_files, - s3_mounts=link_items_for_display, # Use display version with markers + data_files=data_files_for_display, + s3_mounts=s3_mounts_for_display, # Use display version with markers shutdown_in=shutdown_in ) # Output session link in greppable format for CI/automation @@ -972,7 +1006,7 @@ def pause_session(ctx, click.secho(f'Error: Cannot pause session - the session is already paused.', fg='red', err=True) click.secho(f'Tip: Check the session status with: cloudos interactive-session status --session-id {session_id}', fg='yellow', err=True) raise SystemExit(1) - elif api_status == 'aborting': + elif api_status == 'aborting': click.secho(f'Error: Cannot pause session - the session is already being paused.', fg='red', err=True) click.secho(f'Tip: Wait a moment and check status with: cloudos interactive-session status --session-id {session_id}', fg='yellow', err=True) raise SystemExit(1) @@ -1233,7 +1267,7 @@ def resume_session(ctx, click.secho(f'Tip: Terminated sessions cannot be resumed. Please create a new session instead.', fg='yellow', err=True) else: click.secho(f'Tip: Wait for the session to reach "paused" status, or check: cloudos interactive-session status --session-id {session_id}', fg='yellow', err=True) - except: + except Exception: # Fallback if we can't fetch status click.secho(f'Error: Cannot resume session - it is not in a resumable status.', fg='red', err=True) click.secho(f'Only sessions with status "paused" can be resumed.', fg='yellow', err=True) @@ -1256,3 +1290,209 @@ def resume_session(ctx, click.secho(f'Error: Failed to resume session: {str(e)}', fg='red', err=True) raise SystemExit(1) + +@interactive_session.command('link') +@click.argument('path', required=False) +@click.option('-k', + '--apikey', + help='Your Lifebit Platform API key', + required=True) +@click.option('-c', + '--cloudos-url', + help=(f'The Lifebit Platform url you are trying to access to. Default={CLOUDOS_URL}.'), + default=CLOUDOS_URL, + required=True) +@click.option('--workspace-id', + help='The specific Lifebit Platform workspace id.', + required=True) +@click.option('--session-id', + help='The specific Lifebit Platform interactive session id.', + required=True) +@click.option('--job-id', + help='The job id in Lifebit Platform. When provided, links results, workdir and logs by default.', + required=False) +@click.option('--project-name', + help='Fallback Lifebit Platform project name for File Explorer paths that start with a known root folder (Data/, AnalysesResults/, Cohorts/, etc.). Not needed when PATH includes the project as the first segment (e.g. my-project/Data/file.csv).', + required=False) +@click.option('--results', + help='Link only results folder (only works with --job-id).', + is_flag=True) +@click.option('--workdir', + help='Link only working directory (only works with --job-id).', + is_flag=True) +@click.option('--logs', + help='Link only logs folder (only works with --job-id).', + is_flag=True) +@click.option('--verbose', + help='Whether to print information messages or not.', + is_flag=True) +@click.option('--disable-ssl-verification', + help=('Disable SSL certificate verification. Please, remember that this option is ' + + 'not generally recommended for security reasons.'), + is_flag=True) +@click.option('--ssl-cert', + help='Path to your SSL certificate file.') +@click.option('--profile', help='Profile to use from the config file', default=None) +@click.pass_context +@with_profile_config(required_params=['apikey', 'workspace_id', 'session_id']) +def link_session(ctx, + path, + apikey, + cloudos_url, + workspace_id, + session_id, + job_id, + project_name, + results, + workdir, + logs, + verbose, + disable_ssl_verification, + ssl_cert, + profile): + """ + Link files or folders to an interactive analysis session. + + This command links S3 or File Explorer items (files and folders) to an active + interactive analysis session for direct read access. + + PATH: Optional path(s) to link (S3 or File Explorer). + Required if --job-id is not provided. + Supports comma-separated list for multiple paths. + + File Explorer path formats: + + - project-name/Data/folder — project is inferred from the first path segment. + --project-name is not needed. + + - Data/folder — path starts with a known top-level folder name (Data, + AnalysesResults, Cohorts, etc.). --project-name must be supplied so the + CLI knows which project to look in. + + Two modes of operation: + + 1. Job-based linking (--job-id): Links job-related folders. + By default, links results, workdir, and logs folders. + Use --results, --workdir, or --logs flags to link only specific folders. + + 2. Direct path linking (PATH argument): Links specific path(s). + Supports S3 files/folders and Lifebit Platform File Explorer files/folders. + Both S3 and File Explorer paths can be combined. + S3 paths ending with '/' or without a file extension are treated as folders. + S3 paths whose last segment contains a '.' are treated as files. + + Examples: + + # Link all job folders (results, workdir, logs) + cloudos interactive-session link --job-id 12345 --session-id abc123 + + # Link a single S3 folder + cloudos interactive-session link s3://bucket/folder/ --session-id abc123 + + # Link a single S3 file + cloudos interactive-session link s3://bucket/data/file.csv --session-id abc123 + + # Link multiple S3 paths (comma-separated, files and folders mixed) + cloudos interactive-session link s3://bucket1/folder1/,s3://bucket2/data/file.csv --session-id abc123 + + # Link a File Explorer folder (project inferred from first path segment) + cloudos interactive-session link my-project/Data/folder --session-id abc123 + + # Link a File Explorer folder whose path starts with a top-level folder name + cloudos interactive-session link Data/folder --session-id abc123 --project-name my-project + + # Combine S3 and File Explorer paths + cloudos interactive-session link s3://bucket/data/file.csv,my-project/Data/results --session-id abc123 + + """ + verify_ssl = ssl_selector(disable_ssl_verification, ssl_cert) + + if not job_id and not path: + raise click.UsageError("Either --job-id or PATH argument must be provided.") + + if job_id and path: + raise click.UsageError("Cannot use both --job-id and PATH argument. Please provide only one.") + + if (results or workdir or logs) and not job_id: + raise click.UsageError("--results, --workdir, and --logs flags can only be used with --job-id.") + + if job_id and not (results or workdir or logs): + results = True + workdir = True + logs = True + + if verbose: + print('Using the following parameters:') + print(f'\tLifebit Platform url: {cloudos_url}') + print(f'\tWorkspace ID: {workspace_id}') + print(f'\tSession ID: {session_id}') + if job_id: + print(f'\tJob ID: {job_id}') + print(f'\tLink results: {results}') + print(f'\tLink workdir: {workdir}') + print(f'\tLink logs: {logs}') + else: + print(f'\tPath: {path}') + + try: + if job_id: + link_client = _make_link_client(cloudos_url, apikey, workspace_id, project_name, verify_ssl) + print(f'Linking folders from job {job_id} to interactive session {session_id}...\n') + + if results: + link_client.link_job_results(job_id, workspace_id, session_id, verify_ssl, verbose) + + if workdir: + link_client.link_job_workdir(job_id, workspace_id, session_id, verify_ssl, verbose) + + if logs: + link_client.link_job_logs(job_id, workspace_id, session_id, verify_ssl, verbose) + + else: + paths = [p.strip() for p in path.split(',') if p.strip()] + + if len(paths) == 0: + raise click.UsageError("No valid paths provided.") + + # Normalize paths and group by resolved project name. + # S3/Azure paths are keyed under None and sent as their own batch. + groups = {} + for p in paths: + norm_path, resolved = _normalize_file_explorer_path(p, project_name) + if resolved is None and not p.startswith('s3://') and not p.startswith('az://'): + raise click.UsageError( + f"--project-name is required for File Explorer paths that start with a known " + f"top-level folder name (Data, AnalysesResults, Cohorts, etc.). Got: '{p}'" + ) + groups.setdefault(resolved, []).append(norm_path) + + if len(paths) == 1: + print(f'Linking path to interactive session {session_id}...\n') + else: + print(f'Linking {len(paths)} paths to interactive session {session_id}...\n') + + all_succeeded = True + try: + committed = 0 + for grp_project, grp_paths in groups.items(): + client = _make_link_client(cloudos_url, apikey, workspace_id, grp_project, verify_ssl) + if not client.link_folders_batch(grp_paths, session_id, committed_count=committed): + all_succeeded = False + committed += len(grp_paths) + if all_succeeded: + print('\nLinking operation completed successfully!') + else: + click.secho('\nLinking operation completed with errors. See details above.', fg='red', err=True) + raise SystemExit(1) + except SystemExit: + raise + except Exception as e: + click.secho(f'\n✗ Failed: {str(e)}', fg='red', err=True) + raise SystemExit(1) + + except BadRequestException as e: + click.secho(f'Error: Request failed: {str(e)}', fg='red', err=True) + raise SystemExit(1) + except Exception as e: + click.secho(f'Error: Failed to link: {str(e)}', fg='red', err=True) + raise SystemExit(1) diff --git a/cloudos_cli/interactive_session/interactive_session.py b/cloudos_cli/interactive_session/interactive_session.py index 3de7042a..02138e35 100644 --- a/cloudos_cli/interactive_session/interactive_session.py +++ b/cloudos_cli/interactive_session/interactive_session.py @@ -747,125 +747,6 @@ def parse_data_file(data_file_str): } -def resolve_data_file_id(datasets_api, dataset_path: str) -> dict: - """Resolve nested dataset path to actual file ID. - - Searches across all datasets in the project to find the target file. - This allows paths like 'Data/file.txt' to work even if 'Data' is a folder - within a dataset (not a dataset name itself). - - Parameters - ---------- - datasets_api : Datasets - Initialized Datasets API instance (with correct project_name) - dataset_path : str - Nested path to file within the project (e.g., 'Data/file.txt' or 'Folder/subfolder/file.txt') - Can start with a dataset name or a folder name within any dataset. - - Returns - ------- - dict - Data item object with resolved file ID: - {"kind": "File", "item": "", "name": ""} - - Raises - ------ - ValueError - If file not found in any dataset/folder - """ - try: - path_parts = dataset_path.strip('/').split('/') - file_name = path_parts[-1] - # First, try the path as-is (assuming first part is a dataset name) - try: - result = datasets_api.list_folder_content(dataset_path) - # Check if it's in the files list - for file_item in result.get('files', []): - if file_item.get('name') == file_name: - return { - "kind": "File", - "item": file_item.get('_id'), - "name": file_item.get('name') - } - # If we got here, quick path didn't work, continue to search - except (Exception): - # First path attempt failed, try searching across all datasets - pass - # If the quick path didn't work, search across all datasets - # This handles the case where the first part is a folder, not a dataset name - project_content = datasets_api.list_project_content() - datasets = project_content.get('folders', []) - if not datasets: - raise ValueError(f"No datasets found in project. Cannot locate path '{dataset_path}'") - # Try to find the file in each dataset - found_files = [] - for dataset in datasets: - dataset_name = dataset.get('name') - try: - # Try with the dataset name prepended to the path - full_path = f"{dataset_name}/{dataset_path}" - result = datasets_api.list_folder_content(full_path) - # Check files list - for file_item in result.get('files', []): - if file_item.get('name') == file_name: - found_files.append({ - "kind": "File", - "item": file_item.get('_id'), - "name": file_item.get('name') - }) - # Return first match (most direct path) - return found_files[0] - except Exception: - # This dataset doesn't contain the path, continue - continue - # Also try searching without dataset prefix (path is from root of datasets) - for dataset in datasets: - try: - dataset_name = dataset.get('name') - # List what's in this dataset at the top level - dataset_content = datasets_api.list_datasets_content(dataset_name) - # Check if the target file is directly in this dataset's files - for file_item in dataset_content.get('files', []): - if file_item.get('name') == file_name: - found_files.append({ - "kind": "File", - "item": file_item.get('_id'), - "name": file_item.get('name') - }) - # Check folders and navigate if needed - for folder in dataset_content.get('folders', []): - if folder.get('name') == path_parts[0]: - # This dataset has the target folder - full_path = f"{dataset_name}/{dataset_path}" - try: - result = datasets_api.list_folder_content(full_path) - for file_item in result.get('files', []): - if file_item.get('name') == file_name: - return { - "kind": "File", - "item": file_item.get('_id'), - "name": file_item.get('name') - } - except Exception: - continue - except Exception: - continue - # If we found files, return the first one - if found_files: - return found_files[0] - # Nothing found - provide helpful error message - available_datasets = [d.get('name') for d in datasets] - raise ValueError( - f"File at path '{dataset_path}' not found in any dataset. " - f"Available datasets: {available_datasets}. " - f"Try using 'cloudos datasets ls' to explore your data structure." - ) - except ValueError: - raise - except Exception as e: - raise ValueError(f"Error resolving dataset file at path '{dataset_path}': {str(e)}") - - def parse_link_path(link_path_str): """Parse link path format: supports S3, Lifebit Platform, or legacy colon format. @@ -1132,6 +1013,8 @@ def build_resume_payload( Resume payload for API request """ payload = { + # dataItems is intentionally empty: linking during resume is not supported. + # The API requires the field to be present; omitting it causes a 400. "dataItems": [], "fileSystemIds": [] # Always empty (deprecated) } @@ -1231,20 +1114,26 @@ def format_session_creation_table(session_data, instance_type=None, storage_size # Display mounted data files if data_files: - mounted_files = [] + mounted_items = [] for df in data_files: if isinstance(df, dict): - # Handle Lifebit Platform dataset files - if df.get('kind') == 'File': - name = df.get('name', 'Unknown') - mounted_files.append(name) - # Handle S3 files + if df.get('_isFileExplorer'): + original_path = df.get('_originalPath', '') + if original_path: + mounted_items.append(f"File Explorer: {original_path}") elif df.get('type') == 'S3File': data = df.get('data', {}) - name = data.get('name', 'Unknown') - mounted_files.append(f"{name} (S3)") - if mounted_files: - table.add_row("Mounted Data", ", ".join(mounted_files)) + bucket = data.get('s3BucketName', '') + key = data.get('s3ObjectKey', '') + if bucket and key: + mounted_items.append(f"s3://{bucket}/{key}") + elif bucket: + mounted_items.append(f"s3://{bucket}/") + elif df.get('kind') in ('File', 'Folder'): + name = df.get('name', 'Unknown') + mounted_items.append(name) + if mounted_items: + table.add_row("Mounted Data", "\n".join(mounted_items)) # Display linked S3 buckets and File Explorer items (files and folders) if s3_mounts: diff --git a/cloudos_cli/link/link.py b/cloudos_cli/interactive_session/link.py similarity index 80% rename from cloudos_cli/link/link.py rename to cloudos_cli/interactive_session/link.py index fc5ef4fb..ade69499 100644 --- a/cloudos_cli/link/link.py +++ b/cloudos_cli/interactive_session/link.py @@ -2,17 +2,19 @@ This is the main class for linking files to interactive sessions. """ +import json +import time from dataclasses import dataclass from typing import Union, List, Dict -from cloudos_cli.clos import Cloudos -from cloudos_cli.utils.requests import retry_requests_post, retry_requests_get -from cloudos_cli.utils.errors import JoBNotCompletedException, BadRequestException -from cloudos_cli.datasets import Datasets from urllib.parse import urlparse -import json -import time + import rich_click as click +from cloudos_cli.clos import Cloudos +from cloudos_cli.utils.requests import retry_requests_post, retry_requests_get +from cloudos_cli.utils.errors import JoBNotCompletedException +from cloudos_cli.utils.array_job import generate_datasets_for_project + @dataclass class Link(Cloudos): @@ -71,8 +73,9 @@ def link_folder(self, return self.link_folders_batch([folder], session_id) def link_folders_batch(self, - folders: list, - session_id: str) -> bool: + folders: list, + session_id: str, + committed_count: int = 0) -> bool: """Link multiple folders/files (S3 or File Explorer) to an interactive session in one request. Attempts to use API v2 (which supports multiple items per request) first, @@ -84,6 +87,10 @@ def link_folders_batch(self, List of folder/file paths to link. session_id : str The interactive session ID. + committed_count : int, optional + Number of items already submitted in earlier batches during this CLI invocation + but not yet visible in the session status. Added to the current count when + enforcing the 100-item limit. Raises ------ @@ -93,9 +100,9 @@ def link_folders_batch(self, if not folders: raise ValueError("No paths provided") - # Check 100-item limit against already-linked items + # Check 100-item limit against already-linked items plus any in-flight batches current_items = self.get_fuse_filesystems_status(session_id) - current_count = len(current_items) + current_count = len(current_items) + committed_count if current_count + len(folders) > 100: raise ValueError("Cannot link more than 100 items") @@ -112,14 +119,10 @@ def link_folders_batch(self, # v2 failed or not available, fall back to v1 status_code = self._fallback_mount_v1(folder_info, session_id) - # Verify mount completion for all items. Any 2xx is treated as - # "request accepted" and we still verify; anything else is an error. - if status_code is not None and 200 <= status_code < 300: + # Verify mount completion for all items + if status_code == 204: return self._verify_all_mounts(folder_info, session_id) - raise ValueError( - f"Unexpected response from mount API: HTTP {status_code}. " - "The mount request did not succeed; nothing has been verified." - ) + return True def _parse_items_to_data_items(self, folders: list, existing_mount_names: set = None) -> tuple: """Parse and validate folders/files, extracting data items for API payload. @@ -159,18 +162,18 @@ def _parse_items_to_data_items(self, folders: list, existing_mount_names: set = parsed = self.parse_s3_file_path(folder) else: parsed = self.parse_s3_path(folder) - source_type = "S3" mount_name = parsed["dataItem"]["data"]["name"] + self._raise_if_duplicate_mount(mount_name, folder, mount_names_seen) + mount_names_seen[mount_name] = folder + data_items.append(parsed["dataItem"]) + folder_info.append({"path": folder, "type": "S3", "data": parsed["dataItem"]}) else: - parsed = self.parse_file_explorer_item(folder) - source_type = "File Explorer" + parsed = self._parse_file_explorer_item(folder) mount_name = parsed["dataItem"]["name"] - - self._raise_if_duplicate_mount(mount_name, folder, mount_names_seen) - mount_names_seen[mount_name] = folder - - data_items.append(parsed["dataItem"]) - folder_info.append({"path": folder, "type": source_type, "data": parsed["dataItem"]}) + self._raise_if_duplicate_mount(mount_name, folder, mount_names_seen) + mount_names_seen[mount_name] = folder + data_items.append(parsed["dataItem"]) + folder_info.append({"path": folder, "type": "File Explorer", "data": parsed["dataItem"]}) return data_items, folder_info @@ -223,7 +226,7 @@ def _try_mount_v2(self, data_items: list, session_id: str) -> int: If v2 fails for reasons other than unavailability. """ v2_payload = {"dataItems": data_items} - + try: status_code = self.mount_fuse_filesystem_v2( session_id=session_id, @@ -239,11 +242,11 @@ def _try_mount_v2(self, data_items: list, session_id: str) -> int: # Session-not-found errors should propagate immediately if "Session not found" in error_str: raise # Re-raise session-not-found errors immediately - + should_fallback = ( "404" in error_str or "Not Found" in error_str or "not found" in error_str.lower() ) - + if should_fallback: return None # Trigger v1 fallback else: @@ -283,7 +286,7 @@ def _fallback_mount_v1(self, folder_info: list, session_id: str) -> int: status_code = None mounted_folders = [] - + for folder_data in folder_info: try: status_code = self._mount_single_folder_v1(folder_data, session_id) @@ -318,7 +321,7 @@ def _mount_single_folder_v1(self, folder_data: dict, session_id: str) -> int: If the mount request fails. """ v1_payload = {"dataItem": folder_data["data"]} - + url = ( f"{self.cloudos_url}/api/v1/" f"interactive-sessions/{session_id}/fuse-filesystem/mount" @@ -328,10 +331,10 @@ def _mount_single_folder_v1(self, folder_data: dict, session_id: str) -> int: "Content-type": "application/json", "apikey": self.apikey } - + try: r = retry_requests_post(url, headers=headers, json=v1_payload, verify=self.verify) - + if r.status_code >= 400: # Handle v1 errors using consolidated error handling if r.status_code == 403: @@ -351,16 +354,16 @@ def _mount_single_folder_v1(self, folder_data: dict, session_id: str) -> int: raise ValueError(f"Bad request (400): Unable to parse error response") else: raise ValueError(f"Failed to mount item: HTTP {r.status_code}") - + return r.status_code - + except ValueError: # Re-raise ValueError as-is raise except Exception as v1_error: raise ValueError(f"Failed to mount {folder_data['type']} item: {str(v1_error)}") - def _verify_all_mounts(self, folder_info: list, session_id: str) -> bool: + def _verify_all_mounts(self, folder_info: list, session_id: str): """Verify mount completion status for all items (files and folders). Parameters @@ -385,7 +388,8 @@ def _verify_all_mounts(self, folder_info: list, session_id: str) -> bool: mount_name = item_data['name'] item_kind = "file" if folder_data['data'].get('type') == 'S3File' else "folder" else: - full_path = folder_data["path"] + folder_path = folder_data["path"] + full_path = f"{self.project_name}/{folder_path.lstrip('/')}" if self.project_name else folder_path mount_name = folder_data['data']['name'] item_kind = "file" if folder_data['data'].get('kind') == 'File' else "folder" @@ -431,7 +435,7 @@ def _translate_mount_error(self, error_msg: str) -> str: return error_msg def _handle_mount_error(self, error: Exception, type_folder: str): - """Handle and convert mount errors to user-friendly messages. + """Translate a raw mount exception into a user-friendly ValueError. Parameters ---------- @@ -448,32 +452,26 @@ def _handle_mount_error(self, error: Exception, type_folder: str): error_str = str(error) error_lower = error_str.lower() - def matches(*tokens): - """True if any token appears in the original or lowercased error text.""" - return any(t in error_lower or t in error_str for t in tokens) - - if matches('403', 'forbidden'): - if "already exists" in error_lower or "mounted" in error_lower: - raise ValueError( - f"Provided {type_folder} item already exists with 'mounted' status" - ) - raise ValueError("Interactive Analysis session is not active or access denied") + if '403' in error_str or 'forbidden' in error_lower: + if 'already exists' in error_lower or 'mounted' in error_lower: + raise ValueError(f"Provided {type_folder} item already exists with 'mounted' status") + raise ValueError('Interactive Analysis session is not active or access denied') - if matches('401', 'unauthorized'): - raise ValueError("Unauthorized. Invalid API key or insufficient permissions.") + if '401' in error_str or 'unauthorized' in error_lower: + raise ValueError('Forbidden. Invalid API key or insufficient permissions.') - if matches('400', 'bad request'): - if "invalid supported dataitem foldertype" in error_lower: + if '400' in error_str or 'bad request' in error_lower: + if 'invalid supported dataitem foldertype' in error_lower: raise ValueError( f"Invalid Supported DataItem '{type_folder}' folderType. " - "Virtual folders cannot be linked." + 'Virtual folders cannot be linked.' ) - raise ValueError(f"Cannot link item: {error_str}") + raise ValueError(f'Cannot link item: {error_str}') - if matches('404', 'not found'): - raise ValueError("Session not found or endpoint not available") + if '404' in error_str or 'not found' in error_lower: + raise ValueError('Session not found or endpoint not available') - raise ValueError(f"Failed to mount {type_folder} item: {error_str}") + raise ValueError(f'Failed to mount {type_folder} item: {error_str}') def parse_s3_path(self, s3_url): """ @@ -505,21 +503,21 @@ def parse_s3_path(self, s3_url): parsed = urlparse(s3_url) bucket = parsed.netloc - prefix = parsed.path.lstrip('/') # Remove leading slash + prefix = parsed.path.lstrip('/') # Remove leading slash if not prefix: raise ValueError("S3 URL must include a key after the bucket") parts = prefix.rstrip('/').split('/') - base = parts[-1] # Last segment (file or folder) + base = parts[-1] # Last segment (file or folder) return { "dataItem": { - "type": "S3Folder", - "data": { - "name": base, - "s3BucketName": bucket, - "s3Prefix": prefix - } + "type": "S3Folder", + "data": { + "name": base, + "s3BucketName": bucket, + "s3Prefix": prefix + } } } @@ -571,16 +569,13 @@ def parse_s3_file_path(self, s3_url: str) -> dict: key = parsed.path.lstrip('/') if not bucket: - raise ValueError( - f"Invalid S3 URL '{s3_url}': bucket name is empty. " - "Expected 's3:///'." - ) + raise ValueError("Invalid S3 URL: bucket name is empty. Expected: s3://bucket/path/to/file") if not key: raise ValueError("S3 URL must include a key after the bucket") - if key.endswith('/'): + if s3_url.endswith('/'): raise ValueError( - f"Invalid S3 file URL '{s3_url}': key ends with '/' which is folder-like. " - "Drop the trailing slash for a file link, or use the folder linking path." + f"S3 URL '{s3_url}' looks folder-like (trailing slash). " + "Use s3://bucket/path/to/file for files, or use --link for folders." ) name = key.split('/')[-1] @@ -595,14 +590,6 @@ def parse_s3_file_path(self, s3_url: str) -> dict: } } - def parse_file_explorer_item(self, path: str) -> dict: - """Public alias for _parse_file_explorer_item. - - Use this from code outside the Link class. The underscore version is - retained for internal callers but both behave identically. - """ - return self._parse_file_explorer_item(path) - def _parse_file_explorer_item(self, path: str) -> dict: """Auto-detect whether a File Explorer path is a file or folder and return the data item. @@ -611,10 +598,7 @@ def _parse_file_explorer_item(self, path: str) -> dict: Parameters ---------- path : str - The path RELATIVE to the project (e.g., 'Data/results' or - 'Data/file.csv'). Do NOT include the project name as the leading - segment — the project is taken from ``self.project_name`` (set - via ``--project-name``). + The path within the project (e.g., 'Data/results' or 'Data/file.csv'). Returns ------- @@ -624,74 +608,17 @@ def _parse_file_explorer_item(self, path: str) -> dict: Raises ------ ValueError - If ``self.project_name`` is not set, if the path starts with the - project name, or if the item is not found / is a virtual folder. + If the item is not found or is a virtual folder. """ - if not self.project_name: - raise ValueError( - "Cannot resolve File Explorer path without a project. " - "Pass --project-name (or set it in your profile)." - ) - stripped = path.strip("/") parts = stripped.split("/") - - # Reject paths that include the project name as the first segment. - # The project comes from --project-name only; prepending it in the - # path is a common mistake that otherwise produces a confusing - # "Folder '' not found in project ''" error. - if parts[0] == self.project_name: - relative = "/".join(parts[1:]) or "" - raise ValueError( - f"File Explorer path '{path}' must NOT include the project name. " - f"The project is supplied via --project-name ('{self.project_name}'). " - f"Use '{relative}' instead." - ) - item_name = parts[-1] parent_path = "/".join(parts[:-1]) if len(parts) > 1 else "" - # Instantiate Datasets directly (instead of going through - # generate_datasets_for_project) so that "project not found" / - # "forbidden" surface as ValueError here rather than terminating - # the process via sys.exit(1) deep inside the helper. - try: - ds = Datasets( - cloudos_url=self.cloudos_url, - apikey=self.apikey, - workspace_id=self.workspace_id, - project_name=self.project_name, - verify=self.verify, - cromwell_token=None, - ) - except ValueError as e: - raise ValueError( - f"Cannot resolve project '{self.project_name}': {e}" - ) - except BadRequestException as e: - if 'Forbidden' in str(e): - raise ValueError( - "Forbidden when accessing the project. Check your API key, " - "workspace access, and any Airlock restrictions." - ) - raise ValueError(f"Failed to access project '{self.project_name}': {e}") - - # list_folder_content can itself raise BadRequestException (401/403/etc.). - # Wrap it so callers see a clean ValueError with actionable guidance. - try: - contents = ds.list_folder_content(parent_path) - except BadRequestException as e: - msg = str(e) - if 'Forbidden' in msg or '403' in msg or '401' in msg: - raise ValueError( - f"Not authorised to list '{parent_path or '[project root]'}' " - f"in project '{self.project_name}'. " - "Check your API key and workspace access (Airlock may also be restricting you)." - ) - raise ValueError( - f"Failed to list '{parent_path or '[project root]'}' " - f"in project '{self.project_name}': {e}" - ) + ds = generate_datasets_for_project( + self.cloudos_url, self.apikey, self.workspace_id, self.project_name, self.verify + ) + contents = ds.list_folder_content(parent_path) for item in contents.get("folders", []): if item.get("name") == item_name: @@ -772,7 +699,7 @@ def get_fuse_filesystems_status(self, session_id: str) -> List[Dict]: r = retry_requests_get(url, headers=headers, verify=self.verify) if r.status_code == 401: - raise ValueError("Unauthorized. Invalid API key or insufficient permissions.") + raise ValueError("Forbidden. Invalid API key or insufficient permissions.") elif r.status_code == 404: raise ValueError( f"Interactive session {session_id} not found. " @@ -805,8 +732,8 @@ def get_fuse_filesystems_status(self, session_id: str) -> List[Dict]: return all_items - def wait_for_mount_completion(self, session_id: str, mount_name: str, - timeout: int = 360, check_interval: int = 2) -> Dict: + def wait_for_mount_completion(self, session_id: str, mount_name: str, + timeout: int = 360, check_interval: int = 2) -> Dict: """Wait for a specific mount to complete and return its final status. Parameters @@ -816,7 +743,7 @@ def wait_for_mount_completion(self, session_id: str, mount_name: str, mount_name : str The name of the mount to check. timeout : int, optional - Maximum time to wait in seconds (default: 60). + Maximum time to wait in seconds (default: 360). check_interval : int, optional Time between status checks in seconds (default: 2). @@ -876,9 +803,7 @@ def link_job_results(self, job_id: str, workspace_id: str, session_id: str, veri if verbose: print('\tFetching job results...') - # Create a temporary Cloudos client for API calls - cl = Cloudos(self.cloudos_url, self.apikey, None) - results_path = cl.get_job_results(job_id, workspace_id, verify_ssl) + results_path = self.get_job_results(job_id, workspace_id, verify_ssl) if results_path: print('\tLinking results directory...') @@ -924,9 +849,7 @@ def link_job_workdir(self, job_id: str, workspace_id: str, session_id: str, veri if verbose: print('\tFetching job working directory...') - # Create a temporary Cloudos client for API calls - cl = Cloudos(self.cloudos_url, self.apikey, None) - workdir_path = cl.get_job_workdir(job_id, workspace_id, verify_ssl) + workdir_path = self.get_job_workdir(job_id, workspace_id, verify_ssl) if workdir_path: print('\tLinking working directory...') @@ -970,9 +893,7 @@ def link_job_logs(self, job_id: str, workspace_id: str, session_id: str, verify_ if verbose: print('\tFetching job logs...') - # Create a temporary Cloudos client for API calls - cl = Cloudos(self.cloudos_url, self.apikey, None) - logs_dict = cl.get_job_logs(job_id, workspace_id, verify_ssl) + logs_dict = self.get_job_logs(job_id, workspace_id, verify_ssl) if logs_dict: # Extract the parent logs directory from any log file path @@ -993,4 +914,3 @@ def link_job_logs(self, job_id: str, workspace_id: str, session_id: str, verify_ click.secho(f'\tCannot link logs: {error_msg}', fg='red') else: click.secho(f'\tFailed to link logs: {error_msg}', fg='red') - diff --git a/cloudos_cli/jobs/cli.py b/cloudos_cli/jobs/cli.py index 5ebda4e0..d86767ad 100644 --- a/cloudos_cli/jobs/cli.py +++ b/cloudos_cli/jobs/cli.py @@ -15,7 +15,7 @@ from cloudos_cli.cost.cost import CostViewer from cloudos_cli.related_analyses.related_analyses import related_analyses from cloudos_cli.configure.configure import with_profile_config, CLOUDOS_URL -from cloudos_cli.link import Link +from cloudos_cli.interactive_session.link import Link from cloudos_cli.constants import ( JOB_COMPLETED, REQUEST_INTERVAL_CROMWELL, diff --git a/cloudos_cli/link/__init__.py b/cloudos_cli/link/__init__.py deleted file mode 100755 index 3706bf65..00000000 --- a/cloudos_cli/link/__init__.py +++ /dev/null @@ -1,8 +0,0 @@ -""" -Functions and classes related to datasets. -""" - -from .link import Link - - -__all__ = ['link'] diff --git a/cloudos_cli/link/cli.py b/cloudos_cli/link/cli.py index fcd84c22..9a2b83aa 100644 --- a/cloudos_cli/link/cli.py +++ b/cloudos_cli/link/cli.py @@ -1,5 +1,5 @@ import rich_click as click -from cloudos_cli.link.link import Link +from cloudos_cli.interactive_session.link import Link from cloudos_cli.utils.resources import ssl_selector from cloudos_cli.configure.configure import with_profile_config, CLOUDOS_URL from cloudos_cli.utils.errors import BadRequestException diff --git a/tests/test_datasets/test_link.py b/tests/test_datasets/test_link.py index c35d7239..6f77d6a7 100644 --- a/tests/test_datasets/test_link.py +++ b/tests/test_datasets/test_link.py @@ -1,6 +1,6 @@ import pytest from unittest import mock -from cloudos_cli.link.link import Link +from cloudos_cli.interactive_session.link import Link from cloudos_cli.utils.requests import retry_requests_post import responses @@ -213,7 +213,7 @@ def test_link_folder_204_file_explorer(capsys, link_instance_test_response, monk } responses.add(responses.GET, status_url, json=mock_response, status=200) - # Patch _parse_file_explorer_item (replaces parse_file_explorer_path in batch path) + # Patch _parse_file_explorer_item monkeypatch.setattr(link_instance_test_response, "_parse_file_explorer_item", lambda x: { "dataItem": { "kind": "Folder", @@ -224,10 +224,10 @@ def test_link_folder_204_file_explorer(capsys, link_instance_test_response, monk link_instance_test_response.link_folder("/home/user/data", "sessionABC") captured = capsys.readouterr() - assert "Successfully mounted File Explorer folder: /home/user/data" in captured.out + assert "Successfully mounted File Explorer folder: test_project/home/user/data" in captured.out -@responses.activate +@responses.activate def test_get_fuse_filesystems_status_success(link_instance_test_response): """Test successful retrieval of fuse filesystem status.""" status_url = f"https://lifebit.ai/api/v1/interactive-sessions/sessionABC/fuse-filesystems?teamId=team123&limit=100&page=1" @@ -382,7 +382,7 @@ def test_link_folder_v2_file_explorer(capsys, link_instance_test_response, monke } responses.add(responses.GET, status_url, json=mock_response, status=200) - # Patch _parse_file_explorer_item (replaces parse_file_explorer_path in batch path) + # Patch _parse_file_explorer_item monkeypatch.setattr(link_instance_test_response, "_parse_file_explorer_item", lambda x: { "dataItem": { "kind": "Folder", @@ -393,8 +393,7 @@ def test_link_folder_v2_file_explorer(capsys, link_instance_test_response, monke link_instance_test_response.link_folder("/home/user/data", "sessionABC") captured = capsys.readouterr() - assert "Successfully mounted File Explorer folder: /home/user/data" in captured.out - + assert "Successfully mounted File Explorer folder: test_project/home/user/data" in captured.out @responses.activate diff --git a/tests/test_datasets/test_link_files.py b/tests/test_datasets/test_link_files.py index e60651b3..d3176a79 100644 --- a/tests/test_datasets/test_link_files.py +++ b/tests/test_datasets/test_link_files.py @@ -2,7 +2,7 @@ import pytest from unittest import mock -from cloudos_cli.link.link import Link +from cloudos_cli.interactive_session.link import Link import responses CLOUDOS_URL = "https://lifebit.ai" @@ -89,7 +89,6 @@ def test_trailing_slash_key_raises(self, link_instance): with pytest.raises(ValueError, match="folder-like"): link_instance.parse_s3_file_path("s3://bucket/folder/") - # --------------------------------------------------------------------------- # _parse_file_explorer_item (auto-detect) # --------------------------------------------------------------------------- @@ -109,7 +108,7 @@ def test_detects_folder(self, link_instance, monkeypatch): folders=[{"name": "results", "_id": "folder_id_1", "folderType": "S3Folder"}] ) monkeypatch.setattr( - "cloudos_cli.link.link.Datasets", + "cloudos_cli.interactive_session.link.generate_datasets_for_project", lambda *a, **kw: ds ) result = link_instance._parse_file_explorer_item("Data/results") @@ -122,7 +121,7 @@ def test_detects_file(self, link_instance, monkeypatch): files=[{"name": "data.csv", "_id": "file_id_99"}] ) monkeypatch.setattr( - "cloudos_cli.link.link.Datasets", + "cloudos_cli.interactive_session.link.generate_datasets_for_project", lambda *a, **kw: ds ) result = link_instance._parse_file_explorer_item("Data/data.csv") @@ -135,7 +134,7 @@ def test_virtual_folder_raises(self, link_instance, monkeypatch): folders=[{"name": "vfolder", "_id": "vf_id", "folderType": "VirtualFolder"}] ) monkeypatch.setattr( - "cloudos_cli.link.link.Datasets", + "cloudos_cli.interactive_session.link.generate_datasets_for_project", lambda *a, **kw: ds ) with pytest.raises(ValueError, match="Virtual folders cannot be linked"): @@ -144,7 +143,7 @@ def test_virtual_folder_raises(self, link_instance, monkeypatch): def test_not_found_raises(self, link_instance, monkeypatch): ds = self._make_ds_mock() monkeypatch.setattr( - "cloudos_cli.link.link.Datasets", + "cloudos_cli.interactive_session.link.generate_datasets_for_project", lambda *a, **kw: ds ) with pytest.raises(ValueError, match="not found"): @@ -276,7 +275,7 @@ def test_fe_file_linked_via_v2(self, link_instance, capsys, monkeypatch): link_instance.link_folders_batch(["Data/observations.csv"], "sessionABC") captured = capsys.readouterr() - assert "Successfully mounted File Explorer file: Data/observations.csv" in captured.out + assert "Successfully mounted File Explorer file: test_project/Data/observations.csv" in captured.out # --------------------------------------------------------------------------- @@ -348,48 +347,6 @@ def test_folder_linking_unchanged(self, link_instance, capsys, monkeypatch): assert "Successfully mounted S3 folder: s3://b/path/myfolder/" in captured.out -# --------------------------------------------------------------------------- -# _parse_file_explorer_item guards (new in 2.91.0) -# --------------------------------------------------------------------------- - -class TestParseFileExplorerItemGuards: - """Validate the two defensive checks added at the top of _parse_file_explorer_item.""" - - def test_missing_project_name_raises_clear_error(self): - link = Link( - cloudos_url=CLOUDOS_URL, apikey=APIKEY, workspace_id=WORKSPACE_ID, - project_name=None, cromwell_token=None, verify=False, - ) - with pytest.raises(ValueError, match="without a project"): - link._parse_file_explorer_item("Data/file.csv") - - def test_path_starting_with_project_name_is_rejected(self, link_instance): - # link_instance.project_name == 'test_project' - with pytest.raises(ValueError, match="must NOT include the project name"): - link_instance._parse_file_explorer_item("test_project/Data/file.csv") - - def test_rejection_message_quotes_the_correct_relative_form(self, link_instance): - try: - link_instance._parse_file_explorer_item("test_project/Data/file.csv") - except ValueError as e: - assert "Use 'Data/file.csv' instead." in str(e) - - def test_public_wrapper_matches_private(self, link_instance, monkeypatch): - # parse_file_explorer_item should be a thin alias for _parse_file_explorer_item - ds = mock.MagicMock() - ds.list_folder_content.return_value = { - "folders": [{"name": "results", "_id": "rid", "folderType": "S3Folder"}], - "files": [], - } - monkeypatch.setattr( - "cloudos_cli.link.link.Datasets", - lambda *a, **kw: ds - ) - public = link_instance.parse_file_explorer_item("Data/results") - private = link_instance._parse_file_explorer_item("Data/results") - assert public == private - - # --------------------------------------------------------------------------- # _translate_mount_error # --------------------------------------------------------------------------- @@ -436,7 +393,6 @@ def test_v1_fallback_rejects_s3_file(self, link_instance, monkeypatch): status_url = f"{CLOUDOS_URL}/api/v1/interactive-sessions/sessionABC/fuse-filesystems?teamId={WORKSPACE_ID}&limit=100&page=1" responses.add(responses.GET, status_url, json={"fuseFileSystems": []}, status=200) - # v2 returns 404 to trigger v1 fallback url_v2 = f"{CLOUDOS_URL}/api/v2/interactive-sessions/sessionABC/fuse-filesystem/mount?teamId={WORKSPACE_ID}" responses.add(responses.POST, url_v2, status=404, json={"message": "Not Found"}) @@ -459,8 +415,7 @@ def test_v1_fallback_rejects_fe_file(self, link_instance, monkeypatch): url_v2 = f"{CLOUDOS_URL}/api/v2/interactive-sessions/sessionABC/fuse-filesystem/mount?teamId={WORKSPACE_ID}" responses.add(responses.POST, url_v2, status=404, json={"message": "Not Found"}) - # Bypass _parse_file_explorer_item so we land directly in the v1-fallback file check - monkeypatch.setattr(link_instance, "parse_file_explorer_item", lambda path: { + monkeypatch.setattr(link_instance, "_parse_file_explorer_item", lambda path: { "dataItem": {"kind": "File", "item": "id1", "name": "data.csv"} }) @@ -468,40 +423,6 @@ def test_v1_fallback_rejects_fe_file(self, link_instance, monkeypatch): link_instance.link_folders_batch(["Data/data.csv"], "sessionABC") -# --------------------------------------------------------------------------- -# Direct Datasets construction (no more sys.exit via helper) -# --------------------------------------------------------------------------- - -class TestDatasetsConstructionErrors: - """The Datasets() call inside _parse_file_explorer_item must surface as a - plain ValueError — never as sys.exit(1) — so callers can handle it.""" - - def test_project_not_found_raises_clean_value_error(self, link_instance, monkeypatch): - def boom(*args, **kwargs): - raise ValueError("Project 'no-such-project' was not found in workspace 'ws'") - - monkeypatch.setattr("cloudos_cli.link.link.Datasets", boom) - with pytest.raises(ValueError, match="Cannot resolve project 'test_project'"): - link_instance._parse_file_explorer_item("Data/file.csv") - - def test_forbidden_raises_clean_value_error(self, link_instance, monkeypatch): - from cloudos_cli.utils.errors import BadRequestException - - class _FakeResp: - status_code = 403 - content = b'Forbidden' - - def json(self): - return {"message": "Forbidden"} - - def boom(*args, **kwargs): - raise BadRequestException(_FakeResp()) - - monkeypatch.setattr("cloudos_cli.link.link.Datasets", boom) - with pytest.raises(ValueError, match="Forbidden when accessing the project"): - link_instance._parse_file_explorer_item("Data/file.csv") - - # --------------------------------------------------------------------------- # Duplicate-mount message names both colliding paths # --------------------------------------------------------------------------- @@ -511,10 +432,8 @@ class TestDuplicateMountMessage: def test_batch_collision_mentions_both_paths(self, link_instance): seen = {} - # First registration succeeds (returns None) link_instance._raise_if_duplicate_mount("foo", "/first/path", seen) seen["foo"] = "/first/path" - # Second one with the same mount name should mention BOTH paths with pytest.raises(ValueError) as excinfo: link_instance._raise_if_duplicate_mount("foo", "/second/path", seen) msg = str(excinfo.value) @@ -522,36 +441,11 @@ def test_batch_collision_mentions_both_paths(self, link_instance): assert "/second/path" in msg def test_session_collision_mentions_path_and_session(self, link_instance): - # Pre-existing session item → value is None seen = {"foo": None} with pytest.raises(ValueError, match="already mounted in the session"): link_instance._raise_if_duplicate_mount("foo", "/new/path", seen) -# --------------------------------------------------------------------------- -# list_folder_content errors are wrapped as ValueError, not raw BadRequestException -# --------------------------------------------------------------------------- - -class TestListFolderContentErrors: - - def test_forbidden_list_call_becomes_value_error(self, link_instance, monkeypatch): - from cloudos_cli.utils.errors import BadRequestException - - class _Resp: - status_code = 403 - content = b'Forbidden' - - def json(self): - return {"message": "Forbidden"} - - ds = mock.MagicMock() - ds.list_folder_content.side_effect = BadRequestException(_Resp()) - monkeypatch.setattr("cloudos_cli.link.link.Datasets", lambda *a, **kw: ds) - - with pytest.raises(ValueError, match="Not authorised to list"): - link_instance._parse_file_explorer_item("Data/file.csv") - - # --------------------------------------------------------------------------- # get_fuse_filesystems_status paginates correctly # --------------------------------------------------------------------------- @@ -572,7 +466,6 @@ def test_single_page_no_pagination_metadata(self, link_instance): @responses.activate def test_multi_page_pagination_collects_all_items(self, link_instance): - # Page 1: 2 items, total 3 → fetch page 2 url_p1 = f"{CLOUDOS_URL}/api/v1/interactive-sessions/sY/fuse-filesystems?teamId={WORKSPACE_ID}&limit=100&page=1" url_p2 = f"{CLOUDOS_URL}/api/v1/interactive-sessions/sY/fuse-filesystems?teamId={WORKSPACE_ID}&limit=100&page=2" responses.add( diff --git a/tests/test_interactive_session/test_create_session.py b/tests/test_interactive_session/test_create_session.py index 470c066f..ac8b983c 100644 --- a/tests/test_interactive_session/test_create_session.py +++ b/tests/test_interactive_session/test_create_session.py @@ -56,7 +56,7 @@ def test_interactive_session_create_has_optional_configuration_options(self): assert '--shared' in result.output assert '--cost-limit' in result.output assert '--shutdown-in' in result.output - assert '--mount' in result.output + assert '--copy' in result.output assert '--link' in result.output assert '--r-version' in result.output assert '--spark-master' in result.output @@ -119,27 +119,19 @@ def test_create_session_jupyter_basic(self, mock_config, mock_cloudos): # Command should execute (may fail at config loading but not at argument parsing) assert 'Error' not in result.output or result.exit_code == 0 - @patch('cloudos_cli.interactive_session.cli.resolve_data_file_id') - @patch('cloudos_cli.interactive_session.cli.Datasets') @patch('cloudos_cli.interactive_session.cli.Cloudos') @patch('cloudos_cli.configure.configure.ConfigurationProfile.load_profile_and_validate_data') - def test_create_session_with_all_options(self, mock_config, mock_cloudos, mock_datasets, mock_resolve): + def test_create_session_with_all_options(self, mock_config, mock_cloudos): """Test creating a session with all options specified.""" runner = CliRunner() - + mock_config.return_value = { 'apikey': 'test_key', 'cloudos_url': 'http://test.com', 'workspace_id': 'test_team', 'project_name': 'my_project' } - - # Mock Datasets API for resolving mounted files - mock_resolve.return_value = { - 'type': 'CloudOSFile', - 'item': 'file_id_123' - } - + mock_cloudos_instance = MagicMock() mock_cloudos.return_value = mock_cloudos_instance mock_cloudos_instance.create_interactive_session.return_value = { @@ -147,7 +139,7 @@ def test_create_session_with_all_options(self, mock_config, mock_cloudos, mock_d 'name': 'Advanced Session', 'status': 'provisioning' } - + result = runner.invoke(run_cloudos_cli, [ 'interactive-session', 'create', '--apikey', 'test_key', @@ -162,9 +154,8 @@ def test_create_session_with_all_options(self, mock_config, mock_cloudos, mock_d '--shared', '--cost-limit', '50.0', '--shutdown-in', '8h', - '--mount', 'MyDataset/datafile.csv' ]) - + # Command should be invoked without syntax errors assert result.exit_code == 0 @@ -426,12 +417,6 @@ def test_parse_data_file_format(self): assert result5['s3_bucket'] == 'my-bucket' assert result5['s3_prefix'] == 'file.txt' - def test_resolve_data_file_id_function_exists(self): - """Test that resolve_data_file_id function exists.""" - from cloudos_cli.interactive_session.interactive_session import resolve_data_file_id - - assert callable(resolve_data_file_id) - def test_build_session_payload_function_exists(self): """Test that build_session_payload function exists.""" from cloudos_cli.interactive_session.interactive_session import build_session_payload @@ -478,5 +463,113 @@ def test_format_session_creation_table_output(self): assert isinstance(result, (str, type(None))) or hasattr(result, '__str__') +class TestCreateSessionCopyFlag: + """Tests for the --copy flag in create_session.""" + + @pytest.fixture + def base_args(self): + return [ + 'interactive-session', 'create', + '--apikey', 'test_key', + '--cloudos-url', 'http://test.com', + '--workspace-id', 'test_team', + '--project-name', 'my_project', + '--name', 'Copy Session', + '--session-type', 'jupyter', + ] + + @patch('cloudos_cli.interactive_session.cli._make_link_client') + @patch('cloudos_cli.interactive_session.cli.parse_data_file') + @patch('cloudos_cli.interactive_session.cli.Cloudos') + @patch('cloudos_cli.configure.configure.ConfigurationProfile.load_profile_and_validate_data') + def test_copy_with_file_explorer_path(self, mock_config, mock_cloudos, mock_parse, mock_link_client, base_args): + """--copy with a project-prefixed FE path resolves and copies the item.""" + runner = CliRunner() + mock_config.return_value = { + 'apikey': 'test_key', + 'cloudos_url': 'http://test.com', + 'workspace_id': 'test_team', + 'project_name': 'my_project', + } + mock_parse.return_value = { + 'type': 'cloudos', + 'project_name': 'my_project', + 'dataset_path': 'Data/file.csv', + } + fe_link = MagicMock() + fe_link._parse_file_explorer_item.return_value = { + 'dataItem': {'item': 'item_id_123', 'name': 'file.csv', 'kind': 'File'} + } + mock_link_client.return_value = fe_link + mock_cloudos_instance = MagicMock() + mock_cloudos.return_value = mock_cloudos_instance + mock_cloudos_instance.create_interactive_session.return_value = { + '_id': 'sess_001', 'name': 'Copy Session', 'status': 'scheduled' + } + + result = runner.invoke(run_cloudos_cli, base_args + ['--copy', '--link', 'my_project/Data/file.csv']) + + assert result.exit_code == 0 + mock_parse.assert_called_once_with('my_project/Data/file.csv') + fe_link._parse_file_explorer_item.assert_called_once_with('Data/file.csv') + + @patch('cloudos_cli.interactive_session.cli._make_link_client') + @patch('cloudos_cli.interactive_session.cli.parse_data_file') + @patch('cloudos_cli.interactive_session.cli.Cloudos') + @patch('cloudos_cli.configure.configure.ConfigurationProfile.load_profile_and_validate_data') + def test_copy_with_known_root_folder_and_no_project_is_error(self, mock_config, mock_cloudos, mock_parse, mock_link_client, base_args): + """--copy with a bare known-root-folder path and no --project-name raises an error.""" + runner = CliRunner() + mock_config.return_value = { + 'apikey': 'test_key', + 'cloudos_url': 'http://test.com', + 'workspace_id': 'test_team', + 'project_name': None, + } + + args_no_project = [ + 'interactive-session', 'create', + '--apikey', 'test_key', + '--cloudos-url', 'http://test.com', + '--workspace-id', 'test_team', + '--name', 'Copy Session', + '--session-type', 'jupyter', + '--copy', '--link', 'Data/file.csv', + ] + result = runner.invoke(run_cloudos_cli, args_no_project) + + assert result.exit_code != 0 + assert 'project-name' in result.output.lower() or 'project_name' in result.output.lower() or 'Error' in result.output + + @patch('cloudos_cli.interactive_session.cli._make_link_client') + @patch('cloudos_cli.interactive_session.cli.parse_data_file') + @patch('cloudos_cli.interactive_session.cli.Cloudos') + @patch('cloudos_cli.configure.configure.ConfigurationProfile.load_profile_and_validate_data') + def test_copy_with_s3_path_on_azure_is_error(self, mock_config, mock_cloudos, mock_parse, mock_link_client, base_args): + """--copy with an S3 path on Azure execution platform is rejected.""" + runner = CliRunner() + mock_config.return_value = { + 'apikey': 'test_key', + 'cloudos_url': 'http://test.com', + 'workspace_id': 'test_team', + 'project_name': 'my_project', + } + mock_parse.return_value = { + 'type': 's3', + 's3_bucket': 'my-bucket', + 's3_prefix': 'data/file.csv', + } + mock_cloudos_instance = MagicMock() + mock_cloudos.return_value = mock_cloudos_instance + + result = runner.invoke(run_cloudos_cli, base_args + [ + '--copy', '--link', 's3://my-bucket/data/file.csv', + '--execution-platform', 'azure', + ]) + + assert result.exit_code != 0 + assert 'S3' in result.output or 'azure' in result.output.lower() or 'Azure' in result.output + + if __name__ == '__main__': pytest.main([__file__, '-v']) diff --git a/tests/test_interactive_session/test_link_error_handling.py b/tests/test_interactive_session/test_link_error_handling.py new file mode 100644 index 00000000..d634dd9c --- /dev/null +++ b/tests/test_interactive_session/test_link_error_handling.py @@ -0,0 +1,258 @@ +"""Unit tests for _handle_mount_error, _translate_mount_error, and link_job_* methods.""" + +import pytest +from unittest import mock +from cloudos_cli.interactive_session.link import Link +from cloudos_cli.utils.errors import JoBNotCompletedException + +CLOUDOS_URL = "https://lifebit.ai" +APIKEY = "testapikey" +WORKSPACE_ID = "team123" +PROJECT_NAME = "test_project" + + +@pytest.fixture +def link_instance(): + return Link( + cloudos_url=CLOUDOS_URL, + apikey=APIKEY, + workspace_id=WORKSPACE_ID, + project_name=PROJECT_NAME, + cromwell_token=None, + verify=False, + ) + + +# --------------------------------------------------------------------------- +# _translate_mount_error +# --------------------------------------------------------------------------- + +class TestTranslateMountError: + def test_prefix_does_not_exist(self, link_instance): + result = link_instance._translate_mount_error("prefix does not exist in bucket") + assert "prefix does not exist in bucket" in result + assert "workspace may not have permission" in result + + def test_key_does_not_exist(self, link_instance): + result = link_instance._translate_mount_error("key does not exist") + assert "key does not exist" in result + assert "Verify the path is correct" in result + + def test_access_denied(self, link_instance): + result = link_instance._translate_mount_error("Access Denied") + assert "Access Denied" in result + assert "workspace does not have permission" in result + + def test_forbidden(self, link_instance): + result = link_instance._translate_mount_error("Forbidden response from S3") + assert "workspace does not have permission" in result + + def test_unknown_error_returned_unchanged(self, link_instance): + result = link_instance._translate_mount_error("some unexpected error") + assert result == "some unexpected error" + + +# --------------------------------------------------------------------------- +# _handle_mount_error +# --------------------------------------------------------------------------- + +class TestHandleMountError: + def test_403_already_mounted(self, link_instance): + with pytest.raises(ValueError, match="already exists with 'mounted' status"): + link_instance._handle_mount_error(Exception("403 already mounted item"), "S3") + + def test_403_not_active(self, link_instance): + with pytest.raises(ValueError, match="not active or access denied"): + link_instance._handle_mount_error(Exception("403 Forbidden access"), "S3") + + def test_401_unauthorized(self, link_instance): + with pytest.raises(ValueError, match="Invalid API key"): + link_instance._handle_mount_error(Exception("401 unauthorized"), "S3") + + def test_400_virtual_folder(self, link_instance): + with pytest.raises(ValueError, match="Virtual folders cannot be linked"): + link_instance._handle_mount_error( + Exception("400 Invalid Supported DataItem folderType"), "S3" + ) + + def test_400_generic(self, link_instance): + with pytest.raises(ValueError, match="Cannot link item"): + link_instance._handle_mount_error(Exception("400 bad request"), "S3") + + def test_404_not_found(self, link_instance): + with pytest.raises(ValueError, match="Session not found"): + link_instance._handle_mount_error(Exception("404 not found"), "S3") + + def test_unknown_error(self, link_instance): + with pytest.raises(ValueError, match="Failed to mount S3 item"): + link_instance._handle_mount_error(Exception("connection timeout"), "S3") + + def test_type_folder_appears_in_message(self, link_instance): + with pytest.raises(ValueError, match="File Explorer"): + link_instance._handle_mount_error(Exception("connection reset"), "File Explorer") + + +# --------------------------------------------------------------------------- +# link_job_results +# --------------------------------------------------------------------------- + +class TestLinkJobResults: + def test_links_successfully(self, link_instance, capsys): + link_instance.get_job_results = mock.Mock(return_value="s3://bucket/results/") + link_instance.link_folder = mock.Mock(return_value=True) + + link_instance.link_job_results("job1", "ws1", "sess1", True) + + link_instance.link_folder.assert_called_once_with("s3://bucket/results/", "sess1") + out = capsys.readouterr().out + assert "Linking results" in out + + def test_no_results_path(self, link_instance, capsys): + link_instance.get_job_results = mock.Mock(return_value=None) + link_instance.link_job_results("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "No results found" in err.out + err.err + + def test_mount_returns_false(self, link_instance, capsys): + link_instance.get_job_results = mock.Mock(return_value="s3://bucket/results/") + link_instance.link_folder = mock.Mock(return_value=False) + link_instance.link_job_results("job1", "ws1", "sess1", True) + # Should not raise; message printed + link_instance.link_folder.assert_called_once() + + def test_job_not_completed_exception(self, link_instance, capsys): + link_instance.get_job_results = mock.Mock( + side_effect=JoBNotCompletedException("job1", "running") + ) + link_instance.link_job_results("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "Cannot link results" in err.out + err.err + + def test_results_not_available_exception(self, link_instance, capsys): + link_instance.get_job_results = mock.Mock( + side_effect=Exception("Results are not available") + ) + link_instance.link_job_results("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "Cannot link results" in err.out + err.err + + def test_generic_exception(self, link_instance, capsys): + link_instance.get_job_results = mock.Mock(side_effect=Exception("network error")) + link_instance.link_job_results("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "Failed to link results" in err.out + err.err + + def test_verbose_prints_path(self, link_instance, capsys): + link_instance.get_job_results = mock.Mock(return_value="s3://bucket/results/") + link_instance.link_folder = mock.Mock(return_value=True) + link_instance.link_job_results("job1", "ws1", "sess1", True, verbose=True) + out = capsys.readouterr().out + assert "s3://bucket/results/" in out + + +# --------------------------------------------------------------------------- +# link_job_workdir +# --------------------------------------------------------------------------- + +class TestLinkJobWorkdir: + def test_links_successfully(self, link_instance, capsys): + link_instance.get_job_workdir = mock.Mock(return_value="s3://bucket/workdir/") + link_instance.link_folder = mock.Mock(return_value=True) + + link_instance.link_job_workdir("job1", "ws1", "sess1", True) + + link_instance.link_folder.assert_called_once_with("s3://bucket/workdir/", "sess1") + out = capsys.readouterr().out + assert "Linking working directory" in out + + def test_no_workdir(self, link_instance, capsys): + link_instance.get_job_workdir = mock.Mock(return_value=None) + link_instance.link_job_workdir("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "No working directory found" in err.out + err.err + + def test_mount_returns_false(self, link_instance, capsys): + link_instance.get_job_workdir = mock.Mock(return_value="s3://bucket/workdir/") + link_instance.link_folder = mock.Mock(return_value=False) + link_instance.link_job_workdir("job1", "ws1", "sess1", True) + link_instance.link_folder.assert_called_once() + + def test_not_available_exception(self, link_instance, capsys): + link_instance.get_job_workdir = mock.Mock( + side_effect=Exception("workdir not yet available") + ) + link_instance.link_job_workdir("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "Cannot link workdir" in err.out + err.err + + def test_generic_exception(self, link_instance, capsys): + link_instance.get_job_workdir = mock.Mock(side_effect=Exception("network error")) + link_instance.link_job_workdir("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "Failed to link workdir" in err.out + err.err + + def test_workdir_stripped_of_whitespace(self, link_instance, capsys): + link_instance.get_job_workdir = mock.Mock(return_value=" s3://bucket/workdir/ ") + link_instance.link_folder = mock.Mock(return_value=True) + link_instance.link_job_workdir("job1", "ws1", "sess1", True) + link_instance.link_folder.assert_called_once_with("s3://bucket/workdir/", "sess1") + + +# --------------------------------------------------------------------------- +# link_job_logs +# --------------------------------------------------------------------------- + +class TestLinkJobLogs: + def test_links_successfully(self, link_instance, capsys): + logs_dict = {"stdout": "s3://bucket/logs/stdout.txt"} + link_instance.get_job_logs = mock.Mock(return_value=logs_dict) + link_instance.link_folder = mock.Mock(return_value=True) + + link_instance.link_job_logs("job1", "ws1", "sess1", True) + + link_instance.link_folder.assert_called_once_with("s3://bucket/logs", "sess1") + out = capsys.readouterr().out + assert "Linking logs directory" in out + + def test_no_logs(self, link_instance, capsys): + link_instance.get_job_logs = mock.Mock(return_value=None) + link_instance.link_job_logs("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "No logs found" in err.out + err.err + + def test_empty_logs_dict(self, link_instance, capsys): + link_instance.get_job_logs = mock.Mock(return_value={}) + link_instance.link_job_logs("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "No logs found" in err.out + err.err + + def test_mount_returns_false(self, link_instance, capsys): + link_instance.get_job_logs = mock.Mock( + return_value={"stdout": "s3://bucket/logs/stdout.txt"} + ) + link_instance.link_folder = mock.Mock(return_value=False) + link_instance.link_job_logs("job1", "ws1", "sess1", True) + link_instance.link_folder.assert_called_once() + + def test_not_available_exception(self, link_instance, capsys): + link_instance.get_job_logs = mock.Mock( + side_effect=Exception("logs not yet available") + ) + link_instance.link_job_logs("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "Cannot link logs" in err.out + err.err + + def test_generic_exception(self, link_instance, capsys): + link_instance.get_job_logs = mock.Mock(side_effect=Exception("connection reset")) + link_instance.link_job_logs("job1", "ws1", "sess1", True) + err = capsys.readouterr() + assert "Failed to link logs" in err.out + err.err + + def test_verbose_prints_logs_dir(self, link_instance, capsys): + link_instance.get_job_logs = mock.Mock( + return_value={"stdout": "s3://bucket/logs/stdout.txt"} + ) + link_instance.link_folder = mock.Mock(return_value=True) + link_instance.link_job_logs("job1", "ws1", "sess1", True, verbose=True) + out = capsys.readouterr().out + assert "s3://bucket/logs" in out diff --git a/tests/test_interactive_session/test_link_session.py b/tests/test_interactive_session/test_link_session.py new file mode 100644 index 00000000..3d45bb27 --- /dev/null +++ b/tests/test_interactive_session/test_link_session.py @@ -0,0 +1,280 @@ +"""Tests for the interactive-session link CLI command and path-normalisation helpers.""" + +import pytest +from click.testing import CliRunner +from unittest.mock import patch, MagicMock + +from cloudos_cli.__main__ import run_cloudos_cli +from cloudos_cli.interactive_session.cli import _normalize_file_explorer_path, _check_duplicate_mount_name + + +# --------------------------------------------------------------------------- +# _normalize_file_explorer_path +# --------------------------------------------------------------------------- + +class TestNormalizeFileExplorerPath: + + def test_s3_path_returned_unchanged(self): + path, project = _normalize_file_explorer_path("s3://bucket/prefix/", "my-project") + assert path == "s3://bucket/prefix/" + assert project is None + + def test_azure_path_returned_unchanged(self): + path, project = _normalize_file_explorer_path("az://container/blob", "my-project") + assert path == "az://container/blob" + assert project is None + + def test_known_root_folder_data_uses_profile_project(self): + path, project = _normalize_file_explorer_path("Data/results", "profile-project") + assert path == "Data/results" + assert project == "profile-project" + + def test_known_root_folder_case_insensitive(self): + path, project = _normalize_file_explorer_path("analysesresults/report.html", "p") + assert project == "p" + + def test_cohorts_root_folder(self): + path, project = _normalize_file_explorer_path("Cohorts/my-cohort", "workspace-project") + assert path == "Cohorts/my-cohort" + assert project == "workspace-project" + + def test_project_inferred_from_first_segment(self): + path, project = _normalize_file_explorer_path("my-project/Data/folder", None) + assert path == "Data/folder" + assert project == "my-project" + + def test_project_inferred_overrides_supplied_project(self): + # The first segment wins over the profile project when it is not a known root folder + path, project = _normalize_file_explorer_path("other-project/Data/file.csv", "profile-project") + assert path == "Data/file.csv" + assert project == "other-project" + + def test_bare_path_no_slash_uses_profile_project(self): + # A path with no slash is treated as a top-level item on the profile project + path, project = _normalize_file_explorer_path("Data", "my-project") + assert path == "Data" + assert project == "my-project" + + def test_bare_path_no_slash_no_project_returns_none(self): + path, project = _normalize_file_explorer_path("Data", None) + assert path == "Data" + assert project is None + + +# --------------------------------------------------------------------------- +# _check_duplicate_mount_name +# --------------------------------------------------------------------------- + +class TestCheckDuplicateMountName: + + def test_new_name_is_registered(self): + seen = {} + _check_duplicate_mount_name("folder", "Data/folder", seen) + assert seen["folder"] == "Data/folder" + + def test_duplicate_raises_system_exit(self): + seen = {"folder": "Data/folder"} + with pytest.raises(SystemExit): + _check_duplicate_mount_name("folder", "OtherProject/Data/folder", seen) + + +# --------------------------------------------------------------------------- +# link_session CLI command — structural checks +# --------------------------------------------------------------------------- + +class TestLinkSessionCommand: + + def test_command_exists(self): + runner = CliRunner() + result = runner.invoke(run_cloudos_cli, ["interactive-session", "link", "--help"]) + assert result.exit_code == 0 + assert "--session-id" in result.output + assert "--job-id" in result.output + + def test_requires_session_id(self): + runner = CliRunner() + result = runner.invoke(run_cloudos_cli, [ + "interactive-session", "link", + "s3://bucket/folder/", + "--apikey", "key", + "--cloudos-url", "http://test.com", + "--workspace-id", "ws", + ]) + assert result.exit_code != 0 + + def test_path_and_job_id_are_mutually_exclusive(self): + runner = CliRunner() + result = runner.invoke(run_cloudos_cli, [ + "interactive-session", "link", + "s3://bucket/folder/", + "--apikey", "key", + "--cloudos-url", "http://test.com", + "--workspace-id", "ws", + "--session-id", "sess123", + "--job-id", "job456", + ]) + assert result.exit_code != 0 + assert "Cannot use both" in result.output + + def test_results_flag_requires_job_id(self): + runner = CliRunner() + result = runner.invoke(run_cloudos_cli, [ + "interactive-session", "link", + "s3://bucket/folder/", + "--apikey", "key", + "--cloudos-url", "http://test.com", + "--workspace-id", "ws", + "--session-id", "sess123", + "--results", + ]) + assert result.exit_code != 0 + assert "--results" in result.output or "job-id" in result.output.lower() + + def test_neither_path_nor_job_id_is_error(self): + runner = CliRunner() + result = runner.invoke(run_cloudos_cli, [ + "interactive-session", "link", + "--apikey", "key", + "--cloudos-url", "http://test.com", + "--workspace-id", "ws", + "--session-id", "sess123", + ]) + assert result.exit_code != 0 + + @patch("cloudos_cli.__main__.get_shared_config", return_value={}) + @patch("cloudos_cli.configure.configure.ConfigurationProfile.load_profile_and_validate_data") + def test_top_level_folder_path_without_project_name_is_error(self, mock_config, _mock_shared): + mock_config.return_value = { + "apikey": "key", + "cloudos_url": "http://test.com", + "workspace_id": "ws", + "project_name": None, + } + runner = CliRunner() + result = runner.invoke(run_cloudos_cli, [ + "interactive-session", "link", + "Data/my-folder", + "--apikey", "key", + "--cloudos-url", "http://test.com", + "--workspace-id", "ws", + "--session-id", "sess123", + ]) + assert result.exit_code != 0 + assert "--project-name" in result.output + + @patch("cloudos_cli.interactive_session.cli._make_link_client") + @patch("cloudos_cli.configure.configure.ConfigurationProfile.load_profile_and_validate_data") + def test_s3_path_calls_link_folders_batch(self, mock_config, mock_make_client): + mock_config.return_value = { + "apikey": "key", + "cloudos_url": "http://test.com", + "workspace_id": "ws", + "project_name": "proj", + } + mock_client = MagicMock() + mock_client.link_folders_batch.return_value = True + mock_make_client.return_value = mock_client + + runner = CliRunner() + result = runner.invoke(run_cloudos_cli, [ + "interactive-session", "link", + "s3://bucket/folder/", + "--apikey", "key", + "--cloudos-url", "http://test.com", + "--workspace-id", "ws", + "--session-id", "sess123", + ]) + + mock_client.link_folders_batch.assert_called_once() + call_args = mock_client.link_folders_batch.call_args + assert call_args[0][0] == ["s3://bucket/folder/"] + assert call_args[0][1] == "sess123" + + @patch("cloudos_cli.interactive_session.cli._make_link_client") + @patch("cloudos_cli.configure.configure.ConfigurationProfile.load_profile_and_validate_data") + def test_file_explorer_path_infers_project(self, mock_config, mock_make_client): + mock_config.return_value = { + "apikey": "key", + "cloudos_url": "http://test.com", + "workspace_id": "ws", + "project_name": None, + } + mock_client = MagicMock() + mock_client.link_folders_batch.return_value = True + mock_make_client.return_value = mock_client + + runner = CliRunner() + result = runner.invoke(run_cloudos_cli, [ + "interactive-session", "link", + "my-project/Data/folder", + "--apikey", "key", + "--cloudos-url", "http://test.com", + "--workspace-id", "ws", + "--session-id", "sess123", + ]) + + mock_make_client.assert_called_once() + call_args = mock_make_client.call_args[0] + assert call_args[:4] == ("http://test.com", "key", "ws", "my-project") + call_args = mock_client.link_folders_batch.call_args + assert call_args[0][0] == ["Data/folder"] + + @patch("cloudos_cli.interactive_session.cli._make_link_client") + @patch("cloudos_cli.configure.configure.ConfigurationProfile.load_profile_and_validate_data") + def test_multi_project_paths_grouped_correctly(self, mock_config, mock_make_client): + mock_config.return_value = { + "apikey": "key", + "cloudos_url": "http://test.com", + "workspace_id": "ws", + "project_name": None, + } + mock_client = MagicMock() + mock_client.link_folders_batch.return_value = True + mock_make_client.return_value = mock_client + + runner = CliRunner() + result = runner.invoke(run_cloudos_cli, [ + "interactive-session", "link", + "proj-a/Data/folder1,proj-b/Data/folder2", + "--apikey", "key", + "--cloudos-url", "http://test.com", + "--workspace-id", "ws", + "--session-id", "sess123", + ]) + + # Two separate project groups → two client instantiations + assert mock_make_client.call_count == 2 + projects_called = {call[0][3] for call in mock_make_client.call_args_list} + assert projects_called == {"proj-a", "proj-b"} + + @patch("cloudos_cli.interactive_session.cli._make_link_client") + @patch("cloudos_cli.configure.configure.ConfigurationProfile.load_profile_and_validate_data") + def test_committed_count_accumulates_across_groups(self, mock_config, mock_make_client): + """committed_count passed to second group reflects paths already submitted.""" + mock_config.return_value = { + "apikey": "key", + "cloudos_url": "http://test.com", + "workspace_id": "ws", + "project_name": None, + } + mock_client = MagicMock() + mock_client.link_folders_batch.return_value = True + mock_make_client.return_value = mock_client + + runner = CliRunner() + runner.invoke(run_cloudos_cli, [ + "interactive-session", "link", + "proj-a/Data/f1,proj-a/Data/f2,proj-b/Data/f3", + "--apikey", "key", + "--cloudos-url", "http://test.com", + "--workspace-id", "ws", + "--session-id", "sess123", + ]) + + calls = mock_client.link_folders_batch.call_args_list + # First group (proj-a, 2 paths): committed_count=0 + # Second group (proj-b, 1 path): committed_count=2 + committed_counts = [c[1].get("committed_count", c[0][2] if len(c[0]) > 2 else 0) + for c in calls] + assert 0 in committed_counts + assert 2 in committed_counts diff --git a/tests/test_interactive_session/test_normalize_path.py b/tests/test_interactive_session/test_normalize_path.py new file mode 100644 index 00000000..add343ca --- /dev/null +++ b/tests/test_interactive_session/test_normalize_path.py @@ -0,0 +1,102 @@ +"""Unit tests for _normalize_file_explorer_path helper.""" + +import pytest +from cloudos_cli.interactive_session.cli import _normalize_file_explorer_path + + +class TestNormalizeFileExplorerPath: + """Tests for _normalize_file_explorer_path.""" + + # --- S3 / Azure paths are returned unchanged --- + + def test_s3_path_returned_unchanged(self): + path, project = _normalize_file_explorer_path("s3://bucket/prefix/", "my-project") + assert path == "s3://bucket/prefix/" + assert project is None + + def test_s3_file_path_returned_unchanged(self): + path, project = _normalize_file_explorer_path("s3://bucket/data/file.csv", "my-project") + assert path == "s3://bucket/data/file.csv" + assert project is None + + def test_s3_path_no_project_name_returns_none(self): + path, project = _normalize_file_explorer_path("s3://bucket/prefix/", None) + assert path == "s3://bucket/prefix/" + assert project is None + + def test_azure_path_returned_unchanged(self): + path, project = _normalize_file_explorer_path("az://container/blob/", "my-project") + assert path == "az://container/blob/" + assert project is None + + # --- Paths with no slash are treated as relative to project_name --- + + def test_single_segment_uses_project_name(self): + path, project = _normalize_file_explorer_path("Results", "my-project") + assert path == "Results" + assert project == "my-project" + + def test_single_segment_no_project_name(self): + path, project = _normalize_file_explorer_path("Results", None) + assert path == "Results" + assert project is None + + # --- Known root folders are treated as relative to project_name --- + + def test_data_folder_uses_project_name(self): + path, project = _normalize_file_explorer_path("Data/Downloads", "my-project") + assert path == "Data/Downloads" + assert project == "my-project" + + def test_data_folder_case_insensitive(self): + path, project = _normalize_file_explorer_path("data/Downloads", "my-project") + assert path == "data/Downloads" + assert project == "my-project" + + def test_analysesresults_folder_uses_project_name(self): + path, project = _normalize_file_explorer_path("AnalysesResults/run-1", "my-project") + assert path == "AnalysesResults/run-1" + assert project == "my-project" + + def test_analyses_results_underscore_uses_project_name(self): + path, project = _normalize_file_explorer_path("Analyses_Results/run-1", "my-project") + assert path == "Analyses_Results/run-1" + assert project == "my-project" + + def test_analyses_results_hyphen_uses_project_name(self): + path, project = _normalize_file_explorer_path("Analyses-Results/run-1", "my-project") + assert path == "Analyses-Results/run-1" + assert project == "my-project" + + def test_cohorts_folder_uses_project_name(self): + path, project = _normalize_file_explorer_path("Cohorts/cohort-a", "my-project") + assert path == "Cohorts/cohort-a" + assert project == "my-project" + + def test_known_root_deep_path_uses_project_name(self): + path, project = _normalize_file_explorer_path("Data/folder/subfolder/file.csv", "my-project") + assert path == "Data/folder/subfolder/file.csv" + assert project == "my-project" + + # --- Paths whose first segment is the project name --- + + def test_explicit_project_name_extracted(self): + path, project = _normalize_file_explorer_path("my-project/Data/file.csv", "other-project") + assert path == "Data/file.csv" + assert project == "my-project" + + def test_explicit_project_name_no_profile_project(self): + path, project = _normalize_file_explorer_path("my-project/Data/file.csv", None) + assert path == "Data/file.csv" + assert project == "my-project" + + def test_explicit_project_with_deep_path(self): + path, project = _normalize_file_explorer_path("proj/AnalysesResults/run-1/output.csv", None) + assert path == "AnalysesResults/run-1/output.csv" + assert project == "proj" + + def test_unknown_root_segment_treated_as_project(self): + """A first segment that is not a known root folder is inferred as project name.""" + path, project = _normalize_file_explorer_path("custom-folder/subfolder", "profile-project") + assert path == "subfolder" + assert project == "custom-folder"