`cuML` Barnes-Hut t-SNE collapses to a straight line on homogeneous embeddings

> [`cuML`’s t-SNE](https://docs.rapids.ai/api/cuml/nightly/api/generated/cuml.manifold.tsne/#tsne) supports three algorithms: the original exact algorithm (O(N^2)), the Barnes-Hut approximation and the fast Fourier transform interpolation approximation (O(N log N)). The latter two are derived from CannyLabs’ open-source CUDA code and produce extremely fast embeddings when n_components = 2. The exact algorithm is more accurate, but too slow to use on large datasets.

In the `embed_explore` / `precalculated` apps, t-SNE projection with the cuML (GPU) backend renders as a straight 45° line for the Darwin's-finches BioCLIP 2 embeddings. PCA & UMAP are fine, and switching the backend to sklearn produces a correct t-SNE, therefore this is specific to cuML's Barnes-Hut t-SNE, not the data or our pipeline. 

The app uses `cuML` t-SNE's default `method='barnes_hut'`, which collapses both output dimensions onto one axis on this data. `method='exact'` fixes it. **Barnes-Hut's degeneracy is data-dependent.** The finch embeddings are extremely homogeneous, near-uniform pairwise structure after L2-normalization. 

**Degenerate / collapsed embedding:**  2D output isn't a real spread but lies on a single line (one output axis carries ~all the variance; the two coordinates become perfectly correlated)

### `cuML` t-SNE result
<img width="1308" height="686" alt="Image" src="https://github.com/user-attachments/assets/1c7c5439-1fd8-486d-93ab-095c9022aca3" />

### `cuML` PCA result
<img width="1308" height="764" alt="Image" src="https://github.com/user-attachments/assets/38db8c49-2499-44e3-92ad-eac9756a954e" />

### `cuML` UMAP result
<img width="1330" height="734" alt="Image" src="https://github.com/user-attachments/assets/53cbbe08-af7f-4a8a-becf-ad8963ccdd5a" />

### `sklearn` t-SNE result
<img width="1419" height="752" alt="Image" src="https://github.com/user-attachments/assets/a53b57f5-f69d-4a71-a892-b24602854c9e" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`cuML` Barnes-Hut t-SNE collapses to a straight line on homogeneous embeddings #40

`cuML` t-SNE result

`cuML` PCA result

`cuML` UMAP result

`sklearn` t-SNE result

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

cuML Barnes-Hut t-SNE collapses to a straight line on homogeneous embeddings #40

Description

cuML t-SNE result

cuML PCA result

cuML UMAP result

sklearn t-SNE result

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`cuML` Barnes-Hut t-SNE collapses to a straight line on homogeneous embeddings #40

`cuML` t-SNE result

`cuML` PCA result

`cuML` UMAP result

`sklearn` t-SNE result