Training T-SNE Clustering

T-SNE is mostly useful for data visualization.

Data format

T-SNE use CSV data format, see the relevant CSV data section above.

Training for a T-SNE visualization

Using DD platform, from a JupyterLab notebook, start from the code on the right.

T-SNE notebook snippet:

tsne_mnist = TSNE_CSV(
    training_repo = '',
    iterations = 5000,
    perplexity = 30

Building a T-SNE plot after training has completed:


Screening the T-SNE plot with per-class colours:

import pandas as pd
df_orig = pd.read_csv("/path/to/mnist_train.csv")
tsne_mnist.plot(s=10, marker='^', c=df_orig.label, cmap='jet')

This runs a T-SNE compression job with the following parameters:

  • tsne_mnist is the example job name
  • training_repo specifies the location of the data

  • iterations specifies the maximum number of iterations

  • perplexity is related to the number of nearest neighbors used to learn the underlying manifold.

Once training has completed, the following steps on the right can be used to generate the plot below:

T-SNE plot