Questions tagged [umap]

8 questions
2
votes
1 answer

How to improve the preservation of the global data structure in UMAP?

I have a dataset, where the features are comprised of points arranged in a regular grid on a simplex. Each of these points are defined as follows: A point $\mathbf{x}$ on the simplex can be represented as a vector in $\mathbb{R}^n$ such…
1
vote
2 answers

What are the fastest dimensional reduction techniques to use out of the box

I am working on an ML project where we would like to visualize movements in a high-dimensional but sparse vector space (e.g. a 1x75 vector where most of the entries are either one-hot encoded binary or modulo 3). Since the visualization is mainly to…
1
vote
0 answers

Using UMAP on text data (euclidean distance on jaccard distance matrix)

I am checking the capabilities of the UMAP dimensionality reduction algorithm, I am not sure whether the approach I am using is valid and does not violate the rules/limitations of this algorithm. Purpose: visualization (and subsequent grouping) of…
rkabuk
  • 21
  • 4
1
vote
2 answers

When visualizing graph nodes, should I use apply PCA to node2vec embedding?

I am trying to visualize graph nodes using node2vec embedding. The node2vec embeddings has lengths of 50~100 dimensions. I have two plans: use umap to project node2vec embeddings to 2D space use PCA to project node2vec embeddings to a slightly…
Sijie Chen
  • 11
  • 2
0
votes
0 answers

With multiple identical datapoints, should I use UMAP min_dist = 0?

Most (if not all) implementations/ examples of UMAP dimensionality reduction I have seen use a min_dist value of slightly above zero in order to avoid too tight clustering of points. It makes sense, but I noticed that I have a significant number of…
Christoph
  • 101
0
votes
1 answer

How do I interpret low dimentional embeddings of high dimentional embeddings?

I am trying to understand what I am supposed to learn about a problem when using dimensionality reduction methods. In particular, I am referring to methods like t-SNE and UMAP. For the most part I am told that I should be using these methods to…
0
votes
2 answers

Why is UMAP used in combination with other Clustering Algorithm?

I've noticed that UMAP is often used in combination with other clustering algorithms, such as K-means, DBSCAN, HDBSCAN. However, from what I've understood, UMAP can be used for clustering tasks. So why I've noticed people using it primarily as a…
coelidonum
  • 103
  • 1
  • 3
0
votes
1 answer

MEL VS linear spectrograms for bioacoustics machine learning

I don't have background in bioacoustics but working on a data-science project in bioacoustics. I am working with animal vocalizations recorded at sampling rate of 250000. Animals are bats, which are known to produce sounds in high frequency. In…