User Tools

Site Tools


anadat:cs:exercises:cv2

Clustering

Let's continue with clustering sites according to their distances calculated in the previous excercise.

  1. Cluster sites based on their community dissimilarities. Consecutively make 6 clusterings using the combination of two distances (hellinger and bray-curtis) and three clustering algorithms:
    1. single (Nearest neighbour)
    2. complete (Farthest neighbour)
    3. average (UPGMA) hint
  2. Plot the 6 dendrograms. hint
  3. To validate how well do clusterings preserve original distances, calculate correlations between cophenetic and original distances and plot them against each other. Which clustering preserves best the original distances? hint
  4. Since simple linkage clustering lead to a strong chaining of sites preventing an easy interpretation of clusters we will stick to the results of the other two clustering algorithms: complete linkage a UPGMA. Find the number of clusters for each clustering to obtain approximately 4 reasonable clusters. I.e. find the distance level at which about 4 clusters are developed. Ignor simple clusters containing just one sample.
  5. Identify which samples belong to which clusters using cutree().
  6. Use table() to compare the groupings of sites.
  7. Which groupings are the most similar and which are the most different?
anadat/cs/exercises/cv2.txt · Last modified: 2017/04/15 12:04 by vitek