Skip to content

Determine the most stable (optimal) number of clusters via Progeny Clustering algorithm.

The is_pclust() function checks whether an object is class pclust. See inherits().

Usage

progeny_cluster(
  data,
  clust_iter = 2:10L,
  reps = 10L,
  verbose = interactive(),
  ...
)

is_pclust(x)

# S3 method for class 'pclust'
plot(x, ...)

# S3 method for class 'pclust'
print(x, ...)

Arguments

data

A (n x p) data matrix containing n samples and p features. Can also be a data frame where each row corresponds to a sample or observation, whereas each column corresponds to a feature or variable.

clust_iter

integer(n). Span of k clusters to interrogate

reps

integer(1). The number of repeat iterations to perform. Particularly useful if error bars during plotting are desired.

verbose

logical(1). Print the progress of the clustering repeats to the console. Defaults to interactive().

...

Additional parameters passed to the internal progeny_k(), typically iter = and size =. For the plot() method, arguments passed to the corresponding graphics device.

x

A pclust class object (or an object to be tested for one).

Value

A pclust class object, a list containing:

scores

A matrix of stability scores for each iteration in a matrix, with k columns

mean_scores

The mean stability scores for each cluster k

ci95_scores

95% confidence interval scores

random_scores

The reference (random) scores for each iteration at each clustering level (k)

mean_random_scores

The mean of the reference (random) data set, i.e. column means of random_scores

D_max

The distance between the mean stability scores and the mean reference scores for each cluster k

D_gap

The "gap" distance metric for neighboring cluster k differences. See original paper for reference.

clust_iter

Integer Sequence of k clusters interrogated

repeats

The number of repeat iterations to performed

iter

The number of progeny iterations to performed

size

The progeny size used in each iteration

call

The call made to progeny_cluster()

is_pclust() returns a logical boolean.

References

Hu, C.W., Kornblau, S.M., Slater, J.H. and A.A. Qutub (2015). Progeny Clustering: A Method to Identify Biological Phenotypes. Scientific Reports, 5:12894. http://www.nature.com/articles/srep12894

See also

Author

Stu Field

Examples

# `iter=` and `size=` are passed to `progeny_k()`
pclust <- withr::with_seed(1234,
  progeny_cluster(progeny_data, clust_iter = 2:9L, iter = 20L, size = 6)
)
pclust
#> ══ Progeny Cluster Object ═════════════════════════════════════════════
#>    Call                     progeny_cluster(data = progeny_data, clust_iter = 2:9L, iter = 20L, size = 6)
#>    Progeny Size             6
#>    No. of Iterations        20
#>    K Iterations             2 3 4 5 6 7 8 9
#> 
#> ── Mean & CI95 Stability Scores ───────────────────────────────────────
#>        k=2  k=3   k=4  k=5  k=6  k=7  k=8 k=9*
#> 2.5%  2.66 15.4  8.24 12.8 10.5 16.4 14.7 19.6
#>       3.63 21.0 12.70 15.5 13.5 18.2 20.0 24.6
#> 97.5% 5.49 33.1 15.89 21.2 16.7 20.4 23.1 29.8
#> 
#> ── Maximum Distance Scores ────────────────────────────────────────────
#>    k=2   k=3*    k=4    k=5    k=6    k=7    k=8    k=9 
#> -2.073 13.031 -0.526  3.361 -3.404 -2.778 -7.524 -9.303 
#> 
#> ── Gap Distance Scores ────────────────────────────────────────────────
#>    k=2   k=3*    k=4    k=5    k=6    k=7    k=8    k=9 
#> -25.76  25.76 -11.17   4.89  -6.82   3.05  -2.93   2.93 
#> ═══════════════════════════════════════════════════════════════════════

# Test progeny clustering on iris data set
# Doesn't work quite as well as the simulated data set
clust_iris <- withr::with_seed(99,
  progeny_cluster(iris[, -5L], clust_iter = 2:5L, size = 6L, iter = 50)
)
#> Warning: did not converge in 20 iterations
#> Warning: did not converge in 20 iterations
clust_iris    # true n clusters = 3
#> ══ Progeny Cluster Object ═════════════════════════════════════════════
#>    Call                     progeny_cluster(data = iris[, -5L], clust_iter = 2:5L, size = 6L, iter = 50)
#>    Progeny Size             6
#>    No. of Iterations        50
#>    K Iterations             2 3 4 5
#> 
#> ── Mean & CI95 Stability Scores ───────────────────────────────────────
#>       k=2*   k=3  k=4  k=5
#> 2.5%   299  51.2 21.9 41.0
#>        299  82.9 27.6 51.4
#> 97.5%  299 194.9 39.4 72.8
#> 
#> ── Maximum Distance Scores ────────────────────────────────────────────
#>  k=2*   k=3   k=4   k=5 
#> 292.4  74.5  17.9  39.5 
#> 
#> ── Gap Distance Scores ────────────────────────────────────────────────
#> k=2*  k=3  k=4  k=5 
#>  176 -176  -79   79 
#> ═══════════════════════════════════════════════════════════════════════

# Test for class `pclust`
is_pclust(pclust)
#> [1] TRUE

# S3 plot method
plot(pclust)


plot(clust_iris)