|
Multivariate classification & analysis
Classification
Classification
Society
of North America (CSNA)
Metasite with links to classification meetings, journals,
discussion groups, commercial and on-line software.
CLUSFIND: DAISY,
PAM, CLARA, FANNY, AGNES, DIANA & MONA
Collection of multivariate clustering techniques implemented in
the core R package. DAISY computes
dissimilarities between objects with different types of
variables. Partitioning Around Medoids (PAM) partitions the
dataset using the k-medoid method which is robust against outliers.
Clustering Large Applications (CLARA) partitions large data sets.
Fuzzy Analysis (FANNY) give a fuzze partitioning. Agglomerative
clustering (AGNES) and divisive clustering (DIANA) give hierarchical
structures. Monothetic Analysis (MONA) uses binary variables.
This site gives stand-alone Fortran implementations. From the
book Finding Groups in Data: An
Introduction to Cluster Analysis by L. Kaufman and P. J.
Rousseeuw (1987).
Normal mixture models
Several codes are available that classify and characterize multivariate
datasets as mixtures of Gaussian populations via likelihood methods,
often using the EM Algorithm and Bayesian principles. Snob uses the
minimum message length method of machine learning.
EMMIX by
G. McLachlan of University of Queensland
MCLUST by
C. Fraley and A. Raftery of University of Washington
AutoClass Cby P. Cheeseman of NASA's Ames Research Center
Snob by
D. Dowe of Monash University
FastEM by the Auton Lab (CMU) and the PiCA Collaboration
Weka Knowledge
Explorer
Machine learning algorithms for data mining including
multivariate classifiers, decision trees, neural nets, GUIs, resampling
and more. In Java
Machine Learning
Library in C++ (MLC++)
Data mining and multivariate classification package including
data manipulation, variety of categorizers (on attributes, thresholds,
nearest neighbor, perceptron, decision tree ), induction algorithms,
and visualization tools of data and trees. From Silicon Graphics
Inc.
GRB Tool Shed
Interactive environment for the analysis of astronomical
gamma-ray bursts from NASA's BATSE experiment. Emphases
multivariate classification including supervised decision trees,
K* nearest neighbor, Naive Bayes, normal mixtures using the EM
Algorithm, K means, COBWEB, backpropagation neutral networks, and
Kohonen networks. Based on the Weka machine learning
package. By Jon Hakkila (College of Charleston) and colleagues.
Feasible solution
algorithms
Algorithms for the common high breakdown estimation criteria, and
to find the minimum volume ellipsoid in multivariate datasets. By D.
Hawkins, University of Minnesota, and distributed by Statlib.
Oblique
classifier 1 (OC1)
Partitioning of multivariate datasets using oblique and
axis-parallel hyperplanes. Written in C by S. Salzbert of Johns Hopkins
University.
Software
for clustering and multivariate analysis
Metasite with descriptions of on-line programs and
packages. From Fionn Murtagh (Univ. London)
Dysect
Clustering algorithm based on dynamic altering of hierarchies.
Fast Algorithm for
Classification Trees"
Tree-structures classification similar to CART.
Cluster
Library of several dozen subroutines from NIST for multivariate
clustering algorithm from 1975 monobraph by J. A. Hartigan.
Cluster
analysis
Six programs computing dissimilarities, partitioning using
medoids, k-medoid clustering, fuzzy clustering, agglomerative and
divisive hierarchical clustering, clustering of binary data.
CLUSBAS
Average-linkage hierarchical clustering.
Hierarchical
clustering
Algorithm for agglomerative clustering using various criteria
(Ward's minimum variance, single linkage, average linkage, complete
linkage, McQuitty's method, median method, centroid method).
Hierarchical
clustering
Algorithm for single-linkage and minimum intra-cluster variance
clustering. Applied Statistics algorithm #58.
k-means clustering,
k-means clustering minimizing intra-cluster variance.
Multivariate analysis
R Package
Package in Pascal developed for ecological spatio-temporal
multivariate datasets based on monograph by L. & P. Legendre
(1983). Functionalities include autocorrelation using correlograms
(Moran's I and Geary's c indices), hierarchical agglomerative
clustering, k-means clustering, chronological clustering for
multivariate time series, analysis of variance, geometrical connectors,
(nearest neighbor, Gabriel's connection, Delaunay triangulation),
Mantel's two-sample statistic, multidimensional scaling by principal
coordinates analysis, univariate periodogram. [This package
should not be confused with the enormous R statistical package modeled
after S-Plus.]
ADE-4
Large multivariate analysis and graphical display package
designed for ecologists and geographers. Includes principal components
analysis with instrumental variables, correspondance analysis,
coinertia analysis, contingency tables, discriminant analysis,fuzzy
correspondance analysis, Rao's diversity coefficient, Moran's I and
Geary'c randomization tests for spatial autocorrelation, Wartenberg's
multivariate spatial
correlation analysis, partial triadic analysis of k-tables. From
the bioinformatic group at Universite de Lyon for Macintosh
and Windows 95 platforms.
Fast
Minimum Covariance Determinant (MCD)
This is a highly robust estimator of multivariate location and
scatter based on the subset of points whose covariance matrix has the
lowest determinant. Efficient method for large datasets. By
P. Rousseeuw and K. Van Driessen of University of Antwerp.
Minimum
Volume Ellipsoid (MINVOL)
Computes highly robust location and scatter matrix. By P.
Rousseeuw of University of Antwerp.
Multivariate
data analysis software
Collection of subroutines for principal components analysis,
partitioning, hierarchical clustering. discriminant analyses (linear,
multiple, k-nearest neighbors), correspondence analysis,
multidimensional scaling, Sammon mapping, Kohonen self-organizing
feature map. From Fionn Murtagh (Univ. London).
MicrOsiris
Self-contained data management and analysis system well-adapted
to very large multivariate datasets. Includes fast searches
and data minin, ANOVA, linear modeling, clustering, life table
analysis. For Windows.
IPP
Interactive Projection Pursuit, providing 1- and 2-dimensional
projections of multivariate data for interactive discovery of
structure. The user chooses and graphically investigates interesting
projections. From Case Western Reserve University. C and Fortran
algorithms installed as a library for S-Plus.
Projection
pursuit
Two-dimensional exploratory projection pursuit.
Multivariate
skewness and kurtosis
Probabilities of
R2
Distribution function of the square multiple correlation
coefficient
Linear
dependency analysis for multivariate data
Blah
Multivariate
linear regression by least median of squares.
Minimum volume
ellipsoid estimator
Robust estimator of multivariate location and dispersion.
Hypo
Hypothesis testing for means and spreads for multivariate
Gaussian data.
|