Clustering4Ever alternatives and similar packages
Based on the "Science and Data Analysis" category.
Alternatively, view Clustering4Ever alternatives based on common mentions on social networks and blogs.
-
MLLib
Apache Spark - A unified analytics engine for large-scale data processing -
PredictionIO
PredictionIO, a machine learning server for developers and ML engineers. -
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more. -
Spark Notebook
Interactive and Reactive Data Science using Scala and Spark. -
Spire
Powerful new number types and numeric abstractions for Scala. -
Tensorflow_scala
TensorFlow API for the Scala Programming Language -
Squants
The Scala API for Quantities, Units of Measure and Dimensional Analysis -
FACTORIE
FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference. -
ND4S
ND4S: N-Dimensional Arrays for Scala. Scientific Computing a la Numpy. Based on ND4J. -
Compute.scala
Scientific computing with N-dimensional arrays -
OpenMOLE
Workflow engine for exploration of simulation models using high throughput computing -
Optimus * 96
Optimus is a mathematical programming library for Scala. -
rscala
The Scala interpreter is embedded in R and callbacks to R from the embedded interpreter are supported. Conversely, the R interpreter is embedded in Scala. -
LoMRF
LoMRF is an open-source implementation of Markov Logic Networks -
MGO
Purely functional genetic algorithms for multi-objective optimisation -
Synapses
A group of neural-network libraries for functional and mainstream languages -
Axle
Axle Domain Specific Language for Scientific Cloud Computing and Visualization
Access the most powerful time series database as a service
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of Clustering4Ever or a related project?
README
Clustering :four: Ever

Welcome to Clustering:four:Ever, a Big Data Clustering Library gathering clustering, unsupervised algorithms, and quality indices. Don't hesitate to check our Wiki, ask questions or make recommendations in our Gitter.
API documentation
Include it in your project
Add following line in your build.sbt :
"org.clustering4ever" % "clustering4ever_2.11" % "0.11.0"
to yourlibraryDependencies
Eventually add one of these resolvers :
resolvers += Resolver.bintrayRepo("clustering4ever", "C4E")
resolvers += "mvnrepository" at "http://mvnrepository.com/artifact/"
You can also take specifics parts (Core, ScalaClustering, ...) from Bintray or Maven.
Available algorithms
- emphasized algorithms are in Scala.
- bold algorithms are implemented in Spark.
- They can be available in both versions
Clustering algorithms
- Jenks Natural Breaks
- Epsilon Proximity
*
- Scalar Epsilon Proximity
*
, Binary Epsilon Proximity*
, Mixed Epsilon Proximity*
, Any Object Epsilon Proximity*
- Scalar Epsilon Proximity
- K-Centers
*
- K-Means
*
, K-Modes*
, K-Prototypes*
, Any Object K-Centers*
- K-Means
- Gaussian Mixture
- Self Organizing Maps (Original project)
- G-Stream (Original project)
- PatchWork (Original project)
- Random Local Area
*
- OPTICS
*
- Clusterwize
- Tensor Biclustering algorithms (Original project)
- Folding-Spectral, Unfolding-Spectral, Thresholding Sum Of Squared Trajectory Length, Thresholding Individuals Trajectory Length, Recursive Biclustering, Multiple Biclustering
- Ant-Tree
*
- Continuous Ant-Tree, Binary Ant-Tree, Mixed Ant-Tree
- DC-DPM (Original project) - Distributed Clustering based on Dirichlet Process Mixture
- SG2Stream
Algorithm followed with a *
can be executed by benchmarking classes.
Preprocessing
- UMAP
- Gradient Ascent (Mean-Shift related)
- Scalar Gradient Ascent, Binary Gradient Ascent, Mixed Gradient Ascent, Any Object Gradient Ascent
- Rough Set Features Selection
Quality Indices
You can realize manually your quality measures with dedicated class for local or distributed collection. Helpers ClustersIndicesAnalysisLocal and ClustersIndicesAnalysisDistributed allow you to test indices on multiple clustering at once.
- Internal Indices
- Davies Bouldin
- Ball Hall
- External Indices
- Multiple Classification
- Mutual Information, Normalized Mutual Information
- Purity
- Accuracy, Precision, Recall, fBeta, f1, RAND, ARAND, Matthews correlation coefficient, CzekanowskiDice, RogersTanimoto, FolkesMallows, Jaccard, Kulcztnski, McNemar, RusselRao, SokalSneath1, SokalSneath2
- Binary Classification
- Accuracy, Precision, Recall, fBeta, f1
- Multiple Classification
Clustering benchmarking and analysis
Using classes ClusteringChainingLocal, BigDataClusteringChaining, DistributedClusteringChaining, and ChainingOneAlgorithm descendants you have the possibility to run multiple clustering algorithms respectively locally and parallel, in a sequentially distributed way, and parallel on a distributed system, locally and parallel, generate much vectorization of the data whilst keeping active information on each clustering including used vectorization, clustering model, clustering number and clustering arguments.
Classes ClustersIndicesAnalysisLocal and ClustersIndicesAnalysisDistributed are devoted for clustering indices analysis.
Classes ClustersAnalysisLocal and ClustersAnalysisDistributed will be used to describe obtained clustering in terms of distributions, proportions of categorical features...
Incoming soon (developped by our team)
- DESOM:Deep Embedded Self-Organizing Map: Joint Representation Learning and Self-Organization
- SOM:Kohonen self-organizing map
- SOMperf: SOM performance metrics and quality indices
- skstab is a module for clustering stability analysis in Python with a scikit-learn compatible API
- FunCLBM: Functional Conditional Latent Block Model
- Spark Time Series Set data analysis
- UMAP
- Gaussian Mixture
- DBScan
- Bayesian Optimization for AutoML
Citation
If you publish material based on information obtained from this repository, then, in your acknowledgements, please note the assistance you received by using this community work. This will help others to obtain the same information and replicate your experiments, because having results is cool but being able to compare to others is better.
Citation: @misc{C4E, url = “https://github.com/Clustering4Ever/Clustering4Ever“, institution = “Paris 13 University, LIPN UMR CNRS 7030”}
C4E-Notebooks examples
Basic usages of implemented algorithms are exposed with BeakerX and Jupyter notebook through binder :arrow_right:
.
They also can be downloaded directly from our Notebooks repository under different format as Jupyter or SparkNotebook.
Miscellaneous
Helper functions to generate Clusterizable collections
You can easily generate your collections with basic Clusterizable
using helpers in org.clustering4ever.util.{ArrayAndSeqTowardGVectorImplicit, ScalaCollectionImplicits, SparkImplicits}
or explore Clusterizable
and EasyClusterizable
for more advanced usages.
References
What data structures are recommended for best performances
ArrayBuffer or ParArray as vector containers are recommended for local applications, if data is bigger don't hesitate to pass to RDD.