Popularity

4.0

Stable

Activity

0.0

Stable

Stars 127

Watchers 21

Forks 13

Last Commit about 3 years ago

Programming language: Scala

License: Apache License 2.0

Tags: Science And Data Analysis

Latest version: v0.9.6

Clustering4Ever alternatives and similar packages

Based on the "Science and Data Analysis" category.
Alternatively, view Clustering4Ever alternatives based on common mentions on social networks and blogs.

MLLib

10.0 10.0 Clustering4Ever VS MLLib

Apache Spark - A unified analytics engine for large-scale data processing
PredictionIO

9.9 0.0 Clustering4Ever VS PredictionIO

DISCONTINUED. PredictionIO, a machine learning server for developers and ML engineers.

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

Zeppelin

9.8 8.7 L2 Clustering4Ever VS Zeppelin

Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Smile

9.7 9.8 L2 Clustering4Ever VS Smile

Statistical Machine Intelligence & Learning Engine
BigDL

9.7 9.9 Clustering4Ever VS BigDL

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, ModelScope, etc.
Spark Notebook

9.5 0.0 L1 Clustering4Ever VS Spark Notebook

Interactive and Reactive Data Science using Scala and Spark.
Breeze

9.5 5.1 Clustering4Ever VS Breeze

Breeze is a numerical processing library for Scala.
Algebird

9.3 7.6 Clustering4Ever VS Algebird

Abstract Algebra for Scala
Spire

8.9 6.0 Clustering4Ever VS Spire

Powerful new number types and numeric abstractions for Scala.
Tensorflow_scala

8.0 0.0 Clustering4Ever VS Tensorflow_scala

TensorFlow API for the Scala Programming Language
Figaro

8.0 0.0 Clustering4Ever VS Figaro

Figaro Programming Language and Core Libraries
Squants

7.8 3.2 Clustering4Ever VS Squants

The Scala API for Quantities, Units of Measure and Dimensional Analysis
FACTORIE

7.5 0.0 Clustering4Ever VS FACTORIE

FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
Saddle

6.9 0.0 Clustering4Ever VS Saddle

DISCONTINUED. A minimalist port of Pandas to Scala
ND4S

6.1 0.0 Clustering4Ever VS ND4S

DISCONTINUED. ND4S: N-Dimensional Arrays for Scala. Scientific Computing a la Numpy. Based on ND4J.
Chalk

5.7 0.0 Clustering4Ever VS Chalk

DISCONTINUED. Chalk is a natural language processing library.
Compute.scala

4.8 0.0 Clustering4Ever VS Compute.scala

Scientific computing with N-dimensional arrays
Libra

4.5 0.0 Clustering4Ever VS Libra

A dimensional analysis library based on dependent types
OpenMOLE

4.4 9.4 Clustering4Ever VS OpenMOLE

Workflow engine for exploration of simulation models using high throughput computing
Numsca

4.4 2.7 Clustering4Ever VS Numsca

numsca is numpy for scala
Optimus * 96

4.0 0.0 Clustering4Ever VS Optimus * 96

Optimus is a mathematical programming library for Scala.
rscala

3.6 6.1 Clustering4Ever VS rscala

The Scala interpreter is embedded in R and callbacks to R from the embedded interpreter are supported. Conversely, the R interpreter is embedded in Scala.
LoMRF

3.2 0.0 Clustering4Ever VS LoMRF

LoMRF is an open-source implementation of Markov Logic Networks
Tyche

2.9 0.0 Clustering4Ever VS Tyche

Statistics utilities for the JVM - in Scala!
MGO

2.8 5.6 Clustering4Ever VS MGO

Purely functional genetic algorithms for multi-objective optimisation
Rings

2.7 3.5 Clustering4Ever VS Rings

Rings: efficient JVM library for polynomial rings
Synapses

2.5 0.0 Clustering4Ever VS Synapses

A group of neural-network libraries for functional and mainstream languages
Axle

2.4 5.5 Clustering4Ever VS Axle

Axle Domain Specific Language for Scientific Cloud Computing and Visualization
SwiftLearner

2.2 0.0 Clustering4Ever VS SwiftLearner

SwiftLearner: Scala machine learning library
Persist-Units

0.9 0.0 Clustering4Ever VS Persist-Units

Scala Units of Measure Types
OscaR

0.2 - Clustering4Ever VS OscaR

a Scala toolkit for solving Operations Research problems

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of Clustering4Ever or a related project?

Add another 'Science and Data Analysis' Package

Popular Comparisons

README

Clustering :four: Ever

Welcome to Clustering:four:Ever, a Big Data Clustering Library gathering clustering, unsupervised algorithms, and quality indices. Don't hesitate to check our Wiki, ask questions or make recommendations in our Gitter.

API documentation

Include it in your project

Add following line in your build.sbt :

"org.clustering4ever" % "clustering4ever_2.11" % "0.11.0" to your libraryDependencies

Eventually add one of these resolvers :

resolvers += Resolver.bintrayRepo("clustering4ever", "C4E")
resolvers += "mvnrepository" at "http://mvnrepository.com/artifact/"

You can also take specifics parts (Core, ScalaClustering, ...) from Bintray or Maven.

Available algorithms

emphasized algorithms are in Scala.
bold algorithms are implemented in Spark.
They can be available in both versions

Clustering algorithms

Jenks Natural Breaks
Epsilon Proximity*
- Scalar Epsilon Proximity*, Binary Epsilon Proximity*, Mixed Epsilon Proximity*, Any Object Epsilon Proximity*
K-Centers*
- K-Means*, K-Modes*, K-Prototypes*, Any Object K-Centers*
Gaussian Mixture
Self Organizing Maps (Original project)
G-Stream (Original project)
PatchWork (Original project)
Random Local Area *
OPTICS *
Clusterwize
Tensor Biclustering algorithms (Original project)
- Folding-Spectral, Unfolding-Spectral, Thresholding Sum Of Squared Trajectory Length, Thresholding Individuals Trajectory Length, Recursive Biclustering, Multiple Biclustering
Ant-Tree *
- Continuous Ant-Tree, Binary Ant-Tree, Mixed Ant-Tree
DC-DPM (Original project) - Distributed Clustering based on Dirichlet Process Mixture
SG2Stream

Algorithm followed with a * can be executed by benchmarking classes.

Preprocessing

UMAP
Gradient Ascent (Mean-Shift related)
- Scalar Gradient Ascent, Binary Gradient Ascent, Mixed Gradient Ascent, Any Object Gradient Ascent
Rough Set Features Selection

Quality Indices

You can realize manually your quality measures with dedicated class for local or distributed collection. Helpers ClustersIndicesAnalysisLocal and ClustersIndicesAnalysisDistributed allow you to test indices on multiple clustering at once.

Internal Indices
- Davies Bouldin
- Ball Hall
External Indices
- Multiple Classification
  - Mutual Information, Normalized Mutual Information
  - Purity
  - Accuracy, Precision, Recall, fBeta, f1, RAND, ARAND, Matthews correlation coefficient, CzekanowskiDice, RogersTanimoto, FolkesMallows, Jaccard, Kulcztnski, McNemar, RusselRao, SokalSneath1, SokalSneath2
- Binary Classification
  - Accuracy, Precision, Recall, fBeta, f1

Clustering benchmarking and analysis

Using classes ClusteringChainingLocal, BigDataClusteringChaining, DistributedClusteringChaining, and ChainingOneAlgorithm descendants you have the possibility to run multiple clustering algorithms respectively locally and parallel, in a sequentially distributed way, and parallel on a distributed system, locally and parallel, generate much vectorization of the data whilst keeping active information on each clustering including used vectorization, clustering model, clustering number and clustering arguments.

Classes ClustersIndicesAnalysisLocal and ClustersIndicesAnalysisDistributed are devoted for clustering indices analysis.

Classes ClustersAnalysisLocal and ClustersAnalysisDistributed will be used to describe obtained clustering in terms of distributions, proportions of categorical features...

Incoming soon (developped by our team)

DESOM:Deep Embedded Self-Organizing Map: Joint Representation Learning and Self-Organization
SOM:Kohonen self-organizing map
SOMperf: SOM performance metrics and quality indices
skstab is a module for clustering stability analysis in Python with a scikit-learn compatible API
FunCLBM: Functional Conditional Latent Block Model
Spark Time Series Set data analysis
UMAP
Gaussian Mixture
DBScan
Bayesian Optimization for AutoML

Citation

If you publish material based on information obtained from this repository, then, in your acknowledgements, please note the assistance you received by using this community work. This will help others to obtain the same information and replicate your experiments, because having results is cool but being able to compare to others is better. Citation: @misc{C4E, url = “https://github.com/Clustering4Ever/Clustering4Ever“, institution = “Paris 13 University, LIPN UMR CNRS 7030”}

C4E-Notebooks examples

Basic usages of implemented algorithms are exposed with BeakerX and Jupyter notebook through binder :arrow_right: .

They also can be downloaded directly from our Notebooks repository under different format as Jupyter or SparkNotebook.

Miscellaneous

Helper functions to generate Clusterizable collections

You can easily generate your collections with basic Clusterizable using helpers in org.clustering4ever.util.{ArrayAndSeqTowardGVectorImplicit, ScalaCollectionImplicits, SparkImplicits} or explore Clusterizable and EasyClusterizable for more advanced usages.