MLLib alternatives and similar packages
Based on the "Science and Data Analysis" category.
Alternatively, view Apache Spark alternatives based on common mentions on social networks and blogs.
PredictionIO9.9 0.0 MLLib VS PredictionIOPredictionIO, a machine learning server for developers and ML engineers.
Zeppelin9.8 8.7 L2 MLLib VS ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Smile9.7 9.1 L2 MLLib VS SmileStatistical Machine Intelligence & Learning Engine
BigDL9.7 10.0 MLLib VS BigDLFast, distributed, secure AI for Big Data
Spark Notebook9.5 0.0 L1 MLLib VS Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Breeze9.5 5.9 MLLib VS BreezeBreeze is a numerical processing library for Scala.
Algebird9.3 6.3 MLLib VS AlgebirdAbstract Algebra for Scala
Spire8.9 8.3 MLLib VS SpirePowerful new number types and numeric abstractions for Scala.
Tensorflow_scala8.0 5.0 MLLib VS Tensorflow_scalaTensorFlow API for the Scala Programming Language
Figaro8.0 4.0 MLLib VS FigaroFigaro Programming Language and Core Libraries
Squants7.7 7.2 MLLib VS SquantsThe Scala API for Quantities, Units of Measure and Dimensional Analysis
FACTORIE7.5 0.0 MLLib VS FACTORIEFACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
Saddle6.9 0.0 MLLib VS SaddleA minimalist port of Pandas to Scala
ND4S6.1 0.0 MLLib VS ND4SND4S: N-Dimensional Arrays for Scala. Scientific Computing a la Numpy. Based on ND4J.
Chalk5.7 0.0 MLLib VS ChalkChalk is a natural language processing library.
Compute.scala4.9 0.0 MLLib VS Compute.scalaScientific computing with N-dimensional arrays
Libra4.7 0.0 MLLib VS LibraA dimensional analysis library based on dependent types
Numsca4.5 5.1 MLLib VS Numscanumsca is numpy for scala
OpenMOLE4.4 9.7 MLLib VS OpenMOLEWorkflow engine for exploration of simulation models using high throughput computing
Clustering4Ever4.0 0.0 MLLib VS Clustering4EverC4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Optimus * 964.0 5.3 MLLib VS Optimus * 96Optimus is a mathematical programming library for Scala.
rscala3.8 0.0 MLLib VS rscalaThe Scala interpreter is embedded in R and callbacks to R from the embedded interpreter are supported. Conversely, the R interpreter is embedded in Scala.
LoMRF3.2 0.0 MLLib VS LoMRFLoMRF is an open-source implementation of Markov Logic Networks
Tyche2.9 0.0 MLLib VS TycheStatistics utilities for the JVM - in Scala!
MGO2.8 1.2 MLLib VS MGOPurely functional genetic algorithms for multi-objective optimisation
Synapses2.5 0.0 MLLib VS SynapsesA group of neural-network libraries for functional and mainstream languages
Rings2.5 0.0 MLLib VS RingsRings: efficient JVM library for polynomial rings
Axle2.5 9.0 MLLib VS AxleAxle Domain Specific Language for Scientific Cloud Computing and Visualization
SwiftLearner2.2 0.0 MLLib VS SwiftLearnerSwiftLearner: Scala machine learning library
Persist-Units0.9 0.0 MLLib VS Persist-UnitsScala Units of Measure Types
OscaR0.2 - MLLib VS OscaRa Scala toolkit for solving Operations Research problems
Clean code begins in your IDE with SonarLint
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of MLLib or a related project?
Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
You can find the latest Spark documentation, including a programming guide, on the project web page. This README file only contains basic setup instructions.
Spark is built using Apache Maven. To build Spark and its example programs, run:
./build/mvn -DskipTests clean package
(You do not need to do this if you downloaded a pre-built package.)
More detailed documentation is available from the project site, at "Building Spark".
For general development tips, including info on developing Spark using an IDE, see "Useful Developer Tools".
Interactive Scala Shell
The easiest way to start using Spark is through the Scala shell:
Try the following command, which should return 1,000,000,000:
scala> spark.range(1000 * 1000 * 1000).count()
Interactive Python Shell
Alternatively, if you prefer Python, you can use the Python shell:
And run the following command, which should also return 1,000,000,000:
>>> spark.range(1000 * 1000 * 1000).count()
Spark also comes with several sample programs in the
To run one of them, use
./bin/run-example <class> [params]. For example:
will run the Pi example locally.
You can set the MASTER environment variable when running examples to submit
examples to a cluster. This can be a mesos:// or spark:// URL,
"yarn" to run on YARN, and "local" to run
locally with one thread, or "local[N]" to run locally with N threads. You
can also use an abbreviated class name if the class is in the
package. For instance:
MASTER=spark://host:7077 ./bin/run-example SparkPi
Many of the example programs print usage help if no params are given.
Testing first requires building Spark. Once Spark is built, tests can be run using:
Please see the guidance on how to run tests for a module, or individual tests.
There is also a Kubernetes integration test, see resource-managers/kubernetes/integration-tests/README.md
A Note About Hadoop Versions
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported storage systems. Because the protocols have changed in different versions of Hadoop, you must build Spark against the same version that your cluster runs.
Please refer to the build documentation at "Specifying the Hadoop Version and Enabling YARN" for detailed guidance on building for a particular distribution of Hadoop, including building for particular Hive and Hive Thriftserver distributions.
Please refer to the Configuration Guide in the online documentation for an overview on how to configure Spark.
Please review the Contribution to Spark guide for information on how to get started contributing to the project.