BigDL alternatives and similar packages
Based on the "Science and Data Analysis" category.
Alternatively, view BigDL alternatives based on common mentions on social networks and blogs.
-
PredictionIO
machine learning server for developers and data scientists. Built on Apache Spark, HBase and Spray -
Smile
Statistical Machine Intelligence and Learning Engine. Smile is a fast and comprehensive machine learning system. -
Spark Notebook
Scalable and stable Scala and Spark focused notebook bridging the gap between JVM and Data Scientists (incl. extendable, typesafe and reactive charts). -
Figaro
Figaro is a probabilistic programming language that supports development of very rich probabilistic models. -
Tensorflow_scala
TensorFlow API for the Scala Programming Language -
FACTORIE
A toolkit for deployable probabilistic modeling, implemented as a software library in Scala. -
Squants
The Scala API for Quantities, Units of Measure and Dimensional Analysis. -
ND4S
N-Dimensional arrays and linear algebra for Scala with an API similar to Numpy. ND4S is a scala wrapper around ND4J. -
Libra
Libra is a dimensional analysis library based on shapeless, spire and singleton-ops. It contains out of the box support for SI units for all numeric types. -
OpenMOLE
Workflow engine for exploration of simulation models using high performance computing -
Optimus * 96
Optimus is a library for Linear and Quadratic mathematical optimization written in Scala programming language. -
Clustering4Ever
Scala and Spark API to benchmark and analyse clustering algorithms on any vectorization you can generate -
rscala
The Scala interpreter is embedded in R and callbacks to R from the embedded interpreter are supported. Conversely, the R interpreter is embedded in Scala. -
Tyche
Probability distributions, stochastic & Markov processes, lattice walks, simple random sampling. A simple yet robust Scala library. -
MGO
Modular multi-objective evolutionary algorithm optimization library enforcing immutability. -
Rings
An efficient library for polynomial rings. Commutative algebra, polynomial GCDs, polynomial factorization and other sci things at a really high speed. -
SwiftLearner
Simply written algorithms to help study Machine Learning or write your own implementations.
Scout APM - Leading-edge performance monitoring starting at $39/month
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest. Visit our partner's website for more details.
Do you think we are missing an alternative of BigDL or a related project?
Popular Comparisons
README
BigDL: Distributed Deep Learning on Apache Spark
What is BigDL?
BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can directly run on top of existing Spark or Hadoop clusters. To makes it easy to build Spark and BigDL applications, a high level Analytics Zoo is provided for end-to-end analytics + AI pipelines.
Rich deep learning support. Modeled after Torch, BigDL provides comprehensive support for deep learning, including numeric computing (via Tensor) and high level neural networks; in addition, users can load pre-trained Caffe or Torch models into Spark programs using BigDL.
Extremely high performance. To achieve high performance, BigDL uses Intel MKL / Intel MKL-DNN and multi-threaded programming in each Spark task. Consequently, it is orders of magnitude faster than out-of-box open source Caffe, Torch or TensorFlow on a single-node Xeon (i.e., comparable with mainstream GPU). With adoption of Intel DL Boost, BigDL improves inference latency and throughput significantly.
Efficiently scale-out. BigDL can efficiently scale out to perform data analytics at "Big Data scale", by leveraging Apache Spark (a lightning fast distributed data processing framework), as well as efficient implementations of synchronous SGD and all-reduce communications on Spark.
Why BigDL?
You may want to write your deep learning programs using BigDL if:
You want to analyze a large amount of data on the same Big Data (Hadoop/Spark) cluster where the data are stored (in, say, HDFS, HBase, Hive, etc.).
You want to add deep learning functionalities (either training or prediction) to your Big Data (Spark) programs and/or workflow.
You want to leverage existing Hadoop/Spark clusters to run your deep learning applications, which can be then dynamically shared with other workloads (e.g., ETL, data warehouse, feature engineering, classical machine learning, graph analytics, etc.)
How to use BigDL?
For the technical overview of BigDL, please refer to the BigDL white paper
More information can be found at the BigDL project website:
https://bigdl-project.github.io/
In particular, you can check out the Getting Started page for a quick overview of how to use BigDL
For step-by-step deep leaning tutorials on BigDL (using Python), you can check out the BigDL Tutorials project
You can join the BigDL Google Group (or subscribe to the Mail List) for more questions and discussions on BigDL
You can post bug reports and feature requests at the Issue Page
You may refer to Analytics Zoo for high level pipeline APIs, built-in deep learning models, reference use cases, etc. on Spark and BigDL
Citing BigDL
If you've found BigDL useful for your project, you can cite the paper as follows:
@inproceedings{SOCC2019_BIGDL,
title={BigDL: A Distributed Deep Learning Framework for Big Data},
author={Dai, Jason (Jinquan) and Wang, Yiheng and Qiu, Xin and Ding, Ding and Zhang, Yao and Wang, Yanzhang and Jia, Xianyan and Zhang, Li (Cherry) and Wan, Yan and Li, Zhichao and Wang, Jiao and Huang, Shengsheng and Wu, Zhongyuan and Wang, Yang and Yang, Yuhao and She, Bowen and Shi, Dongjie and Lu, Qi and Huang, Kai and Song, Guoqiong},
booktitle={Proceedings of the ACM Symposium on Cloud Computing},
publisher={Association for Computing Machinery},
pages={50--60},
year={2019},
series={SoCC'19},
doi={10.1145/3357223.3362707},
url={https://arxiv.org/pdf/1804.05839.pdf}
}