This project contains some basic runnable tools that can help with various tasks around an Apache Spark based project.
Spark Tools alternatives and similar packages
Based on the "Big Data" category.
Alternatively, view Spark Tools alternatives based on common mentions on social networks and blogs.
10.0 10.0 Spark Tools VS Apache SparkApache Spark - A unified analytics engine for large-scale data processing
9.9 9.9 L1 Spark Tools VS Deeplearning4JSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
8.9 8.3 Spark Tools VS Reactive-kafkaAlpakka Kafka connector - Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.
8.1 0.0 Spark Tools VS SparktaReal Time Analytics and Data Pipelines based on Spark Streaming
7.0 8.4 Spark Tools VS metorikkuA simplified, lightweight ETL Framework based on Apache Spark
3.5 0.0 Spark Tools VS SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
2.0 4.9 Spark Tools VS GridScaleScala library for accessing various file, batch systems, job schedulers and grid middlewares.
1.9 0.0 Spark Tools VS SparkplugSpark package to "plug" holes in data using SQL based rules ⚡️ 🔌
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.
Do you think we are missing an alternative of Spark Tools or a related project?
This project contains some basic runnable tools that can help with various tasks around a Spark based project.
The main tools available:
- [FormatConverter](docs/format-converter.md) Converts any acceptable file format into a different file format, providing also partitioning support.
- [SimpleSqlProcessor](docs/sql-processor.md) Applies a given SQL to the input files which are being mapped into tables.
- [StreamingFormatConverter](docs/streaming-format-converter.md) Converts any acceptable data stream format into a different data stream format, providing also partitioning support.
- [SimpleFileStreamingSqlProcessor](docs/file-streaming-sql-processor.md) Applies a given SQL to the input files streams which are being mapped into file output streams.
This project is also trying to create and encourage a friendly yet professional environment for developers to help each other, so please do no be shy and join through gitter, twitter, issue reports or pull requests.
- Java 8 or higher
- Scala 2.11 or 2.12
- Apache Spark 2.4.X
Getting Spark Tools
where the latest artifacts can be found.
- Group id / organization:
- Artifact id / name:
- Latest version is
Usage with SBT, adding a dependency to the latest version of tools to your sbt build definition file:
libraryDependencies += "org.tupol" %% "spark-tools" % "0.4.1"
Include this package in your Spark Applications using
with Scala 2.11
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.11:0.4.1
or with Scala 2.12
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.12:0.4.1
- The project compiles with both Scala
- Updated Apache Spark to
- Updated the
- Removed the
com.databricks:spark-avrodependency, as avro support is now built into Apache Spark
- Updated the
spark-utilsdependency to the latest available snapshot
For previous versions please consult the [release notes](RELEASE-NOTES.md).
This code is open source software licensed under the [MIT License](LICENSE).
*Note that all licence references and agreements mentioned in the Spark Tools README section above are relevant to that project's source code only.