Description
This project contains some basic runnable tools that can help with various tasks around an Apache Spark based project.
Spark Tools alternatives and similar packages
Based on the "Big Data" category.
Alternatively, view Spark Tools alternatives based on common mentions on social networks and blogs.
-
Kafka
Kafka is a message broker project and aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. -
Scrunch
A Scala wrapper for Apache Crunch which provides a framework for writing, testing, and running MapReduce pipelines. -
spark-deployer
A sbt plugin which helps deploying Apache Spark stand-alone cluster and submitting job on cloud system like AWS EC2. -
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API. -
Spark Utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
Get performance insights in less than 4 minutes
* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest. Visit our partner's website for more details.
Do you think we are missing an alternative of Spark Tools or a related project?
README
Spark Tools
Description
This project contains some basic runnable tools that can help with various tasks around a Spark based project.
The main tools available:
- [FormatConverter](docs/format-converter.md) Converts any acceptable file format into a different file format, providing also partitioning support.
- [SimpleSqlProcessor](docs/sql-processor.md) Applies a given SQL to the input files which are being mapped into tables.
- [StreamingFormatConverter](docs/streaming-format-converter.md) Converts any acceptable data stream format into a different data stream format, providing also partitioning support.
- [SimpleFileStreamingSqlProcessor](docs/file-streaming-sql-processor.md) Applies a given SQL to the input files streams which are being mapped into file output streams.
This project is also trying to create and encourage a friendly yet professional environment for developers to help each other, so please do no be shy and join through gitter, twitter, issue reports or pull requests.
Prerequisites
- Java 8 or higher
- Scala 2.11 or 2.12
- Apache Spark 2.4.X
Getting Spark Tools
Spark Tools is published to Maven Central and Spark Packages:
where the latest artifacts can be found.
- Group id / organization:
org.tupol
- Artifact id / name:
spark-tools
- Latest version is
0.4.1
Usage with SBT, adding a dependency to the latest version of tools to your sbt build definition file:
libraryDependencies += "org.tupol" %% "spark-tools" % "0.4.1"
Include this package in your Spark Applications using spark-shell
or spark-submit
with Scala 2.11
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.11:0.4.1
or with Scala 2.12
$SPARK_HOME/bin/spark-shell --packages org.tupol:spark-tools_2.12:0.4.1
What's new?
0.4.1
- Added
StreamingFormatConverter
- Added
FileStreamingSqlProcessor
,SimpleFileStreamingSqlProcessor
- Bumped
spark-utils
dependency to0.4.2
- The project compiles with both Scala
2.11.12
and2.12.12
- Updated Apache Spark to
2.4.6
- Updated
delta.io
to0.6.1
- Updated the
spark-xml
library to0.10.0
- Removed the
com.databricks:spark-avro
dependency, as avro support is now built into Apache Spark - Updated the
spark-utils
dependency to the latest available snapshot
For previous versions please consult the [release notes](RELEASE-NOTES.md).
License
This code is open source software licensed under the [MIT License](LICENSE).
*Note that all licence references and agreements mentioned in the Spark Tools README section above
are relevant to that project's source code only.