Popularity

8.0

Stable

Activity

0.0

Stable

Stars 524

Watchers 138

Forks 197

Last Commit over 4 years ago

Programming language: Scala

License: Apache License 2.0

Tags: Big Data

Latest version: v1.5.0

Sparkta alternatives and similar packages

Based on the "Big Data" category.
Alternatively, view Sparkta alternatives based on common mentions on social networks and blogs.

Apache Spark

10.0 10.0 Sparkta VS Apache Spark

Apache Spark - A unified analytics engine for large-scale data processing
Kafka

10.0 9.9 L2 Sparkta VS Kafka

Mirror of Apache Kafka

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

Flink

9.9 9.9 L2 Sparkta VS Flink

Apache Flink
Deeplearning4J

9.9 6.5 L1 Sparkta VS Deeplearning4J

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Scalding

9.6 2.5 Sparkta VS Scalding

A Scala API for Cascading
Summingbird

9.3 1.7 Sparkta VS Summingbird

DISCONTINUED. Streaming MapReduce with Scalding and Storm
Scio

9.3 9.6 Sparkta VS Scio

A Scala API for Apache Beam and Google Cloud Dataflow.
Reactive-kafka

8.9 8.2 Sparkta VS Reactive-kafka

Alpakka Kafka connector - Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.
Jupyter Scala

8.7 9.0 Sparkta VS Jupyter Scala

A Scala kernel for Jupyter
BIDMach

8.3 0.0 Sparkta VS BIDMach

CPU and GPU-accelerated Machine Learning Library
Hail

8.3 9.8 Sparkta VS Hail

Cloud-native genomic dataframes and batch computing
Gearpump

8.1 0.0 Sparkta VS Gearpump

Lightweight real-time big data streaming engine over Akka
Vegas

7.5 0.0 Sparkta VS Vegas

The missing MatPlotLib for Scala + Spark
metorikku

7.4 2.4 Sparkta VS metorikku

A simplified, lightweight ETL Framework based on Apache Spark
Scoobi

6.8 0.0 Sparkta VS Scoobi

A Scala productivity framework for Hadoop.
Scrunch

5.1 1.4 L3 Sparkta VS Scrunch

DISCONTINUED. Mirror of Apache Crunch (Incubating)
Scoozie

4.7 0.0 Sparkta VS Scoozie

Scala DSL on top of Oozie XML
Schemer

3.5 0.0 Sparkta VS Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
spark-deployer

3.4 0.0 Sparkta VS spark-deployer

Deploy Spark cluster in an easy way.
GridScale

2.2 6.6 Sparkta VS GridScale

Scala library for accessing various file, batch systems, job schedulers and grid middlewares.
raster-frames

2.1 0.0 Sparkta VS raster-frames

DISCONTINUED. Spark DataFrames for earth observation data
Spark Utils

2.0 4.6 Sparkta VS Spark Utils

Basic framework utilities to quickly start writing production ready Apache Spark applications
Sparkplug

1.8 0.0 Sparkta VS Sparkplug

Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Shadoop

1.3 0.0 Sparkta VS Shadoop

A wrapper for Hadoop in Scala
Spark Tools

1.0 0.0 Sparkta VS Spark Tools

Executable Apache Spark Tools: Format Converter & SQL Processor

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of Sparkta or a related project?

Add another 'Big Data' Package

Popular Comparisons

README

Discontinued

After around two years of development, we have decided to discontinue this project due to a major refactor in its structure and in a near future we will launch Sparta 2.0.

We would like to thank all the open source community for their contribution. Needless to say that you can continue using this repository as a basis for your developments as it contains the latest stable version as of today and minor issues will be attended.

If you are interested in the new Sparta 2.0 with pipelines and workflows, please contact with us in the email [email protected]

About Stratio Sparta

At Stratio, we have implemented several real-time analytics projects based on Apache Spark, Kafka, Flume, Cassandra, ElasticSearch or MongoDB. These technologies were always a perfect fit, but soon we found ourselves writing the same pieces of integration code over and over again. Stratio Sparta is the easiest way to make use of the Apache Spark Streaming technology and all its ecosystem. Choose your input, operations and outputs, and start extracting insights out of your data in real-time.

Main Features

Pure Spark
No need of coding, only declarative analytical workflows
Data continuously streamed in & processed in near real-time
Ready to use out-of-the-box
Plug & play: flexible workflows (inputs, outputs, transformations, etc…)
High performance and Fault Tolerance
Scalable and High Availability
Big Data OLAP on real-time to small data
ETLs
Triggers over streaming data
Spark SQL language with streaming and batch data
Kerberos and CAS compatible

Architecture

Send one workflow as a JSON to Sparta API and execute in one Spark Cluster your own real-time plugins [Architecture](./images/architecture.jpg)

Sparta as a Job Manager

Send more than one Streaming Job in the Spark Cluster and manage them with a simple UI

Run workflows over Mesos, Yarn or SparkStandAlone

Sparta as a SDK

Modular components extensible with simple SDK

You can extend several points of the platform to fulfill your needs, such as adding new inputs, outputs, operators, transformations.
Add new functions to Kite SDK in order to extend the data cleaning, enrichment and normalization capabilities. [Architecture Detail](./images/architectureDetail.jpg)

Components

On each workflow multiple components can be defined, but now all have the following architecture [workflow](./images/workflow.jpg) [Components](./images/components.jpg)

Core components

Several plugins are been implemented by Stratio Sparta team [Main plugins](./images/plugins.jpg)

Trigger component

With Sparta is possible to execute queries over the streaming data, execute ETL, aggregations and Simple Event Processing mixing streaming data with batch data on the trigger process. [triggers](./images/triggers.jpg)

Aggregation component

The aggregation process in Sparta is very powerful because is possible to generate efficient OLAP processes with streaming data [OLAP](./images/OLAPintegration.jpg)

Advanced feature are been implemented in order to optimize the stateful operations over Spark Streaming [Aggregations](./images/aggregation.jpg)

Inputs

Twitter
Kafka
Flume
RabbitMQ
Socket
WebSocket
HDFS/S3

Outputs

MongoDB
Cassandra
ElasticSearch
Redis
JDBC
CSV
Parquet
Http
Kafka
HDFS/S3
Http Rest
Avro
Logger

[Outputs](./images/outputs.png)

Key technologies

Advantages

Sparta provide several advantages to final Users [Advantages](./images/features.jpg)

Build

You can generate rpm and deb packages by running:

mvn clean package -Ppackage

Note: you need to have installed the following programs in order to build these packages:

In a debian distribution:

fakeroot
dpkg-dev
rpm
jq

In a centOS distribution:

fakeroot
dpkg-dev
rpmdevtools
jq

In all distributions:

Java 8
Maven 3

License

Licensed to STRATIO (C) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The STRATIO (C) licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

*Note that all licence references and agreements mentioned in the Sparkta README section above are relevant to that project's source code only.

Sparkta

Real Time Analytics and Data Pipelines based on Spark Streaming