Popularity

9.3

Stable

Activity

1.7

Stars 2,118

Watchers 295

Forks 259

Last Commit about 2 years ago

Programming language: Scala

License: Apache License 2.0

Tags: Big Data

Latest version: v0.11.0-RC1

Summingbird alternatives and similar packages

Based on the "Big Data" category.
Alternatively, view Summingbird alternatives based on common mentions on social networks and blogs.

Kafka

10.0 9.9 L2 Summingbird VS Kafka

Mirror of Apache Kafka
Apache Spark

10.0 10.0 Summingbird VS Apache Spark

Apache Spark - A unified analytics engine for large-scale data processing

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

Promo www.influxdata.com

Deeplearning4J

9.9 6.5 L1 Summingbird VS Deeplearning4J

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Flink

9.9 9.9 L2 Summingbird VS Flink

Apache Flink
Scalding

9.6 2.5 Summingbird VS Scalding

A Scala API for Cascading
Scio

9.3 9.6 Summingbird VS Scio

A Scala API for Apache Beam and Google Cloud Dataflow.
Reactive-kafka

8.9 8.2 Summingbird VS Reactive-kafka

Alpakka Kafka connector - Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.
Jupyter Scala

8.7 9.0 Summingbird VS Jupyter Scala

A Scala kernel for Jupyter
BIDMach

8.3 0.0 Summingbird VS BIDMach

CPU and GPU-accelerated Machine Learning Library
Hail

8.3 9.8 Summingbird VS Hail

Cloud-native genomic dataframes and batch computing
Gearpump

8.1 0.0 Summingbird VS Gearpump

Lightweight real-time big data streaming engine over Akka
Sparkta

8.0 0.0 Summingbird VS Sparkta

Real Time Analytics and Data Pipelines based on Spark Streaming
Vegas

7.5 0.0 Summingbird VS Vegas

The missing MatPlotLib for Scala + Spark
metorikku

7.4 2.4 Summingbird VS metorikku

A simplified, lightweight ETL Framework based on Apache Spark
Scoobi

6.8 0.0 Summingbird VS Scoobi

A Scala productivity framework for Hadoop.
Scrunch

5.1 1.4 L3 Summingbird VS Scrunch

Mirror of Apache Crunch (Incubating)
Scoozie

4.7 0.0 Summingbird VS Scoozie

Scala DSL on top of Oozie XML
Schemer

3.5 0.0 Summingbird VS Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
spark-deployer

3.4 0.0 Summingbird VS spark-deployer

Deploy Spark cluster in an easy way.
GridScale

2.2 6.6 Summingbird VS GridScale

Scala library for accessing various file, batch systems, job schedulers and grid middlewares.
raster-frames

2.1 0.0 Summingbird VS raster-frames

Spark DataFrames for earth observation data
Spark Utils

2.0 3.8 Summingbird VS Spark Utils

Basic framework utilities to quickly start writing production ready Apache Spark applications
Sparkplug

1.8 0.0 Summingbird VS Sparkplug

Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Shadoop

1.3 0.0 Summingbird VS Shadoop

A wrapper for Hadoop in Scala
Spark Tools

1.0 0.0 Summingbird VS Spark Tools

Executable Apache Spark Tools: Format Converter & SQL Processor

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of Summingbird or a related project?

Add another 'Big Data' Package

Popular Comparisons

README

Summingbird

Summingbird is a library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce platforms, including Storm and Scalding.

While a word-counting aggregation in pure Scala might look like this:

  def wordCount(source: Iterable[String], store: MutableMap[String, Long]) =
    source.flatMap { sentence =>
      toWords(sentence).map(_ -> 1L)
    }.foreach { case (k, v) => store.update(k, store.get(k) + v) }

Counting words in Summingbird looks like this:

  def wordCount[P <: Platform[P]]
    (source: Producer[P, String], store: P#Store[String, Long]) =
      source.flatMap { sentence =>
        toWords(sentence).map(_ -> 1L)
      }.sumByKey(store)

The logic is exactly the same, and the code is almost the same. The main difference is that you can execute the Summingbird program in "batch mode" (using Scalding), in "realtime mode" (using Storm), or on both Scalding and Storm in a hybrid batch/realtime mode that offers your application very attractive fault-tolerance properties.

Summingbird provides you with the primitives you need to build rock solid production systems.

Getting Started: Word Count with Twitter

The summingbird-example project allows you to run the wordcount program above on a sample of Twitter data using a local Storm topology and memcache instance. You can find the actual job definition in ExampleJob.scala.

First, make sure you have memcached installed locally. If not, if you're on OS X, you can get it by installing Homebrew and running this command in a shell:

brew install memcached

When this is finished, run the memcached command in a separate terminal.

Now you'll need to set up access to the Twitter Streaming API. This blog post has a great walkthrough, so open that page, head over to https://dev.twitter.com/ and get your various keys and tokens. Once you have these, clone the Summingbird repository:

git clone https://github.com/twitter/summingbird.git
cd summingbird

And open StormRunner.scala in your editor. Replace the dummy variables under config variable with your auth tokens:

lazy val config = new ConfigurationBuilder()
    .setOAuthConsumerKey("mykey")
    .setOAuthConsumerSecret("mysecret")
    .setOAuthAccessToken("token")
    .setOAuthAccessTokenSecret("tokensecret")
    .setJSONStoreEnabled(true) // required for JSON serialization
    .build

You're all ready to go! Now it's time to unleash Storm on your Twitter stream. Make sure the memcached terminal is still open, then start Storm from the summingbird directory:

./sbt "summingbird-example/run --local"

Storm should puke out a bunch of output, then stabilize and hang. This means that Storm is updating your local memcache instance with counts of every word that it sees in each tweet.

To query the aggregate results in Memcached, you'll need to open an SBT repl in a new terminal:

./sbt summingbird-example/console

At the launched repl, run the following:

scala> import com.twitter.summingbird.example._
import com.twitter.summingbird.example._

scala> StormRunner.lookup("i")
<memcache store loading elided>
res0: Option[Long] = Some(5)

scala> StormRunner.lookup("i")
res1: Option[Long] = Some(52)

Boom. Counts for the word "i" are growing in realtime.

See the wiki page for a more detailed explanation of the configuration required to get this job up and running and some ideas for where to go next.

Community and Documentation

This, and all github.com/twitter projects, are under the Twitter Open Source Code of Conduct. Additionally, see the Typelevel Code of Conduct for specific examples of harassing behavior that are not tolerated.

To learn more and find links to tutorials and information around the web, check out the Summingbird Wiki.

The latest ScalaDocs are hosted on Summingbird's Github Project Page.

Discussion occurs primarily on the Summingbird mailing list. Issues should be reported on the GitHub issue tracker. Simpler issues appropriate for first-time contributors looking to help out are tagged "newbie".

IRC: freenode channel #summingbird

Follow @summingbird on Twitter for updates.

Please feel free to use the beautiful Summingbird logo artwork anywhere.

Maven

Summingbird modules are published on maven central. The current groupid and version for all modules is, respectively, "com.twitter" and 0.9.1.

Current published artifacts are

summingbird-core_2.11
summingbird-core_2.10
summingbird-batch_2.11
summingbird-batch_2.10
summingbird-client_2.11
summingbird-client_2.10
summingbird-storm_2.11
summingbird-storm_2.10
summingbird-scalding_2.11
summingbird-scalding_2.10
summingbird-builder_2.11
summingbird-builder_2.10

The suffix denotes the scala version.

Authors (alphabetically)

Oscar Boykin https://twitter.com/posco
Ian O'Connell https://twitter.com/0x138
Sam Ritchie https://twitter.com/sritchie
Ashutosh Singhal https://twitter.com/daashu

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

*Note that all licence references and agreements mentioned in the Summingbird README section above are relevant to that project's source code only.

Summingbird

Streaming MapReduce with Scalding and Storm

Summingbird alternatives and similar packages

Popular Comparisons

README

Summingbird

Getting Started: Word Count with Twitter

Community and Documentation

Maven

Authors (alphabetically)

License