Popularity

1.3

Stable

Activity

0.0

Stable

Stars 10

Watchers 4

Forks 3

Last Commit over 9 years ago

Programming language: Scala

License: Apache License 2.0

Tags: Big Data

Latest version: v1.0

Shadoop alternatives and similar packages

Based on the "Big Data" category.
Alternatively, view Shadoop alternatives based on common mentions on social networks and blogs.

Apache Spark

10.0 10.0 Shadoop VS Apache Spark

Apache Spark - A unified analytics engine for large-scale data processing
Kafka

10.0 9.9 L2 Shadoop VS Kafka

Mirror of Apache Kafka

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

Promo workos.com

Flink

9.9 9.9 L2 Shadoop VS Flink

Apache Flink
Deeplearning4J

9.9 6.5 L1 Shadoop VS Deeplearning4J

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learning using automatic differentiation.
Scalding

9.6 2.5 Shadoop VS Scalding

A Scala API for Cascading
Summingbird

9.3 1.7 Shadoop VS Summingbird

DISCONTINUED. Streaming MapReduce with Scalding and Storm
Scio

9.3 9.6 Shadoop VS Scio

A Scala API for Apache Beam and Google Cloud Dataflow.
Reactive-kafka

8.9 8.2 Shadoop VS Reactive-kafka

Alpakka Kafka connector - Alpakka is a Reactive Enterprise Integration library for Java and Scala, based on Reactive Streams and Akka.
Jupyter Scala

8.7 9.0 Shadoop VS Jupyter Scala

A Scala kernel for Jupyter
BIDMach

8.3 0.0 Shadoop VS BIDMach

CPU and GPU-accelerated Machine Learning Library
Hail

8.3 9.8 Shadoop VS Hail

Cloud-native genomic dataframes and batch computing
Gearpump

8.1 0.0 Shadoop VS Gearpump

Lightweight real-time big data streaming engine over Akka
Sparkta

8.0 0.0 Shadoop VS Sparkta

Real Time Analytics and Data Pipelines based on Spark Streaming
Vegas

7.5 0.0 Shadoop VS Vegas

The missing MatPlotLib for Scala + Spark
metorikku

7.4 2.4 Shadoop VS metorikku

A simplified, lightweight ETL Framework based on Apache Spark
Scoobi

6.8 0.0 Shadoop VS Scoobi

A Scala productivity framework for Hadoop.
Scrunch

5.1 1.4 L3 Shadoop VS Scrunch

DISCONTINUED. Mirror of Apache Crunch (Incubating)
Scoozie

4.7 0.0 Shadoop VS Scoozie

Scala DSL on top of Oozie XML
Schemer

3.5 0.0 Shadoop VS Schemer

Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
spark-deployer

3.4 0.0 Shadoop VS spark-deployer

Deploy Spark cluster in an easy way.
GridScale

2.2 6.6 Shadoop VS GridScale

Scala library for accessing various file, batch systems, job schedulers and grid middlewares.
raster-frames

2.1 0.0 Shadoop VS raster-frames

DISCONTINUED. Spark DataFrames for earth observation data
Spark Utils

2.0 3.8 Shadoop VS Spark Utils

Basic framework utilities to quickly start writing production ready Apache Spark applications
Sparkplug

1.8 0.0 Shadoop VS Sparkplug

Spark package to "plug" holes in data using SQL based rules ⚡️ 🔌
Spark Tools

1.0 0.0 Shadoop VS Spark Tools

Executable Apache Spark Tools: Format Converter & SQL Processor

* Code Quality Rankings and insights are calculated and provided by Lumnify.
They vary from L1 to L5 with "L5" being the highest.

Do you think we are missing an alternative of Shadoop or a related project?

Add another 'Big Data' Package

Popular Comparisons

README

Shadoop

A Hadoop DSL and lightweight wrapper for Scala

This fork of ScalaHadop is mostly just cherry-picked commits from the forks by @hito-asa, @ivmaykov and @oscarrenalis, of the original work by @bsdfish. In addition there are a few extra features and a cleaned up Maven build.

This code provides some syntactic sugar on top of Hadoop in order to make it more usable from Scala. Take a look at src/main/scala/net/renalias/scoop/examples/WordCount.scala for more details.

License

Apache License, Version 2.0

Usage

Basic Usage

A basic mapper looks like:

val mapper = new Mapper[LongWritable, Text, Text, LongWritable] {
    mapWith {
        (k, v) =>
            (v split " |\t").map(x => (new Text(x), new LongWritable(1L))).toList
    }
}

a reducer looks like this:

val reducer = new Reducer[Text, LongWritable, Text, LongWritable] {
    reduceWith {
        (k, v) =>
            List((k, (0L /: v)((total, next) => total + next)))
    }
}

and, the pipeline to bind them together may look like this:

TextInput[LongWritable, Text]("/tmp/input.txt") -->
MapReduceTask(mapper, reducer, "Word Count")    -->
TextOutput[Text, LongWritable]("/tmp/output")   execute

The key difference here between standard mappers and reducers is that the map and reduce parts are written as side-effect free functions that accept a key and a value, and return an iterable; code behind the scenes will take care of updating Hadoop's Context object.

Some note still remains to be done to polish the current interface, to remove things like .toList from the mapper and the creation of Hadoop's specific Text and LongWritable objects.

Note that implicit conversion is used to convert between LongWritable and longs, as well as Text and Strings. The types of the input and output parameters only need to be stated as the generic specializers of the class it extends.

These mappers and reducers can be chained together with the --> operator:

object WordCount extends ScalaHadoop {
  def run(args: Array[String]) : Int = {
    TextInput[LongWritable, Text](args(0)) -->
    MapReduceTask(mapper, reducer, "Main task") -->
    TextOutput[Text, LongWritable](args(1)) execute

    0 //result code
  }
}

Multiple map/reduce

Multiple map/reduce runs may be chained together:

object WordsWithSameCount extends ScalaHadoop {
  def run(args: Array[String]) : Int = {
    TextInput[LongWritable, Text](args(0)) -->
    MapReduceTask(tokenizerMap1, sumReducer, "Sum") -->
    MapReduceTask(flipKeyValueMap, wordListReducer, "Reduce") -->
    TextOutput[LongWritable, Text](args(1)) execute

    0 //result code
  }
}

Contributors

Alex Simma: Developer of original version of ScalaHadoop. https://github.com/bsdfish/ScalaHadoop
ASAI Hitoshi: Cherry-picked - Code re-organisation and initial Maven build. https://github.com/hiti-asa/ScalaHadoop
Ilya Maykov: Cherry-picked - Various fixes, and support for Multiple Input Paths. https://github.com/ivmaykov/ScalaHadoop
Oscar Renalias: Cherry-picked - Scala Syntax improvements. https://github.com/oscarrenalias/ScalaHadoop
Rob Walpole: Various bug fixes: https://github.com/rwalpole/ScalaHadoop