Spark Utils v0.4 Release Notes

  • 0.4.2

    • The project compiles with both Scala 2.11.12 and 2.12.12
    • ⚡️ Updated Apache Spark to 2.4.6
    • ⚡️ Updated the spark-xml library to 0.10.0
    • ✂ Removed the com.databricks:spark-avro dependency, as avro support is now built into Apache Spark
    • ✂ Removed the shadow org.apache.spark.Loggin class, which is replaced by the org.tupol.spark.Loggign knock-off

    0.4.1

    • ➕ Added [SparkFun](docs/spark-fun.md), a convenience wrapper around [SparkApp](docs/spark-app.md) that makes the code even more concise
    • ➕ Added FormatType.Custom so any format types are accepted, but of course, not any random format type will work, but now other formats like delta can be configured and used
    • ➕ Added GenericSourceConfiguration (replacing the old private BasicConfiguration) and GenericDataSource
    • ➕ Added GenericSinkConfiguration, GenericDataSink and GenericDataAwareSink
    • ✂ Removed the short ”avro” format as it will be included in Spark 2.4
    • ➕ Added format validation to FileSinkConfiguration
    • ➕ Added [generic-data-source.md](docs/generic-data-source.md) and [generic-data-sink.md](docs/generic-data-sink.md) docs

    0.4.0

    • ➕ Added the StreamingConfiguration marker trait
    • ➕ Added GenericStreamDataSource, FileStreamDataSource and KafkaStreamDataSource
    • ➕ Added GenericStreamDataSink, FileStreamDataSink and KafkaStreamDataSink
    • ➕ Added FormatAwareStreamingSourceConfiguration and FormatAwareStreamingSinkConfiguration
    • Extracted TypesafeConfigBuilder
    • API Changes: Added a new type parameter to the DataSink that describes the type of the output
    • 👌 Improved unit test coverage