All Versions
5
Latest Version
0.6
Avg Release Cycle
-
Latest Release
-

Changelog History

  • v0.6 Changes

    0.6.2

    • πŸ›  Fixed core dependency to scala-utils; now using scala-utils-core
    • Refactored the core/implicits package to make the implicits a little more explicit

    0.6.1

    • πŸ“š Small dependencies and documentation improvements
    • πŸ“š The documentation needs to be further reviewed
    • The project is split into two modules: spark-utils-core and spark-utils-io
    • 🚚 The project moved to Apache Spark 3.0.1, which is a popular choice for the Databricks Cluster users
    • The project is only compiled on Scala 2.12
    • πŸ‘» There is a major redesign of core components, mainly returning Try[_] for better exception handling
    • ⚑️ Dependencies updates
  • v0.4 Changes

    0.4.2

    • The project compiles with both Scala 2.11.12 and 2.12.12
    • ⚑️ Updated Apache Spark to 2.4.6
    • ⚑️ Updated the spark-xml library to 0.10.0
    • βœ‚ Removed the com.databricks:spark-avro dependency, as avro support is now built into Apache Spark
    • βœ‚ Removed the shadow org.apache.spark.Loggin class, which is replaced by the org.tupol.spark.Loggign knock-off

    0.4.1

    • βž• Added [SparkFun](docs/spark-fun.md), a convenience wrapper around [SparkApp](docs/spark-app.md) that makes the code even more concise
    • βž• Added FormatType.Custom so any format types are accepted, but of course, not any random format type will work, but now other formats like delta can be configured and used
    • βž• Added GenericSourceConfiguration (replacing the old private BasicConfiguration) and GenericDataSource
    • βž• Added GenericSinkConfiguration, GenericDataSink and GenericDataAwareSink
    • βœ‚ Removed the short ”avro” format as it will be included in Spark 2.4
    • βž• Added format validation to FileSinkConfiguration
    • βž• Added [generic-data-source.md](docs/generic-data-source.md) and [generic-data-sink.md](docs/generic-data-sink.md) docs

    0.4.0

    • βž• Added the StreamingConfiguration marker trait
    • βž• Added GenericStreamDataSource, FileStreamDataSource and KafkaStreamDataSource
    • βž• Added GenericStreamDataSink, FileStreamDataSink and KafkaStreamDataSink
    • βž• Added FormatAwareStreamingSourceConfiguration and FormatAwareStreamingSinkConfiguration
    • Extracted TypesafeConfigBuilder
    • API Changes: Added a new type parameter to the DataSink that describes the type of the output
    • πŸ‘Œ Improved unit test coverage
  • v0.3 Changes

    0.3.2

    • βž• Added support for bucketing in data sinks
    • πŸ‘Œ Improved the community resources

    0.3.1

    • βž• Added configuration variable substitution support

    0.3.0

    • Split SparkRunnable into SparkRunnable and SparkApp
    • Changed the SparkRunnable API; now run() returns Result instead of Try[Result]
    • Changed the SparkApp API; now buildConfig() was renamed to createContext() and now it returns Context instead of Try[Context]
    • Changed the DataSource API; now read() returns DataFrame instead of Try[DataFrame]
    • Changed the DataSink API; now write() returns DataFrame instead of Try[DataFrame]
    • Small documentation improvements
  • v0.2 Changes

    0.2.0

    • Added DataSource and DataSink IO frameworks
    • Added FileDataSource and FileDataSink IO frameworks
    • Added JdbcDataSource and JdbcDataSink IO frameworks
    • Moved all useful implicit conversions into org.tupol.spark.implicits
    • Added testing utilities under org.tupol.spark.testing