All Versions
5
Latest Version
Avg Release Cycle
-
Latest Release
-
Changelog History
Changelog History
-
v0.6 Changes
0.6.2
- π Fixed
core
dependency toscala-utils
; now usingscala-utils-core
- Refactored the
core
/implicits
package to make the implicits a little more explicit
0.6.1
- π Small dependencies and documentation improvements
- π The documentation needs to be further reviewed
- The project is split into two modules:
spark-utils-core
andspark-utils-io
- π The project moved to Apache Spark 3.0.1, which is a popular choice for the Databricks Cluster users
- The project is only compiled on Scala 2.12
- π» There is a major redesign of core components, mainly returning
Try[_]
for better exception handling - β‘οΈ Dependencies updates
- π Fixed
-
v0.5 Changes
N/A
-
v0.4 Changes
0.4.2
- The project compiles with both Scala
2.11.12
and2.12.12
- β‘οΈ Updated Apache Spark to
2.4.6
- β‘οΈ Updated the
spark-xml
library to0.10.0
- β Removed the
com.databricks:spark-avro
dependency, as avro support is now built into Apache Spark - β Removed the shadow
org.apache.spark.Loggin
class, which is replaced by theorg.tupol.spark.Loggign
knock-off
0.4.1
- β Added [
SparkFun
](docs/spark-fun.md), a convenience wrapper around [SparkApp
](docs/spark-app.md) that makes the code even more concise - β Added
FormatType.Custom
so any format types are accepted, but of course, not any random format type will work, but now other formats likedelta
can be configured and used - β Added
GenericSourceConfiguration
(replacing the old privateBasicConfiguration
) andGenericDataSource
- β Added
GenericSinkConfiguration
,GenericDataSink
andGenericDataAwareSink
- β Removed the short
βavroβ
format as it will be included in Spark 2.4 - β Added format validation to
FileSinkConfiguration
- β Added [generic-data-source.md](docs/generic-data-source.md) and [generic-data-sink.md](docs/generic-data-sink.md) docs
0.4.0
- β Added the
StreamingConfiguration
marker trait - β Added
GenericStreamDataSource
,FileStreamDataSource
andKafkaStreamDataSource
- β Added
GenericStreamDataSink
,FileStreamDataSink
andKafkaStreamDataSink
- β Added
FormatAwareStreamingSourceConfiguration
andFormatAwareStreamingSinkConfiguration
- Extracted
TypesafeConfigBuilder
- API Changes: Added a new type parameter to the
DataSink
that describes the type of the output - π Improved unit test coverage
- The project compiles with both Scala
-
v0.3 Changes
0.3.2
- β Added support for bucketing in data sinks
- π Improved the community resources
0.3.1
- β Added configuration variable substitution support
0.3.0
- Split
SparkRunnable
intoSparkRunnable
andSparkApp
- Changed the
SparkRunnable
API; nowrun()
returnsResult
instead ofTry[Result]
- Changed the
SparkApp
API; nowbuildConfig()
was renamed tocreateContext()
and now it returnsContext
instead ofTry[Context]
- Changed the
DataSource
API; nowread()
returnsDataFrame
instead ofTry[DataFrame]
- Changed the
DataSink
API; nowwrite()
returnsDataFrame
instead ofTry[DataFrame]
- Small documentation improvements
-
v0.2 Changes
0.2.0
- Added
DataSource
andDataSink
IO frameworks - Added
FileDataSource
andFileDataSink
IO frameworks - Added
JdbcDataSource
andJdbcDataSink
IO frameworks - Moved all useful implicit conversions into
org.tupol.spark.implicits
- Added testing utilities under
org.tupol.spark.testing
- Added