BigDL v0.5.0 Release Notes

Release Date: 2018-03-30 // about 6 years ago
  • Highlights

    • πŸ’… Bring in a Keras-like API(Scala and Python). User can easily run their Keras code (training and inference) on Apache Spark through BigDL. For more details, see this link.
    • Support load Tensorflow dynamic models(e.g. LSTM, RNN) in BigDL and support more Tensorflow operations, see this page.
    • πŸ‘Œ Support combining data preprocessing and neural network layers in the same model (to make model deployment easy )
    • Speedup various modules in BigDL (BCECriterion, rmsprop, LeakyRelu, etc.)
    • βž• Add DataFrame-based image reader and transformer

    πŸ†• New Features

    • Tensor can be converted to OpenCVMat
    • Bring in a new Keras-like API for scala and python
    • πŸ‘Œ Support load Tensorflow dynamic models(e.g. LSTM, RNN)
    • πŸ‘Œ Support load more Tensorflow operations(InvertPermutation, ConcatOffset, Exit, NextIteration, Enter, RefEnter, LoopCond, ControlTrigger, TensorArrayV3,TensorArrayGradV3, TensorArrayGatherV3, TensorArrayScatterV3, TensorArrayConcatV3, TensorArraySplitV3, TensorArrayReadV3, TensorArrayWriteV3, TensorArraySizeV3, StackPopV2, StackPop, StackPushV2, StackPush, StackV2, Stack)
    • πŸ‘ ResizeBilinear support NCHW
    • πŸ‘ ImageFrame support load Hadoop sequence file
    • πŸ‘ ImageFrame support gray image
    • βž• Add Kv2Tensor Operation(Scala)
    • βž• Add PGCriterion to compute the negative policy gradient given action distribution, sampled action and reward
    • πŸ‘Œ Support gradual increase learning rate in LearningrateScheduler
    • βž• Add FixExpand and add more options to AspectScale for image preprocessing
    • βž• Add RowTransformer(Scala)
    • πŸ‘Œ Support to add preprocessors to Graph, which allows user combine preprocessing and trainable model into one model
    • πŸ‘ Resnet on cifar-10 example support load images from HDFS
    • βž• Add CategoricalColHashBucket operation(Scala)
    • πŸ‘ Predictor support Table as output
    • βž• Add BucketizedCol operation(Scala)
    • πŸ‘Œ Support using DenseTensor and SparseTensor together to create Sample
    • βž• Add CrossProduct Layer (Scala)
    • πŸ‘» Provide an option to allow user bypass the exception in transformer
    • πŸ“œ DenseToSparse layer support disable backward propagation
    • βž• Add CategoricalColVocaList Operation(Scala)
    • πŸ‘Œ Support imageframe in python optimizer
    • πŸ‘Œ Support get executor number and executor cores in python
    • βž• Add IndicatorCol Operation(Scala)
    • βž• Add TensorOp, which is an operation with Tensor[T]-formatted input and output, and provides shortcuts to build Operations for tensor transformation by closures. (Scala)
    • 🐳 Provide a docker file to make it easily to setup testing environment of BigDL
    • βž• Add CrossCol Operation(Scala)
    • βž• Add MkString Operation(Scala)
    • βž• Add a prediction service interface for concurrent calls and accept bytes input
    • βž• Add SparseTensor.cast & SparseTensor.applyFun
    • βž• Add DataFrame-based image reader and transformer
    • πŸ‘Œ Support load tensoflow model files saved by tf.saved_model API
    • πŸ“œ SparseMiniBatch supporting multiple TensorDataTypes

    ✨ Enhancement

    • πŸ‘ ImageFrame support serialization
    • 0️⃣ A default implementation of zeroGradParameter is added to AbstractModule
    • πŸ‘Œ Improve the style of the document website
    • Models in different threads share weights in model training
    • Speed up leaky relu
    • Speed up Rmsprop
    • Speed up BCECriterion
    • πŸ‘Œ Support Calling Java Function in Python Executor and ModelBroadcast in Python
    • βž• Add detail instructions to run-on-ec2
    • ⚑️ Optimize padding mechanism
    • πŸ›  Fix maven compiling warnings
    • Check duplicate layers in the container
    • πŸš€ Refine the document which introduce how to automatically Deploy BigDL on Dataproc cluster
    • Refactor adding extra jars/python packages for python user. Now only need to set env variable BIGDL_JARS & BIGDL_PACKAGES
    • Implement appendColumn and avoid the error caused by API mismatch between different Spark version
    • βž• Add python inception training on ImageNet example
    • ⚑️ Update "can't find locality partition for partition ..." to warning message

    API change

    • πŸ“¦ Move DataFrame-based API to dlframe package
    • 🚚 Refine the Container hierarchy. The add method(used in Sequential, Concat…) is moved to a subclass DynamicContainer
    • Refine the serialization code hierarchy
    • Dynamic Graph has been an internal class which is only used to run tensorflow models
    • Operation is not allowed to use outside Graph
    • The getParamter method as final and private[bigdl], which should be only used in model training
    • βœ‚ remove the updateParameter method, which is only used in internal test
    • Some Tensorflow related operations are marked as internal, which should be only used when running Tensorflow models

    πŸ› Bug Fix

    • πŸ›  Fix Sparse sample batch bug. It should add another dimension instead of concat the original tensor
    • πŸ›  Fix some activation or layers don’t work in TimeDistributed and RnnCell
    • πŸ›  Fix a bug in SparseTensor resize method
    • πŸ›  Fix a bug when convert SparseTensor to DenseTensor
    • πŸ›  Fix a bug in SpatialFullConvolution
    • πŸ›  Fix a bug in Cosine equal method
    • πŸ›  Fix optimization state mess up when call optimizer.optimize() multiple times
    • πŸ›  Fix a bug in Recurrent forward after invoking reset
    • πŸ›  Fix a bug in inplace leakyrelu
    • πŸ›  Fix a bug when save/load bi-rnn layers
    • πŸ›  Fix getParameters() in submodule will create new storage when parameters has been shared by parent module
    • πŸ›  Fix some incompatible syntax between python 2.7 and 3.6
    • πŸ›  Fix save/load graph will loss stop gradient information
    • πŸ›  Fix a bug in SReLU
    • πŸ›  Fix a bug in DLModel
    • πŸ›  Fix sparse tensor dot product bug
    • πŸ›  Fix Maxout ser issue
    • πŸ›  Fix some serialization issue in some customized faster rcnn model
    • πŸ›  Fix and refine some example document instructions
    • Fix a bug in export_tf_checkpoint.py script
    • πŸ›  Fix a bug in set up python package.
    • πŸ›  Fix picklers initialization issues
    • πŸ›  Fix some race condition issue in Spark 1.6 when broadcasting model
    • πŸ›  Fix Model.load in python return type is wrong
    • πŸ›  Fix a bug when use pyspark-with-bigdl.sh to run jobs on Yarn
    • πŸ›  Fix empty tensor call size and stride not throw null exception