scala-scraper v2.0.0 Release Notes

Release Date: 2017-07-21 // over 6 years ago
    • ๐Ÿ’ฅ Breaking changes
      • Extracting using a CSS query string as extractor will now extract elements instead of text. This allows easier chaining of extractors and CSS selectors and fits more nicely the current extractor model. The old behavior can be recovered by wrapping the CSS query string in the texts content extractor, e.g. doc >> texts("myQuery");
      • HtmlExtractor, HtmlValidator and ElementQuery now have an additional type parameter for the type of Element they work on. If you have custom instances of one of those classes, filling the missing parameter with Element (which is a superclass of all elements) should be enough for them to work with all source code using scala-scraper 1.x;
      • Methods for loading extractors and validators from a config were extracted to a separate module. In order to use them users must add scala-scraper-config to their SBT dependencies and import net.ruippeixotog.scalascraper.config.dsl.DSL._;
      • The implicit conversion of Validated/Either to a RightProjection in order to expose foreach, map and flatMap in for comprehensions was moved to a separate object that is not imported together with the DSL. Either upgrade to Scala 2.12 (in which Either is already right-biased) or import the new net.ruippeixotog.scalascraper.util.EitherRightBias support object;
    • ๐Ÿ—„ Deprecations
      • SimpleExtractor and SimpleValidator are now deprecated. The classes remain available for the time being, but DSL methods that returned those classes now return only HtmlExtractor and HtmlValidator instances;
      • The Validated type alias is now deprecated. Users should now use Either, Right and Left directly;
      • The asDate content parser was deprecated in favor of asLocalDate and asDateTime;
      • The DSL validation operator ~/~ was renamed to >/~ in order to have the same precedence as the extraction operators >> and >?>;
      • The and DSL operator is deprecated and will be removed in future versions;
    • ๐Ÿ†• New features
      • The concrete type of the models in scala-scraper is now passed down from the Browser to Element instances extracted from documents. This allows users to use features unique of each browser (such as modifying or interacting with elements) while still using the scala-scraper DSL to exteact and query them;
      • HtmlExtractor[E, A] is now a proper instance of ElementQuery[E] => A and have map and mapQuery methods to map the extraction results and the preceding query, respectively;
      • Content extractors, which were previously just functions, are now full-fledged HtmlExtractor instances and can be used by themselves, e.g. doc >> elements, doc >> elementList("myQuery") >> formData;
      • A new PolyHtmlExtractor class was created, allowing the implementation of extractors whose return type depends on the type of the element or document being extracted;
      • Overall code cleanup and simplification of some concepts.