官术网_书友最值得收藏!

Introducing H2O.ai

H2O is an open source, machine learning platform that plays extremely well with Spark; in fact, it was one of the first third-party packages deemed "Certified on Spark".

Sparkling Water (H2O + Spark) is H2O's integration of their platform within the Spark project, which combines the machine learning capabilities of H2O with all the functionality of Spark. This means that users can run H2O algorithms on Spark RDD/DataFrame for both exploration and deployment purposes. This is made possible because H2O and Spark share the same JVM, which allows for seamless transitions between the two platforms. H2O stores data in the H2O frame, which is a columnar-compressed representation of your dataset that can be created from Spark RDD and/or DataFrame. Throughout much of this book, we will be referencing algorithms from Spark's MLlib library and H2O's platform, showing how to use both the libraries to get the best results possible for a given task.

The following is a summary of the features Sparkling Water comes equipped with:

  • Use of H2O algorithms within a Spark workflow
  • Transformations between Spark and H2O data structures
  • Use of Spark RDD and/or DataFrame as inputs to H2O algorithms
  • Use of H2O frames as inputs into MLlib algorithms (will come in handy when we do feature engineering later)
  • Transparent execution of Sparkling Water applications on top of Spark (for example, we can run a Sparkling Water application within a Spark stream)
  • The H2O user interface to explore Spark data
主站蜘蛛池模板: 长治市| 齐齐哈尔市| 行唐县| 太仆寺旗| 大丰市| 绥芬河市| 房产| 谢通门县| 石屏县| 德昌县| 巴南区| 内江市| 二连浩特市| 扶绥县| 石城县| 齐齐哈尔市| 合作市| 丹棱县| 东宁县| 皋兰县| 榆中县| 仲巴县| 梁河县| 威海市| 张北县| 岳池县| 龙南县| 石林| 永济市| 长沙县| 浠水县| 洞口县| 金堂县| 新晃| 都兰县| 噶尔县| 公安县| 颍上县| 宁武县| 陆良县| 青冈县|