官术网_书友最值得收藏!

Architecture and integration with applications

The architecture is well covered in the official documentation located at http://predictionio.incubator.apache.org/system/. However, we will expand on the important aspects a little more in this section so that we can completely understand the flexibility and the platform offering in detail.

The following diagram is from the official documentation of PredictionIO:

The key things to understand from the preceding diagram are as follows:

  • Event Server will provide a RESTful endpoint for all the applications to drop events in real time. For applications such as product recommender, events may include view data, for when a buyer views various products, an event when a buyer adds a product to a cart, an event from IOT devices, and so on. Event Server of the current version of PredictionIO can use PostgreSQL 9.1/MySQL 5.1 or Apache HBase/ElasticSearch for the event data store. PredictionIO allows different engines to be used in training, but many algorithms come from Spark's MLlib. For scalable and large data volume applications, it is better to consider Apache HBASE, which is an open source, distributed, versioned, and non-relational database capable of handling billions of transactions for the training of data.
  • Training: PredictionIO uses Apache Spark to train the dataset. Apache Spark has an extensive API support for developers using data structure and most of the templates use libraries such as SPARK MLlib to directly access machine learning functions developed by data scientists.
  • Prediction Server will be a RESTful endpoint to submit a query in real time and get predictive results. The output of the training has two parts: a model and its metadata. The model is then stored in Hadoop Distributed File System (HDFS--a local file system) or ElasticSearch.
HDFS  is a distributed filesystem from Hadoop; it allows the storage to be shared among clustered machines. It is used to stage data for the batch import into PredictionIO (PIO), for the export of Event Server datasets, and for the storage of some models. ElasticSearch is a distributed, RESTful search and analytics engine; it's at the core of the Elastic Stack and stores your data centrally so that you can discover the expected and uncover the unexpected.
主站蜘蛛池模板: 怀远县| 樟树市| 武陟县| 阿瓦提县| 津南区| 江源县| 图木舒克市| 政和县| 桐城市| 邯郸市| 盐亭县| 新建县| 新营市| 开封市| 涪陵区| 隆德县| 万安县| 湟源县| 关岭| 随州市| 仪陇县| 曲阳县| 扶余县| 铜陵市| 无锡市| 甘洛县| 禹州市| 白城市| 万源市| 绥滨县| 绵竹市| 含山县| 西乌| 博罗县| 吐鲁番市| 烟台市| 攀枝花市| 如东县| 乃东县| 吴桥县| 安泽县|