官术网_书友最值得收藏!

The big problem

Hadoop is a distributed file system and a distributed framework designed to compute large chunks of data. It is relatively easy to get data into Hadoop. There are plenty of tools for getting data into different formats, such as Apache Phoenix. However it is actually extremely difficult to get value out of the data you put into Hadoop.

Let's look at the path from data to value. First we have to start with collecting data. Then we also spend a lot of time preparing it, making sure that this data is available for analysis, and being able to question the data. This process is as follows:

Unfortunately, you may not have asked the right questions or the answers are not clear, and you have to repeat this cycle. Maybe you have transformed and formatted your data. In other words, it is a long and challenging process.

What you actually want is to collect the data and spend some time preparing it; then you can ask questions and get answers repetitively. Now, you can spend a lot of time asking multiple questions. In addition, you can iterate with data on those questions to refine the answers that you are looking for. Let's look at the following diagram, in order to find a new approach:

主站蜘蛛池模板: 长汀县| 剑阁县| 启东市| 亚东县| 达州市| 团风县| 卢龙县| 北碚区| 达拉特旗| 山阳县| 耒阳市| 榆树市| 毕节市| 加查县| 克山县| 郧西县| 雷山县| 长治市| 诸暨市| 磐安县| 萝北县| 乡宁县| 亳州市| 樟树市| 普定县| 麻城市| 武宣县| 房产| 沽源县| 泗洪县| 宿州市| 阜平县| 米泉市| 绍兴县| 房产| 从江县| 全南县| 五原县| 都江堰市| 栾川县| 徐州市|