- Learning Hunk
- Dmitry Anoshin Sergey Sheypak
- 216字
- 2021-07-23 14:45:02
The big problem
Hadoop is a distributed file system and a distributed framework designed to compute large chunks of data. It is relatively easy to get data into Hadoop. There are plenty of tools for getting data into different formats, such as Apache Phoenix. However it is actually extremely difficult to get value out of the data you put into Hadoop.
Let's look at the path from data to value. First we have to start with collecting data. Then we also spend a lot of time preparing it, making sure that this data is available for analysis, and being able to question the data. This process is as follows:

Unfortunately, you may not have asked the right questions or the answers are not clear, and you have to repeat this cycle. Maybe you have transformed and formatted your data. In other words, it is a long and challenging process.
What you actually want is to collect the data and spend some time preparing it; then you can ask questions and get answers repetitively. Now, you can spend a lot of time asking multiple questions. In addition, you can iterate with data on those questions to refine the answers that you are looking for. Let's look at the following diagram, in order to find a new approach:

- 程序員面試筆試寶典(第3版)
- 程序員面試白皮書
- 算法大爆炸:面試通關步步為營
- Production Ready OpenStack:Recipes for Successful Environments
- Ray分布式機器學習:利用Ray進行大模型的數據處理、訓練、推理和部署
- Java Web應用開發技術與案例教程(第2版)
- MATLAB定量決策五大類問題
- Learning Network Forensics
- Java編程的邏輯
- Visualforce Developer’s guide
- Appcelerator Titanium:Patterns and Best Practices
- Java EE 8 and Angular
- H5+移動營銷設計寶典
- HTML5 WebSocket權威指南
- 高性能MVVM框架的設計與實現:San