官术网_书友最值得收藏!

Data Processing Toolbox

In the previous chapter, we discussed the best practices for approaching data science problems. We looked at CRISP-DM, which is the methodology for dealing with data mining projects, and one of the first steps there is data preprocessing. In this chapter, we will take a closer look at how to do this in Java.

Specifically, we will cover the following topics:

  • Standard Java library
  • Extensions to the standard library
  • Reading data from different sources such as text, HTML, JSON, and databases
  • DataFrames for manipulating tabular data

In the end, we will put everything together to prepare the data for the search engine.

By the end of this chapter, you will be able to process data such that it can be used for machine learning and further analysis.

主站蜘蛛池模板: 黑水县| 宁陕县| 钟祥市| 宝兴县| 永胜县| 纳雍县| 云梦县| 子洲县| 修武县| 扶余县| 新竹市| 德江县| 常州市| 山西省| 宁陕县| 安平县| 石台县| 榕江县| 杂多县| 嘉兴市| 泸西县| 新巴尔虎左旗| 瑞丽市| 台南市| 信阳市| 北碚区| 门源| 秦皇岛市| 平遥县| 达拉特旗| 光泽县| 锡林浩特市| 扬中市| 克什克腾旗| 海门市| 循化| 高清| 鹿泉市| 昌黎县| 东乌珠穆沁旗| 乐清市|