- Java Data Science Cookbook
- Rushdi Shams
- 222字
- 2021-07-09 18:44:25
Introduction
Every data scientist needs to deal with data that is stored on disks in several formats, such as ASCII text, PDF, XML, JSON, and so on. Also, data can be stored in database tables. The first and foremost task for a data scientist before doing any analysis is to obtain data from these data sources and of these formats, and apply data-cleaning techniques to get rid of noises present in them. In this chapter, we will see recipes to accomplish this important task.
We will be using external Java libraries (Java archive files or simply JAR files) not only for this chapter but throughout the book. These libraries are created by developers or organizations to make everybody's life easier. We will be using Eclipse IDE for code development, preferably on the Windows platform, and execution throughout the book. Here is how you can include any external JAR file, and in many recipes, where I instruct you to include external JAR files into your project, this is what you need to do.
You can add a JAR file in a project in Eclipse by right-clicking on the Project | Build Path | Configure Build Path. Under the Libraries tab, click on Add External JARs..., and select the external JAR file(s) that you are going to use for a particular project:
- 企業(yè)數(shù)字化創(chuàng)新引擎:企業(yè)級PaaS平臺HZERO
- SQL Server 2008數(shù)據(jù)庫應用技術(第二版)
- 從0到1:數(shù)據(jù)分析師養(yǎng)成寶典
- Libgdx Cross/platform Game Development Cookbook
- R數(shù)據(jù)科學實戰(zhàn):工具詳解與案例分析(鮮讀版)
- 數(shù)字媒體交互設計(初級):Web產(chǎn)品交互設計方法與案例
- R Object-oriented Programming
- 數(shù)據(jù)修復技術與典型實例實戰(zhàn)詳解(第2版)
- 活用數(shù)據(jù):驅動業(yè)務的數(shù)據(jù)分析實戰(zhàn)
- 中文版Access 2007實例與操作
- SQL Server 2008寶典(第2版)
- MySQL性能調優(yōu)與架構設計
- 掌中寶:電腦綜合應用技巧
- 量化投資:交易模型開發(fā)與數(shù)據(jù)挖掘
- 推薦系統(tǒng)全鏈路設計:原理解讀與業(yè)務實踐