官术网_书友最值得收藏!

Introduction

Every data scientist needs to deal with data that is stored on disks in several formats, such as ASCII text, PDF, XML, JSON, and so on. Also, data can be stored in database tables. The first and foremost task for a data scientist before doing any analysis is to obtain data from these data sources and of these formats, and apply data-cleaning techniques to get rid of noises present in them. In this chapter, we will see recipes to accomplish this important task.

We will be using external Java libraries (Java archive files or simply JAR files) not only for this chapter but throughout the book. These libraries are created by developers or organizations to make everybody's life easier. We will be using Eclipse IDE for code development, preferably on the Windows platform, and execution throughout the book. Here is how you can include any external JAR file, and in many recipes, where I instruct you to include external JAR files into your project, this is what you need to do.

You can add a JAR file in a project in Eclipse by right-clicking on the Project | Build Path | Configure Build Path. Under the Libraries tab, click on Add External JARs..., and select the external JAR file(s) that you are going to use for a particular project:

Introduction

主站蜘蛛池模板: 花莲县| 仲巴县| 湘乡市| 垦利县| 平果县| 大宁县| 黄骅市| 清水县| 平遥县| 日喀则市| 浑源县| 顺义区| 兴安盟| 吉隆县| 岚皋县| 敦化市| 天水市| 临颍县| 旬阳县| 化隆| 瑞安市| 健康| 永定县| 岑巩县| 哈密市| 东安县| 樟树市| 青阳县| 宜黄县| 穆棱市| 隆德县| 浮梁县| 白城市| 宜都市| 忻州市| 闽清县| 巴青县| 绥棱县| 托里县| 施甸县| 长乐市|