- Mastering Python for Data Science
- Samir Madhavan
- 266字
- 2021-07-16 20:14:17
Chapter 1. Getting Started with Raw Data
In the world of data science, raw data comes in many forms and sizes. There is a lot of information that can be extracted from this raw data. To give an example, Amazon collects click stream data that records each and every click of the user on the website. This data can be utilized to understand if a user is a price-sensitive customer or prefer more popularly rated products. You must have noticed recommended products in Amazon; they are derived using such data.
The first step towards such an analysis would be to parse raw data. The parsing of the data involves the following steps:
- Extracting data from the source: Data can come in many forms, such as Excel, CSV, JSON, databases, and so on. Python makes it very easy to read data from these sources with the help of some useful packages, which will be covered in this chapter.
- Cleaning the data: Once a sanity check has been done, one needs to clean the data appropriately so that it can be utilized for analysis. You may have a dataset about students of a class and details about their height, weight, and marks. There may also be certain rows with the height or weight missing. Depending on the analysis being performed, these rows with missing values can either be ignored or replaced with the average height or weight.
In this chapter we will cover the following topics:
- Exploring arrays with NumPy
- Handling data with pandas
- Reading and writing data from various formats
- Handling missing data
- Manipulating data
推薦閱讀
- WildFly:New Features
- Java多線程編程實(shí)戰(zhàn)指南:設(shè)計(jì)模式篇(第2版)
- Interactive Data Visualization with Python
- Raspberry Pi for Secret Agents(Third Edition)
- Visual Basic程序設(shè)計(jì)與應(yīng)用實(shí)踐教程
- 教孩子學(xué)編程:C++入門(mén)圖解
- Web程序設(shè)計(jì)(第二版)
- 碼上行動(dòng):用ChatGPT學(xué)會(huì)Python編程
- Learning Raspbian
- C語(yǔ)言開(kāi)發(fā)基礎(chǔ)教程(Dev-C++)(第2版)
- 新一代SDN:VMware NSX 網(wǎng)絡(luò)原理與實(shí)踐
- 持續(xù)輕量級(jí)Java EE開(kāi)發(fā):編寫(xiě)可測(cè)試的代碼
- IBM Cognos TM1 Developer's Certification guide
- Building Slack Bots
- Secret Recipes of the Python Ninja