官术网_书友最值得收藏!

Introduction

In the previous chapter, we saw how to transform data and attributes obtained from raw sources into expected attributes and values through pandas. After structuring data into a tabular form, with each field containing the expected (correct and clean) values, we can say that this data is prepared for further analysis, which involves utilizing the prepared data to solve business problems. To ensure the best outcomes for a project, we need to be clear about the scope of the data, the questions we can address with it, and what problems we can solve with it before we can make any useful inference from the data.

To do that, not only do we need to understand the kind of data we have, but also the way some attributes are related to other attributes, what attributes are useful for us, and how they vary in the data provided. Performing this analysis on data and exploring ways we can use it, is not a straightforward task. We have to perform several initial exploratory tests on our data. Then, we need to interpret their results and possibly create and analyze more statistics and visualizations before we make a statement about the scope or analysis of the dataset. In data science pipelines, this process is referred to as Exploratory Data Analysis.

In this chapter, we will go through techniques to explore and analyze data by means of solving some problems critical for businesses, such as identifying attributes useful for marketing, analyzing key performance indicators, performing comparative analyses, and generating insights and visualizations. We will use the pandas, Matplotlib, and seaborn libraries in Python to solve these problems.

主站蜘蛛池模板: 黔东| 瑞安市| 广东省| 五大连池市| 合肥市| 嘉义市| 罗定市| 罗源县| 方正县| 平湖市| 凤冈县| 阜新市| 同仁县| 疏附县| 临湘市| 西林县| 日照市| 南城县| 濉溪县| 海安县| 阆中市| 顺平县| 邵阳县| 枣阳市| 常山县| 沭阳县| 孟津县| 汝城县| 霞浦县| 双峰县| 昌江| 都昌县| 揭西县| 凌云县| 黑水县| 辽源市| 三明市| 鹤壁市| 自贡市| 阳江市| 阜阳市|