官术网_书友最值得收藏!

  • Practical Data Analysis
  • Hector Cuesta
  • 334字
  • 2021-07-23 15:59:29

The nature of data

Data is the plural of datum, so it is always treated as plural. We can find data in all the situations of the world around us, in all the structured or unstructured, in continuous or discrete conditions, in weather records, stock market logs, in photo albums, music playlists, or in our Twitter accounts. In fact, data can be seen as the essential raw material of any kind of human activity. According to the Oxford English Dictionary:

Data are known facts or things used as basis for inference or reckoning.

As shown in the following figure, we can see Data in two distinct ways: Categorical and Numerical:

Categorical data are values or observations that can be sorted into groups or categories. There are two types of categorical values, nominal and ordinal. A nominal variable has no intrinsic ordering to its categories. For example, housing is a categorical variable having two categories (own and rent). An ordinal variable has an established ordering. For example, age as a variable with three orderly categories (young, adult, and elder).

Numerical data are values or observations that can be measured. There are two kinds of numerical values, discrete and continuous. Discrete data are values or observations that can be counted and are distinct and separate. For example, number of lines in a code. Continuous data are values or observations that may take on any value within a finite or infinite interval. For example, an economic time series such as historic gold prices.

The kinds of datasets used in this book are as follows:

  • E-mails (unstructured, discrete)
  • Digital images (unstructured, discrete)
  • Stock market logs (structured, continuous)
  • Historic gold prices (structured, continuous)
  • Credit approval records (structured, discrete)
  • Social media friends and relationships (unstructured, discrete)
  • Tweets and trending topics (unstructured, continuous)
  • Sales records (structured, continuous)

For each of the projects in this book, we try to use a different kind of data. This book is trying to give the reader the ability to address different kinds of data problems.

主站蜘蛛池模板: 邯郸市| 车致| 西充县| 扎囊县| 张掖市| 东乡族自治县| 江源县| 庄河市| 蚌埠市| 井研县| 巧家县| 文水县| 辽阳县| 丹东市| 米易县| 赤峰市| 读书| 婺源县| 观塘区| 香格里拉县| 宜城市| 增城市| 天全县| 塔城市| 宣汉县| 南漳县| 浪卡子县| 祥云县| 阆中市| 湾仔区| 卢龙县| 昔阳县| 山东省| 松桃| 本溪市| 巴青县| 缙云县| 郁南县| 华阴市| 宁安市| 万载县|