官术网_书友最值得收藏!

Introducing data analytics

We analyze data everyday for various reasons. To predict an event or forecast the key indicators, such as the revenue for a given organization, is fast becoming a major requirement in the industry. There are various types of techniques and tools that can be leveraged to analyze the data. Here are the techniques that will be covered in this book using Stata as a tool:

  • Stata programming and data management: Before predicting anything, we need to manage and massage the data in order to make it good enough to be something through which insights can be derived. The programming aspect helps in creating new variables to treat data in such a way that finding patterns in historical data or predicting the outcome of given event becomes much easier.
  • Data visualization: After the data preparation, we need to visualize the data for the the following:
    • To view what patterns in the data look like
    • To check whether there are any outliers in the data
    • To understand the data better
    • To draw preliminary insights from the data
  • Important statistical tests in Stata: After data visualization, based on observations, you can try to come up with various hypotheses about the data. We need to test these hypotheses on the datasets to check whether they are statistically significant and whether we can depend on and apply these hypotheses in future situations as well.
  • Linear regression in Stata: Once done with the hypothesis testing, there is always a business need to predict one of the variables, such as what the revenue of the financial organization will be in specific conditions, and so on. These predictions about continuous variables, such as revenue, the default amount on a credit card, and the number of items sold in a given store, come through linear regression. Linear regression is the most basic and widely used prediction methodology. We will go into details of linear regression in a later chapter.
  • Logistic regression in Stata: When you need to predict the outcome of a particular event along with the probability, logistic regression is the best and most acknowledged method by far. Predicting which team will win the match in football or cricket or predicting whether a customer will default on a loan payment can be decided through the probabilities given by logistic regression.
  • Survey analysis in Stata: Understanding the customer sentiment and consumer experience is one of the biggest requirements of the retail industry. The research industry also needs data about people's opinions in order to derive the effect of a certain event or the sentiments of the affected people. All of these can be achieved by conducting and analyzing survey datasets. Survey analysis can have various subtechniques, such as factor analysis, principle component analysis, panel data analysis, and so on.
  • Time series analysis in Stata: When you try to forecast a time-dependent variable with reasonable cyclic behavior of seasonality, time series analysis comes handy. There are many techniques of time series analysis, but we will talk about a couple of them: Autoregressive Integrated Moving Average (ARIMA) and Box Jenkins. Forecasting the amount of rainfall depending on the amount of rainfall in the past 5 years is a classic time series analysis problem.
  • Survival analysis in Stata: These days, lots of customers attrite from telecom plans, healthcare plans, and so on, and join the competitors. When you need to develop a churn model or attrition model to check who will attrite, survival analysis is the best model.
主站蜘蛛池模板: 大兴区| 工布江达县| 沙雅县| 濮阳县| 穆棱市| 霍林郭勒市| 江川县| 武城县| 苏尼特右旗| 滦南县| 贞丰县| 手机| 乐平市| 习水县| 安西县| 浠水县| 积石山| 闽侯县| 闽侯县| 安化县| 筠连县| 闸北区| 柳州市| 临安市| 临清市| 新密市| 滁州市| 茂名市| 墨脱县| 三都| 元朗区| 满城县| 和顺县| 肥西县| 平阴县| 顺昌县| 卢湾区| 西城区| 明星| 绵阳市| 太湖县|