官术网_书友最值得收藏!

Python for Data Wrangling

There is always a debate on whether to perform the wrangling process using an enterprise tool or by using a programming language and associated frameworks. There are many commercial, enterprise-level tools for data formatting and pre-processing that do not involve much coding on the part of the user. These examples include the following:

  • General purpose data analysis platforms such as Microsoft Excel (with add-ins)
  • Statistical discovery package such as JMP (from SAS)
  • Modeling platforms such as RapidMiner
  • Analytics platforms from niche players focusing on data wrangling, such as Trifacta, Paxata, and Alteryx

However, programming languages such as Python provide more flexibility, control, and power compared to these off-the-shelf tools.

As the volume, velocity, and variety (the three Vs of big data) of data undergo rapid changes, it is always a good idea to develop and nurture a significant amount of in-house expertise in data wrangling using fundamental programming frameworks so that an organization is not beholden to the whims and fancies of any enterprise platform for as basic a task as data wrangling:

Figure 1.2: Google trend worldwide over the last Five years

A few of the obvious advantages of using an open source, free programming paradigm such as Python for data wrangling are the following:

  • General purpose open source paradigm putting no restriction on any of the methods you can develop for the specific problem at hand
  • Great ecosystem of fast, optimized, open source libraries, focused on data analytics
  • Growing support to connect Python to every conceivable data source type
  • Easy interface to basic statistical testing and quick visualization libraries to check data quality
  • Seamless interface of the data wrangling output with advanced machine learning models

Python is the most popular language of choice of machine learning and artificial intelligence these days.

主站蜘蛛池模板: 扎赉特旗| 三明市| 高青县| 深水埗区| 麟游县| 东港市| 玉田县| 奉贤区| 广德县| 怀远县| 利辛县| 车险| 垣曲县| 娱乐| 正蓝旗| 抚远县| 民权县| 平顶山市| 玉树县| 峨眉山市| 尚义县| 同江市| 垫江县| 遂溪县| 宿松县| 林州市| 南部县| 金门县| 营山县| 亚东县| 靖宇县| 汝城县| 穆棱市| 吐鲁番市| 平安县| 九江县| 砚山县| 曲松县| 祥云县| 雷州市| 精河县|