官术网_书友最值得收藏!

Deep diving into a concrete example

Early on, we wanted to build a data pipeline that extracted insights from Twitter by doing sentiment analysis of tweets containing specific hashtags and to deploy the results to a real-time dashboard. This application was a perfect starting point for us, because the data science analytics were not too complex, and the application covered many aspects of a real-life scenario:

  • High volume, high throughput streaming data
  • Data enrichment with sentiment analysis NLP
  • Basic data aggregation
  • Data visualization
  • Deployment into a real-time dashboard

To try things out, the first implementation was a simple Python application that used the tweepy library (the official Twitter library for Python: https://pypi.python.org/pypi/tweepy) to connect to Twitter and get a stream of tweets and textblob (the simple Python library for basic NLP: https://pypi.python.org/pypi/textblob) for sentiment analysis enrichment.

The results were then saved into a JSON file for analysis. This prototype was a great way to getting things started and experiment quickly, but after a few iterations we quickly realized that we needed to get serious and build an architecture that satisfied our enterprise requirements.

主站蜘蛛池模板: 德庆县| 山东省| 亚东县| 新和县| 大渡口区| 繁昌县| 广元市| 阿拉善盟| 体育| 凉城县| 榆树市| 南皮县| 陇南市| 晋城| 喀什市| 孙吴县| 东乡县| 聂荣县| 延川县| 冕宁县| 东方市| 柳州市| 宁国市| 潢川县| 洛川县| 紫金县| 桂平市| 繁峙县| 潼关县| 九江县| 元朗区| 丹棱县| 赣榆县| 建宁县| 板桥市| 怀宁县| 大埔县| 金乡县| 惠安县| 台北县| 隆德县|