- Data Analysis with Python
- David Taieb
- 187字
- 2021-06-11 13:31:42
Deep diving into a concrete example
Early on, we wanted to build a data pipeline that extracted insights from Twitter by doing sentiment analysis of tweets containing specific hashtags and to deploy the results to a real-time dashboard. This application was a perfect starting point for us, because the data science analytics were not too complex, and the application covered many aspects of a real-life scenario:
- High volume, high throughput streaming data
- Data enrichment with sentiment analysis NLP
- Basic data aggregation
- Data visualization
- Deployment into a real-time dashboard
To try things out, the first implementation was a simple Python application that used the tweepy library (the official Twitter library for Python: https://pypi.python.org/pypi/tweepy) to connect to Twitter and get a stream of tweets and textblob (the simple Python library for basic NLP: https://pypi.python.org/pypi/textblob) for sentiment analysis enrichment.
The results were then saved into a JSON file for analysis. This prototype was a great way to getting things started and experiment quickly, but after a few iterations we quickly realized that we needed to get serious and build an architecture that satisfied our enterprise requirements.
- 同步:秩序如何從混沌中涌現
- 大規模數據分析和建模:基于Spark與R
- 分布式數據庫系統:大數據時代新型數據庫技術(第3版)
- MySQL從入門到精通(第3版)
- 圖解機器學習算法
- 大數據算法
- iOS and OS X Network Programming Cookbook
- Mockito Cookbook
- PostgreSQL指南:內幕探索
- IPython Interactive Computing and Visualization Cookbook(Second Edition)
- 編寫有效用例
- Solaris操作系統原理實驗教程
- 新手學會計(2013-2014實戰升級版)
- MySQL數據庫技術與應用
- 數據庫查詢優化器的藝術:原理解析與SQL性能優化