官术网_书友最值得收藏!

Processing data

The processing (or transformation) of data is where the data scientist's programming skills will come in to play (although you can often find a data scientist performing some sort of processing in other steps, like collecting, visualizing, or learning).

Keep in mind that there are many aspects of processing that occur within data science. The most common are formatting (and reformatting), which involves activities such as mechanically setting data types, aggregating values, reordering or dropping columns, and so on, cleansing (or addressing the quality of the data), which is solving for such things as default or missing values, incomplete or inapposite values, and so on, and profiling, which adds context to the data by creating a statistical understanding of the data.

The processing to be completed on the data can be simple (for example, it can be a very simple and manual event requiring repetitious updates to data in an MS Excel worksheet), or complex (as with the use of programming languages such as R or Python), or even more sophisticated (as when processing logic is coded into routines that can then be scheduled and rerun automatically on new populations of data).

主站蜘蛛池模板: 扶沟县| 沂源县| 昭苏县| 堆龙德庆县| 河曲县| 略阳县| 郑州市| 大邑县| 姜堰市| 崇信县| 北宁市| 南涧| 项城市| 五寨县| 龙岩市| 禄劝| 杭州市| 固始县| 册亨县| 邮箱| 南汇区| 平武县| 罗源县| 五指山市| 西安市| 徐水县| 枝江市| 平乐县| 黄陵县| 和平县| 丁青县| 区。| 邻水| 荥经县| 潢川县| 昆山市| 宁蒗| 台南县| 新干县| 敦化市| 翁牛特旗|