Data cleaning, also known as data cleansing or data scrubbing, is a process consisting of the following steps:
Identifying inaccurate, incomplete, irrelevant, or corrupted data to remove it from further processing
Parsing data, extracting information of interest, or validating whether a string of data is in an acceptable format
Transforming data into a common encoding format, for example, UTF-8 or int32, time scale, or a normalized range
Transforming data into a common data schema; for instance, if we collect temperature measurements from different types of sensors, we might want them to have the same structure