Forum Posts

Ziaur Rahman
Aug 03, 2022
In Welcome to the Forum
Recently, I have been in contact with learning BI (Business Intelligence, business intelligence) related content, and extracted some leased business data and imported it into BI software for analysis. Since the data spans a large year and comes from offline input, there are fewer field constraints in the early days. . The mechanism related to business auditing is not perfect, resulting in frequent reporting of errors in some field statistics, or a large gap with the ideal situation. In this case, a "data cleaning" process should be introduced to exclude the wrong data. However, in the process of data cleaning, if you just mobile number list make some routine judgments, such as the amount paid by the user in the rental order, it should be greater than 0 yuan, the area of ​​the house cannot be 0 square meters, etc. After simple processing, many abnormal problems can be solved. However, there are still data exceptions. For example, only the order amount cannot be 0 yuan. I never expected that there are orders with negative amounts and orders with empty amounts, which will eventually lead to the calculated average data and the ranking of regional performance data. and so on to produce an exception. The following will discuss how to measure the "degree" of cleaning based on the general process of data cleaning. The valid flag refers to the key flag that a piece of data is valid data, which is generally the status field of the data, such as the payment status of the payment order, the registration status of the user information, whether the product information is deleted, etc. decisive role. If possible, it is best to do a statistic on the original data (group by in SQL, filter function in Excel, etc.) to see how many states exist and how much data there are in different states. Then, according to the business process, screening the data valid signs can effectively reduce unnecessary data and potential problems therein, and improve the speed and result quality of subsequent data analysis. If the amount of filtered abnormal data exceeds 1% of the order, you should work with colleagues in charge of technical development or data analysis to study whether there is a problem with the data source record, or even if there is an unknown demand or potential bugs etc. For example, our rental order has multiple rental statuses:
Data cleaning and degree thinking in the construction of rental big data kanban
 content media
0
0
1

Ziaur Rahman

More actions