Data Cleaning in Excel

Harfourlarke Baiyewu Simbeeat
2 min readJun 17, 2020

--

Why do we clean data? We clean data in order to make it useful for the purpose of which it is created. Cleaning data can be done by updating information, correcting errors, removing unwanted information, creating a clear format, and doing all it entails to ensure the raw data provided is useful for the purpose it is made.

The data used for this project was obtained from the data.gov catalog (https://catalog.data.gov/dataset/fire-incidents-b6ba2). According to data.gov, this data represents all Fire Department incidents responded to by the Town of Cary. It contains five complete calendar years, plus the current year. The dataset only contains calls that have been completed and reviewed. As a result, incident reports may take 1–5 business days to complete, based on the complexity of the call. Incidents are reported in accordance with the National Fire Incident Reporting System (NFIRS) and it is updated daily.https://www.nfirs.fema.gov/documentation/reference. the data was created on February 10, 2017.

Cleaning data in excel depends on what the data is to be used for. However, for the purpose of this write-up, data was cleaned to ensure that all rows and columns are properly readable and filled with necessary information. When data was downloaded from its source, a shot of what it looks is shown.

All data from different columns are clustered in a single column. Thereby, leaving other columns and cells empty. We also discovered that all cells are separated by a semicolon. Upon correcting this error, we figured out that some columns have empty data, these were subsequently filled with N/A values. Cleaning this data in excel required using the text-to column function and removing duplicate function to remove all duplicates if any. However, there was none. An output of the clean data is shown.

Cleaned data in excel

With this output, we can now decide what we want to use with this data. It could be for analysis or visualization. Whatever the data would be used for, will determine if it would be needed for further cleaning before it's used for the purpose it’s designed for.

--

--