SQL Data Cleaning using Housing Sales Data
Some of the SQL techniques used:
ALTER Table, ALTER COLUMN data type, ISNULL(), splitting address into separate columns (address, city, state) using SUBSTRING() & PARSENAME()
Introduction
This project uses housing sales data from Nashville, TN including information such as: SaleDate, PropertyAddress, SalePrice, TotalValue, YearBuilt, Unique ID, and Parcel ID which is unique to a property
To view all of the code, please visit the GitHub Repository linked here: SQL Data Cleaning GitHub Repository. I have highlighted a few parts of the project below.
Cleaning the Date data by removing unused timestamps
The column "SaleDate" includes the date and also a timestamp that only contains "00:00:00.000"

The following code was used to ALTER the table and change the data type for column SaleDate so that it no longer includes the timestamp.
ALTER Table [SQL Data Cleaning].dbo.NashvilleHousing ALTER COLUMN SaleDate Date; Select * From [SQL Data Cleaning].dbo.NashvilleHousing order by SaleDate

Populating data that was left blank
Some of the data for "PropertyAddress" was left blank, but since the "ParcelID" and "PropertyAddress" should always be connected. Any blank address that have a ParcelID should actually have the same address as any other row with the same ParcelID number.
By sorting the data by ParcelID, we can see two examples below where the PropertyAddress is NULL, but there's another row above the NULL value with the same ParcelID and a complete address in the "PropertyAddress" column


In order to find matching ParcelIDs and use their address in the blank address, two temporary tables (a & b) were created from the original table to compare ParcelID values in each table and replace any NULL addresses with the value in any matching ParcelID rows. ISNULL() was used to look for "a.PropertyAddress" values that are NULL and replace them with "b.PropertyAddress" that are not NULL.


Separating address data into individual columns (address, city, & state)
To make the data more usable in the future, it is useful to split the address into separate columns for the address and city. Below is an example of this using PARSENAME() and using commas as the delimiter since commas are only used to separate the address from city and state.

