Business Data Analysis : Comparing Data Populations with SQL

While dealing with business data you will often times need to validated various database populations with one another. The comparison queries below are simplistic in nature but can be used in a host of business questions, solutions and resolutions. The SQL queries check and see if a record is in one population and not the other. Below is a set of three SQL business queries which use a standard join, left join, and right join which compares two populations against each other.

Business Scenario: A company has a set of loans which it needs to have evaluated by a third party. The company has 1000 loan records in their database. They have received back 975 records. The goal is to research the returned loan population from the third party evaluation company.

Individual SQL Count Query: It will be important to document how many loans are in each data population. In this business scenario, the original population results in 1000 loans and the returned population has 975 loans.

Individual SQL Count Query Example:

select count(loan_num) from original_loan_pop

select count(loan_num) from returned_loan_pop

SQL Straight Join Query: This business related query checks to see which records are in both populations. In the business scenario, the query returns 970 records. This means of the 1000 loans sent to be evaluated, only 970 loans were sent back correctly from the original population.

SQL Straight Join Query Example:

Select * from
(select * from original_loan_pop)a
join
(select * from returned_loan_pop)b
on a.loan_num = b.loan_num

SQL Left Join Query with Criteria: This query looks to see which loans are in the original table, but are not in the returned table. The query in this example returns 30 loans.  This mean of the 1000 loans sent to be evaluated, 30 loans from the original population were not included.

SQL Left Join Query Example:

Select * from
(select * from original_loan_pop)a
left join
(select * from returned_loan_pop)b
on a.loan_num = b.loan_num
where b.loan_num is null

SQL Right Join Query with Criteria: This query looks to see which loans are in the returned table table, but are not in the original table. The query in this example returns 5 loans.  The results of this query mean the company who provided back the evaluated loan information some how added 5 loans that were not even included in the original 100o loan data set.

SQL Right Join Query Example:

Select * from
(select * from original_loan_pop)a
right join
(select * from returned_loan_pop)b
on a.loan_num = b.loan_num
where b.loan_num is null

Final Results and Findings: Based on the results of the data analysis above, a data analyst or business analysis could conclude that the data returned by the loan evaluation company is incomplete with data inconsistencies. Of the original 1000 loans only 970 were returned correctly, with 30 loans missing. More concerning, there were 5 loans returned which did not even exist in the companies original loan set.

Article by Brady Smith
Be Sociable, Share!

2 Responses to “Business Data Analysis : Comparing Data Populations with SQL”

  1. Nick Fotopoulos on June 14th, 2009 at 3:46 am

    Why write the select list as sub queries when you don’t have any additional joins and where clauses in them.

    Select * from
    (select * from original_loan_pop)a
    join
    (select * from returned_loan_pop)b
    on a.loan_num = b.loan_num

    is the same as

    Select * from
    original_loan_pop a
    join
    returned_loan_pop b
    on a.loan_num = b.loan_num

    ???

  2. Perfect question! IMO from a technical or IT perspective your query is perfect and the most efficient way to write it and would be the best way if we were dealing with perfect data set. However, from a business analysis perspective, more often than not you will be more efficient with the sub queries especially when you are first researching an unknown data population.
    Here is an example, in the ‘Left Join Query’ above where there are 30 loans in the in the original population and not in the returned population. Leaving the ‘select * from original_loan_pop’ allows you to see the details of the loans or data set and easily add or remove fields from the query. This allows flexibility in researching the issue or variance.



Warning: strpos() [function.strpos]: Empty delimiter in /home/texeyes/public_html/technology-advisors.net/wp-content/plugins/kish-translate-ajax/functions.php on line 1027

Warning: strpos() [function.strpos]: Empty delimiter in /home/texeyes/public_html/technology-advisors.net/wp-content/plugins/kish-translate-ajax/functions.php on line 1027

Warning: strpos() [function.strpos]: Empty delimiter in /home/texeyes/public_html/technology-advisors.net/wp-content/plugins/kish-translate-ajax/functions.php on line 1027

Warning: strpos() [function.strpos]: Empty delimiter in /home/texeyes/public_html/technology-advisors.net/wp-content/plugins/kish-translate-ajax/functions.php on line 1027

Warning: strpos() [function.strpos]: Empty delimiter in /home/texeyes/public_html/technology-advisors.net/wp-content/plugins/kish-translate-ajax/functions.php on line 1027

Warning: strpos() [function.strpos]: Empty delimiter in /home/texeyes/public_html/technology-advisors.net/wp-content/plugins/kish-translate-ajax/functions.php on line 1027

Warning: strpos() [function.strpos]: Empty delimiter in /home/texeyes/public_html/technology-advisors.net/wp-content/plugins/kish-translate-ajax/functions.php on line 1027

Warning: strpos() [function.strpos]: Empty delimiter in /home/texeyes/public_html/technology-advisors.net/wp-content/plugins/kish-translate-ajax/functions.php on line 1027