3.1 Highlighting Duplicate Data
Sometimes duplicate data is useful, and sometimes it just makes it harder to understand your data.
Duplicate data can be any record that inadvertently shares data with another record in your table. These are records that may contain the same name, phone number, email, or address as another record, but also contain other non-matching data. A duplicate can be confined to a single column; and any value that occurs more than once in that column is a duplicate.
In this lesson, we’ll focus on how to use various methods in Excel to locate and highlight a group of duplicate data in a table.
By the time you complete this lesson, you should be able to:
- Locate and highlight data duplicates in rows of an Excel table
- Locate and highlight data duplicates in columns of an Excel table
- Locate and highlight data duplicates in multiple columns of an Excel table
- Highlight data based on criteria
Using Conditional Formatting to Highlight Data
Data duplicates in Excel can assume many forms. They can be in a single column, multiple columns or an entire row.
Conditional formatting in Excel can be used to identify duplicates in a table. Follow these steps to highlight cells in a table by using conditional formatting:
1. Select the cells you want to check for duplicates.
2. Go to Home tab.
3. Click Conditional Formatting (figure 2) in the Styles group.
4. Click Highlight Cell Rules to reveal Duplicate Values rule.
5. Click Duplicate Values.
6. Select Duplicate in the left drop-down menu in the Duplicate Values dialog box (figure 3).
7. Pick the formatting you want to apply to the duplicate values in the box beside values with (figure 3).
You can specify the format to be applied by scrolling through the right drop down menu. There are some existing formats that you can use, or specify your own format using the Custom Format option.
7. Click OK.
Figure 4 shows all duplicates in the table. The duplicates are formatted in green text.
Note that the process just discussed would instantly highlight all the cells that have a duplicate in the selected cells of the table. The selection of data in your table can be a single column, multiple columns, or a non-contiguous range of cells.
Only 10 cells of the table in figure 4 are not highlighted. They are not duplicate data. All other cells in green text have either one or duplicates.
Highlighting Cells based on Criteria
You can use conditional formatting in Excel to quickly highlight cells that meet a certain criteria. Let’s focus on relational operators – greater than (>), less than (<) and equal to (=).
Lets use our previous data file (figure 5) to illustrate how to highlight all cells with scores less than a given pass score, or marks less than the given pass score in an exam. To highlight cells based on a criteria, do the following tasks:
1. Select your range of cells in the table (figure 5).
2. Go to Home tab.
3. Choose Conditional Formatting.
4. Select Highlighting Cell Rules.
5. Select Greater Than… / Less Than… (figure 6). Let’s select Greater Than…
Based on the option (greater than or less than) you select, a dialog box opens (figure 7).
6. Type the number 54 in the first textbox in figure 5. In this case, all cells with numbers greater than 54 will be highlighted. How these numbers will be highlighted is defined in step 7.
7. Click the drop-down menu beside with to specify the format you want to apply to the cells that meet the condition (greater than 54).
There are some existing formats that you can apply to cells that meet a condition. You can specify your own format by choosing the Custom Format option in the drop-down menu of the combo box.
8. Click OK.
This instantly highlights all the cells with values greater than 54 in the table (figure 8).
Note that all final exam data that are more than 54 are highlighted in the light red fill format.
How to Highlight Top/Bottom 10%
Conditional formatting in Excel can quickly identify top 10 items or top 10% from a table. This could be helpful in situations where you want to quickly see the top candidates by scores or top deal values in your sales data.
Similarly, you can also quickly identify the bottom 10 items or bottom 10% in a table. To be able to use conditional formatting to identify certain category of data, consider the following steps:
1. Select your range of cells in the table.
2. Go to Home tab.
3. Choose Conditional Formatting (figure 9).
4. Select Top/Bottom Rules.
5. Select Top 10%….
6. Specify the format to be applied to the cells that meet the condition by using the drop-down beside with (figure 10).
There are other existing formats that you can use, or specify your own format using the Custom Format option.
7. Click OK.
This would instantly highlight the top 10% of the scores in the table. Note that this conditional formatting works only for cells that have a numeric value in it. The top 10% of final exam scores in the table is shown in figure 11.