Data Requirements for T Test

Introduction

When we choose to use t-test to analyse our data, part of the test process requires that our data should be checked to ensure that it satisfies the data requirement for t-test analysis. The process insists that our data satisfies t-test assumptions so that results we get from running a t-test can be valid. Sometimes, one or more of these assumptions can be violated especially, when you’re working with real-world data. We can always use other methods to test the validity of our data prior to doing any t-test.

Learning Outcomes

By the time you complete this lesson, you should be able to:

examine the integrity of sample data for t tests
check normality of sample data with Shapiro-Wilk test
investigate homogeneity of sample data with Welch’s test
determine outliers in a sample data

Assumptions for Paired Samples T Tests

The sample data must meet the data requirement for t test. Otherwise, the results you get might not be valid if you do not carry out the statistical tests based on the following assumptions:

Assumption 1 – Continuous Scale of Dataset

The dependent variable must be measured on a continuous scale – at interval or ratio level. Examples include:

time (measured in hours)
intelligence (measured using IQ score)
exam performance (measured from 0 to 100)
weight (measured in grams)

Assumption 2 – Categorical Independent Groups

The independent variable must consist of two categorical, independent groups. Examples of independent variables that meet this criterion include:

gender (2 groups – male or female)
employment status (2 groups – employed or unemployed)
smoker (2 groups – yes or no)
student status (2 groups – regular or weekend)

Assumption 3: Independent Sample Groups

There should be no relationship between the observations in each group or between the groups themselves. For example, no participant must be in more than one group – a participant must not be a regular and weekend student at the same time; no student must be in both level 100 and level 200. You must necessarily belong to only one group.

Assumption 4 – Normality Test

The dependent variable must be approximately normally distributed for each group of the independent variables.

We can test for normality by using the Shapiro-Wilk test of normality, which is easily tested for by using SPSS procedures.

(a) Normality Test for One Sample T Test

In the following example, we’ll use SPSS procedures to illustrate how to check the normality of dataset.

The data in Table 1 represents the measurement of 42 energy bars randomly collected at a particular location:

Table 1: Measurement of 42 energy bars
20.7	27.46	22.15	19.85	21.29	24.75
20.75	22.91	25.34	20.33	21.54	21.08
22.14	19.56	21.1	18.04	24.12	19.95
19.72	18.28	16.26	17.46	20.53	22.12
25.06	22.44	19.08	19.88	21.39	22.33
25.79	20.75	22.91	25.34	20.33	22.14
27.46	22.15	19.85	21.29	25.34	20.33

To check the normality of the energy bars data, we do the following tasks:

1. Start SPSS

2. Create a dataset with the 42 energy bars

3. Click Analyze, Figure 1

4. Click Descriptive Statistics

5. Click Frequencies…

Figure 1: SPSS procedures for creating histogram

6. Select the test variable (energy bars) in the Frequencies dialog box, Figure 2.

7. Click the arrow button to move the variable to the test Variable(s) area. Figure 3 displays the test variable that has been moved to the test variable dialog box.

8. Click Display frequency tables checkbox to uncheck it.

Figure 3: Selecting energy bars as selected test variable

Our aim is to create a histogram. Therefore, we ignore all other commands and focus on Chart… command.

9. Click Chart… command

10. Select Histograms option in Frequencies Chart dialog box, Figure 4

11. Check Show normal curve on histogram, Figure 4

12. Click Continue

We now have a histogram, Figure 5. By inspection, the histogram shows the data is roughly bell-shaped, and symmetrical, which means they are “even” on both sides of the center. So, our assumption of a normal distribution of the energy bars data seems reasonable.

Figure 5: Testing normality of energy bars

We’ll revisit this topic as a requirement for normally distributed data for one sample t-test. To learn more about how to carry out normality test for one sample t test, … click here >>>

(b) Normality Test for Independent Samples T Test

Normally distributed data is a requirement for independent samples t-test. These are four methods you can use to test for normality in 2-sample t test:

Shapiro-Wilk test of normality
Skewness and Kurtosis
Histograms
Normal Q-Q Plots

In this example, we’ll use quantile-quantile plots (Q-Q plots) to investigate whether or not a variable is normally distributed, particularly, with two sample t test. Our dataset is the test scores for 18 students – 10 weekend and 8 regular students.

To create a Q-Q plot, we do the following tasks:

1. Launch SPSS

2.Click Analyze, Figure 6

3. Click Descriptive Statistics

4. Click Explore…

Figure 6: SPSS procedures on normal Q -Q plots

The Explore dialog box opens, Figure 7

5. Transfer the variable scores into the box labelled Dependent List, Figure 8

Figure 8: Selecting variables for nor!al Q-Q plot

6. Click the command labelled Plots…

The Explore-Plot dialog box opens, Figure 9. Make sure the box next to Normality plots with tests is checked.

7. Click Continue

8. Click OK

The Output for normal Q-Q plot procedure is displayed in Figures 10 and 11:

Normal Q-Q plot of Scores for regular – control. From the normal Q-Q plots Figure 10 and Figure 11, we note that most of the residuals fall along a roughly straight line at a 45-degree angle. This suggests that the residuals are approximately normally distributed.

Figure 10: Normal Q-Q plot for regular students

Normal Q-Q plot of Scores for weekend – treatment. The normal Q-Q plot for weekend defined as treatment is displayed in Figure 11.

Figure 11: Normal Q-Q plot for weekend students

We’ll revisit this topic as a requirement for normal distribution of data for independent samples t-test. To learn more about how to carry out normality test for independent samples t test, … click here >>>

(c) Normality Test for Paired Samples T Test

We can test for normality of paired t test by using the Shapiro-Wilk test of normality since our sample size is small (n<50). We use dataset in Table 2 to illustrate how to check normality for paired samples t test. In this “matched pairs” design, our interest is to establish whether the distribution of the change differences in the dependent variable between the two related groups (test1 and test2) is approximately normally distributed.

Table 2: Scores for Test1 and Test2
Student	Test1	Test2
Mercy	63	69
Muna	65	65
Ama	56	62
Kate	90	91
Amama	88	78
John	83	87
Aasim	77	79
Julia	92	88
Jamila	90	85
Zain	84	92
Jean	68	69
Indra	74	81
Susan	87	80
Ruhia	64	75
Miski	71	84
Haneen	88	89

We use SPSS procedures to create the change difference variable – difference between test2 and test1. We then use the change difference to determine whether the normality of distribution of our random sample is approximately normally distributed. To learn more about how to use SPSS procedures to build executable expressions that can create a new value from existing values, … click here>>>

To perform Shapiro-Wilk normality test, we do the following tasks:

1. Launch SPSS

2.Click Analyze, Figure 12

3. Click Descriptive Statistics

4. Click Explore…

Figure 12: SPSS procedures for Shapiro-Wilk test

The Explore dialog box opens and displays the original two sample variables (test1 and test2) together with Difference, the new “change” variable Figure 13.

Figure 13: Explore dialog box for selecting dependent variable

5. Transfer the variable Difference to the Dependent List box, Figure 14.

6. Click the command labelled Plots..

Figure 14: Using Explore dialog box to transfer dependent variable

The Explore-Plot dialog box opens, Figure 15. Make sure the box next to Normality plots with tests is checked

7. Unchecked Stem-and-leaf Descriptive option.

8. Check Histogram under Descriptive field.

9. Check Normality plots with tests box.

Figure 15: Explore: Plots dialog box for selected related options

10. Click Continue

11. Click OK

The resultant output has a bunch of outputs tables and plots. However, the only table we’re looking for is Shapiro-Wilk Tests of Normality, Figure 16.

From Figure 16 the Sig. value of the Shapiro-Wilk Test is greater than 0.05, and as a rule of thumb, we fail to reject the null hypothesis since .821 >.05. This suggests that our data set (paired difference) is from a normal distribution.

Figure 16: Test of normality for paired differences

We revisit this topic as a requirement for normal distribution of data for paired samples t-test. To learn more about how to carry out normality test for dependent samples t test, … click here >>>

Assumption 5: Outliers

There should be no significant outliers. Outliers are simply single data points within a data that do not follow the usual pattern. The problem with outliers is that they can have a negative effect on the t-test, thereby reducing the validity of test analysis.

The simplest way to detect outliers is to draw box plots. Box plots, also known as box and whisker plots, are easy ways to graphically visualize the distribution of the data you’re analyzing. The box demonstrates the central value (50%) of the data, with a line in the middle that shows the median value. The lines extending from the box capture the range of the remaining data. Any data point that falls outside the lines is an outlier, Figure 12.

SPSS uses a circle to mark any outliers. Far outliers, which are more likely to be true outliers, are marked with a star. Next to an outlier icon is a number. The number corresponds to the listed dataset in your Variable View list.

Figure 12: Using boxplot to detect outliers

Handling Outliers. When you encounter outliers in your data, there are a few ways to handle them:

Removing outliers. If there is no reasonable scientific basis for an outlier to be in the dataset, one of the easy ways to tackle the issue is to remove the data point. Problematic outliers that represent the following categories should be removed:

measurement errors
data entry
processing errors
poor sampling

Replacing outliers with closest value to median. If you have just a few data points that are outliers, you could replace them with the next closest value to the median.

Replacing outlier with mean of population. We can tackle outliers by replacing them with the mean of the remaining values without the outlier. The setback for this approach is that we run the risk of distorting the distribution.

Overlooking outliers. Oftentimes, outliers are overlooked by analysts. Some outliers represent natural variations in the population, and they should be overlooked. These are called true outliers. It’s best to remove outliers only when we have a sound reason for doing so.

Assumption 6: Homogeneity of Variances

The assumption of homogeneity of variance is an assumption of the independent samples t-test that all comparison groups have the same variance. We may use any of the following methods to check the homogeneity of variances for 2-sample t test:

Levene’s Test for Variance. We can test this assumption in SPSS Statistics by using Levene’s test for homogeneity of variances. To learn more about how Laverne’s Test is used for independent-sample data … Click here >>

Box Plot for Comparing Variances. We can also use SPSS procedures to look at boxplots to get an idea of what to expect when we conduct independent samples t test. We use the hypothetical data in Figure 13 to compare variances:

Figure 13: Sample data for two-sample t test

To compare the variances of the two groups in the given data, we retrieve the boxplot from the bunch of tables in our previous tutorials on independent t test, see Normality Test for Independent Samples T Test

Figure 14: Box plot for comparing variances

From Figure 14, we notice that the total length are not about the same for both the two groups. This confirms that the two variances are not equal. Also, from this boxplot, it is clear that the spread of observations for regular is much greater than the spread of observations for weekend. We can safely estimate that the variances for these two groups are quite different.

Welch’s Test for Unequal Variance. We use Welch’s test to compare the means of two independent groups when sample sizes and variances are unequal between groups

In practice, when we compare the means of two groups it is unlikely that the standard deviations for each group will be identical. This makes it a good idea to just always use Welch’s t-test, so that we don’t have to make any assumptions about equal variances.

To learn more about Welch’s test for unequal variance watch the video (Figure 14)

Figure 14: Welch’s test for unequal variances

To learn more about how to carry out homogeneity test for independent t test, ….Click here>>>

Citation Information

If you want to cite this lesson, you may use the following APA information:

Author: Mahama, A.
Date of publication: Use the 2024, February 18 or the last date the lesson was modified.
Title: Data requirements for t test
URL of lesson: https://thecalleacademy.thecallinfo.com/lessons/data-requirements-for-t-tests/
xxx is the date you retrieved the lesson from the online source

Example

Mahama, A. (2024, February 18). Data requirements for t test. Retrieve xxx from https://thecalleacademy.thecallinfo.com/lessons/data-requirements-for-t-tests/

References

Amanda, S. (2023). STM1001 Topic 6: t-tests for two-sample hypothesis testing. Retrieved January 12, 2024 from https://bookdown.org/content/f9d035ed-86ea-4779-ad01-31acc973f0dd/

Lund. (n.d.). Testing for normality using SPSS statistics. Retrieved August 31, 2023 from https://statistics.laerd.com/spss-tutorials/testing-for-normality-using-spss-statistics.php

Khan, M. (April 8, 2022). How to detect outliers. Retrieved September 12, 2023 from https://godatadrive.com/blog/how-to-detect-outliers

Stephanie G. (2023). Outliers SPSS – from StatisticsHowTo.com: Elementary Statistics for the rest of us! Retrieved September 11, 2023 from https://www.statisticshowto.com/outliers-spss/

▣▣▣