# Statistical Analysis using Python Numpy Obtain two Numpy arrays from the DataFrame column to represent Female student scores and Male Student scores.

Add the Numpy code to determine the T-value and P-value of the data sets.

Add the function to remove outliers from each set of data, then re-compute the T-value and P-value.

2 hours

By the end of this project you will use the statistical capabilities of the Python Numpy package and other packages to find the statistical significance of student test data from two student groups. The T-Test is well known in the field of statistics. It is used to test a hypothesis using a set of data sampled from the population. To perform the T-Test, the population sample size, the mean, or average, of each population, and the standard deviation are all required. These will all be calculated in this project. Note: This course works best for learners who are based in the North America region. We’re currently working on providing the same experience in other regions.

## 您要培養的技能

• Python Statistics

• Python Programming

• Statistics T Test

• Numpy

• Statitistics Pooled Variance

## 分步進行學習

1. Analyze the T-Test problem and use the Python Pandas to read from the CSV into a Data Frame.

2. Obtain two Numpy arrays from the DataFrame column to represent Female student scores and Male Student scores.

3. Compute the variance of the two arrays using the standard deviation from each array.

4. Add the Numpy code to compute the pooled Variance and standard deviation and determine the T-value and P-value of the data sets.

5. Add a function to remove outliers from each set of data, then re-compute the T-value and P-value.

## 指導項目工作原理 