26 May, 2021

Outliers in normally distributed datasets

I've recently touched the subject of statistical analysis of data using Excel functions. In this post I dive deeper into Excel statistical tools. It's about outliers in data sets, about numbers that distort the state of reality and can lead to unsound findings and conclusions regarding specific areas of knowledge.

There are no strict statistical rules for indisputable ways of identifying outliers; we are dealing with probabilities. Nonetheless, there are guidelines and tests we can utilize to find outlying values, and they can significantly improve our intuition, formally.

Because of the importance of detecting outliers I've prepared Excel workbook providing practical tools (tests) for identifying such deviating/departing values within any set of numerical data. The workbook includes basic guidelines for using some specific statistical tests; I'm showing here its fragment:

The results of those statistical tests are presented in both descriptive and graphical way, as illustrated partially in this example:

The outlying values can be either marked or removed from tested data set, in each of the tests.

I'm sharing the workbook for your convenience, so you can get access to it with Google Sheets by clicking on this link (to request my permission for access):

Excel workbook ("OutliersOut.xlsm" file)

and eventually download, and then open it in your Microsoft Excel program on your computer.

Hope you'll find it useful.

No comments:

Post a Comment

All comments are held for moderation. I reserve the right to edit, censor, delete and - if necessary - block comments.