# Statistics is Unintuitive

I routinely come across day to day situations where statistics can be very unintuitive. Consider the case where we are designing an algorithm to identify some outlier - say credit card fraud as an example. Assume I design an algorithm and it is 99.99 % accurate.

Sounds impressive?

That is until you consider that fraud is quite rare, maybe one in a thousand or lesser. A dummy algorithm that does nothing and categorises every case to be "not fraud" will have a 99.999 % accuracy since it will categorise every non-fraud correctly - non-fraud is the vast majority of the population - and therefore the accuracy will be sky high. Yet the algorithm is useless since it will never catch a fraud.

To understand how good the algorithm really is, we need to know how well it does separately for each scenario: how often is it right/wrong when there is no fraud and how often is it right/wrong when there is a fraud. The technical term for these metrics are precision and recall. Accuracy is a misleading metric and should not be used.

The same applies for a medical diagnostic test, eg; a covid test. Most people are not sick, only a few people in the population may actually have it. Therefore a diagnostic that tends to return a negative result will have a very high accuracy. To understand how good a diagnostic is, you will need to separately see how accurate it is both scenarios. The medical terms for this are Specificity and Sensitivity.