Pet Peeves in Non-technical Data Science
Posted by Moonless Nights
Pet Peeves in Non-technical Data Science
I was watching some video talking about survey results related to Project Zomboid Build42 direction today and it reminded me of something I have found annoying for a while: Lack of standard deviation when discussing scored survey results.

I have found that there is a big emphasis on mean (aka "average") in large data sets, then a graphic showing a bar chart with various wordiness talking over it to vaguely express the shape of the graph. To me, this is where standard deviation should be provided, as the mean and standard deviation, together, provide a reasonable objective description of the shape. Providing a mean alone doesn't really say much.

Median can also be a good way of describing distribution in some odd cases, but it can be more difficult to collect (not that this matters in any real-world cases), and isn't always that interesting.

On a related note, I find it disappointing that voting systems always use a first-past-the-post approach, instead of a ranked ballot system. I always get irked when I see some call for user feedback to pick the most popular of something like 5 options and they make a big deal about the response which got 30% of the votes, as though that is meaningful (and is often straight up dishonest). When you have more than 2 options, you NEED a ranked system. I suspect if we used this approach as a default in more of these cases, people would also start to wonder why government requests for selections/input don't work that way.

Speaking of data science, I suspect that an interest in these matters may be inversely correlated with social popularity,
...Nights