#2 — If you torture the data long enough, it will confess.
People like to read quotes because quotes are concise sentences, expressing wisdom, coming from experience, and awaken inspiration. I love quotes because a quote is catchy enough for us to remember and stir our mindset to a better place.
Are there any quotes in the Data Science field? Yes, Data Science might sound technical, but Data Science is not just about typing code and creating a model. It so much more in there; Exploration, Communication, Creativity, Honesty, just to name a few things associated with Data Science.
As a Data Scientist, I am quite rigid. I believe in the number I see, the statistical results I get, and the information from the data. However, I am also a human being with consciousness and not a robot; therefore, my mind is shaped by the advice in quotes I read. I believe quotes could empower us, and therefore I want to share a few quotes I always remember when working as a Data Scientist.
“Data that is loved tends to survive.” -Kurt Bollacker
Would you think that it was a good thing when we loved the data and it survives some selection criteria? Well, the answer is maybe towards no. Why? Because it could lead to Survivorship Bias. It is a fallacy where we conclude a conclusion from an incomplete set of data that survives our selection criteria.
In the case of favouritism, we always try to make sure that our data survive, whether it is intentional or not. I believe when we loved specific data, we subconsciously think about what kind of conclusion we want, and it is terrible. Even if we are the expert in the field, we need to act neutral to our data; think like we do not know anything and let the data tell the story.
Of course, the reality is different from idealism. I always try to be “Data-Driven” as much as possible, but the idealism is sometimes not the best approach. When faced with the stakeholder or your boss, It could be that this stakeholder loves the data that we removed and they were criticising us because of that. You might then feel terrible because you know what is right, but it is the reality; working with someone means there are always people in a higher position. Just remember, you might need to lie because of your work but try to keep your mind bias-free and you good to go.
“If you torture the data long enough, it will confess.” — Ronald H. Coase
Oh, how I loved this quote by Coase. While data tell a story, we could always get any conclusion we want if we torture them enough. Here, by torturing, it could refer to anything — Removing data, selecting specific methods, data inclusion, etc.
This quote represents how far we as Data Scientist need (or willing) to go just to get that particular conclusion we want. It is not always a bad thing for us to get into the extra mile in processing our data, but it depends on our intention.
Are you doing so much to your data because you already have a conclusion in your mind? or it is just because you want to explore the data?
Answering “Yes” to the first question would lead to bias and be careful if you have a bias; it means you would incline to decline anything but your bias. Answering “Yes” to the second question might be better, but remember to solve your business problem first. I know some people are data maniac.
This quote is close to the reality that happens in everyday work life. The stakeholder might already have a tendency they want to see, and we as Data Scientist might just need to torture the data until they confess so the stakeholder is satisfied.
“The temptation to form premature theories upon insufficient data is the bane of our profession”. — Sherlock Holmes (from The Valley of Fear, by Sir Arthur Conan Doyle)
Yes, this is a quote from a detective, and yes, this is from a fictional character written by Sir Arthur Conan Doyle. However, it rings true to our profession as I believe that Data scientist is a detective in other forms.
This quote could not explain any better to what we as Data Scientist need to believe; Don’t conclude before we have sufficient data. Although some people could argue how “sufficient” is hard to define. In that case, we need to go back to the subject that can prove sufficiency — Statistic (or your belief, stakeholder, experience, etc. it depends on your decision after all).
In any scenario, try to build your data science portfolio and mindset surrounded by data sufficiency. You do not want to accused of giving a false conclusion. Especially for the aspirant data scientist; often time the data you get is clean, processed and good enough to be taken into the next step. In real work, it rarely happens. That is why aspirant data scientists practically never get a chance to deal with an insufficient data problem. My advice is to try to collect data on your own for practice purposes.
“Every company has big data in its future, and every company will eventually be in the data business.” — Thomas Hayes Davenport
I am a Biologist by education and Researcher by my past employment. Right now, I am a Data Scientist. It seems like a drastic change, but it is not. My previous experience and what I did now pretty much the same; only the domain and the environment are different.
Well, I choose to be in the Data Science field because I see the future would be similar to the quote above. Sooner or later, everything would be in the data business. It is inevitable; We already live in an era where data is everywhere, and we need to utilise that. In the end, early bird catch the worm; means that you need to go early before more competition come. Although early worm caught by the bird, so always have a safety net in the case of failure.
“Talented data scientists leverage data that everybody sees; visionary data scientists leverage data that nobody sees.” — Vincent Granville
This quote represents my aspiration as a Data Scientist, and I feel everyone should have the same desire. While the talented data scientist could use the apparent data to answering the business problems, the truly visionary see the pattern that is not so obvious; and I tell you, it is the real talent. You need a creativity and outside-of-the-box mindset to be able to grasp the unseen data.
As a Data Scientist, I still keep trying to practice my mindset and skill to become the Visionary Data Scientist. I would say if someone truly wants to break into the Data Science field, try to become the Visionary Data Scientist. It separates you from everybody else, even the talented one.
“Data is not information, Information is not knowledge, Knowledge is not understanding, Understanding is not wisdom.” — Clifford Stoll
Some people do not know the differences between Data and the other related terms. It is because not everyone wants to know what Data is and how it is different from the other. For me, this quote is the best reminder to get my mindset right.
Why knowing the differences is significant? Isn’t it just some kind of definition that is not essential for the Data Scientist work?. It is not about working with the data, but it is about the way how we are thinking. There are fundamental differences between the terms, after all.
You could try to imagine data is like a rock; it is just there and not useful at all. To get something from this rock, we need to polish it to become a diamond. Data is the same; when we clean (or transform and explore), it becomes the information.
As we know, the diamond is expensive, but it is because we are already analysing the rock and decide that a diamond is valuable. Just like Information, it would stay as information and would not become knowledge without any analysis process.
Just because we know the price of the diamond does not mean we understand the diamond itself; why is it expensive? Why is the colour bright? And etc. We have the knowledge, but we need to ask the right question to understand completely.
Lastly, we might know everything about the diamond, but it does not mean we could suddenly get rich from that; we need to sell it to the correct person. We might understand everything, but only by applying the understanding to the right situation, we could exert all the data could do.
“If we have data, let’s look at data. If all we have are opinions, let’s go with mine.” — James Love Barksdale
Above is a simple quote that I love to think whenever I am arguing. In the argument, let’s look at the data if it exists. I believe in data, rather than complicated word or imagination in the form of a beautiful word. If data do not fuel the argument, it is better to follow my opinion as nothing could be proven anyway. In the end, if nobody knows anything, just follow any advice you feel right.
Conclusion
Here are just a few quotes that I try to remember as a Data Scientist. Yes, it is not as glorified as those programming and creating a machine learning model, but Data Scientist is still a human and therefore need a proper mindset to become a great Data Scientist.