Having heard the saying “Garbage In – Garbage Out” before, some of us more frequently than others provided that you exist in the realm of software programming, its birthplace, correlates quite literally. Information Technology has developed great solutions to automate all sorts of activities but in most cases, the solutions require that the source data is clean and accurate. First, let’s look at garbage in, garbage out.
Garbage in, garbage out (GIGO)
The phrase “Garbage in, garbage out” is a concept used to express the idea that in computer science, GIGO implies that invalid input may result in ‘garbage’ or unidentifiable output. Rubbish in, rubbish out, thus being an alternate expression. Structured questions capture the exact information needed to move the process forward and feed properly into decision points (human or automated). When considering automated workflow, a collection of bad data doesn’t just produce headaches at the point of output, it produces pain throughout a process.
What Exactly is Garbage?
If Garbage in yields Garbage out, what might garbage be? Part of the answer to this question is easy: bad data. By this I mean data that are either:
So is it possible to fix garbage?
Garbage in, insights out (GIIO)
When we consider what it is we know about the Cross-Industry Standard Process for Data Mining (CRISP-DM), stepping back and considering what is most important to you, will guide the search for solutions to the GIGO problem.
One starts with the Business Understanding task, which if done properly produces a solid understanding of what a client wants to solve via analytics. This iterative task drives all subsequent phases shown in the diagram, some of which reinforce each other.
The above diagram is rather self-explanatory, therefore an applied logic is that if nonsense is (i.e garbage) input into any of the above steps, can confuse all prospectively related steps, which will ultimately lead to garbage output.
Despite all this, it created an ideal opportunity for businesses to start investing in solutions that particularly drive delivering clean input data of the highest quality. Solutions that deploy artificial intelligence/ machine learning to enhance the data across the various systems to make sure there is a single Master Data Element that can be used seamlessly across advanced data stories.
Are there any other ways to fix the garbage data?
Connect with us to learn more about how we navigate Garbage In – Insights Out.