Most of us would agree that past 2-3 years can be referred as years of Data Analytics, revolving around Big Data. After introduction to Big Data, everyone has at minimum evaluated if it’s something they should keep eye on or can just be ignored considering data that their applications are generating or collecting. As this hype is somehow about to settle, there is another type of data term started looking out for attention…….Dark Data.
When I first came across this term I was not very sure if it’s really different than big data? At a high level dark data is defined as a data form that is stored in the past, getting stored as we speak and will be generated in future, but yet to be noticed by organizations and analyzed for intelligence.
Confused? Let’s take a look at how Gartner Inc. describes Dark Data: “information assets that organizations collect, process and store in the course of their regular business activity, but generally fail to use for other purposes”.
Some also count data yet to be captured by the enterprises under Dark Data umbrella!
Examples of Dark Data:
- Application log files
- Data moved from main stream as archived data
- Data collected by application and stored for ‘just in case’ situations without knowing real use
If we consider these different data sources more closely, I find some similarities with Big Data.
- All 3 V’s i.e. Volume, Variety & Velocity can be applied for Dark Data
- It is Unstructured as Big Data
Though most of key characteristics are same, there are some notable differences as well.
- Storage mechanism [both physical location and format] used for Dark Data may result into adding additional complexity or cost for analyzing it
- As some of it is not yet captured, it might exists outside the boundaries of enterprises
- Big data is mostly identified by enterprises and will be leveraged by enterprises wherein Dark Data has not leveraged yet
In my opinion, though there are key differences, still it would be safe to say that any Dark Data would eventually turn into Big Data. It may be complex in nature or sources of dark data may not be supported out of the box by existing Big Data Tools but same Big Data Tool(s) and Platform(s) exists today will be able to handle this data with some modifications or additions. While looking at Dark data, organizations should first try to explore dark data at hand, identify its value, and evaluate existing big data solutions to analyze it rather than starting to worry about it or thinking of devising alternate solutions to address it.