Back in July, Brad Feld wrote a post titled "The Dark Matter of the Blogosphere". I'm not sure if he coined the term or not, but I like its meaning.
For those of you that are less of a physics nerd than me, dark matter is something astrophysicists have been struggling with for a while. Simply put, the Universe doesn't have enough stuff in it to work the way it does. The most viable explanation is that there is a _lot_ of stuff we can't see or detect easily a.k.a. Dark Matter.
In the case of the blogosphere, Brad was referring specifically to reader comments. There's a huge volume of user generated content out there in the form of blog comments, and for the most part it is unsearchable and effectively invisible. Folks like Disqus and Intense Debate are working hard to resolve this.
But I think the concept of Dark Matter is very applicable to data in general.
Think about all of the data in your life. How much useful information do you have that is effectively hidden and invisible? This is as true for an individual as it is for a corporation. Some of this information is hidden by virtue of being hard to search or hard to access... and some is hidden because it isn't explicit -- it's "implied" by the way things have been collected, organized, or used.
So lets take a quick look at each case...
Hard to search:
The original idea for disruptorMonkey stemmed from a personal problem... Like many of you, I have the "big box of crap" that I've accumulated from many different jobs. It includes CD-ROMs of data, printed stuff, handwritten notes and numerous other treasures. About 18 months ago, I needed to put together some sales training materials for someone. I dug in to the big box and it took me 4+ days to organize, recreate and assemble what I needed. It was a nightmare. Incensed at the stupidity of the process, I started looking for a better way, which quickly lead me to set up a wiki. Wiki's can be great, but they're mostly hopeless with existing data unless you reformat it for the wiki...which is a huge pain.
The underlying issue was the fact that the data was hard to search, which made it difficult to organize and repurpose.
Hard to access:
Last week I was talking to a banker, who happened to have majored in IT systems. I was explaining some of what we do, and he started telling me about some of his data woes. The biggest one stemmed from the fact that some banking systems are built on fairly old databases. You've probably seen the horrible green-screen terminal-window interfaces in use at your local bank. These UI's have zero flexibility and are the result of many years of development, much of it seemingly without input from the people using the product.
Even though the whole thing is just a database, he has no way whatsoever to run unique queries. For example, he would love to be able to search for customers with a $5,000-$10,000 personal line of credit. The data he needs is in the database, but he has no way to access it, so from a practical perspective it doesn't exist in any meaningful way.
The discussion I had with Brad before Thanksgiving was about how Exchange server contains a lot of interesting "implied" data, above and beyond the obvious email & social network info. Your Outlook/Exchange account says an awful lot about you and the things your interested in... along with who you talk to and what you talk about.
That's not data that is readily exposed in any useful way, although companies like Xobni are making some headway on that front.
All three of these scenarios are about "dark matter" data. There's a lot incredibly important information that's there, waiting to be mined, but today's tools mostly can't see or use it.
One of our longer term goals at disruptorMonkey is to build a tool that not only captures all that dark matter, it'll put it to work and make it useful.
There's much to do, but we're excited with the progress we've made so far...