back to article SOD Big Data! Most of what you're keeping is digital landfill

I've been an industry analyst a long time – and I have seen a lot of "the next big things" come and go. In fact our appetite as an industry for the next big thing rivals that of Simon Cowell in pursuit of the next big star. And just as Simon's "stars" are usually anything but, so too are we in the IT sector often deluded, and …

COMMENTS

This topic is closed for new posts.
  1. K
    Thumb Up

    It's a 360° turnaround that is long overdue

    Surely you mean its a 180° turn around?

    But your are correct, this does need some reality check. When you have some SMEs with only a few GB of data, most of which is 5-10 years old and now completely irrelevant, yet they jump onto the "Hadoop" bandwagon. All because a) another company did it or b) their CEO heard the phrase "Big Data" from an "informed article" and they suddenly read it as "Big Money over this hill" , which vendors are only too pleased to encourage.

    Its the blind leading the blind...

  2. Michael H.F. Wilkinson Silver badge
    Thumb Up

    It is also about pattern recognition

    Indiscriminate adding of data makes finding any information hard. There is a real risk of drowning interesting patterns in noise. Data != Information

    As I like to tell students (over and over again): Adding hay does not make finding needles easier

    We do do some work on really big data (vast amount of image data from astronomy and remote sensing), but that data has lots of internal structure which helps finding stuff.

  3. EyeCU

    The great Joan Rivers?

    There is nothing 'great' about that evil harpy just because she has managed to make a career of being a total spiteful bitch. Here's what Adam Hills thinks http://www.youtube.com/watch?v=mOcAIGFma5Q

    1. K

      Re: The great Joan Rivers?

      Joan Rivers, a living legend in my book!

  4. Josco

    Good sense there....

    I worked for a document storage company where we had over 3 million document boxes in store for many customers large and small. I reckoned 95% of those documents would never see the light of day again except to be recycled. Ignorance of data retention rules, an unwillingness to be the person who authorised the destruction of 'sensitive' data and a universal instinct to hoard contributed to our bottom line.

    I'm no better, you should see my garage......

    1. Anonymous Custard

      Re: Good sense there....

      Not to mention the universal application of Sod's Law meaning that 2 days after you shredded/deleted/destroyed said document or information, a situation arises when you need it.

  5. SeymourHolz

    Don't Hate The Message Just Because You Hate The Messenger

    Analysts love to sneer at new buzzwords, rival analysts, and vendors. But APS is missing the point.

    Improving Worker Productivity is the imperative. The best data for making workers more productive is already held by the organization, but that data is federated and noisy and will always be federated and noisy. Big Data technologies handle the discrete technical challenges associated with that reality. Some products are better than others, caveat emptor. But don't dismiss the importance of this next phase of computing just because it clashes with your expectations about third normal form.

    1. Steve Knox

      Re: Don't Hate The Message Just Because You Hate The Messenger

      Improving Worker Productivity is the imperative.

      No, Improving business efficiency is the imperative. The business doesn't exist for the workers. If it's more efficient to replace the workers with robots, or computers, or toadstools, that's what'll happen.

      The best data for making workers more productive is already held by the organization,...

      Agreed, but...

      ... but that data is federated and noisy and will always be federated and noisy.

      Both of those depend on the organization, and the data involved. I'm working with some data right now that is sourced from three separate databases, and the only way to link them is on name, which causes some problems in spelling, etc., but the majority of data used by my organization is well centralized and has very little noise.

      Big Data technologies handle the discrete technical challenges associated with that reality.

      Again, that depends on the data involved. Most of the data I work with has too few rows and too little noise to make the effort involved in setting up "Big Data" technologies worthwhile. The biggest factor here being the degree of noise: If your data sources are well-designed from the start, you don't need to muck about with Big Data technologies.

      But don't dismiss the importance of this next phase of computing just because it clashes with your expectations about third normal form.

      And you don't dismiss normal forms just because they clash with your desire to be on the crest of a computing revolution. Why do you think that many of the bigger "NoSQL" technologies are adding relational features as fast as they can?

      Big Data technologies have some use. They are best when:

      1. You cannot curate your data sources,

      2. Inserts and Updates happen almost continuously,

      3. Queries don't have to be based on up-to-the-second data,

      4. Your dataset is very, very, very large, and

      5. You have lots of hardware resources to throw at the processing of this dataset.

      That, as APS has said, is NOT the case for any but the largest organizations, and even the largest organizations don't necessarily need that data processing in order to run efficiently.

  6. Mr Templedene
    Stop

    And this is the same argument I've used against the .gov retaining all the phone email and chat data as they want to, noise over signal. The data they want to keep will be to noisy for decent analysis.

    1. Anonymous Coward
      Anonymous Coward

      and...

      Everything that they'll have needed* to know will be there and be discovered after the fact.

      * - for a government definition of 'needed'

  7. Chris at SysMech

    Big data an expensive needle in a haystack?

    Alan – I’d have to disagree with your point that to get any value from big data you need to be like Facebook or LinkedIn and store vast amounts of data on your customers. Working with companies in the Telco sector, where the volumes of potential data available have always been extensive, and for regulatory reasons some classes of data have to be stored for many years, the value as I see it comes from a combination of collecting, analysing and storing the right information in the first place. I understand it can be difficult to know what data is valuable and what is as you put it ‘just collecting dust’, therefore I would suggest starting with a clear business strategy in mind or at the very least a business case for using the data.

    Once this is in place businesses should find it easier in discarding this dust collecting data as the ‘one day it might be useful’ paranoia will have been addressed. Once businesses are in the routine of collecting and storing data that can be used, cost and monetisation of data can throw up the real issues. For example, if the wrong tools are used for the wrong job, the costs can be considerable, and the task of finding value in the data collected can very quickly become just like looking for a very expensive 'needle in a haystack'!

  8. alanps

    Chris - I think we are actually in agreement :-)

    "... the value as I see it comes from a combination of collecting, analysing and storing the right information in the first place".

  9. douglaney

    Opportunity for principles/practices of infonomics

    Suggesting analytics has hit its limit reminds me of that late 1800s US Patent & Trademark Office commissioner, Charles H. Duell who suggested "Everything that can be invented has been invented." Hogwash! (To use a common 1800s term) At Gartner we're tracking a number of emergent technologies including autonomous analytics that will slice through today's Big Data like a knife through hot butter. Regardless, there's certainly a cost-benefit argument for "defensible disposal" using the infonomics models we've developed at Gartner for quantifying the economic (or relative) value of an information asset. --Doug Laney, VP Research, Gartner, @doug_laney

    1. alanps

      Re: Opportunity for principles/practices of infonomics

      Hi Doug, thanks for the comment. To be clear I don't believe that analytics has reached its limit. Rather that churning through mountains of redundant junk data is an unnecessary and costly exercise. Like Gartner, we at 451 are doing a lot of research on advanced and emerging analytical systems. But the point of this piece is that junk at the end of the day is still junk - and there is a better way forward.

      Best!

      Alan

  10. Dr. Ellen

    Nine-tenths of the stuff you're saving could easily be thrown away -- if only you were sure *which* nine-tenths. There have been occasions I've found something I stuck away decades ago, and been very glad I've found it.

This topic is closed for new posts.

Other stories you might like