back to article Welcome to the Petabyte Club

Hype alert; hype alert; Big Data is coming our way. A new volcano has blasted its way above the surface of the marketing sea, spewing out "big data" messages in enormous flows of thought leader bullshit. What the heck is this big data thing? EMC says it's to do with handling data at the petabyte scale, where things like …

COMMENTS

This topic is closed for new posts.
  1. NoneSuch Silver badge
    Black Helicopters

    If Theodore Sturgeon was right...

    ...when he said 90% of Sci-Fi was crap because 90% of everything is crap; Then 90% of the Petabytes stored in the "cloud" is crap too.

    1. Anonymous Coward
      Flame

      I think Theodore was on the right lines...

      As 90% of Reg articles are crap as well.

      Come on kids, THEIR / THERE ... really?

    2. The Fuzzy Wotnot
      Happy

      Indeed...

      All those SMS messages and Twitter feeds from people making posts about their pets have to kept somewhere while the Gov sifts them looking for evidence of terrorist activity!

  2. thecakeis(not)alie

    Depends.

    Is it deduplicated? 5 million people posting the same damed lolcat is a hell of a dedupe ratio...

  3. Dan 10

    De-dupe

    Not sure on the value of de-dupe in this space. I could understand wanting to de-duplication data in the ETL layer, although a lot of products there are DB-based rather than file-based. However, since the aim is for a 'single source of truth' via third normal form, where's the value of de-dupe in the DW? Assuming the reporting tier is ROLAP (as part of that single source of truth that everyone's striving for), there's very little data there apart from cube dimensions.

    I suppose you might want limited MOLAP for performance reasons, then de-dupe that, but that ought to happen at the DB level, surely?

  4. Anonymous Coward
    Grenade

    Big (file) data is very real

    Chris, come spend a day with Isilon and you'll see that 'big data' is very real. It's something we talk to customers about every day. We can talk about the members of our 10PB club too!

  5. Chemist

    There's big data ..

    and there's the LHC - in a different league

    The Large Hadron Collider will produce roughly 15 petabytes (15 million gigabytes) of data annually

  6. Ramon Chen
    Thumb Up

    RainStor's goal is to de-dupe Big Data

    Great article as usual Chris and thanks for the mention.

    As you pointed out RainStor de-dupes structured data without sacrificing the original form. We preserve the immutable structure of the data while magically de-duplicating the values so that the footprint physically shrinks 40-1 or more.

    We are all about taking petabytes of data and reducing it to terabytes thereby allowing limitless amounts of data to be stored at the lowest possible cost.

    Many thanks again for the article and the reference to RainStor.

    Ramon Chen

    VP Product Management

    www.rainstor.com

This topic is closed for new posts.

Other stories you might like