back to article Xerox admits there's no fix yet for number-fudging copiers

A flaw in the scanning compression software of some Xerox copiers which changes digits and numbers run through the machine is worse than first thought and will require a full software upgrade, the self-styled "Document Company" has said. The flaw was first spotted by German computer science student David Kriesel, who …

COMMENTS

This topic is closed for new posts.
  1. John Smith 19 Gold badge
    FAIL

    So what are they using for a development method?

    The "No complaints == No faults" rule?

    1. Don Jefe

      Re: So what are they using for a development method?

      I hear what you're saying but I believe there's a much larger issue here.

      Is anyone reading/using the incorrect documents? It doesn't sound much like it. More than anything I think Xerox has made a great case for less copying of documents, obviously no one is using them anyway.

      1. Anonymous Coward
        Anonymous Coward

        Re: So what are they using for a development method?

        I'd have thought that the "much larger issue here" is: Isn't this all caused by the Xerox ("document") Corporation recycling a FAX compression algo?.. So how many FAXes were fucked in this manner?

        What a mess!

      2. Tom 13

        Re: It doesn't sound much like it.

        I wouldn't count on that. I'm still not sure the implications have sunk in. People assumed it was an accurate copy and proceeded. If they don't have cause to check it, they probably won't. Hell, the mass transit center next to where I work that was supposed to go into service 6 months ago is still in legal limbo because somebody could follow directions or confusion with blueprints. In fact if faulty Xeroxes are anywhere in loop on that one, I can pretty much guarantee the subcontractor will be reaching for that defense.

  2. DaLo

    Mixed Messages?

    So from that report we can conclude that:

    "... the unit’s “Quality/file size” factory default and highest modes don’t completely alleviate the problem"

    and

    "The default and highest modes ... a software bug character substitution is not completely eliminated"

    however we then revert to the original stement that

    "... on low-resolution scans of documents ... then it may reoccur"

    Also it doesn't affect normal office documents unless you consider a large spreadsheet which has small numbers (quite a few) a normal office document?

    "We apologize for any confusion that came from our prior communications." ... and from this one.

    1. Darryl
      Joke

      Re: Mixed Messages?

      I think the problem is, they scanned that document and emailed it to the PR department. Some of the text got changed along the way

      1. petur
        Coffee/keyboard

        Re: Mixed Messages?

        "I think the problem is, they scanned that document and emailed it to the PR department. Some of the text got changed along the way"

        new keyboard please

  3. We're all in it together

    Good

    I am planning on copying my wage slip and getting an instant pay rise.

  4. Suburban Inmate

    The obvious question

    They can't simply do an uncompressed scan because...?

    1. Anonymous Coward
      Anonymous Coward

      Re: The obvious question

      Or something like 4bit 1channel PNG? Surely the compression would be adequate for an internal process? ...and you'd actually be getting A COPY rather than some sort of creative evocation.

      1. Tom 13

        Re: The obvious question

        Follow this link for a detailed from an AC who claims experience with the compression algorithm in question:

        http://forums.theregister.co.uk/forum/1/2013/08/06/xerox_copier_flaw_means_dodgy_numbers_and_dangerous_designs/#c_1917144

        Based on the level of detail and tone of the message I accept his claim of expertise.

        1. Anonymous Coward
          Anonymous Coward

          Re: "Copies"

          Issue does not occur on basic copies

          because compression is not used when making basic copies.

          RTFA.

          1. Anonymous Coward
            Anonymous Coward

            Re: "Copies"

            "compression is not used when making basic copies."

            How sure are you ?

            Based on what the apologists have said, compression is used to minimise the amount of memory needed.

            The context they say this in seems to say that if you want to do things like collating or layup (two pages per sheet etc), the input is scanned and the compressed image is stored in memory, and then a potentially corrupt version of the input is printed out. It would make sense (well, it would if the compression/decompression was trustworthy).

            Similarly, if you want multiple copies of a set of originals, I imagine it scans the originals once, stores the compressed version, and then a potentially corrupt version of the input is printed out. I'd be *very* surprised if it rescanned the original for each copy that was wanted - the way that dumb but trustworthy copiers used to (trustworthy as long as there wasn't a misfeed or other mechanical problem).

            In this context, I struggle to see what a "basic copy" (without compression) might mean, and why the flow of data (including whether it's compressed or not) would be different for making 2 copies of an original rather than making one copy. But ICBW. Who's got the source code?

            Wait till this kind of "cost excellence" reaches places where it actually matters. Engine control units, that kind of thing, where lives may eventually be at stake (how many computers in that nice new Volvo hybrid).

  5. ecofeco Silver badge

    Just how hard is it?

    I'm beginning to think they outsourced the code to begin with and are now having "access" problems.

  6. Snake Silver badge

    Ouch -_-

    Only in our modern world does a "copier"...not do (correct) copes.

    1. Anonymous Coward
      Anonymous Coward

      Re: not do (correct) copes.

      And only on El Reg are the typo gremlins so merciless!

      1. Anonymous Coward
        Holmes

        Re: not do (correct) copes.

        Intentional?

        1. Jim Willsher

          Re: not do (correct) copes.

          No worse than their article saying Ariel instead of Arial I guess.

          1. Jim Willsher

            Re: not do (correct) copes.

            Edit: Or perhaps the 'a' became an 'e' due to the compression algorithm?

            1. Anonymous Coward
              Anonymous Coward

              Re: Ariel

              I suspect the little mermaid had something to do with this.

  7. Anonymous Coward
    Anonymous Coward

    Time to call Mr Loophole, Nick Freeman.....

    To see if the local dibble used Xerox copiers for my speeding ticket.

  8. Proud Father
    FAIL

    /FacePalm

    It's a bloody simple task!

    How the hell did they cock-up doing a straight verbatim copy?

    Muppets.

    1. Tom 13

      Re: /FacePalm

      If you were following the issue there was a quite good explanation in a previous column.

      It starts with they are compressing for data manipulation speed. It gets mangled with an algorithm that has more variations than RS-232. And it sounds like it has a library of sharp images that they use based on detection from the mangly algorithm.

      Yes, to me the non-programming tech who is the first guy to catch flack from the users, it looks like they should simply turn off the compression while they get it fixed and take the performance hit. But since I'm not the programmer, I'm willing to believe them when they say it is a bit more complicated than that.

  9. Anonymous Coward
    Anonymous Coward

    Re: "They can't simply do an uncompressed scan because...?"

    They're clueless? They've "cost excellenced" [1] everything down to a level where this isn't actually fixable?

    Also, not all compression is lossy compression. ZIP files are often quite compressed, but it's a rare day when they get undetected data corruption even in the presence of other errors. Ditto PDF in most cases. And various other less well known lossless compression formats, some of which have been routinely applied to documents for many years without any difficulties of this nature.

    What went wrong here? Too much clueless "cost excellence", perhaps. There's a lot of it about.

    [1] Yes,seriously. "value engineering" used to be the term of choice, but now in line with the company's policy of continuous product and service improvement, and in order to acknowledge the important role of the value stream contribution from the staff in Strategic Sourcing (which everyone else still calls Purchasing), the term has been replaced with "cost excellence".

    1. Someone Else Silver badge
      Devil

      Re: "They can't simply do an uncompressed scan because...?"

      [1] Yes,seriously. "value engineering" used to be the term of choice, but now in line with the company's policy of continuous product and service improvement, and in order to acknowledge the important role of the value stream contribution from the staff in Strategic Sourcing (which everyone else still calls Purchasing), the term has been replaced with "cost excellence".

      BINGO!

  10. Alan Esworthy
    Pint

    For a faster fix...

    ...Copy your monthly Xerox bill on their faulty gear. You've got a 50-50 chance any error will reduce the total you pay. Spend the difference at the pub.

    1. Anonymous Coward
      Anonymous Coward

      Re: For a faster fix...

      But what happens if it goes the OTHER way and RAISES the bill?

      1. Anonymous Coward
        Anonymous Coward

        Re: For a faster fix...

        "But what happens if it goes the OTHER way and RAISES the bill?

        Have another go at a different randomisation "compression" setting?

      2. Anonymous Coward
        Anonymous Coward

        Re: For a faster fix...

        Simples. You've got both copies. Check it first, only submit the copy if it is cheaper.

    2. kain preacher

      Re: For a faster fix...

      What do you want to bet they have error controls in place to prevent that? Like being able to access the copier via phone line. Trey advertise a line of copiers that can self call a tech to fix an issue. If it can call out it can transmit the page count, your bill, to a high quality printer. When the tech comes out to fixes the machine they often print out a page count and other stats from the copier directly. Want to bet that gets printed out in the best quality?

  11. btrower

    Pretty bad

    Silent data corruption like this is the absolute worst. Who knows how much this corruption has ended up bleeding back into databases. Believe it: a lot of computer generated data ends up being re-keyed. A couple of generations with documents sent to the shredder means -- what?

    1. Anonymous Coward
      Anonymous Coward

      Re: Pretty bad

      Yup. Methinks more than a few poor office sods around the place are going to be facing an utter bastard of an auditing job.

  12. kain preacher

    Joke

    It's like this joke some once sent me. This young monk was sent to this monastery. All thy did was make copys of of the holy texts. The young monk noticed that they were working of copies of copies. He ask the senior monk what happens if some gets it wrong? Won't the error be duplicated. The senior monk don't worry it never happens., but to placate the young monk the senior monk went into the archives to check the original. The senior monks said they left out the R they left out the f'n r. " each day we shall be happy and celibrate" became "each day we shall be happy and celibate "

    1. Anonymous Coward
      Headmaster

      Re: Joke

      While true that copying alone can incur errors. There are methods of copy error checking and integrity checking. The old method was to count individual letters and hope errors were spellings, thus correctable, and not substitution, both in letter and word and less correctable. Along with context, allows for correction in most instances, with the exceptions above and things like numbers or unique names.

      More recently we can do it mathematically with 100% (assuming a perfect "machine/calculator") or close to (assuming no hash collisions) accuracy. Say, assign a prime number to each letter and record the total sum of a line (may be off in my maths there, been a while). You have to hope the hash/integrity check is not copied in error though... and no, I'm not going to apply reoccurring checks! :P

      1. Aslan
        Thumb Up

        OCR - Optical Character Recognition

        Given the level of technical expertise here, I'm surprised that you're surprised OCR isn't 100%. Personally I think it's brilliant that Xerox came up with a file format compresses the original document by running it through OCR and building a font of the original images of the characters used. I'd love to know the sizes of files it generates for a given resolution. OCR is never 100 percent and shouldn't be used as the only method of storing/reading a document if every part of the information is important, or if you are going to be using OCR in such a case you need a human checking it.

        I think the fault here is that Xerox did not provide warning on the machine of what the risks and limitations of using that file format were. Perhaps a solution would be to provide notice of the limitations on the machine and have the copier assess it's certainty that it was seeing the correct characters and switch to an alternate file format if the certainty was below say 96%.

        Thumbs up to the guy who created this format. Xerox should have made end users aware of it's limitations.

        1. Anonymous Coward
          Facepalm

          Xerox isn't brilliant ..

          "Given the level of technical expertise here, I'm surprised that you're surprised OCR isn't 100%. Personally I think it's brilliant"

          No it isn't brilliant, when you are creating a machine that copies things then the one thing it has to do is COPY with 100% fidelity ...

  13. aurizon

    Xerox is not alone in this error. I have a brother printer scanner that creates errors when scanning documents to a PDF file. When I view that file or print it, things are changed.

  14. Paddy
    WTF?

    Another Y2K-like bug?

    Is anyone looking at other brands of copiers? Just how widespread is the problem likely to be?

    Should we all just confine purchases to those manufacturers that clearly state the have at least one mode that is guaranteed not to have a problem like this?

  15. MacroRodent
    Boffin

    Fundamental error?

    I have been wondering how this can come about, and concluded it must be a serious thinko in the design of the lossy compression algorithm itself, when applied to documents. I guess it splits the input image into blocks, and then looks for blocks with the commonest patterns and substitutes them for approximately similar rarer blocks, so that it does not have to store a separate code for them (this is about what the "vector quantization" compression method does, remember the 1990's blocky QuickTime and AVI videos?). No problem for kitten videos, but too bad if in an important document, some common blocks contain digits "8" and rarer blocks contain "6":s...

    Anyway, why should an all-in-one COPIER even apply lossy compression to an image on the way from the scanner part to the printer part? Another thinko there.

    1. Anonymous Coward
      Anonymous Coward

      Re: Fundamental error?

      Pretty much what I've made of it.

      EPIC XEROX FAIL!!! (In memory of Eadon)

    2. Anonymous Coward
      Anonymous Coward

      Re: Fundamental error?

      Xerox claims the algorithm the copier uses is called JBIG2. The original JBIG is lossless but JBIG2 allows for lossy. As for why the compression, consider a big scan run that has to be duplexed or collated. That means ALL the pages have to be in memory, and it's not unheard of for copiers to get skimped on memory. "Out of Memory" errors are rare enough that an expansion might not be considered or allowed by accounting.

      1. Yet Another Commentard

        Re: Fundamental error?

        @AC 07:03

        I see the "out of memory" point, but why not just tell the user that after [X] pages the pdf will be written (e-mailed, whatever) and a new one started? Thus freeing up the memory problem.

        Surely even at compressed there must be a page limit the memory can take, it's just you'll reach it sooner with uncompressed.

        The problem here is the action - to COPY, scanning has a different user mindset. A significant number of users will have grown up with old-style not-scanned-but-copied copies, and therefore have a reasonable expectation that the copy will be just that. In the good old days unclear copied numbers would be a mess (8,9,6 often being the culprits) I knew they were wrong because I couldn't read them, or I had doubts from the quality of the copied digit, so I'd go and look at the original. Now, I can't tell at a glance, as all the numbers look as clear as day because they have been OCRd and rendered in a nice, clear font.

        1. Anonymous Coward
          Anonymous Coward

          Re: Fundamental error?

          "I see the "out of memory" point, but why not just tell the user that after [X] pages the pdf will be written (e-mailed, whatever) and a new one started? Thus freeing up the memory problem."

          Because it would be pointless in the job's case to do in segments. Plus, like I said, collating and duplexing are pure copy functions and involve rearranging the pages IN MEMORY so they come out in a certain way. For these kinds of jobs, they have to be all or nothing or there's no point in the exercise (especially collating, which requires each set of copies come out in the same order as the original—3 copies of 3 pages will go 1,2,3,1,2,3,1,2,3—any break defeats the purpose).

          1. MacroRodent
            FAIL

            Re: Fundamental error?

            Well, using lossy compression that can visibly alter the results defeats the purpose even worse!

            They simply should have added enough memory to handle these jobs using only loss-less compression, which should work fine for most actual pages, because they contain large areas of solid colour, usually white.

          2. Anonymous Coward
            Anonymous Coward

            Re: collating is tricky

            " it would be pointless in the job's case to do in segments"

            Oh really? Do you think we were born yesterday or are you a copier salesman?

            How do you think the world managed before copiers could (not) hold a whole jobsworth of data in memory?

            The world managed for several decades with photocopiers that copied without misrepresentation but were not capable of handling clever stuff in huge jobs and/or huge numbers of copies.

            Manual intervention part way through (splitting a big job into a few smaller ones) was one widely used option which fixed a lot of things without too much effort and no loss of quality. Collating was and is particularly trivial to fix, and what couldn't be fixed manually could and can be sent to a copy shop with a clue, where $$$ will be required.

            You're seriously trying to tell readers that copying with misrepresentation is preferable to occasional manual intervention or occasionally spending $$$ with a copy shop with a clue?

            Words fowl me.

            1. Anonymous Coward
              Anonymous Coward

              Re: collating is tricky

              "How do you think the world managed before copiers could (not) hold a whole jobsworth of data in memory?"

              Clumsily, with plenty of potential for mistakes. That's why most firms went to professional printers for the big jobs, which meant outsourcing and money. Things the accounting department may not be keen to budget anymore. Same goes for the memory. As I've said, the higher-ups may not see the value in more memory for the copier.

              1. Anonymous Coward
                Anonymous Coward

                Re: collating is tricky

                "the higher-ups may not see the value in more memory for the copier"

                Do you think the PHBs may see the value in a 'copier' where the output text is reliably the same as the input text?

                1. Anonymous Coward
                  Anonymous Coward

                  Re: collating is tricky

                  Probably not. They'll say you're doing it wrong. It would have to take something exceptional, like a 100-copy run of a single page intended for the PHB and ALL bearing some critical mistake to draw their attention. As noted, the circumstances are already somewhat contrived (very small print for starters).

      2. Anonymous Coward
        Anonymous Coward

        Re: Fundamental error?

        Simples. Just copy the quote to get the copier to lower the price for you. Keep copying the copies until it's low enough for accounting.

This topic is closed for new posts.

Other stories you might like