back to article Excel hell messes up ~20 per cent of genetic science papers

Scientific literature often mis-names genes and boffins say Microsoft Excel is partly to blame. “Automatic conversion of gene symbols to dates and floating-point numbers is a problematic feature of Excel software,” wrote Mark Ziemann, Yotam Eren and Assam El-OstaEmai, of the Baker IDI Heart & Diabetes Institute in Australia, …

Page:

  1. A Non e-mouse Silver badge

    There was a similar piece of research done on spreadsheets generated by financial bods. They found something like 10% of spreadsheets contained errors affecting the calculations.

    1. Anonymous Coward
      Anonymous Coward

      Yes, but in that case the act of doing this on Excel was deliberate - it's called plausible deniability..

      They wouldn't have that using LibreOffice.

      Hmm, as ODF is the UK government document standard, maybe that's an extra argument to mandate LO? I would love to be a fly on the wall at Microsoft HQ if anyone close to government would publicly float that idea :)

      Yes, I'm an evil man. Why?

      1. Fungus Bob
        Pint

        Upvote for being Evil. Have one of these --->

    2. a_yank_lurker

      Spreadsheets are overused. Excel is just the best known.

  2. Anonymous Coward
    Anonymous Coward

    My pet gripe is

    That the list of custom formats does not include

    YYYY-MM-DD'T'HH:mm:ss

    or

    YYYY-MM-DD HH:mm:ss

    Makes working with time data a PITA. The standard ones don't have the seconds part.

    I am sure that many others have their own ones.

    Fighing Excel and its wierdness in a company that seems to use it for all sorts of things (incluing many wrong ones) is something that I have to endure on a daily basis.

    1. DaLo

      Re: My pet gripe is

      "That the list of custom formats does not include..."

      Are you being serious? If you are, do you not realise that you can type whatever format you want into it, that's why it's called 'Custom'. The lists in the box are just suggestions and examples and custom ones you have already created for that workbook.

      You can also save that workbook with all your commonly used custom format as a template for subsequent spreadsheets.

      1. Robin

        Re: My pet gripe is

        Are you being serious? If you are, do you not realise that you can type whatever format you want into it, that's why it's called 'Custom'. The lists in the box are just suggestions and examples and custom ones you have already created for that workbook.

        A condescending reply on a tech forum? That must be a first.

        But anyway, another +1 here for the format the OP mentioned. I was surprised when they let me use it in a reporting function I made for an admin tool recently.

        1. DaLo

          Re: My pet gripe is

          Not meant to be condescending, just didn't know if the OP was being serious or sarcastic and didn't want to be trolled. Maybe it's bad UX but the OP seemed to know what format they required, just didn't realise that custom formats is designed to use any custom format and you can add your own.

          1. imanidiot Silver badge

            Re: My pet gripe is

            What surprises me most is that YYYY-MM-DD still isn't the commonly used/accepted format. I work in a company that does a lot of bussiness with US companies. I don't know how many times DD-MM-YYv. MM-DD-YY format F(*^ups have resulted in problems but it has to be quite often. And even now there are many forms and procedures where DD-MM-YY is the only accepted format...

            1. Simon Harris

              Re: My pet gripe is

              In my last company on official documents all dates were required to have the month spelled out as a word, and the year as 4 digits just to be sure there could be no ambiguity whichever order the day, month and year came.

              1. Eddy Ito

                Re: My pet gripe is

                My personal favorite was when someone either accidentally or as a prank formatted the date something along the lines of yyyy-mm-ddd.

            2. Primus Secundus Tertius

              Re: My pet gripe is

              @imanidiot

              When I worked for the London branch of a US company, in my documents I would write 01-Apr-99 to avoid confusion with 04-Jan-99. But our software product used US dates (MM/DD/YY) on every screen, so I used US dates on any new screen.

      2. jimbo60

        Re: My pet gripe is

        Well, how about this then: If you type in a custom date / time format as suggested, it completely fails to work when creating pivot tables from that data. You have to use one of the non-custom fields for the pivot table to correctly separate the timestamps. Not cool when I'm analyzing data timestamped to fractions of seconds.

    2. Zippy's Sausage Factory

      Re: My pet gripe is

      I'm guessing the problem the OP is having here is other people. Namely that there is a specific need to format the dates and times in this format. The fact that Excel doesn't contain the format with seconds means people are using the format that contains only minutes and then there are support calls when they don't understand why the data doesn't look right.

      In other words, Excel is making people take a shortcut down the wrong route.

      (As an aside, if I had all the money I've seen wasted thanks to Excel diverted to my bank account, I'd be able to own my own private island, which I could fly to and from every day in a helicopter made of solid gold piloted by a hired supermodel.)

    3. Black Betty

      Re: My pet gripe is

      Cosmetics. And as pointed out by others, easily fixed by creating a formatting template.

      If working with time data is your thing, then shouldn't decent time and date arithmetic be topping your wishlist? Not to mention a proper time storage format. FFS compacting them into a REAL to save space stopped making sense the day computer memories breached Bill Gate's infamous 640K limit.

  3. Tchou
    Holmes

    "Automatic conversion of gene symbols any user input to dates and floating-point numbers is a problematic feature of Excel software"

    1. Martin an gof Silver badge

      problematic feature of Excel software

      Reformatting affects all sorts of software, such that I usually find myself turning off nearly all 'autocorrect' features, just to stop the annoyances of having to re-type stuff. The one that's annoying me at the moment is at work, where Outlook insists on autocorrecting (in formatted emails) our postcode, which ends 3RD, to 3rd

      And don't talk about the problems of writing emails with a mixture of langauges - in my case Welsh and English. Best just turn off all correction features.

      M.

      1. Doctor Syntax Silver badge

        "Outlook insists on autocorrecting (in formatted emails) our postcode, which ends 3RD, to 3rd"

        "formatted emails": there's your problem, right there ;)

        Except, of course, any software which starts out attempting to be too clever by half ends up not being clever enough.

        1. Martin an gof Silver badge

          "formatted emails": there's your problem, right there ;)

          Well yes. When I originate an email it's plain text, but when replying to an email it's invariably formatted, and getting rid of the formatting can muck up the "history".

          You know, I still can't get used to top-posting and the way Outlook mangles / ignores signature separators...

          M.

          1. VinceH

            "Well yes. When I originate an email it's plain text, but when replying to an email it's invariably formatted, and getting rid of the formatting can muck up the "history"."

            When I originate an email it's plain text - and it's also plain text when I reply to one, no matter what format the original came in!

            I also interleave my replies, which did lead to one colleague asking me why I sent his email back to him without replying (his "preview pane" was small enough that he couldn't see my reply, which he expected to be at the top).

          2. gzuckier

            But if the default format for email isn't HTML, then how can the sales department send out their 1 paragraph missives as 5 meg powerpoint slides?

        2. Wensleydale Cheese

          "Except, of course, any software which starts out attempting to be too clever by half ends up not being clever enough."

          I've recently run into series of supplier invoice numbers which start with 118, and the "intelligent" pattern recognition system in OS X tries to interpret them as phone numbers.

          In my country, 118 is the number for the Fire Brigade.

          Oops.

    2. Just Enough

      Everything is potentially a US formatted date

      Using Excel in a chain of data transference always results in these kind of errors. Usual one is telephone numbers being rendered in scientific notation. Or the old favourite of a DD/MM/YYYY formatted date becoming a "DD/MM/YYYY" string, which then is interpreted as a MM/DD/YYYY formatted date. Excel will attempt to turn just about any code with forward slashes into a guessed date.

      I also struggled with one that insisted in converting the user name June1 into a date.

      1. Omgwtfbbqtime
        Windows

        Re: Everything is potentially a US formatted date

        I found the best way to deal with this was set your region to USian and leave it there. I don't get the mismatched date formats on the initial import (.CSV from a USian server) and once saved to .xlsx it keeps the dates the way they are meant to be for the other users.

    3. Anonymous Coward
      Anonymous Coward

      "Automatic conversion of any user input to dates and floating-point numbers is a problematic feature of Excel software"

      Hear hear.

      Another example: exported lists of usernames and passwords sent as an Excel spreadsheet.

      The passwords were 8-character random hex strings. It worked fine for 99.9% of the time - except for ones which happened to be like "123456e8" which Excel had munged into floating-point numbers.

      Stripping leading zeros off telephone numbers is another problem.

    4. gzuckier

      "Automatic conversion of gene symbols any user input to dates and floating-point numbers is a problematic feature of Excel software"

      As in, 60,000 distinct 14 digit ID numbers pasted into excel all coming up as 5.42342E14 and when reformatted as number, all coming up as 54234200000000? You mean that's not what was desired?

  4. John Smith 19 Gold badge
    Unhappy

    Storing semi text data in a tool designed to process numbers automatically.

    What could possibly go wrong?

    1. Tchou

      Re: Storing semi text data in a tool designed to process numbers automatically.

      No it is supposed to process just about anything without getting on your feet.

      And "just about anything" means string when it can't be parsed to some other formats, not force silly cast.

      1. Anonymous Coward
        Anonymous Coward

        Re: Storing semi text data in a tool designed to process numbers automatically.

        You know there's a very simple way to insert data as plain strings, right? IIRC it dates back as far as 1-2-3, but I never used Multiplan or Visicalc.

        Any tool you don't really learn to use will bite your back sooner or later.

        1. gzuckier

          Re: Storing semi text data in a tool designed to process numbers automatically.

          If the column is not preformatted as the desired format, pasting as text doesn't help. And once it's been converted to floating point, it's rounded off and you can't format it back to integer.

    2. gzuckier

      Re: Storing semi text data in a tool designed to process numbers automatically.

      It's not designed to process numbers. It's designed to do tabular jobs. Everything from pseudo-database work to making giant pixel-based pictures.

  5. david bates

    Its the formatting changes to plain text that you foolishly paste in that gets me. Paste in a phone number, lose the leading zero. Paste in a long number and get something like 87687767E10.

    I've seen so many issues with data caused by people being caught out by this.

    1. Anonymous Coward
      Anonymous Coward

      but...

      Don't you know by now that Nanny Microsoft knows more about what you are trying to do than you do.

      Nanny knows best. Stop complaining and get back to work or it will be down the pit for you.

      1. Oengus

        Re: but...

        I do really hate it when the system tells me what I am trying to do and (invariably) gets it wrong and then won't let me do what I am trying to do so I have to "fudge" it. Excel is one of the worst for it. Try importing a .CSV file vs a .TXT file

      2. Doctor Syntax Silver badge

        Re: but...

        " Nanny Microsoft knows more about what you are trying to do than you do."

        It's not just Microsoft. LibreOffice Calc has similar problems - although it might be that they're trying to be bug-compatible with Excel. The real problem is with any dev who decides to try to double-guess what the user's going to do has put a foot on a slope which is far slippier than they ever imagined.

        1. m0rt

          Re: but...

          i also deal with a lot of phone numbers, and a lot or reporting that is invariably created as a CSV file.

          Excel is the bane of my life. I personally use Libreoffice, but clients use Excel. They then complain when they open their CSV reports in excel, and excel subsequently craps all over the data.

          Don't even get me started on phone numbers that are international format with a '+' at the beginning. Excel helpfully starts to see that as a formula.

          Whereas Libreoffice will always show you csv options before pulling in the sheet, Excel just thinks you are too stupid and opens it in the way it thinks you want your data.

          Clippy is dead, to come back and haunt us as a spreadsheet poltergeist.

          1. Anonymous Coward
            Anonymous Coward

            Re: but...

            This week I had someone from Sales asking me to "crack the password" on an Excel sheet. Turned out it wasn't password protected, just corrupt. Well, only Excel reported it corrupt* . LibreOffice opened it fine, and when I re-saved it Excel was happy with it too. Madness.

            * I'm not saying it wasn't corrupt - it came from our Sales department after all.

            1. Gotno iShit Wantno iShit

              Re: but...

              Y'know how there's a word, paragraph and page count? There's a Lie count too and it commonly blows a fuse on anything coming out of a sales department. You need the Government Edition software which obviously has a much larger counter.

            2. Anonymous Coward
              Anonymous Coward

              Re: but...

              for one part of my job I commonly have to deal with .CSV files, because that's the only format that a piece of software we rely on for certain important jobs is pretty much guaranteed to eb fien with.Oh, it's gotten better at trying to import.xls files, but it still can;t manage them as reliably as .csv's.

              So, I receive .xls files from user. I look at what the job entails. Hmmn... I can merge those two together before using the combined file with this other file to get the desired end result.So - save out first file as .csv - all OK. Save out second fiel as .csv - seemed to go ok. Reopen first file so I can tack it on end of secon done - 'cannot understand file format' or some such nonsense from Excel. BUT YOU CREATED IT JUST A COUPLE OF MINUTES AGO! Grrrr... - I had about fifteen months of that kind of nonsense before Excel finallyseemed to become capable of reliably reading .csv's it had created itself.

              And I'm miffed that Excel doesn't use BigEndian as the standard date format nor does it appear to do anything sensible with columns of dates in datafiles I receive, so I have to manually format columns with dates every damned time.

              This on top of teh travesty that is Outlook. And people wonder why I've gone off all things IT in this last few years.

              1. psychonaut

                american dates

                oh jesus christ i hate it. you get to the stage where you actually dont know if it has formatted some of them, all of them, none of them.

                what we need to do is get the US to use a sensible date format. small/bigger/evenbigger

                1. imanidiot Silver badge

                  Re: american dates

                  @Psychonaut, lets please just go to the international standard we all agreed on in the first place: YYYY/MM/DD HH:mm:ss

              2. psychonaut

                Re: but...

                BUT YOU CREATED IT JUST A COUPLE OF MINUTES AGO! Grrrr...

                i love the way you shouted that at the pc. i bet you did too, reperatedly.

                just work you fucking bastard!!

                whats that? you cant find the printer? its just fucking over there you stupid fucker! aaaaargh

                1. VinceH

                  Re: but...

                  "i love the way you shouted that at the pc. i bet you did too, reperatedly."

                  You can laugh, but I do shout stuff like that at mine. Usually preceded by the phrase "What the fuck are you talking about, you stupid computer?"

            3. gzuckier

              Re: but...

              There is/was a weird bug in a recent version of Excel that would prevent it opening perfectly fine spreadsheets at pseudorandom; the same spreadsheet format with varying daily data would work 9 days out of 10 but the tenth day it wouldn't open. Has something to do with the new file format, zipped xml, in some way I assume.

              I miss the older version.

      3. Anonymous Coward
        Anonymous Coward

        Re: but...

        This is why we need clippy back, he can help :)

  6. dan1980

    Among the things Excel does to gene names include changing “SEPT2”, the name of a gene thought to have a role in proper formation of cell structure, to the date “2-Sep”.

    And after the 'auto-formatting', it once again has a role in proper formation of a cell structure - as a date in an spreadsheet cell.

  7. Millennia
    Facepalm

    Doesn't happen with Google

    It is the constant thing with all M$ products that they think they know what you want better than you and force change it. This has now crept into Apple with it's bloody autocorrect changing what you want to say, sometimes without even giving you an option to reject the change, so you have to go back over and over again or risk sending a non sensical message.

    Why don't the devs behind this just piss off with their auto-corrects? By all means underline a word, or highlight a cell that may be wrong and give the user chance to correct or ignore the warning, but quit the auto-correct shit as it actually makes us waste time going back and forcing what WE want to actually say! </rant>

    1. Named coward

      Re: Doesn't happen with Google

      I use google for convenience but the automatic conversion of what I want to be strings, to dates, happens there as well, and is annoying as hell

  8. james 68

    Another issue is the inverse square law of intelligence, where the more specific intelligence in a given field is displayed by an individual, the less generalised intelligence/common sense is displayed by that individual.

    In essence, trying to get a geneticist to understand excel is like trying to get Kim Kardashian to understand humility.

    1. Tchou
      Holmes

      More appropriately...

      Or make Microsoft understand its user needs.

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon