back to article 'Rowhammer' attack flips bits in memory to root Linux

Last summer Google gathered a bunch of leet security researchers as its Project Zero team and instructed them to find unusual zero-day flaws. They've had plenty of success on the software front – but on Monday announced a hardware hack that's a real doozy. The technique, dubbed "rowhammer", rapidly writes and rewrites memory …

  1. Morrie Wyatt

    It's not the fault of Linux

    To be fair, this is an exploit of the underlying hardware, not a flaw in linux itself.

    After all, you can slow down a hard disk disk I/O by yelling at it. (the accoustic vibration affects the read heads positioning, multiplying latency.) This is just another case of rattling the hardware until somthing breaks.

    1. thames

      Re: It's not the fault of Linux

      I don't know if they changed the article since your post, but the second paragraph does say this will work with any operating system. It's a hardware level exploit, targeting a feature of the x86 architecture (page tables). The proof of concept they created used Linux, because that's the operating system the researchers use and are familiar with.

      This hardware level exploit is a form of "writing to arbitrary memory" except this one does not require a vulnerability (bug) in the operating system in order to work. There are probably other variations on this which are possible once people put some effort into it.

      This is a problem that Intel, AMD, and the PC hardware vendors need to address. It's not something that can reliably be fixed in software. At best, you can tweak some firmware settings to increase the time for the exploit to succeed, but that's not a real solution.

      For anyone with vulnerable hardware, you're pretty much out of luck. The one mitigating factor in your favour is that it may be hardware model dependent rather than an across the board exploit.

      1. Paul Shirley

        @thames

        It probably can be fixed in firmware by adding the right guard pages around critical structures but knowing where those pages need to be will require ram manufacturers input, component databases in each os and probably won't happen.

        What's more likely is they'll need to reduce power saving tweaks (like under volting and reduced refresh rates) and everyone gets lower battery life on their laptops. No idea why they think more than a tiny minority of desktop machine are using ECC.

        1. Metrognome

          Re: @thames

          Agreed on the ECC point.

          I have yet to see ECC RAM fitted to any desktop from the humble Dell Minis all the way to powerful CAD workstations, enthusiast LAN party gaming rigs and everything in between.

          In fact, outside of Xeon CPU's there's almost nothing for the desktop. (There's a few for lappies and embedded but not for desktop).

          Haswell and Broadwell have only just started offering ECC support.

          1. Gordan

            Re: @thames

            "In fact, outside of Xeon CPU's there's almost nothing for the desktop. (There's a few for lappies and embedded but not for desktop)."

            FYI, most AMD chipsets still support ECC, whether it is officially listed on the motherboard spec or not.

          2. Phil O'Sophical Silver badge

            Re: @thames

            I have yet to see ECC RAM fitted to any desktop from the humble Dell Minis

            My old Dell Precision 390, bought 8 years ago, has ECC RAM. As far as I remember it made no significant difference to the price.

            1. Anonymous Coward
              Anonymous Coward

              Re: @thames

              Having fabricated numerous machines from scratch, my experience of buying ECC ram has been the opposite. Maybe you got lucky with a deal at the time you bought your Dell, or more likely a period a long time back when RAM became ridiculously cheap for a while.

              Every time I've spec'd a new machine, the ECC option has always been significantly more expensive than non-ECC. Without a use case to justify the additional expense, I've always gone with the cheaper non-ECC option, and spent the saving elsewhere in the system such as a faster, more reliable, hard drive.

              Am going to grab a copy of the POC code, to see if the Corsair ram in my Mac Pro workstation is vulnerable. Although its a lot higher spec to your usual budget DDR boards, I suspect its probably still vulnerable.

            2. Metrognome

              Re: @Phil-O-Sophical

              What you described there, was quite a powerhouse for the time.

              Quadro Nvidias, SAS interfaces, 8 GB RAMs and the first Core2Quads and all that circa 2006!

              1. Phil O'Sophical Silver badge

                Re: @Phil-O-Sophical

                Well, mine had 1GB RAM and a 2.13GHz Core2Duo, but it's got more RAM now, and extra disks. Still runs fine with Windows (XP, it was that or Vista) but more often Solaris or Debian. It was bought as a combined home + work-from-home system, so I did go for a decent one. I suppose more workstation than home desktop.

            3. Michael Wojcik Silver badge

              Re: @thames

              My old Dell Precision 390, bought 8 years ago, has ECC RAM.

              ECC RAM was not uncommon in UNIX workstations circa 1990, for that matter. Commoditization of desktop systems largely pushed it out.

      2. Michael Wojcik Silver badge

        Re: It's not the fault of Linux

        For anyone with vulnerable hardware, you're pretty much out of luck.

        Yes, and it's broader than this particular attack - this attack is just a proof-of-concept based on the underlying vulnerability, which has various manifestations. The original paper, which was published last June, has more details.

        The paper also mentions, in a footnote:

        The industry has been aware of this problem since at least 2012, which is when a number of patent applications were filed by Intel regarding the problem of “row hammer” [6, 7, 8, 9, 23, 24]. Our paper was under review when the earliest of these patents was released to the public.

        It seems to me quite unlikely this isn't being used by APT teams run by various governments, on those occasions where they can't find an easier way to elevate.

        A related note: Some may remember a successful attack on the Java type system some years ago, which involved an application that filled available memory with objects of a particular type, then stressing the RAM to cause (with high probability) a bit flip in the type label for at least one object, which the application then exploited to escape the type protections. The authors of that paper used a heat lamp to cause random faults in the RAM chips of their target system; Row Hammer shows it's possible to do that in software.

    2. Gordan

      Re: It's not the fault of Linux

      While this is an exploit, it shows that modern hardware is actually unstable out of the box even without overclocking or other tuning that reduces the margins for error. Anything that causes memory corruption on hardware level is, IMO, a hardware fault, and therefore grounds for returning the hardware to the retailer as unfit for purpose.

      Given the descriptions of the methods, this is also mostly a RAM fabrication issue, rather than being largely related to the rest of the machine, as the leakage happens directly within the RAM chips. So using better RAM from a different manufacturer would almost certainly reduce the exposure to this bug, much more so than using the same RAM in a different laptop.

      But in any case, ECC is the way forward - if only it was more commonly available in laptop and desktop grade chipsets.

      1. phuzz Silver badge
        Facepalm

        Re: It's not the fault of Linux

        "Anything that causes memory corruption on hardware level is, IMO, a hardware fault, and therefore grounds for returning the hardware to the retailer as unfit for purpose."

        All electrical and mechanical devices will fail if you take them far enough out of their normal operating regime, this is no different.

        This is the equivalent of taking an ordinary road car, and driving it constantly up and down a road full of potholes, until parts fall off (the clever bit here is they've tuned the potholes just right to make the bit they want fall off). It's just not something that's going to happen in normal daily operation. If you have to drive over bumpy roads, you buy a vehicle with better suspension, if you're hammering your DDR then you buy ECC.

        1. DropBear

          Re: It's not the fault of Linux

          All electrical and mechanical devices will fail if you take them far enough out of their normal operating regime, this is no different.

          Bollocks. The point is that this IS WITHIN their normal operating regime. You should be able to flip bits at full tilt 24/7/365 without any of this shit happening, as long as you don't actually overclock that RAM - and I have not seen overclocking being mentioned.

    3. Robert E A Harvey

      Re: underlying hardware

      I used to design embedded systems between the 70s and 90s, and we used to use static ram exclusively because of effects like this - accidental, not malicious.

      If you are involved in volume design, then it is a lot easier, but in small volumes a cockup in the layout or timing of dram was very costly, in time and money. So we didn't use it at all.

  2. Scott Earle
    FAIL

    “On the software from” ?

    Have El Reg’s proofreaders all gone on holiday or something?

    1. Captain DaFt

      Re: “On the software from” ?

      "Have El Reg’s proofreaders all gone on holiday or something?"

      <PFFFFFT> Uh... yeah, might as well ask about their unicorn while you're at it.

      1. Scott Earle
        WTF?

        Re: “On the software from” ?

        Wait … they fired the unicorn?!?

        1. TRT Silver badge

          Re: “On the software from” ?

          No, it left of its own accord when there were no more virgins in the office.

          1. Alfred
            Coat

            Re: “On the software from” ?

            No virgins in a web-based technology magazine's office? A likely story.

    2. diodesign (Written by Reg staff) Silver badge

      Re: “On the software from” ?

      It's been fixed. Click on some ads and we'll hire more proofreaders :-P

      There's always corrections@thereg if you want to point out typos. We don't have time to read every comment, so those emails are appreciated.

      C.

      1. Anonymous Coward
        Anonymous Coward

        Re: “On the software from” ?

        Click on some ads and we'll hire more proofreaders :-P

        Ooh, very brave! IME saying (our even hinting at) that is just about the only thing that upsets Google enough for them to send you nasty emails threatening to cut off your ads. :-)

        Of course given El Reg's principled antipathetic editorial attitude to Google they may well use alternative ad networks that are less sensitive about click fraud...

      2. Dan 55 Silver badge
        Trollface

        Re: “On the software from” ?

        If we promise to click on some ads will you get the people involved in the website makeover to finish it off?

        1. Paul Kinsler

          Re: will you get the people involved in the website makeover to finish it off?

          Two bullets or three? :-)

      3. auburnman

        Re: “On the software from” ?

        What's your ratio of corrections mailed to corrections@thereg vs. posted in the forums? I really think it would be worth adding a button that lets you flag your own post as correcting the article. Limit it to badge holders and take it off anyone who abuses it.

  3. Anonymous Coward
    Anonymous Coward

    Desktops don't have ECC

    At least not Intel's. The reason it effects laptops moreso than desktops is that laptops use low power DRAM built with smaller processes. The smaller the process, the greater the likelihood rowhammer will work. ECC will put a stop to it (correcting single bit errors, and taking a machine check for uncorrectable double bit errors)

    Pretty much certain that all smartphones would be vulnerable to this attack, as they use low power DRAM without ECC.

    Since Apple designs their own SoC and therefore their own memory controller and the iPhone tends to ship with / require less RAM than high end Android phones, they could fix this by adopting ECC. Without source code access it might be a bit harder to develop an exploit for this against iOS, but it isn't impossible.

    1. thames

      Re: Desktops don't have ECC

      This is making use of an x86 hardware feature. We don't know if an equivalent exploit is possible with ARM. If it is, then any SoC manufacturer could add ECC support, if it doesn't have it already, and any phone manufacturer can add ECC.

      The real problem is going to be all the existing phones out there. If you bought a cheap phone, then it's not a big deal to throw it away and buy a new one. The people who bought high priced phones though will be completely stuffed. There's no such thing as a "patch" to fix this. They'll be stuck with vulnerable phones bought on multi-year contracts and whose resale value has sudden fallen to zero.

      So if you want to see which manufacturers will be most affected, look at the ones who sell the most expensive phones.

      1. joeldillon

        Re: Desktops don't have ECC

        If the 'hardware feature' is page tables then modern Android and iOS phones have those on ARM. They're pretty fundamental to any modern protected-memory OS.

      2. Gordan

        Re: Desktops don't have ECC

        "This is making use of an x86 hardware feature."

        It's not an x86 specific feature per se. The testing code uses an x86 assembly instruction that bypasses CPU caches for reads. It is quite likely that similar equivalents exist on many other if not most CPU architectures.

        1. thames

          Re: Desktops don't have ECC

          Gordan - "It is quite likely that similar equivalents exist on many other if not most CPU architectures."

          Perhaps, but as I said, this particular exploit is x86 specific. Nobody has demonstrated whether an ARM (or MIPS) equivalent is possible yet. Given though that this thread started off with someone talking about how he felt that Apple phones were going to be better than Android phones when it comes to dealing with this issue, I think we need to step back a bit and admit we don't know if it is a problem for ARM yet.

          1. Michael Wojcik Silver badge

            Re: Desktops don't have ECC

            Nobody has demonstrated whether an ARM (or MIPS) equivalent is possible yet.

            Perhaps not with those specific CPU families, but the general thrust of your argument is wrong. Read the original paper by Kim et al (DOI 10.1145/2678373.2665726). Part of their study involved testing a range of DRAM chips using an FPGA-based system - no x86 in sight.

            The Row Hammer vulnerability is a flaw in DRAM implementations. The particular attack developed at Google Project Zero is x86-specific, but the vulnerability is not, and there's no reason to believe it can't be extended to most systems that use a different CPU but the same DRAM.

    2. diodesign (Written by Reg staff) Silver badge

      Re: Desktops don't have ECC

      "laptops use low power DRAM"

      I've tossed that into the story. FWIW Intel does do desktop mobos with ECC support.

      C.

    3. Anonymous Coward
      Anonymous Coward

      Oh dear. Wrong in lots of ways.

      Not least affect/effect but to state 'Desktops don't have ECC, At least not Intel's'

      (I'm forgiving you for the apostrophe you used in 'Intel's' as it's not entirely incorrect but I suspect you didn't mean it that way)

      http://www.intel.com/support/motherboards/desktop/sb/cs-009023.htm

      ECC is already available on the chipsets used in some Android devices and has been since 2010 (possibly earlier)

      Apple may 'design their own' SoC but it's a rehash of someone else's IP (ARM license a lot of IP to Apple) so yeah, it's possible that it amy contain ECC already but to include it is considerably more difficult than picking a chip and OS that can cope with ECC by design.

      1. Anonymous Coward
        Anonymous Coward

        Re: Oh dear. Wrong in lots of ways.

        The only ARM IP in Apple's SoCs is the ISA itself (i.e. written documentation in English) the cores are entirely Apple's design since the A6. If you think Apple is using more than that, you'll have to explain how they managed to complete the design of their first 64 bit SoC before ARM did, and well before ARM released the RTL to their partners.

        I didn't mean to imply that Apple is any closer to Android today in being able to defeat this sort of issue, assuming someone proves it is exploitable on ARM (I see no reason why it wouldn't be as it supports uncached reads) I simply meant that if Apple decides they want to do ECC in future iPhones, they can do it a lot more quickly than Android OEMs because they control the design of their SoC. Aside from Samsung, Android OEMs rely on others to design their SoCs - and even Samsung still uses ARM designed cores, they have yet to release a SoC using a core they designed themselves.

        I wouldn't worry about phones having value "dropped to zero" due to such an exploit. You may not be able to patch a hardware flaw via software, but you can certainly prevent the code sequence that exploits that hardware flaw from running on your platform if it requires signed code.

  4. Nate Amsden

    I wonder

    How well something like HP's Advanced ECC or IBM's Chipkill which go well beyond basic ECC would hold up to this sort of attack. Myself I don't deploy any serious systems without this technology, as the systems tend to have dozens to hundreds of gigs of ram and ECC alone just doesn't cut it in my past experience anyway.

    Last I looked I could not find good info on IBM's ChipKill but HP has good info here on Advanced ECC:

    ftp://ftp.hp.com/pub/c-products/servers/options/c00256943.pdf

    some text from the pdf

    "To improve memory protection beyond standard ECC, HP introduced Advanced ECC technology in 1996. HP and most other server manufacturers continue to use this solution in industry-standard products. Advanced ECC can correct a multi-bit error that occurs within one DRAM chip; thus, it can correct a complete DRAM chip failure. In Advanced ECC with 4-bit (x4) memory devices, each chip contributes four bits of data to the data word. The four bits from each chip are distributed across four ECC devices (one bit per ECC device), so that an error in one chip could produce up to four separate single-bit errors.

    Since each ECC device can correct single-bit errors, Advanced ECC can actually correct a multi-bit error that occurs within one DRAM chip. As a result, Advanced ECC provides device failure protection

    Although Advanced ECC provides failure protection, it can reliably correct multi-bit errors only when they occur within a single DRAM chip."

  5. Hackbert

    Just checked the calendar. No, it's not April 1st.

  6. Conundrum1885

    Re. RowHammer

    Interestingly I suggested quite a while back something along these lines to implement a neural net using Flash memory, and actually have some schematics here for an AI that uses this exact technique to get nearly-quantum level speedup effects using a bootable pendrive that runs DSL and then uses the leakage between the memory cells (has to map out chips and look for correlations but that is doable) to run the NN.

    It should work on any old x86 laptop with 2GB RAM but obviously the faster CPUs are more efficient and for something like this a custom BIOS that overclocks the RAM chips just enough would be ideal with a thermal sensor for feedback to keep the chips in the desired temperature range.

    Not exactly on topic but still interesting, as chips are getting more and more dense it is entirely possible that something as simple as a RaspPi2 (1GB RAM) if kept at just the right low temperature in a strong magnetic field could implement a limited subset of the Turing test..

    Also relevant, if the memory manufacturers would get back to me and send me the NDA already I could make this work on a 64GB microSD as the densities on these are many times greater and with billions of potential artificial neurons between the adjacent cells that are currently ignored due to wear leveling.

    Had some success flash X-raying defective 32GB chips to bring them back to life and noticed effects suggesting this could work but ran into problems replicating the effect as the power supply conked out during testing.

    Anyone interested?

    1. Anonymous Coward
      Anonymous Coward

      Re: Re. RowHammer

      I'm interested. Now put on this lovely white jacket and we'll fasten the straps at the back for you.

    2. Jimmy2Cows Silver badge

      Re: Re. RowHammer

      Surely it'd be easier to replace the power supply than battling manufacturers for an NDA. For a few quid you could continue your glorious endeavour, rather than blaming your inability to proceed on lack of an NDA...

      1. Conundrum1885

        Re: Re. RowHammer

        As in the NDA to get the memory controller specs for microSD cards.

        Its pretty hard to reprogram these AFAICT but some of the older cards did have points under a removable label where the pins could be accessed.

    3. mevets

      Re: Re. RowHammer

      Very interested in the travesty generator you used to generate your comment.

      How much did you have to seed it with to get as close to this topic as you did?

  7. Anonymous Coward
    Meh

    "While this was a high cracking rate, the team reported almost no success on desktop machines. This is possibly because those computers use newer RAM with error-correcting memory (ECC)"

    Most desktops don't use ECC due to the fact it would of been about double the price when new, workstations may of shipped with them, but your run of the mill pc won't have done. Also as for "newer" RAM, if your comparing a 2010 pc with a 2015 pc of course it will.

    My guess is the fact one is likely to be DIMM and the other a SODIMM, but I'm just guessing.

    1. Bronek Kozicki

      I guess the difference is DRAM refresh rate. High refresh rate means higher power utilisation to keep RAM powered up. This is insignificant for a desktop PC with AC power attached, but significant for a laptop.

    2. Paul Crawford Silver badge

      Double the cost?

      Really? ECC memory costs more, but typically 20% and the RAM is often only a fraction of the machine cost.

      True, proper servers cost a lot more than desktops, but there are other factors in that cost such as dual PSU options, easier to change fans, hot swappable HDD, etc, (and probably a bit of profiteering as well).

      1. Paul Shirley

        Re: Double the cost?

        @paul Crawford

        We live in a world's where Lenovo were prepared to ship malware on PCs because margins are too slim on the hardware. 20% on the ram really is a significant overhead for most of the devices shipped.

  8. Dan 55 Silver badge
    Facepalm

    Meanwhile, in a parallel universe...

    Palo Alto, CA...

    Newly incorporated security outfit Project Zero claims it has found a serious design flaw in the HAL Laboratories' HAL 9000 computer. The claim that if the user were to gain access to the the interior of the computer and randomly remove memory modules from memory arrays, they could force a system shutdown even they don't have permission to do so.

    A spokesman at HAL Plant in Urbana, Illinois said, "No shit, Sherlock".

  9. RyokuMas
    Coat

    Start the clock...

    Hardware manufacturers, you have 90 days to fix this, starting NOW!

  10. Tromos

    Brands

    Is anybody looking into differences between RAM chip manufacturers as regards to susceptibility to this attack? I'm sure some brands will come out better than others as this essentially relies on a fundamental design flaw in the chip.

    1. Anonymous Coward
      Anonymous Coward

      Re: Brands

      Sounds like a marketing opportunity to me.

  11. naive

    It is just an elevation

    Although unsettling, it is "just" an elevation. Somebody already needs access on user level to the system before this hack can be deployed.

    If I understood well, the solution would be to allocate the sensitive CPU privilege bits further away from memory regions accessible by the user ?.

    1. Charles 9

      Re: It is just an elevation

      Doesn't this exploit bypass segregation, allowing full access to all memory?

      1. Bronek Kozicki

        Re: It is just an elevation

        No. The goal of the exploit is to bypass process segregation, but the means is to repeatedly alter memory in physical neighbourhood of the memory region describing virtual memory address space of the attackers process own memory. Since the specifics of this memory region is defined by the CPU hardware, it might, or might not, be possible to move it away. It's a good question actually.

        EDIT: imagine you are holding a key to a cage with one hand, and the attacker who is sitting in that cage (locked, i.e. virtual address space of a process) is repeatedly asking you to hand him something with the other hand (i.e. you are serving memory requests). The chances are that after Nth attempt (a very high number) you will be so distracted, that you hand him the key instead, with which the attacker is able to open his own cage. This being rather poor analogy, but the point stands that attack is based on a possible side effect (you being distracted, i.e. missing DRAM refresh) of a flaw in certain implementation of DRAM.

        1. Paul Shirley

          Re: It is just an elevation

          Surely the point is with write access to even a single arbitrary page table entrie you have unrestricted access to *physical ram*, then you could map any other processes ram into your address space and modify privilege bits at will.

  12. Anonymous Coward
    Anonymous Coward

    Hmmm

    The fact that a particular piece of hardware is supported does not mean it is widely implemented. And the simple fact is that virtually NO consumer desktop computers come with ECC RAM. As correctly pointed it, it is only when you get into workstation/server class business computers that ECC is used to any great degree. Again the 100's of Millions of computers sitting in homes/offices around the globe are wide open to this attack.

    And what about the growing use of SSD's? Surely they are vulnerable to the same type of attack. Essentially this type of security breach will drive the home computer to extinction. Heck, if you can't shut down or effectively censor the internet, shut down/control the computers themselves.

    1. Michael Wojcik Silver badge

      And what about the growing use of SSD's? Surely they are vulnerable to the same type of attack.

      No. NAND Flash and DRAM have very different hardware implementations. And while SSDs, just like conventional drives, generally have RAM caches, those aren't directly accessible to unprivileged programs, so there's no obvious way to stress them with software executing in an unprivileged process.

  13. Anonymous Coward
    Anonymous Coward

    I wonder how many programmers...

    I wonder how many programmers have read this and are now thinking back to an old project with an unexplained crashing bug and going: "Hmmm".

    Kudos to the guys that thought this one up ... it would never have occurred to me*.

    [*] Not that that means a lot ;-)

    1. Michael Wojcik Silver badge

      Re: I wonder how many programmers...

      Kudos to the guys that thought this one up ... it would never have occurred to me*.

      I'm sure it never occurred to many people. Software developers generally operate on the assumption that the hardware is reliable, in the sense that it either does what they ask, or it provides an error response through a documented mechanism. You have to make some assumptions in order to get anything done, and for most software that's a reasonable one. (Even for applications where it isn't, often the problem is addressed with more hardware - fault-tolerant systems and the like - or externalized to watchdog programs.)

      Security researchers, however, are generally conditioned to regard hardware as a source of vulnerabilities, whether those involve bypassing software controls (alternate boot methods, etc), leaking information through side channels (TEMPEST, timing and power-consumption attacks), increasing the attack surface (tapping network cables), or simply failing normally in an unsafe manner.

      I've noted upthread that the Kim paper says this problem has been acknowledged in the industry since at least 2012, and that physical attacks on RAM go back for many years. This is one of those cases where many people were pretty sure there was a problem, and gradually various folks got around to poking at it. Which is not to say this isn't great research - the experiments described in the Kim paper are well-done and there's nothing like a nasty PoC to bring attention to a problem - but it's not like it came out of nowhere.

  14. petef

    ECC is not enough

    ECC guards against random errors. With more work rowhammer might maliciously set bits that had a valid ECC.

    1. Peter Gathercole Silver badge

      Re: ECC is not enough

      If you could predictably determine which bits would be flipped, you may be able to do this, but most ECC memory has multiple bits for error correction per 64 or 128 bit word, and use something a bit more sophisticated than plain parity.. The ECC I've seen normally allows single or double bit corruption per word to be fixed, with multi-bit corruption detected. You would have to be able to flip the bits in a pattern where the ECC bits would not flag an error, and in order to do this, you would need to know how the ECC bits are calculated, which is probably memory vendor specific.

      I don't know how Linux handles uncorrectable ECC errors, but other OSs normally take exception, and depending on what was running at the time the ECC error occurred would either kill the process, or if it were in Kernel mode, panic the kernel. I've even seen this take out an entire system running VMs in a type 1 hypervisor, if the error occurs while executing hypervisor code.

      As a result, if you are using ECC memory, you would have to get a correct pattern every time or else things will happen that will be noticed.

      1. Bronek Kozicki

        Re: ECC is not enough

        In Linux it's possible (and often used) to force system panic on uncorrectable memory error (called UE). EDAC module option for this is "edac_core.edac_mc_panic_on_ue=1" (copied from my own /proc/cmdline)

        In order to perform such an attack against ECC memory, an attacker would have to flip multiple specific bits in ECC-guarded module at the same time and this, I think, can only be performed by a Maxwell demon - not by a program written by a human. In case when only single bit is flipped, it will be transparently corrected, and in case if "wrong" combination of multiple bits are flipped at the same time (which in itself seems nearly impossible) the system will encounter UE. Which will fail memory request - I guess that's SIGBUS to crash the process or, if option above is set, instant system reset. Admittedly a program may ignore SIGBUS (and go to exploit the system instead), but it may not prevent system panic.

        So yeah, unless you know a well trained Maxwell demon, I would say that ECC is enough.

      2. Michael Wojcik Silver badge

        Re: ECC is not enough

        most ECC memory has multiple bits for error correction per 64 or 128 bit word, and use something a bit more sophisticated than plain parity

        Hamming Codes, generally, which are based on elementary group theory. The basic idea is that all the valid bit sequences are separated by one another, in terms of Hamming distance, so any given 1-bit error gives you a sequence that's closer to the correct sequence than to any valid alternative.

        The discussion in Wikipedia is OK, but a good undergrad textbook (eg Gersting's Mathematical Structures for Computer Science) will do a better job working through the theory.

        Kim et al briefly address the question of ECC in the paper:

        While most words have just a single victim, there are also some words with multiple victims. This has an important consequence for error-correction codes (ECC). For example, SECDED (single error-correction, double error-detection) can correct only a single-bit error within a 64-bit word. If a word contains two victims, however, SECDED cannot correct the resulting double-bit error. And for three or more victims, SECDED cannot even detect the multi-bit error, leading to silent data corruption. Therefore, we conclude that SECDED is not failsafe against disturbance errors.

        Of course, all security mechanisms are only ever ways to increase an attacker's work factor under a particular threat model. None are truly "failsafe", except asymptotically (measure X reduces the probability of successful attack Y to below threshold ε). In the worst case documented in the paper, the error rate for more than 2 bits was ~2.3e-4 that of the 1- or 2-bit error rate, or more than 4000 times less likely. That's a pretty good improvement in work factor even if we don't account for the likelihood that the system will detect a slew of 1- and 2-bit errors first. So in practice SECDED ECC appears to make this attack infeasible.

        1. Peter Gathercole Silver badge

          Re: ECC is not enough

          I did not mention hamming codes, because I learned about it over 30 years ago (it was one of the first things taught on my CS course at uni.), and I was not certain the term was in use still, and I could not be arsed to spend any time reading up on the specific techniques used nowadays while writing the comment.

    2. Solmyr ibn Wali Barad

      Re: ECC is not enough

      Sure, all you have to do is to set the whole word (comprised of bits that are located in several physical DRAM chips) and its checksum at once - during the same RAS/CAS cycle. That way it would look like a normal write.

      Good luck.

  15. Peter Gathercole Silver badge

    Probably difficult to exploit in real systems

    I was initially sceptical about this because I could not see how you could predict which other memory pages would be affected by a particular rowhammer, but the report adds much detail to the issue. If you are really interested, give it a read,

    There is quite a lot of "Hopefully this..." type of statement, so the authors acknowledge a degree of luck in triggering the exploit.

    The report also acknowledges that the exploit will work best on a system that is fairly idle, as it requires the process to essentially fill all of the available memory first with known data to identify related memory pages, and then with page table entries created using mmap().

    In a machine with other workload, the likelihood of being able to control enough memory to allow this to be reliable is seriously reduced (it requires a page that is identified, then freed to be immediately used by the system for a page table page, something that could not be guaranteed on a busier system, especially as the page freed with madvise() + MADV_DONTNEED would probably generate a context switch).

    All the time you are rowhammering essentially unpredictable pages (part of the exploit is to hammer memory lines until you can find one that affects a memory page you control), you could also be creating other problems on the system including unpredictably modifying running code and data structures.

    The more you have to do this, the more likely it is that you will trigger another unpredictable action which would attract attention.

    There is also tacit acknowledgement that this attack in it's published form relies on certain features of the Intel x86-64 architecture (specific instructions to allow rapid toggling of memory bypassing the cache like CLFLUSH), although it does suggest ways of triggering the bit flip on other architectures.

    Don't get me wrong. It's a clear issue, and one that is exacerbated by the fact that Linux, being open, is less difficult to craft an exploit for because the internals are better understood, but I believe that exploits in the wild are likely to be few.

  16. Nigel 11

    Hardware, not software ...

    If the hardware doesn't work perfectly, there's (usually) nothing that the operating system can do about it, other than (sometimes) detecting the problem and refusing to boot further with a hopefully informative message.

    I hope that the test for this problem is added to memtest86 and similar RAM testers in the near future.

    Then it gets interesting. I'm assuming this is a RAM module fault, not a generic motherboard / chipset fault. If so, I wonder whether it will show that some? all? notebook manufacturers are shipping the cheapest crappiest DRAM that they can lay their hands on, along with the preinstalled malware? Now, how they will handle the hopefully large number of people who will run a memory diagnostic and return their faulty systems to be repaired?

    Alternatively, will it show that all makes of DRAM are equally crap, because the industry has been pushing the performance envelope too far and fast without properly testing the worst edge and corner cases? (Crucial/Micron, Samsung, here's hoping you will gain the right sort of publicity out of this).

    The world really does need ECC to be available for professional grade notebook and desktop systems. Intel, by all means charge a few dollars more for ECC-supporting chipsets, but it was really stupid to decide that nobody using a notebook or desktop system needed ECC.

  17. phil dude
    Pirate

    genetic algorithms...

    The first thought that comes to mind is "has this been exploited allready?".

    The second is, if I had to write a program to exploit this it would be using a GA.

    That might make it very specific to the physical parameters of the target system, but this might be sufficiently valuable e.g disrupting giant uranium washing machines in the middle east

    P.

  18. Anonymous Coward
    Anonymous Coward

    I want to learn Linux but ...

    * The choices are daunting.... Is there a small but fully functional version of Linux that will sit on a 8GB / 16GB USB-memory-key and boot without a working hard drive? If yes, any download links?

    * I have a spare Dell XPS that blew up after Vista, but the hard drive is fried... I've never forgiven M$ or Dell for the Vista lies, so it'd fitting that this laptop should be the first one with Linux!

    * It doesn't need more. I will use the box to study languages, its just needs to have a VLC like player and a PDF viewer... But if it has Wine too obviously that'd be great. Cheers!

    1. thames

      Re: I want to learn Linux but ...

      Here's how to install Ubuntu from USB onto a hard drive. You'll need at least 2GB free on the USB key.

      http://www.ubuntu.com/download/desktop/create-a-usb-stick-on-windows

      http://www.ubuntu.com/download/desktop/install-ubuntu-desktop

      However, if you are trying to install to a USB key rather than a hard drive, while I'm pretty sure that is possible, I wouldn't recommend it. The USB key will be very slow and I don't think it will have a long life if you are continually writing to it.

      Here's how to do it though. There are a lot of different options. If it were me though, I would look at replacing the hard drive and then installing Linux on that.

      http://www.pendrivelinux.com/

      1. Matt Piechota

        Re: I want to learn Linux but ...

        However, if you are trying to install to a USB key rather than a hard drive, while I'm pretty sure that is possible, I wouldn't recommend it. The USB key will be very slow and I don't think it will have a long life if you are continually writing to it.

        Just FYI

        - It's entirely possible. I've done it several times with USB disk and sticks using the normal install process, and as others point out distros often have a USB-writer tool.

        - If you do try the normal install process, be warned that some distros (LinuxMint, I'm looking at you) will overwrite the MBR on /dev/sda (which will likely be your internal disk) with GRUB which might hose up your installed system (especially if you have some sort of software encryption FDE). Pull the disk for the install or install on another system if you can't remove the internal disk.

        - Sticks aren't as fast as disk over USB, but USB2 and 3 are fine for basic use even with software encryption. I surrently use a small Sandisk USB3 "stick" (athough it's not much bigger than the USB connector) to boot my corporate laptop into Linux for "home use". I haven't used it extensively, but it's working fine so far. Maybe it'll wear out some day, but I'll just buy another 32GB USB3 stick for $20US and restore from backups.

      2. Conundrum1885

        Re: I want to learn Linux but ...

        I use DSL a lot now, because it has very small overheads and will fit on a cheap card supplied with a £5 MP3 player fake or otherwise.

        As it only ever uses 128MB of the available memory a larger card is a waste and the Poundland USBs are also bootable as an added bonus.

        You can now get an updated DSL (Damn Small Linux) which is still under 100MB but has Firefox built in and all the usual driver improvements as well as RAM test and CPU ID.

      3. Anonymous Coward
        Anonymous Coward

        Re: I want to learn Linux but ...

        Cheers for the replies folks!

    2. Nigel 11

      Re: I want to learn Linux but ...

      Off-topic ... but there are dozens. Most are described as "Live CDs or Live DVDs" and will boot off a CD or DVD, but with most it's described how to copy or install into a memory stick instead.

      Fedora has a liveusb-creator app that you can run on Windoze to create a bootable Fedora Workstation USB stick.

      Linux booted off a CD or DVD is surprisingly usable. An application may be slow to start the first time (while the DVD drive spins up and seeks slowly) but will usually then stay cached in RAM for repeated use.

  19. xyzw

    It looks like a version of Gehot's hack on PS3

    [back in 2010 I think]

  20. Henry Wertz 1 Gold badge

    "It's a hardware level exploit, targeting a feature of the x86 architecture (page tables). "

    Well, yes, they did rewrite the page table. But ultimately the cause of the exploit was finding that laptop DDR3 controllers are unstable. In no way should repetitively writing memory addresses cause the memory "one row over" to become corrupted, but this is what has happened here. It makes me glad my computers are antiquated and still have DDR2 8-)

    ECC might mask this, but my guess is simply the desktops either keep the RAM cooler, run slightly higher voltage to the RAM (perhaps the laptops lower it to save power)? or simply there are some flaws in laptop DRAM controllers that the desktop ones do not have. Anyway, what a mess. Kudos to Google for finding such a creative zero-day.

    1. Michael Wojcik Silver badge

      Kudos to Google for finding such a creative zero-day.

      Technically, neither Google nor the attendees of Google Project Zero found the vulnerability. See my other posts. They wrote a PoC exploit for a known problem.

  21. Crazy Operations Guy

    can be killed with software

    Just request 2*row_width more memory than needed and then block off a row before and after the page tables. Maybe throw some canary values in those blank rows to detect an attack, maybe set the NX bit on those rows so the OS becomes aware... On a 512-bit wide memory system, it would only take 128 Bytes of memory. We live in an era of phones shipping with 4 GB of RAM, I don't think anyone would miss a few KB here and there if it improves security.

  22. Conundrum1885

    Could this be why

    My X520 mysteriously started crashing Itunes and otherwise misbehaving, yet upon changing the 2*2GB DDR3 RAM despite it passing every test I threw at it for identical capacity though marginally faster chips now works fine.

    Symptoms were very strange indeed such as certain programs notably Winhex crashing at random despite changing the drive and doing a fresh install of Win7.

    Had to resort to using an old clunker (5230) to clone the drive which worked fine.

    Same machine also had a 500GB drive fail with odd symptoms but when zerofilled it worked fine on other systems though threw a "Disk Read Error Has Occurred " on the 520.

    It does occasionally crash Explorer when you plug in a particularly power hungry USB device but this could just be dodgy USB ports.

  23. Anonymous Coward
    Anonymous Coward

    Re. hard disk fail

    I've heard anecdotal reports of lots of drives failing within days or hours despite SMART never reporting any problem, including on industrial systems subject to vibrations or loud noises.

    The current working hypothesis is that writing certain combinations of patterns to the drive is over time weakening the firmware storage zones in a similar way to how ZIP disks would lose their "Z" tracks on misaligned drives eventually causing a catastrophic corruption event (CCU)

    The high density drives seem to be somewhat less sensitive to this but it does occur to me that it would be worth testing for this vuln on candidate drives in a RAID array to make sure it can't happen with say elevated temperature or acoustic resonances from lots of closely spaced drives.

  24. Memoryguy

    The fact that it is possible to trigger bit flips by row-hammering shows again that DRAMs are not perfect. The databits in DRAM memory are stored charge-based and this charge can only be kept for a few milliseconds, then needs a refresh by the CPU.

    It is well-known that DRAM memory is sensitive to any kind of disturbance and row-hammering is just one of them. Also antennas, radiation or heat can cause bit flips. Many bit-flips happen without being able to find a root cause. The older memory chips get, the more they degredate and the more sensitive they become. It's interesting to see that brandnew devices work flawlessly for a while and then have the first "hiccups".

    Some people say that shorter refresh-cycles would help, but this is only partially true. Many bit-flips are not related to the data-retention time of the memory cells, but come from external disturbances. In addition, more-frequent refreshing would result in higher power consumption and a strong performance drop (the DRAM can not be read or written during a refresh-cycle).

    Any electronic device we own, if it is a smartphone, a WiFi router, navigation system, settop box or our PC and laptop, sometimes has a malfunction or crash and needs to be rebooted. We got used to that and don't really think about why it happened.

    Now look at an ECC protected system like a server and you find it never crashes although it stays switched on for months and years. Why is it so much more reliable although the software running on it is fairly similar to that running on a PC or laptop? It is the ECC error correction that covers bit-flips in the DRAM.

    And yes, Wifi Routers run their software from a DRAM, also most other electronics use DRAMs. Even a HDD or a SSD drive uses a DRAM as a cache or write-buffer. Bit-flips modify the data or the software code and result in random fails or even crashes.

    But how can we add ECC to systems that normally do not support ECC?

    The ECC error correction is normally performed by the CPU which generates additional parity-bits for all data it writes to the DRAM. Upon reading the data and the parity bits from the DRAM, the CPU performs the ECC algorithm and can detect&correct bit-flips.

    Not only the CPU needs to be ECC-capable for that, but also the DRAM memory bus needs to be wider. Instead of standard 64 bits width, 8 additional parity bits are required, so the memory must be 72 bits wide. Talking in "DRAM chips", that normally means to have minimum 9 DRAM Chips! Modules with 9 DRAM chips (or 18 or 36) might fit into a PC or laptop, but definitely will not fit on a HDD, SSD, router or other small form factor electronics.

    But now there is something new: DRAM memory chips with on-chip integrated ECC error correction (www.intelligentmemory.com/ECC-DRAM/)

    This could solve the problems as no more ECC-capable CPUs are required to use them. And even if some products (HDD/SSD/routers, etc) use just one single DRAM chip on them, they could take such an ECC DRAM and have the required protection.

    The ECC DRAM chips are fully compatible to conventional DRAMs, so they fit everywhere. They can even be put onto a standard 64 bit wide laptop or PC memory module PCB and would result in self-correcting Non-ECC modules. Each chip on the module will verify and correct its output-data by the integrated ECC function.

  25. Anonymous Coward
    Anonymous Coward

    It's the chip, not the system

    Row hammer is not random, nor single bit upset, so ECC does not handle it. That's why it can be exploited and why this is a big deal. Row hammer was well understood in the 1980's because DRAM were susceptible to it back then. It has nothing to do with process geometry, it is quite simple that they cut the corner too close on the bitcell/array design. In the '80s the CPU had direct access to the DRAM (no cache), but as caches were introduced the statistical likelihood of row hammer went away. In a normal system it is not an issue, but if you bypass the cache, or deliberately write code that can hammer rows you are going to have problems if it is a crappy DRAM design.

    I'd guess the desktop machines did not show susceptibility because they are antiques by computer standards and built with DRAM that is not susceptible (not because it's an older process geometry, but because it was properly designed.)

    Mitigation by more frequent refresh will statistically reduce, but not eliminate the problem. It will burn more power and rob performance. Row counters would only be useful as an alarm, but not be able to pinpoint the adjacent victim row, so stop everything and refresh the whole bank. The best answer is to design better DRAM arrays.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like