I’m Learning Multithreading!

Since this quarter started I’ve been doing a weekly interview practice session with friends from school. Last week we spent most of our hour on this:

Rotate Matrix: Given an image represented by an NxN matrix, where each pixel in the image is 4 bytes, write a method to rotate the image by 90 degrees. Can you do this in place?

For those interested, it’s exercise 1.7 in the sixth edition of Cracking the Coding Interview by Gayle Laakmann McDowell. It’s a great interview prep book that’s been really helpful since I started job hunting. I had a lot of fun with this one so I’d like to do a short write up on our thought process.


Row-to-Column Method

The obvious thing to do here is create a new NxN square matrix and transpose the rows of the old matrix into the columns of the new matrix as in the graphic below:

MatrixRotateRtC

The above method takes O(n) time and O(n) space where n is the number of entries in the matrix. Coding this up should be pretty simple so I’m going to skip ahead to the thing I’m actually interested in: the in-place method.


In-Place Method

We can’t get below O(n) time to rotate a matrix since every entry must be moved and there are n entries. However, it is possible to cut the space usage down from O(n) to O(1) by doing the rotation in place. See the graphic below for an illustration of the method:

MatrixRotateIP

How do we implement this in code? It’s a little difficult but the key insight is that it’s possible to find the destination index for any given entry in the matrix from its starting index. In general terms the destination index can be found using this relation

Destination for { i, j } = { j, n – i }

where n is the last index in the matrix. The really cool thing about this is that it works for any starting index. The graphic above only shows a rotation of the outer layer of the matrix but the above method works for any index in any size of square matrix.

The code I post on this blog has been getting longer and longer lately, so I’m only going to post the relevant functions on the site. If you want to see the whole program you can find it on my gitHub repo.

The program is called matrixRotate.cpp. It takes a .txt file containing the matrix to rotate as a command line argument. The text file should have one int  representing the size of the matrix on the first line. Each successive line should have a space delimited string of ints representing each row of the matrix. It prints the rotated matrix to the console.

The function below will take care of a single step in the matrix rotation. After the function completes 4 matrix entries are moved.

// accepts three ints, the starting row and column indices
// and the highest index in the matrix, NOT the size
// rotates 4 matrix entries by 90 degrees counterclockwise
void rotateLeftHelper(int row, int column, int n) 
{
  int temp = matrix[row][column];
  for (int i = 0; i < 3; i++) 
  {
    int nextRow = column;
    int nextColumn = n - row;
    matrix[row][column] = matrix[nextRow][nextColumn];
    row = nextRow;
    column = nextColumn;
  }
  matrix[row][column] = temp;
}

To rotate the whole matrix we need to run the above function on every entry in the first row from { 1, 1 } to { 1, n-1 }. Then every entry in the second row from { 2, 2 } to { 2, n-1 }. This needs to be repeated until we get to the starting index: { n/2, n/2 }. The function below will do this for us:

// accepts an int, the matrix size
// rotates the matrix by 90 degrees counterclockwise
void rotateLeft(int n) 
{
  int level = n - 1; // switch n from matrix size to highest index
  for (int i = 0; i < n/2; i++) 
  {
    for (int j = i; j < level; j++) 
    {
      rotateLeftHelper(i, j, n - 1);
    }
    level--;
  }
}

So when they work in concert, the above functions will rotate a matrix by 90 degrees counterclockwise and it’ll do it in place. That’s pretty cool, right? If you’ll recall though, the problem from Cracking the Coding Interview says the matrix represents an image. If it’s a raw image from a run-of-the-mill camera it will have 20 million entries. Doing each rotation in series will tie up the CPU for a long time. Fortunately, using the above  rotateLeftHelper() function, one instance of the function doesn’t affect the others. They never operate on the same part of the matrix! Do you know what that means? They can run in parallel and asynchronously!


Multi-Threaded Version

Really all we need to do here is create a new thread for every call to the rotateLeftHelper() function instead of running them in series. Cool, right?

The function below implements the in-place matrix rotation method using pthreads. See matrixRotate.cpp on my gitHub repo to see the whole program.

// accepts a pointer to a threadParams struct containing three ints, 
// the starting row and column indices
// and the highest index in the matrix, NOT the size
void* rotateLeftHelper(void* start) 
{
  threadParams* startState = (threadParams*) start;
  int row = startState->row;
  int column = startState->column;
  // cout << "starting thread on {" << row <<  ", " << column << "}" << endl;
  int n = startState->n;
  int temp = matrix[row][column];
  for (int i = 0; i < 3; i++) 
  {
    int nextRow = column;
    int nextColumn = n - row;
    matrix[row][column] = matrix[nextRow][nextColumn];
    row = nextRow;
    column = nextColumn;
  }
  matrix[row][column] = temp;
}

// accepts an int, the matrix size
// rotates a matrix by 90 degrees counterclockwise
void rotateLeft(int n) 
{
  int level = n - 1; // switch n from matrix size to highest index
  // calculate number of threads
  int numThreads = 0;
  for (int i = level; i >= 1; i -= 2) {
    numThreads += i;
  }
  pthread_t *threads;
  threadParams *parameters;
  threads = new pthread_t[numThreads];
  parameters = new threadParams[numThreads];
  int curThread = 0;
  for (int i = 0; i < n/2; i++) 
  {
    for (int j = i; j < level; j++) 
    {
      parameters[curThread].row = i;
      parameters[curThread].column = j;
      parameters[curThread].n = n - 1;
      pthread_create(&threads[curThread], NULL, 
		     &rotateLeftHelper, (void*) &parameters[curThread]);
      curThread++;
    }
    level--;
  }
  // join all threads
  for (int i = 0; i < numThreads; i++) 
  {
    pthread_join(threads[i], NULL);
  }
  delete[] threads;
  delete[] parameters;
}

Further Improvements

So the above code will make use of parallel processing to speed up the matrix rotation. However, creating and destroying a thread for each instance of the rotateLeftHelper() function involves an unnecessary amount of overhead. We could avoid that overhead by using a thread pool.

Also, we’re still running the threads on the CPU. In the best case scenario the CPU has 8 cores, so we’ll be getting a speedup factor of much less than 8. That’s good but not great. Also, we’re tying up the CPU to do a lot of extremely simple tasks. It would be better if we had a much larger number of much worse processors to do this task. Does that sound familiar? It’s called a graphics card and it’s what image processing programs on computers actually use to do tasks like this.

I think the best thing I could do for this program would be to figure out how to create threads that run on my graphics card using openGL.

That’s all for now though! Feel free to email me with questions.