Optimizing Drupal Views the Right Way

Posted on July 22, 2012, 11:54 p.m.

The Views module is one of the most impressive features Drupal offers. With some clicks here and there, you can build a rich display of data in minutes. But to offer this kind of flexibility, Views makes some assumptions that can hamper your site's performance. In this post, you'll find out how to improve performance dramatically.

First, a quick technical explanation of what happens. Drupal shares the main record for each content type in the node table. That means that regardless of whether an entry is a blog post or an event, it will get a record in this table. This has the advantage that building aggregated lists of all entries is quite easy.

On Joins

How does this tie into Views? When you build a view that uses a field specific to a content type - for example, event date - it doesn't know if every entry will have this field or not. So when querying the database, it tells it "give me this node. And if it has a corresponding field called event date, give me that as well." This is called a LEFT JOIN, and it's quite slow. If on the other hand, we changed the query ever so slightly to "give me this node and the corresponding field event date," then the whole thing will execute much quicker. By doing this, the query is executing an INNER JOIN which joins each node entry with each event entry one to one.

Learning Drupal? Subscribe to my Drupal articles, tips and tutorials.

Keep in mind, this will only work if you want the result set to contain events only. If you need to mix events and blog posts, you're out of luck. Well, sort of1.

On Pagers

Guess what the most expensive part of your view is? That's right, the pager at the bottom. Turns out, counting in databases is slow. Really slow2. So get rid of as many COUNT queries as you can. What's this got to do with pagination? To accurately paginate, Views executes a COUNT query on the entire result set. That means that even if you're displaying 20 records out of a million, Views will need to count all 20 million rows. This takes a long time.

So what are your options? You've got two3. One is to get rid of pagination entirely. The other is to use a less accurate pager. We won't go over the former, because clients/product managers can be notoriously inflexible - good luck trying to convince them to use no pagination. But a less accurate pager? They can probably live with that.

So how does a less accurate pager work? First, it eliminates the "last" option. That means you don't need to know how many total results there are. Skipping more than 2 pages ahead at a time is also nixed. With these compromises, you can get rid of the COUNT query and simply use the current offset to build the pager. You'll only be able to accurately jump to pages before the one you're on, but often this is good enough.

On Distinct

DISTINCT clauses have their time and place - never. No but really, try to avoid them. Most of the time, you can eliminate the need for them by crafting a better query. Duplicates rows are usually a symptom of a bigger problem. So you shouldn't mark your view queries as needing DISTINCT unless you're absolutely certain it's needed. Or if you're lazy. Regardless of the reason, it will slow down your query. Worse - when you use DISTINCT in a view, a GROUP BY clause will be inserted as well. This will also slow down your query significantly. Fortunately, disabling DISTINCT will rid you of both problems. Of course, do check to make sure you're not getting duplicate results. If you're using INNER JOIN as described above, you most likely won't.

The Changes

After all that technical talk, let's actually apply these changes. Three assumptions:

  1. Let's assume your view's machine name is my_view.
  2. Let's also assume it's joining to a table called content_type_event for the additional event information.
  3. Let's pretend you've got a module floating around called my_module.

Here's how to do this The Right Way™:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
<?php                                                                          
/**                                                                             
 * Implements hook_views_query_alter().                                         
 *                                                                              
 * Disables distinct and turns LEFT joins into INNER joins. This drammatically  
 * speeds up performance.                                                       
 */                                                                             
function my_module_views_query_alter(&$view, &$query) {                         
  if ($view->name == "my_view") {                                               
    $query->table_queue['content_type_notice']['join']->type = 'INNER';         
    $query->distinct = 0;                                                       
  }                                                                             
}

What about the pager? That's even easier. Just download the lite pager module and select it as the paging option for your view.

Conclusion

Views can be really helpful, no doubt. And it can get a bad rap about performance. But that's because - without much user input - it has to make a lot of assumptions to construct the query. With just a little bit of code, we can guide the query builder down the right path.


  1. You could just load the common data from the node table, then individually fetch the rest of the information with a query per row. 

  2. If you're using the MyISAM table engine for MySQL, disregard this. 

  3. There's a third option: query hinting. But it's beyond the scope of this article.