Rails Performance exercise

Thu, Oct 6, 2022

Here is a typical Rails endpoint:

class ProductsController < ApplicationController
  def index
    @products = Product.all

    @products = @products.map do |product|
      {
        id: product.id,
        name: product.name,
        average_rating: product.reviews.average(:rating),
        categories: product.categories.map(&:name)
      }
    end

    render json: @products
  end
end

It is readable, easy to maintain, uses Rails’s native functions such as average.

Let’s do an exercise together to see how we can improve its performance. The exercise assumes you have monitoring and logging, you judged this endpoint worth optimizing, and CPU/RAM and networking are already optimized.

1. Pagination

Fetching all products can be expensive if the database is large. Let’s reduce the number of records fetched at once.

def index
  @products = Product.page(params[:page]).per(20)
  ...
end

It is set at 20 in this example, but it can be a parameter or a different value, based on customer needs.

2. Preload Associations

Eager load reviews and categories reduce queries. Calculating average_rating for each product causes N+1 queries. Mapping categories for each product adds additional N+1 queries.

def index
  @products = Product.includes(:reviews, :categories).page(params[:page]).per(20)
  ...
end

No more N+1 queries

3. Indexing

If you haven’t already, make sure to add indexes on joined or searched columns

# db/migrate/add_indexes_to_reviews_and_categories.rb
add_index :reviews, :product_id
add_index :categories_products, [:category_id, :product_id]

4. Database Calculations

Whatever Ruby uses for calculating averages, database level calculation will be faster. There is no type casting, no formatting, no N loops etc.

def index
  @products = Product
    .select("products.*, AVG(reviews.rating) as average_rating")
    .joins(:reviews)
    .group("products.id")
    .includes(:categories)
    .page(params[:page]).per(20)
  
  ...
end

Many miss db calculations. Any Rails developer know about count which translates into COUNT(*), but did you know databases also do average, sum, maximum, minimum?

5. Selective Queries

Tables are often bloated with binary jsons, dozens of columns, and even binaries. Select only needed columns!

def index
  @products = Product
    .select("products.id, products.name, AVG(reviews.rating) as average_rating")
    .joins(:reviews)
    .group("products.id")
    .includes(:categories)
    .page(params[:page]).per(20)
  
  ...
end

6. Batch Queries for Associations

We can write more efficient queries for associations

def index
  @products = # load products
  product_ids = @products.pluck(:id)

  average_ratings = Review.where(product_id: product_ids).group(:product_id).average(:rating)
  categories = Category.joins(:products).where(products: { id: product_ids }).group_by(&:product_id)

  render json: @products.map do |product|
    {
      id: product.id,
      name: product.name,
      average_rating: average_ratings[product.id] || 0,
      categories: categories[product.id]&.map(&:name) || []
    }
  end
end

5. Asynchronous Queries

Asynchronous queries in Rails 7.0 are executed in a background queries. If you have multiple expensive database queries, the application doesn’t have to wait for one query to finish before starting another

def index
  # the queries below will be processed in parallel 
  @products = Product.load_async.page(params[:page]).per(20)
  product_ids = @products.pluck(:id)

  average_ratings = Review.load_async.where(product_id: product_ids).group(:product_id).average(:rating)
  categories = Category.load_async.joins(:products).where(products: { id: product_ids }).group_by(&:product_id)

  render json: @products.map do |product|
    {
      id: product.id,
      name: product.name,
      average_rating: average_ratings[product.id] || 0,
      categories: categories[product.id]&.map(&:name) || []
    }
  end
end

load_async requires config.active_record.async_query_executor to be set

9. Cache Results

For frequently accessed rows, we can cache average rating and category data for products.

def index
  @products = # Product load
  render json: @products.map { |p| p.cached_attributes }
end

# app/models/product.rb
class Product < ApplicationRecord
  def cached_attributes
    Rails.cache.fetch(["product", id, "cached_attributes"], expires_in: 12.hours) do
      {
        id: id,
        name: name,
        average_rating: reviews.average(:rating),
        categories: categories.map(&:name)
      }
    end
  end
end

10. Background Processing

Average is easy. In real-world applications, we have to calculate fees, totals, and metrics that are far from trivial. Calculations can be precomputed and stored.

Background processing also works for large datasets.

class Product < ApplicationRecord
  after_save :update_average_rating

  def update_average_rating
    self.average_rating = reviews.average(:rating)
    save!
  end
end

Bonus

Depending on your use case, aggregation logic can be offloaded to an external API. That external API could run on a mean, blazing fast stack (such as Crystal or Rust), or can be allocated more powerful hardware.

Make sure to profile performance. An external API will add networking time so you can end with worse response times.

# Offload aggregation logic to an external API gateway.
def index
  render json: ExternalProductService.fetch_products(page: params[:page])
end

Another option is to replace the REST endpoint withh a GraphQL API for selective querying.

class Types::ProductType < Types::BaseObject
  field :id, ID, null: false
  field :name, String, null: false
  field :average_rating, Float, null: true
  field :categories, [String], null: true
end

class Resolvers::ProductsResolver
  def resolve(page: 1, per_page: 20)
    Product.includes(:reviews, :categories).page(page).per(per_page)
  end
end

Like the external API gateway above, there are pros and cons with GraphQL. Many teams were wooed by the new tech only to backtrack when they see the maintenace costs and added complexity.