Solving Cucumber's Problems

Cucumber took the Rails community by storm a couple of years ago. For the first time, we had an easy way of excercising the full stack of our applications. Many people didn't even realize that behind the scenes there was another library, Webrat, doing all the hard work. Cucumber became the de-facto way of writing end-to-end tests in Ruby.

When I wrote Capybara, it was mostly to improve the experience of writing Cucumber features. Over time however, with the arrival on the scene of Steak and similar approaches, people realized that the Capybara API was quite efficient at driving acceptance tests on its own and began using it with just plain RSpec or another testing framework.

And all was well, case closed, right?

Over the last year, I kind of abandoned cucumber, to write acceptance tests in plain Ruby, Capybara and RSpec. Throughout this experience, I have tried to keep an open mind. My verdict is: I don't buy it. In the beginning, it was fantastic, the overhead of Cucumber was gone, we were insanely productive. But over time, cracks appeared. As the projects grew larger, the tests became more and more difficult to maintain. I have since tried to figure out why this is.

Imagine the scenario of creating a task with a particular title, in Capybara, this might look something like this:

fill_in('Title', :with => 'Buy milk')

Simple enough. Imagine that we do this a lot, now we want to abstract this. Also quite convenient:

create_task(:title => 'Buy milk')

That looks good. Now imagine that this task is attached to a milestone:

milestone = create_milestone('name' => '1.0')
create_task(:title => 'Buy milk', :milestone => '1.0')
create_task(:title => 'Drink milk', :milestone => '1.0')

That's quite okay too, but what if this is a common pattern, a milestone with multiple tasks?

create_milestone('name' => '1.0', :tasks => ['Buy milk', 'Drink milk'])

That looks fantastic!

Here's the problem though: no one ever builds this abstraction. There is so much overhead involved in implementing the create_milestone method, that in practice, it's simply not done. It's certainly not done for the first acceptance test that could have used it. And herein lies the whole crux of the problem: the default behaviour for acceptance tests in Ruby is to be unnecessarily verbose, and you have to constantly fight this behaviour in order to write maintainable tests.

It is in abstracting these kinds of common patterns that Cucumber shines. In fact, this abstraction is probably still at too high a level for Cukes. If cukes are written like this:

Given there is a milestone called "1.0"
And there is a task called "Buy milk" for the milestone "1.0"
And there is a task called "Drink milk" for the milestone "1.0"
When I visit the homepage
And I click "Milestones"
And I click "1.0"
Then I should see "Buy milk"
And I should see "Drink milk"

Then you are not gaining any benefit from cucumber at all. You really want something like this:

Given I am looking at a milestone with the tasks "Buy milk" and "Drink milk"
Then I should see "Buy milk" and "Drink milk"

In my experience, it's very difficult to write tests at this level of abstraction with Ruby and a lot easier to write them with Gherkin, the language that cucumber features are written in.


Still, going back to Cucumber after being in Ruby land for a while, I encountered a number of problems. These are the same problems that are mentioned by many abandoning cucumber for plain Ruby.

  1. Having a separate test framework is annoying
  2. Mapping steps to regexps is hard
  3. Cucumber has a huge, messy codebase
  4. Steps are always global

I have written a new library, which I believe solves these problems.


Turnip parses Gherkin feature files and runs them in RSpec. You run your feature files the exact same way you would run a normal spec file, and they are automatically run when you run your RSpec suite. So to run a feature file with Turnip, you would do something like:

$ rspec spec/acceptance/view_milestone.feature

Steps are implemented with strings instead of regexps, like this:

step "there is a task called :name" do |title|
  Task.create(:title => title)

It still allows for some variation in natural language by allowing a pseudo syntax for optional letters or alternative words:

step "there is/are :count monster(s)" do |count|

Just like Markdown, we're aiming for something which follows the natural conventions of writing text, instead of using the more arcane regexp syntax. The idea is to cover the 90% use case very well, instead of allowing every possible variation.

Turnip was written just to solve this particular, rather simple problem. There is no support for other programming languages, no wire protocol, it doesn't have its own runner or formatters or anything. Its only dependencies are rspec and gherkin.

In Turnip, steps can be local by scoping them to tags:

steps_for :interface do
  step "I do it" do
    click_link('Do it')

steps_for :database do
  step "I do it" do!

Now just tag the scenarios with the @interface and @database tags and you have different behaviour for the same step in different scenarios.

Scenario: do it through the interface
  When I do it

Scenario: do it through the database
  When I do it


I don't know if Turnip solves the problems that Cucumber has. I don't know if Cucumber is the right solution for you. I do believe that Cucumber has a lot of benefits which the hivemind of this community has too easily dismissed this past year or so. I have tried to separate the ideas of Cucumber from its implementation. Try it out and see if you like the result!


You're Cuking It Wrong

Opinions on cucumber seem to be divided in the Ruby community. Here at Elabs we've been using cucumber to fantastic success on all of our projects for more than a year. At the same time Steak and projects like it seem to be gaining traction; some people are seemingly frustrated and fed up with cucumber.

So where does this gulf of experiences come from, why is cucumber loved by some and hated by others. At the risk of over-generalisation and mischaracterisation I recently came up with a theory: the cucumber detractors are not using cuke the way it was intended.

This is in fact not their fault. The entire cucumber ecosystem, and in fact even cucumber itself, encourage its misuse.

A while ago someone created an issue on the Capybara issue tracker. The interesting thing about this issue wasn't the problem itself, but rather the cucumber feature that the author presented in order to replicate the problem. This is the feature the author submitted:

Scenario: Adding a subpage
  Given I am logged in
  Given a microsite with a Home page
  When I click the Add Subpage button
  And I fill in "Gallery" for "Title" within "#document_form_container"
  And I press "Ok" within ".ui-dialog-buttonpane"
  Then I should see /Gallery/ within "#documents"

At first glance this seems reasonable. But contrast this with the following, improved version:

Scenario: Adding a subpage
  Given I am logged in
  Given a microsite with a home page
  When I press "Add subpage"
  And I fill in "Title" with "Gallery"
  And I press "Ok"
  Then I should see a document called "Gallery"

The difference isn't huge, the steps are largely the same, and there's an argument to be made for writing in a more declarative style, but there's one crucial difference: the first feature is code, the second isn't.

The argument against cucumber that's often presented is that as a programmer, plain text is unnecessary, because we can all read code. While it's true that we all can read code, I still find it beneficial to jump out of the code writing mode for describing the behaviour of the application. When you're writing features first, you don't want to be bothered with the details of how this functionality works. In this initial stage you care nothing about the implementation, about how the result is achieved. You care nothing about things like #document_form_container or .ui-dialog-buttonpane.

I believe that it's in this switching between designer mode and developer mode where cucumber, done right, really shines.

In order to evaluate the bigger picture before hacking
As a developer
I want to write my stories before writing my code

There are some secondary benefits as well. Writing truly plain text features leads to better maintainability as well, since the features are robust against code changes. Plain text is also easier to understand for new developers coming to an existing project. Probably the nicest advantage though is that over time a library of steps is built up, which can then be simply combined to describe new features.

The above feature is nicely illustrative of this anti-pattern, but it is far from the only example. In many of our cucumber suites here at Elabs, we have steps like the above, some of them were written by me. Which leads me to what's really wrong with the last three lines of the above feature. They are written using nothing else than the standard web steps generated by cucumber-rails own generator. Cucumber itself ships with steps which in my opinion encourage an anti-pattern.

Pickle my fancy

Another tool which we've experimented a bit with is Pickle, which allows you to easily generate models from your feature files. A basic example from the README:

Given a user exists
And a post exists with author: the user

Given a person: "fred" exists
And a person: "ethel" exists
And a fatherhood exists with parent: user "fred", child: user "ethel"

It actually looks fairly nice, reads quite naturally, so a first instinct might be to call this plain text. But on closer inspection, there is a whole language in there. To comprehend what these steps are doing you'd need to understand not only the domain models involved, but also the language Pickle uses to manipulate these. I'm pretty sure a non-technical person couldn't make sense of the above. This is really no different, and in fact worse, than writing actual code:

@user = User.make
Post.make(:user => @user)

@fred = Person.make(:name => 'Fred')
@ethel = Person.make(:name => 'Ethel')
Fatherhood.make(:user => @fred, :child => @ethel)

Note how there is an almost one-to-one mapping between the feature above, and the code below. The only thing cucumber does in this case is act as some kind of phoney translator. We write code, but not actual code. So we can do some stuff, but mostly it comes out worse than if we'd just written it as code in the first place. I can't blame anyone for disliking cucumber when using it like this.

However, try this instead:

Given there is a user called "Jimmy"
And there is a post authored by "Jimmy"

Given there is a person called "Fred"
And there is a person called "Ethel"
And "Fred" is the father of "Ethel"

There's not a huge difference between the first couple of lines, even though they read somewhat nicer when written out like this. The real difference is in the last line. Here cucumber is adding value by explaining this abstract concept of a Fatherhood into something very concrete: one person is the other's dad. Cucumber added value to this feature, instead of only acting as a hindrance.

I believe that Pickle is flawed as a concept, in order to achieve readable steps, they need to be written by hand.

The worst feature ever written.

As a curiosity, I present the worst cucumber feature known to man. If you are responsible for something like this, please go slap yourself in the face as hard as you can.

Scenario: User creates some sites and circuits, check connected sites list
    Given a "site" exists with {"name"=>"Somewhere1", "identifier" => "TER1", "provider"=>"TER1 Provider"}
    And a "site" exists with {"name"=>"Somewhere2", "identifier" => "TER2", "provider"=>"Some Provider"}
    And a "site" exists with {"name"=>"Somewhere3", "identifier" => "TER3", "provider"=>"TER3 Provider"}
    And a "circuit" exists with {"provider_name"=>"Another provider", "redacted_circuit_id"=>"ABC1", "provider_circuit_id"=>"C1", "circuit_type"=>CircuitType.find_by_name("Peering"), "service_type"=>CircuitServiceType.find_by_name("Dark Fiber"), :capacity => CircuitCapacity.find_by_name("1 Gbps"), "physical_wire_type"=>PhysicalWireType.find_by_name("Multi Mode Fiber"), "status"=> CircuitStatus.find_by_name("Cancelled"), "a_end"=>Site.find_by_identifier("TER1"), "b_end"=>Site.find_by_identifier("TER2")}
    And a "circuit" exists with {"provider_name"=>"Switch and Data", "redacted_circuit_id"=>"ABC2", "provider_circuit_id"=>"C2", "circuit_type"=>CircuitType.find_by_name("Backbone"), "service_type"=>CircuitServiceType.find_by_name("Dark Fiber"), :capacity => CircuitCapacity.find_by_name("1 Gbps"), "physical_wire_type"=>PhysicalWireType.find_by_name("Multi Mode Fiber"), "status"=> CircuitStatus.find_by_name("Cancelled"), "a_end"=>Site.find_by_identifier("TER1"), "b_end"=>Site.find_by_identifier("TER3")}
    When I am on the "connected_sites" page for site "TER1"
    Then the "connected-sites-list" should look like
      |   Site ID    |  Site Name | Site Provider | Provider Circuit ID  | Provider Name    | Circuit Status |
      |     TER2     | Somewhere2 | Some Provider |         C1           | Another provider |  Cancelled     |
      |     TER3     | Somewhere3 | TER3 Provider |         C2           | Switch and Data  |  Cancelled     |
    When I am on the "connected_sites" page for site "TER2"
    Then the "connected-sites-list" should look like
      |   Site ID    |  Site Name | Site Provider | Provider Circuit ID  | Provider Name    | Circuit Status |
      |     TER1     | Somewhere1 | TER1 Provider |         C1           | Another provider |  Cancelled     |
    When I am on the "connected_sites" page for site "TER3"
    Then the "connected-sites-list" should look like
      |   Site ID    |  Site Name | Site Provider | Provider Circuit ID  | Provider Name    | Circuit Status |
      |     TER1     | Somewhere1 | TER1 Provider |         C2           | Switch and Data  |  Cancelled     |

Yes, those are Hashes inside a feature, which are then eval'd. Make sure to scroll to the right to experience the full horror of it all. I challenge anyone to find a worse cucumber feature than this. I assure you, that thing is real (from one of our rescue mission projects), and there is much more where it came from.

Writing better steps

So how do we write better steps? For me personally, I've found that sticking to the following rule seems to lead to nice, maintainable steps:

A step description should never contain regexen, CSS or XPath selectors, any kind of code or data structure. It should be easily understood just by reading the description.


Continuous Integration Testing for Ruby on Rails with Integrity

Doing test-driven development usually means you have a lot of tests in a project. While this is almost entirely a good thing, running the thousands of Cucumber features and RSpec examples in a large project takes a couple of minutes. If you run your entire test suite every time you commit this will easily eat up a large chunk of your day. Offloading some of this to a continuous integration server will allow you to save time by running your tests asynchronously, in addition to its other benefits.

At eLabs we usually run our unit tests locally—as well as the Cucumber feature for the story we're currently working on—before checking in. Then we let our CI server run the rest of our Cucumber features and notify us if something goes wrong. Here's the setup we use:


At eLabs we've looked at a number of different CI servers, such as CruiseControl.rb and Run Code Run, but our favorite by far is Integrity.

Screenshot of our Integrity site

Integrity suits us perfectly. It fetches our code from our private GitHub repositories, can run any testing command and notify us in a variety of ways such as email and Campfire. It also has a very nice and clean interface. Its one major shortcoming is its complete lack of error reporting. If there's something wrong with your setup it will silently fail, which makes troubleshooting a nightmare. Hopefully the instructions below will help you avoid some of the pitfalls.


We installed Integrity on a server running Mac OS X and Passenger under Apache. Here's a quick guide.

First we installed the gem:

$ sudo gem install integrity

Then set it up in your chosen directory using the --passenger option:

$ integrity install --passenger /Library/WebServer/Sites/integrity

Next, set up a virtual host in Apache, pointing its DocumentRoot to the public folder in your Integrity installation.

DocumentRoot "/Library/WebServer/Sites/integrity/public"

One absolutely crucial step that we missed at first is to make sure that the system user that runs the Integrity passenger processes has git in its PATH. The simplest way to do this is to set the PATH in the virtual host configuration:

SetEnv PATH /opt/local/bin:/usr/bin:/bin:/usr/sbin:/sbin

After configuring Apache you have to configure Integrity by editing config.yml in the root directory of your Integrity installation. We used SQLite for the database (couldn't get it to work with MySQL). If you want to use a hash password for the admin user, here's a simple way to get the SHA1 of a password:

$ ruby -r 'digest/sha1' -e 'puts Digest::SHA1.hexdigest("password")'

The final step is to create the database:

$ integrity migrate_db

You should now be able to log in to your Integrity site and add your projects.

Setting Up a Project

The most important part of setting up a project for CI is the build command. This is the command that Integrity runs to test your app, and it can be anything that exits with a status of 0 when successful. We use a simple rake task that prepares our project by copying a database.yml file and runs RSpec and Cucumber tests.

namespace :ci do
  task :copy_yml do
    system("cp #{Rails.root}/config/ #{Rails.root}/config/database.yml")

  desc "Prepare for CI and run entire test suite"
  task :build => ['ci:copy_yml', 'db:migrate', 'spec', 'features'] do

With that committed to our repository (along with a file) we add the project to Integrity. The important parts here are the Git repository and Build script settings.

Add a project to Integrity

You must also make sure that the Integrity user can access your repository on GitHub. There are a couple of different ways you can do this, but we created a separate free GitHub account that we add as a collaborator to our projects.

After you add the project you should be able to request a manual build from the Integrity web interface. Note that the build is done synchronously—so you'll have to wait a while—but if the build succeeds you're ready to set up the Post-Receive hook for GitHub to have Integrity run your tests whenever you push your code to GitHub.

GitHub Post-Receive URL settings

Go to your project's page on GitHub and click the Admin link in the top menu, and then Service Hooks in the sub menu. Enter the push URL for your Integrity project as Post-Receive URL. The URL has the following format:


After you've updated the settings, click the Test Hook link and Integrity should start a new build. If that works, you're all set for having automated builds on every push to GitHub.


While Integrity's interface is nice, you probably don't want to visit your Integrity site after every commit to check the status of your build. The point of asynchronous tests after all is to get notified when somethings goes wrong. Integrity has a bunch of different notifiers you can use. We use the ones for email and Campfire. Find more and installation instructions on the Integrity site.

In addition to Integrity's own notifiers we also use CCMenu, a Mac OS X Menu extra built for showing CruiseControl build status. It works with Integrity as well with the gem integritray.

We also use GitHub's Campfire service hook that posts a message to our Campfire room every time someone pushes new code. This makes it very easy to keep track of what other people in the company are working on.

Campfire screenshot

Not having to wait for our entire test suite to run before each commit saves us a lot of time. But we can still feel confident knowing that Integrity has our backs and will alert us if something goes wrong.


Relieving the Pain of Controller Tests

Lately we've been embracing Cucumber as the preferred way of testing our Ruby on Rails applications. Cucumber is awesome, both for communicating with the customer and for getting thorough, full-stack tests of the application. We like Cucumber so much, we basically thought that it could replace both view and controller tests. It turns out we were wrong.

While our policy of Cucumber over view tests has been working out great so far, controllers are a different story. There is simply too much logic in the controller that is very hard to test (in a sane way) with Cucumber. It makes sense to have a cucumber feature that specifies that, for a non-admin user, a certain link should not be there, however that doesn't test the security of the application, despite the link not being there, the action may still be freely accessible for the user. Cucumber is not well suited (nor is it intended) to test these kinds of things.

But writing controller tests is a serious pain, so we tried to find a stack that felt natural and pleasant to work with. After some experimentation, we've settled on a slightly odd and interesting stack, consisting of the following:

  • Remarkable's descriptions and steps
  • RSpec's normal mocking syntax
  • Macro-style methods for different user contexts

We first tried using Remarkable on its own, but quickly found that we did not like the mocking syntax:

mock_models :data_point

describe(:post => :create, :data => "params") do
  expects :bulk_create, :on => DataPoint, 
          :with => proc { [@current_account, "params"] }, 
          :returns => proc { [mock_data_point] }

  it { should set_the_flash(:notice) }
  it { should render_template('data_points/new')}
  it { should assign_to(:data_points, :with => [mock_data_point]) }

The fact that it uses a "class-method" level for the DSL presents a lot of problems, it is impossible to simply use instance variables, methods need to be wrapped in procs, etc... It also, for some reason, does not seem to support stubs, which is very inconvenient in some cases. In the end we realized that there is absolutely no advantage to Remarkable's DSL over simply doing:

mock_models :data_point

describe(:post => :create, :data => "params") do
  before do
    DataPoint.should_receive(:bulk_create).with(@current_account, "params").and_return([mock_data_point])

  it { should set_the_flash(:notice) }
  it { should render_template('data_points/new')}
  it { should assign_to(:data_points, :with => [mock_data_point]) }

One sore point though was that there was a lot of setup required in each controller spec for getting the logged in user right. We thought that with some block trickery we might be able to take care of this tedious setup:

module LogInContext

  def as_user(params={}, &block)
    describe "(as a logged in user)" do
      before do
        @current_user = mock('current_user')

      describe(params, &block)


  def deny_access_to_visitors(params={})
    as_visitor(params) do
      it { should redirect_to(new_session_path) }



Now we can use these contexts in our controller tests:

mock_models :data_point

as_user(:post => :create, :data => "params") do
  before do
    DataPoint.should_receive(:bulk_create).with(@current_account, "params").and_return([mock_data_point])

  it { should set_the_flash(:notice) }
  it { should render_template('data_points/new')}
  it { should assign_to(:data_points, :with => [mock_data_point]) }

deny_access_to_visitors(:post => :create, :data => "params")

But we can do one better:

module LogInContext

  def as_user_only(params={}, &block)
    as_user(params, &block)

Now it is as simple as:

mock_models :data_point

as_user_only(:post => :create, :data => "params") do
  before do
    DataPoint.should_receive(:bulk_create).with(@current_account, "params").and_return([mock_data_point])

  it { should set_the_flash(:notice) }
  it { should render_template('data_points/new')}
  it { should assign_to(:data_points, :with => [mock_data_point]) }

And this single test checks both that the post action is accessible to users, and also that it is not accessible to visitors. Of course these contexts can get a lot more advanced once different roles come into the picture. Here's something we're doing in our upcoming app KiNumbers:

module LogInContext

  def as_admin_or_user(params={}, &block)
    as_logged_in_user(params.dup, &block)
    as_admin(params.dup, &block)

  def as_anyone(params={}, &block)
    as_admin(params.dup, &block)
    as_logged_in_user(params.dup, &block)
    as_visitor(params.dup, &block)

This way there is no overhead in testing that a particular action is accessible to several different groups of users. Note that we had to call #dup on params, before passing it along, since Remarkable seems to use destructive operations on the Hash (it turned out to be empty after having been used in a describe block).

We ended up with a controller test that looks like this:

require File.expand_path(File.dirname(__FILE__) + '/../spec_helper')

describe DataPointsController do

  mock_models :data_point

  as_admin_or_user(:get => :new) do
    it { should respond_with(:success) }

  as_admin_or_user(:post => :create, :data => "params") do
    before do
      DataPoint.should_receive(:bulk_create).with(@current_account, "params").and_return([mock_data_point])

    it { should set_the_flash(:notice) }
    it { should render_template('data_points/new')}
    it { should assign_to(:data_points, :with => [mock_data_point]) }


Short, easy to read, yet also very thorough. Controller tests are sexy again! Spread the word!