Does cucumber suck?

I’ve been having a lot of rants about Cucumber of late, as it’s the new shiny thing for agile teams.  Does anyone else have issues with it?  I’ve asked all of my programmer friends to convince me of its worth, and they’ve all failed so far.  I’ve not seen it adding any value above building a good API (and I see it bringing a lot of negatives relative to other possible approaches).

In my experience, I’m seeing –

– Customers/non-programmers never write the tests (because they have very little interest in specifying everything in given-when-then.  They just want to tell us what they want.  And it doesn’t make much sense to specify everything in that format anyway).
– Customers/non-programmers write the tests but it focuses the test effort on writing those kinds of tests, rather than other testing that seems to add more value.
– The tests are written in English, but what the test actually does depends on how the developers convert the english phrases into code (so there’s no guarantee it tests what the customer intended anyway).
– Avoidance of conversation (ie. tests as contracts).
– Cucumber and the related tools (through the toy examples they provide) encourage developers to put lots of implementation detail into the tests (and sometimes to do a lot more testing through the GUI/http layer rather than pushing some of that testing down).
– Refactoring sucks as we lose IDE support.
– Much heavier test artefacts (when a lot of the teams I work with are already struggling with the weight of their agile test automation).
– A continued focus of tests on what the system does and not why it does it and the business outcome.
– Anecdotally, I’m not seeing better outcomes than people were getting with other approaches.  I realise it may be leading to better designs, but then I’d expect to see improvements somewhere in the process.  That looks like it would require better modelling skills than I’m seeing in most of the cucumber tests on my projects.

Most of the examples that I’ve seen where people are claiming success are fairly small applications.  I’m not seeing the approach scale that well.  Yes, most teams could write their cucumber tests better, but even then, in my experience, other approaches would be more effective and more efficient.

Any thoughts?  If there’s interest, I’ll try and post some examples of what I think those tests might look like if we pushed them into other forms.

19 comments on “Does cucumber suck?”

  1. Lisa Crispin says:

    Interesting! Cucumber looks like a nice tool. But my experience has also been that, no matter what the tool, “customers” don’t have time or inclination to write tests. They are happy to discuss and draw examples on the whiteboard or provide us with examples in spreadsheets; they’re even willing to go over the tests that we testers write. We started using FitNesse in 2004 hoping our product owner might write tests. He doesn’t have time, and even if he did, he doubts he would do it. But he does a great job of giving us examples we can turn into tests.

    So now we don’t worry that much about having the tests be easily readable by business people – though we like the ability to put narrative (even in Given/When/Then style if we want) in the FitNesse test pages.

    Is Cucumber a good choice if you have testers writing the tests, as opposed to business people? In my experience, FitNesse, RobotFramework, and (for GUI) Watir, Selenium and SWAT all work equally well for that.

  2. James Martin says:

    Hi Jared, I’m a first time reader of your blog. I possibly haven’t quite understood your position on this. Apologies for the deluge of questions; I’m just looking for some context.

    Some questions about the teams you’re working with:

    What kind of teams do you see having these problems? Are they comfortable with ATDD/ATDP in general? Or are they just starting to adopt this collaborative approach? Are all of the teams having all of the problems you describe? Or are different kinds of teams experiencing different problems?

    Are they developing outside-in; starting with feature descriptions and then allowing the programmers to drop down into lower-level testing frameworks to work on the inner-BDD/TDD loop (e.g. RSpec, in the Ruby world)? This seems to be a much misunderstood concept of the Cucumber/RSpec relationship.

    What technologies are they building software with? Are they using the tight integration of Cucumber with Rails, for example? Or perhaps a slightly less mature implementation?

    You mention that Cucumber encourages a lot more testing through the GUI than you would like. This seems to be entirely dependent on how the team choose to wire their scenarios together. I wouldn’t say that Cucumber pushes that approach any more than any other tool. It does, however, provide some rather nice hooks into things like Webrat and hence Selenium to allow end-to-end/full-stack automation when relevant.

    It’s a really difficult skill, learning when to drive the ‘Given’ steps through direct model access, and when to drive via the GUI. If there’s a lot of client-side javascript/AJAX going on in the browser, for example, then I’d suggest a some full-stack acceptance test automation would be helpful to support the team and the business. Again, I don’t see this being a failing of any tool, particularly, but a lack of experience in a team.

    I’d really encourage you to read the RSpec book[1]. It’s in beta format right now, from PragPub, but it’s coming along nicely and explains some of these concepts in much more depth than I have here.

    Some questions about the customers you’re working with:

    Are they integrated into the team? Are they talking to the programmers every day? Are they co-located? Are they on board with the concepts of ATDP/ATDD and understand its value and goals? Do they understand their role in the team?

    Are the customers open to working with the team to develop the acceptance tests, even if they don’t like writing Given/When/Then? Is it a problem if they don’t physically write the tests themselves?

    Isn’t one of the main advantages of ATDD the conversation between the business and team? Is that happening, despite Cucumber? If not, can any tool alone bridge that gap?

    You also mention that you’re not seeing improved results over other approaches. Do you mean approaches other than ATDD? Or ATDD with a tool other than Cucumber? Could you give an example?

    Consider me interested in examples of ‘tests pushed into other forms’

    [1] http://www.pragprog.com/titles/achbd/the-rspec-book

  3. Rasmus says:

    As a tester I hate Cucumber because the DSL is weaksauce compared to FitNesse and friends. No includes, variables, common setup for testsuites, etc.
    Basically you have to choose between maintenance-nightmare every little detail specified in tests or simple tests with all the juicy bits “hidden” in ruby.
    In the latter case tests end up looking like the one test to rule them all(TM):
    Given everything is set up
    When I submit correct data
    Then everything should be OK

  4. Evan says:

    You have the same experience I did. Nobody on my team likes Cucumber, and we’ve given dozens of other developers the opportunity to explain what they like about it. The vast majority have said “yeah … it’s really not all that special”.

    For me, the killer misfeature is this one:
    “so there’s no guarantee it tests what the customer intended anyway”

    The “plain english” is little different from pasting the customer’s test description into an ‘it … do’ description in rspec. EXCEPT it’s worse, because the code implementation of that description goes into a different file and so is much more likely to diverge from the text.

    In summary: with Cucmber you do extra work to maintain an additional layer for a slower test that has no additional capability and is less likely to accurately reflect the client’s specifications because the description and code are in separate files and can therefore diverge easily. This sounds like a lose-lose to me. What am I missing?

    We do integration and acceptance testing in rspec with capybara, with the code directly interleaved with the descriptions and therefore easier to read and maintain.

  5. Jared says:

    Definitely that summary is my main concern. If you want to know what the customer wants tested, ask them (and write it down, if that’s what’s best for you). If you want to know what the code’s *actually* testing, look at the code.

    The other big piece for me is that behavioural testing’s a useful technique, but there are other ways to model it (and test it). Similarly, some things we’re going to build benefit from examples. For other things, examples are overkill or inefficient.

  6. Mark Harris says:

    We are starting to use Cucumber for testing an agent portal implementation. With contractors and FTE’s mixed doing the specs and coding, and legacy code that we need to interact with, in many cases, the feature files are the best specification of how the product works for us.

    When doing our initial proof-of-concept for this effort, we found that the process that the QA Analyst goes through with BA and dev to create the scenarios and steps at the beginning of the iteration forces the team to think through the broader user story. Without forcing that interaction, it’s all to easy to get back into the mold of test at the end of the iteration, and agile devolves into mini-waterfall.

    In this case, then, *creating* the feature files is far more valuable than the feature files themselves. And you end up with a regression suite as a side-effect.

  7. Jared says:

    That’s basically one of my points. Certainly, discussing test ideas and acceptance criteria prior to developing the story will in most cases be helpful. However, limiting that discussion to behavioural tests is a bit narrow. Specification by example has its own issues. Regardless, you don’t need cucumber to force the interaction you describe.

    Another point relevant to your comment is that I feel the extra layers in the cucumber automation framework (as well as the idea that everything should be shoehorned into a behavioural testing framework) really aren’t helpful. Discuss the tests, take what’s important, implement it in the most efficient way you can. Based on my experience, using cucumber as a functional test framework seems a pretty inefficient use of my time.

    By all means, put your behavioural tests in there. Do lots of other kinds of tests too though. While Dan North might think “It’s all behaviour”, but just because everything *can* be expressed as behaviour, doesn’t mean it’s efficient or effective to do so.

  8. Johnny Rocket says:

    We’ve been using Cucumber for about 9 months at my work. We used it to bang out lots and lots of GUI based tests in a very short period of time. So we wound up putting lots of detail in the tests. They basically turned out to be fancy “recorded” scripts, with things like object ids, link text, test data, etc, etc, right in the feature files. This is of course, entirely our fault, but to Jared’s point, it is the model promoted by the samples. Even our Cucumber expert said “That’s what makes it wonderful. Anybody can read and write it!” Apparently its maintenance be damned and full steam ahead.

    In our efforts to refactor we’ve discovered several things which are bringing our test development to a grinding halt.

    – Adoption: As Lisa states, we can’t get anybody outside the test team to write a feature file. They have other things to do not write our tests for us. Heck, we can’t even get 1/2 our test team to write them!

    – Running Multiple Tests at Once: We’ve got lots of tests now. It can take up to 5-6 hours to run all our tests. So we’ve got the need for running multiple tests in parallel. And we don’t need the simple “run the same test in multiple browsers”. We need real concurrency. If I have 10 tests tagged with “@awesometests” we’d like to see those tests run at once. We can supply the hardware necessary for that, but Cucumber provides no manner for running multiple tests at the same time. So if we want to use Cucumber to do this we’ll have to tear it apart, alter it, and put it back together.

    – Sharing Data: Very early we ran into the fact that we need to share data across step definitions. Our in-house Cucumber and Ruby expert suggested that we use global variables, which “should work fine, because all step defs are loaded in to memory and aren’t class instances, so they’ll all see the variables”. Which of course made us all cringe, as this a recipe for lost data, data changed by code that shouldn’t be changing it, one test changing another test’s data… the list goes on and on. Throw concurrency into the mix and the global var theory gets completely blown away.

    – Tool support: The only IDE we’ve found that does a decent job of interpreting feature files is IntelliJ… at $600 a pop. A bit hefty for an open source tool.

    So now we’ve got lots and lots of tests that have big time data problems, take a good deal of effort to refactor, and take waaaayy to long to run to be useful in a fast-paced agile environment. Some if it is our fault, some of it isn’t. but hey, we’ve got lots of tests! Yay!

    Do we dare attempt to tear Cucumber apart and rebuild it to suit our needs? I don’t know….

  9. John Furr says:

    We’ve been trying ot make cucumber work. I really want to realize the potential benefits.

    However, cucumber fails intermittently. It runs differently in different browsers. It runs differently in different OS.

    In short…and believe me I hate to be this guy: Cucumber is the single worst piece of technology I have ever seen advocated for a professional dev environment. I inherited cucumber at my shop and now we are about to bit the many week bullet and move all of our tests away from cucumber.

    We often times find it takes longer to write tests than code. that is a no start in my shop. Agile or not cucumber is not ready for prime time in our opinion.

  10. Abe Heward says:

    Aside from all the serious problems listed here, the most annoying issue for me is the limitations in how things can be set up.

    You either have a setup that applies to ALL features/scenarios or to EACH. There’s no such thing as a “Before” that runs ONCE for all the scenarios in given feature.

    This is a serious limitation, and using a tag for the first item in your feature’s scenario list isn’t going to be helpful, because any variables declared there won’t be available for the untagged scenarios.

    And speaking of cordoning off variables, that’s another horrible sticking point. Scenario A does something that requires you to store a variable to be used by Scenario B. Where do you put this variable? A Global variable? That’s just ugly. To avoid doing that you have to code so much huge overhead it’s prohibitive.

    I’ve just been really really disappointed with Cucumber, so far. I’m skeptical it’s really feasible to use it if the thing you’re testing is made up of anything remotely complicated and interdependent.

  11. Sheetal Deshpande says:

    I have a different opinion. I am QA and I have been using cucumber since 3 years and I found it very useful. Though it’s true that it’s not being used as it is supposed to be in terms of doing BDD or ATDD. But still it’s an awesome tool. It lets me organize my tests in better way.

    And with intellij idea 12, the ide support is also good. And there is picocontainer jar which makes dependency injection for you so you can design a very good automation framework.

    And one important advantage I see with it is that I can just define the tests even before the development is done and somebody else can implement it. We use this thing for collaborating with our developers.

  12. James P. says:

    The biggest problem I have with cucumber is that it’s incredibly bad at what it is supposed to do –> communicate. Coercing a test into a “Given/When/Then” format is incredibly stupid. Stringing test steps together according to a paragraph in English is incredibly cumbersome, and doesn’t add any value. Furthermore, such formatting of tests obscure ideas to be complex. Even using the asterisk to drown-out the poor choice of keywords provided by Cucumber makes an otherwise simple idea hard to read.

    A better idea, in my opinion, is to simply tie a document to a test. Then you can at least have the flexibility to program the test in a way that makes sense, and not embedding your thinking into a poor linearized testing format. And don’t even get me started on how bad of an idea it is to have shared steps for a testing application.

  13. Jared says:

    Hi James,

    As I understand, communicating is one aspect of the tool. Supporting a BDD approach is another. So the Given/When/Then format is a useful tool for specifiying those kinds of tests (ie. Behavioural), where you can think of them as state-rule-transition. I have heard of someone using this approach to actually then build out the state machine from the BDD tests, allowing them to find gaps in test coverage.

    My comment ‘…And it doesn’t make much sense to specify everything in that format anyway’ is specifically about the idea that people try to shoehorn other kinds of tests into the given-when-then format, and this is a bad idea.

    If your system has a lot of interesting state-related tests, then by all means adopt a model-based or BDD approach. But only use that tool for the testing for which it is designed. Complement it with other approaches for data-driven tests, for business rules, for functional tests and you will have a much more efficient test effort.

    As you say, cucumber’s a pretty crummy tool. I see no benefit over writing my own expressive code that clear models the problem domain, just an additional layer of stuff to maintain.

  14. Jared says:

    Hi Sheetal,

    “I have a different opinion. I am QA and I have been using cucumber since 3 years and I found it very useful. Though it’s true that it’s not being used as it is supposed to be in terms of doing BDD or ATDD. But still it’s an awesome tool. It lets me organize my tests in better way.”

    What organisational benefit do you get? What else have you tried? How do you know this is an effective and/or efficient way to organise your tests?

    “And with intellij idea 12, the ide support is also good.”

    Do you still have an additional layer of text and regular expressions over the top of readable code? What is the value of this layer to you, and is it worth the extra expense?

    “And there is picocontainer jar which makes dependency injection for you so you can design a very good automation framework.”

    Dependency injection has nothing to do with the goodness or badness of Cucumber.

    “And one important advantage I see with it is that I can just define the tests even before the development is done and somebody else can implement it. We use this thing for collaborating with our developers.”

    You can do this with any tool if that’s something you get value from. I think what you are saying is that you can reuse some typing, but the text you typed to define your cucumber test isn’t really ‘the test’, but the test idea. The underlying ‘real’ automation code is the test implementation.

    So I see that there is a benefit in that you get some free traceability from the test idea (ie. the plain-text given-when-then test) that lets you know some code has been implemented to cover that test idea.

Leave a Reply

Your email address will not be published. Required fields are marked *