One of the real killers of the utility of unit tests that I have encountered is having a test set that is brittle and prone to breaking. This leads to a situation where even minor changes are large efforts due to the changes required to make all tests pass. We can therefore compare the overall utility of test sets by the likelihood of tests breaking due to unrelated changes. Ideally this likelihood should be very low and test breakages should indicate genuine bugs introduced by a change.
Complex tests that have multiple assertions and expectations are a particular source of brittleness in tests. Each additional expectation or assertion makes a test more sensitive to changes. Add too many and a test will start failing whenever anything it relates to changes. The effort to resolve these broken tests is compounded by the fact that with so many possible causes of failure the effort to determine the cause is itself substantial and it may take many iterations to resolve all the issues and get the test passing again.
I have therefore through (sometimes bitter) experience and by reading the wisdom of others come up with a set of principals I apply to writing tests
Principal 1 A test should only validate one output or behaviour
This principal immediately reduces the number of conditions that can cause a test failure. The trade-off is that the test validates less of the unit under test. This means we will now need multiple tests to validate the unit. Provided that the setup and teardown overhead is contained I consider this to be an advantage. Having more but smaller tests is good because the defects in the codebase may now be more precisely determined. This increased clarity is immediately of benefit in defect resolution. In most cases this will actually improve the maintainability of the code as it will be larger but significantly less complex.
Principal 2 Validate behaviour or output but not both
Validating behaviour using mocks is fundamentally a different kind of testing to verifying the output (return value or changed state) of a unit. This principal is therefore a restatement of the first with this recognition applied.
Principal 3 Favour Stubs over Mocks
As Martin Fowler points out Mocks Aren’t Stubs. Overuse of mocks indicates that we are too interested in the internal works of our unit. The test is a specification that validates the desired functionality. Mocks should be used when the interaction of the unit with its dependencies is important in it performing its function. This is the case a lot less than beginners with mocking frameworks tend to think (as some of my earlier test sets attest).
In general interaction with a dependency should be validated where the interaction is to request the dependency to make some kind of change to the world. For instance a request to a service to publish an event to a message queue would be a candidate for mocking. A request to a repository to retrieve data would not. In the later case as a user of the unit under test we don’t care that the repository is called. This is an implementation detail of the unit that it does in order to satisfy our requirements. It could obtain the data via some other source and we don’t care as long as the outcome is what we need. Our test should therefore be establishing requirements for what we want and stubbing the repository such that the unit may or may not use it as necessary.
In the former case we do care that the service is invoked because this determines whether our event is published. It is therefore desirable to ensure that this happens so that out unit meets its requirements. A mock may therefore be appropriate. Based on the previous principals the test that validates the interaction with the dependency should be a separate test to that validating any other behaviour of the unit.
Generally if a dependency does not return anything to the unit under test then we will likely need an expectation to verify the interaction. However if the dependency returns a value we have an opportunity to validate it implicitly. If the service call returns a value that we can check for elsewhere (such as in a return value or expectation) we can often eliminate the need for an explicit expectation. Instead we can implicitly assume the action has happened if we can otherwise validate a value that can have been obtained no other way.
This should not be overused, as we are simply expressing the expectation indirectly and a change to the unit under test may still cause a test breakage. Nevertheless it is a useful technique for writing simpler and more concise tests. If you are making too many implicit expectations it can be a sign that your unit under test needs to be refactored. The test shown below demonstrates implicit expectations (for those following at home this test uses MbUnit and Rhino Mocks):
[Test]
public void CanFindByDateAndItemName()
{
_blogPostingRepository.Stub(repository => repository.FindByDateAndItemName(2009, 2, 21, "TestItem")).Return(_blogPosting);
_blogPostingMapper.Stub(mapper => mapper.Map(_blogPosting)).Return(_blogPostingModel);
var result = CreateStubbedService().FindPostingByDateAndItemName(2009, 2, 21, "TestItem");
Assert.AreSame(_blogPostingModel, result, "Result not expected instance");
}
This is a relatively simple test for a relatively simple method (which is in fact shorter than the test). This test has two dependencies, a repository and a mapper which maps from a business domain to a model. This test uses only stub objects (which are created in the test setup along with the test model and domain instance). At the end we assert that the model we get is the model was expected. We do set expectations on the dependencies because they are not needed. The instance we expect can only be obtained by the unit test through the mapper dependency. In turn this will only return it if invoked with the correct domain object instance. This instance can in turn only be obtained from the repository dependency if presented with the correct parameters. Invoking either dependency incorrectly or not at all will not fail the test. However there is no other way that the asserted outcome can be obtained. Indeed if there is the test shouldn’t care, it’s just validating that the result is correct.
I would generally not be comfortable with chaining implicit expectations any further than this. Doing so will tend to make it complicated to track how everything comes together. The advantage however is that it is possible for a test to define the stubbed behaviour and let the unit under test be concerned with composing it into the desired result. In more complicated scenarios it may also be better to assert on properties of an instance rather than asserting that a specific instance is expected.
In summary by making each test do one thing only we make the tests easier to understand and better identify what the scope of a breaking change is. This will go a long way to eliminating brittle tests, but will not eliminate the problem. Other concerns may also cause brittle tests, including how test instances are created and the level of coupling present in the units being tested. These are potential scope for a post at a later date.