DTOs are not reusable

Data Transfer Objects are a workhorse type of many systems. You have some data in one place and you need it in another so you slap it into a DTO to transfer it in a format both endpoints understand. As a transfer format a DTO should be a structured container for data, implementing no logic. These are about the cheapest types you can have. Replicating them is low cost, yet reusing "standard" DTOs for concepts in a system is common practice. Unfortunately this is one of the most toxic things I know of to do to an API.

The need to transfer data will at all times happen in a context. This context is defined by the purpose of the data transfer, which is highly situational. Although it is not uncommon for the shape of data to be the same in different contexts, this is only coincidental. Changing requirements can alter this at any time.

If you have multiple things using a DTO inevitably one of them will need to vary it in some fashion to deal with a new requirement. At this point a developer should create a new type that has the same structure as the old type plus the changes for the new requirement. In practice however in many cases what actually happens is that the shared DTO is modified in response to a requirement that only reflects a subset of the use cases to which it is put.

This leads to some immediate problems. If you're lucky these are compilation issues that force you to deal with incompatible or missing data. However not every method of implementing DTOs will cause these issues and even then there's no guarantee someone will fix them correctly. What you are most likely left with is that the usage of part of the DTO is undefined in many scenarios, and that this undefined behaviour is non-obvious. This is a magnet for bugs.

Consider a case where you have create and update endpoints for an entity. These share a DTO. When creating an entity you may add a series of comments. Hence the DTO has a list of comments. The update endpoint does not apply to comments, it just doesn't use that data. It only takes the pieces it needs. Both endpoints do what the developer intended. But they don't work as a potential consumer may reasonably expect.

If you publish the schema of the endpoints the contract definition for the update endpoint will include the comments section. A consumer may reasonably choose to use that section but it will not have the desired effect. From the consumer perspective your endpoint is broken. It defines a comments section but using it has no effect. You've just earned yourself a bug report.

Doing this is also a terrible API in many cases. Exposing create and update actions like this is a very CRUD interface that requires the client to understand and perform most of the business logic. It assumes, often incorrectly, that all the data of the entity changes at once. It also gives the service no context as to why the client wishes to perform an action and no clean way of restricting which parts of an entity may be varied. A better solution would be endpoints for performing the specific business actions that a client should support, each with a custom DTO that makes the data required unambiguous.

This can also be problematic on the query side. Consider a case where a customer summary is provided by an endpoint so that webpages may display a custom header identifying the user. Elsewhere in the system a list of customers is needed in response to searches in the admin system. This is also a summary. Initially both have the same properties and the DTO is reused. And then comes the new requirement.

The business decides that it would be useful to show the number of items and value of a shopping cart to the customer. This is added to the summary object that is already being requested (we'll ignore users who are not logged in but have a shopping cart in this example). Suddenly the search result endpoint has number of items in cart and cart value as fields in its result set. It almost certainly isn't populating them, but they're there. A client may then attempt to use them and have errors because the information is missing. The best case outcome the client breaks immediately. But what if the values take defaults (probably zero for this example) that look sensible but are wrong. The client may assume all shopping carts are empty. Unlikely to be the end of the world. But what if it's something more important, like the outstanding balance on a loan.

You may think that it would be obvious that the value is always a default so the error will be detected immediately. But a client that thinks it can get the same data in multiple places may take a DTO from a place where it is not fully populated and treat it as if it is. This could lead to difficult to track errors which are likely to hit in production where they can do the most damage.

There are some specialised case where there's some justification for having multiple endpoints using a single DTO. If you support locating an entity in different ways it is probable that you still want to have a consistent response. This isn't a particularly good justification for sharing a DTO but it's not terrible. What you will need to do in this case is treat all of these endpoints as a single unit and ensure that any modification or versioning is applied consistently to all of them or is applied to none.

Colin Scott

Read more posts by this author.