(User) business goals drive decisions that need making.
The decision-making needs drive which (user) questions need answering.
The questions you need answered drive which data is needed (by the user) in order to answer them, and in which form. This is your data model.
Reduce the number of unique models sent downstream, it would make life simpler for your consumers. Don’t dump everything downstream, it creates usability, cost, and security issues. However, if is indeed required that additional data is to be sent, stored, and consumed, ship it, name it, tag it and document it in a manner that enables consumer to understand what it is that they are consuming, how if and when should they process the data, and how does each datapoint relate to other datapoints.
Reuse and extend common models, names, and tags instead of adding new ones whenever possible, this will minimize company-wide model span (so it is less complex overall) and will make sure your change will reach all relevant consumers.
Merge atomic data points to reflect their context -Data that is sent downstream should reflect context by consolidating several atomic data points to form a more complete picture. Grouping and aggregating data into sets, objects, lists, lines, documents, events, tables, hierarchies, or even files based on some heuristic has intrinsic value and can help consumers make sense of complex models and processes by analyzing the data that is relevant to them in its proper context.
Data is stored in order to serve a single purpose – meeting some business needs for some consumer. Whether that consumer (=data user) is internal to the company or is a client has no effect on the process – they still have some goal or need that should be met and require the same level of diligence in modeling data for their consumption and use. ↩︎
Leaving a field as String knowing sources will populate them with non-schema controlled (or worse yet, unmodelled) structured data (such as JSON, XML, dumps, etc.) through it should be avoided at all costs, as even the existence of these “path of least resistance” fields is tempting data source developers to bypass the desired modelling process. The risk posed by these fields, that there would exist some business-critical model that will not be tracked and managed as part of the normal flow of data in the company precludes any benefit one may derive from allowing such field to exist. Just like you would not allow any rouge source code to be added to a product without proper code review, version control and testing, so shouldn’t rouge data models exist in your product. ↩︎
Based on the opinions of this document's sole author: Adam Lev-Libfeld (2023). Learn more about me at