tl;dr - Data pipeline improvements which cross the boundary between human systems and computer systems create a self-reinforcing cycle of improvement.

Andrew Jones, originator of the data contracts concept, has posted lately about the need to improve data from the source[1]. Part of Andrew’s approach is to “shift left” the application of contracts and quality of input data, in a defensive programming format[3]. Unstated but implied in Andrew’s approach is that these input validating data contracts become deliberately applied points of friction[4] which can spur conversations about improving the source system.

The thing is, if you follow this movement left far enough, in the vast majority of cases, you’ll reach a point where the data source is a human being.

Shifting further crosses what can be thought of as a technology interface layer where human systems on the left input data to computer systems on the right. At this layer, UML gives way to Business Process Modelling Notation and programming gives way to “soft skills” and Service Design.

Working at the human layer requires different technologies, but the meta-process for engineering human systems is really the same as that for engineering electronic systems[5]. It’s challenging in different ways but success here means that downstream data processes can begin to drive upstream human process improvements, beginning a virtuous cycle of better data driving better decisions which drive better data. This is the nirvana point of the Operations (nee RevOps, nee SalesOps) team.

Footnotes

1 - Because, as he says, You can’t make the data any more correct, complete or timely downstream - you’re bounded by how that data was generated at source. This statement is so obviously true that we forget the truth of it, at times. We cannot make our input data more correct. The most we can do is make it less incorrect. By analogy, if what we receive is a photograph of a snow covered field, we can adjust the white balance of the picture so the snow is the right color, but we’ll never know there was a moose standing just outside the frame when the picture was taken[2].

2 - Any other philosophy minors reminded of the Allegory of the Cave?

3 - Consider a data team whose pipeline is ingesting some sets of data and producing a set of data as output. Adhering to these contracts would mean they’re following the software engineering guidance to “validate input; filter output”; another support for the assertion that data engineering is a subset of software engineering and software standards of practice should be part of data standards of practice.

4 - Hubspot’s RevOps certification course has a good discussion of deliberate friction in processes..

5 - Creating process documents for people is the Natural Intelligence (NI) equivalent of being an AI Prompt Engineer. In both cases, you’re writing the parameters which you want an intelligence to consider in order to achieve some goal, which you’ve given it. The main differences are that the NI entity is significantly more complex and, for any significant organization, you’ll have to prompt and train hundreds or thousands of them, all with different slightly architecture.