Fifty Shades of Digital
With reference to research into some of the mathematical principles that underpin the construction of digital algorithms and digital information systems, in other pages under this section I have tried to draw attention to what appears in the analysis of that research to be a critical problem affecting the exchange of data across digital domains, generally speaking.
Digital information systems have been with us for a good while already; indeed some of us are young enough not to have experienced a time when digital technologies did not play a decisive determining role in the forms of our social and economic organisation. There is therefore a considerable industrial and economic momentum already established in favour of the successful rolling-out of those technologies, together with the appearance of a growing alliance between corporate and state power – one that relies essentially upon the progress of that deployment continuing without significant interruption.
Nevertheless, those technologies remain by and large experimental in terms of their extended effects in practice, and while there have been important criticisms raised in response to some of those effects (for example by the 2019 AI Now Report1), these have been necessarily reactive (i.e., as important critical responses to some of the negative unintended consequences of the digital experiment), and have not to my knowledge addressed themselves to the kind of foundational criticisms I have tried to draw attention to on this website. The particular focus of these criticisms is an emphasis upon the specific problem of the logical inconsistency of data when exchanged across digital domains.
Within the title page in this section, I have tried to show why it is that the logic that informs the meaning and interpretation of data within its domain of origin should not be considered as freely transferable outside of that domain, and why the torrid exchange of data across domains without regard to that issue is a potential source of global confusion and disarray. It has nevertheless been a tacit assumption of information scientists that computational logic does somehow transcend the limits of digital domains, and that the meaning and value of that data is somehow retained integrally within the data itself as it is liberally exchanged beyond its domain of origin.
That assumption is an error-in-principle. I have argued that in order for any digital data to retain its logical consistency it cannot be considered independently from the particular set of algorithmic rules under which it was derived, that those rules exhibit no universal applicability, and that all further uses of the data outside its domain of origin must be fully qualified in terms of those original rules; i.e., with respect to the original purposes and intents of the data. It has indeed been the purpose in part of the recent General Data Protection Regulation (‘GDPR’) to establish regulatory limits upon the reprocessing of subject data that arbitrarily exceeds its original intents and purposes.2 However, the problem of logical inconsistency is not limited to ethical issues concerning the integrity of individuals’ personal data, but one that potentially impinges upon the logical consistency of all data universally.
We may have experienced a range of failings and irregularities in our use of digital technologies that, if the faults were not anticipated to lie within the data itself, we were accustomed to attribute to human or systemic errors in the management and processing of the data, or to weaknesses in the security of its storage. There is a sense in which it appears that errors ‘just happen’ due to an essential incompatibility between the technology itself and the ways in which we are accustomed to work with it. While I do not wish to claim that all of these problems might be attributable to the problem of logical inconsistency I have tried to highlight in these pages, in drawing attention to that particular issue as a problem inherent in data-sharing, but one that has yet to be openly acknowledged by the industry itself, there may be some progress in the knowledge that there is now less reassurance in the idea that any data problem might be eradicated by removing the factor of human error from it, or by simply throwing more resources at it.
Whether it may be associated with the problem of inherent logical inconsistency or otherwise, it seems to me that all digital data is at least potentially redundant (out-of-date or simply incorrect) as soon as it is compiled. This is in the nature of data produced and stored digitally, as it is essentially static and resistant to change. How often have you come across personal or other data through the Internet which is incorrect in one or more essential details, but for which there seems to be no available means to amend it, nor any shared interest in maintaining its accuracy – the absence of which implies that the misinformation promises to persist thereafter indelibly? It is not an excessively wild extrapolation to project that the petty confusion and helplessness provoked in the researcher by such misleading information is only the microcosm of a related global data-disarray, one not limited to that shared by (yes) billions of users tapping incredulous queries into their mendacious devices, and who are lucky if they can act upon fifty percent of what they find there.
While there are clearly variations in the reliability between different categories of data (according to the relative integrity of its sources), serious and unforeseen vulnerabilities arise from the sheer ubiquity of data and the expectations placed upon it, in terms of its ability to faithfully retain its ontological value. A particular weakness is the hidden tenuousness of the value attached to subject-provided data (which somehow assumes that individuals never knowingly or unknowingly provide incorrect information on forms). Whichever way you look at it, there is inevitable disagreement between the body of data (however its limits are conceived) and its reference points, which is generally unanticipated and whose scale cannot be estimated. This should be understood in terms of increasing ‘entropy’ in the system (i.e., as a tendency towards increasing disorder in the system).
With the highlighted problem of logical inconsistency in mind, we should firstly consider: Is there any real ontological value in the sharing of any data outside its domain of origin? This must be the point at which the integrity of the data is first compromised and its vulnerabilities exposed – where its exchange-value outstrips its use-value. Exchange-value and use-value work quasi-independently, according to different logics, so that newly emerging exchange-value of data is calculated on the basis of a somewhat mythical (redundant) conception of its original use-value. Any new use of the data is both promiscuous and precarious, as it is too remote from that original use-value.
Expert fallibility and technical inertia
There is a serious imbalance, generally speaking, between the level of reprocessing done to data and the work done in evaluating that data; so that while data may enjoy unwarranted liquidity in the degree to which it is exchanged as a commodity, it nevertheless remains static and resistant to change. Computational systems are imbued with imaginary super-human capabilities, which promise to do all the work for us. A not entirely unintended consequence of rapid digital innovation has been the marginalisation of human engagement and concern in the granular management of all kinds of information, because digital technology frees us in varying degrees from the labour of that engagement. Unfortunately however, at the same time it encourages us to dispense with the methods and wisdom through which we had previously exercised such essential critical engagement and concern.
It is important to point out a factor which I’m sure every person with the least experience of digital encoding has felt, but the significance of which has not been fully appreciated by experts in the field – that there is a ‘top-heavy’ relationship between the degree of coding, testing, and hence debugging required to manage the distributed effects of deploying any particular digital procedure, and the limited practical needs intended to be served by that procedure. Practically, coding is always more complex than expected because of unanticipated concomitant effects that follow down-the-line from instantiations of or changes to the code. The result is an unforgiving gradient of required effort and attention from software engineers that results in a backlog of inertia and failure in information systems, with the result that at times end-users are led to raise the question: If it’s broke why not fix it? As the effects of these failures are largely down-the-line and remote from their sources, they will frequently be imperceptible as to their causes, which means that the expert’s honest response to that question, if it were ever openly given, is likely to be: I’m sorry but we don’t actually know how to.3
If the causes of failures in data processes tend to remain opaque to us, and also to those who design and manage those systems, the problem is already widely out of control. Will those failures ever indeed be fully remediable, either on the basis of improvements in the technology itself, or in our methods of applying it? To answer in the affirmative is to express some underlying faith in the idea that information technology is essentially unmotivated, neutral, and impartial – that it is implicitly benign, and effectively at our service, if only we could learn how to design it or to manipulate it appropriately. This needs some unpacking. The belief is firstly quite oblivious to the prospect that, aside from the effects of any human input, digital processes might in themselves be inherently responsible for the generation of inconsistencies and failures in the systems they populate. While that perception may have once remained occult and easily dismissible, by raising an alert attention, as I have attempted to do on these pages, to the unforeseen but very real problem of inherent logical inconsistency, such confidence is at least undermined.
New data tools tend to be marketed on the basis of their seductiveness as novel solutions to well-established problems. This strategy engenders a wide-eyed approach to problem-solving that is prepared to abandon established and proven methodologies in favour of revolutionary and unprecedented solutions – the prescriptive recklessness of “move fast and break things” – an approach which is inimical to the prospect of the unforeseen deleterious consequences that tend to arise, with apparent inevitability, from the use of these novel technologies. It is a recklessness borne out of the idea that any use of technology (by virtue of the fact that it displaces human involvement) cannot in itself be the cause of error, because by the nature of technology it is unmotivated and impartial, and in that sense implicitly benign. The error must therefore result from the fact that we have employed the technology in some unrefined manner – that we are in effect infants in the use of this technology which is itself in its infancy. We have committed ourselves to a very steep learning-curve, abandoning previous wisdoms and skills, in exchange for the naïve expectation that technology will ultimately provide some form of complete solution; while we remain inimical to the realisation that any single technological advance is likely to create as many new intractable problems as those it purports to solve.
[continues]
4 April 2021
(revised: 9 January 2024)
Footnotes:
- The 2019 AI Now Report, produced by the AI Now Institute, New York University. This Report addresses a range of socially regressive effects that follow from the use of advanced AI technologies, particularly within the labour market with respect to the ‘gig economy’ and the use of zero-hours contracts – practices which depend upon the widespread divestment of employment rights from workers, and which encouraged Yanis Varoufakis, during a recent TV interview, to identify the dominant features of this new economy under the attribute of “techno-feudalism”. The characterisation is apposite and suggests that the standard of rights enjoyed by gig-economy workers represents their subjection to a sort of motorised medieval serfdom. The Report is also concerned over regressive social consequences following the rapid expansion of public surveillance technologies, particularly in the area of facial-recognition systems, and their implications for individual privacy. From a majority feminine perspective, the report emphasises a tendency for AI technologies to create inherent algorithmic biases, typically entrenching existing patterns of inequality and discrimination, and resulting in the further consolidation of power amongst the already powerful, through the “private automation of public infrastructure”. CITATION: Kate Crawford, Roel Dobbe, Theodora Dryer, Genevieve Fried, Ben Green, Elizabeth Kaziunas, Amba Kak, Varoon Mathur, Erin McElroy, Andrea Nill Sánchez, Deborah Raji, Joy Lisi Rankin, Rashida Richardson, Jason Schultz, Sarah Myers West, and Meredith Whittaker; AI Now 2019 Report. New York: AI Now Institute, 2019: https://ainowinstitute.org/AI_Now_2019_Report.html. [back]
- Article 5 §1(b) of GDPR states:
“Personal data shall be:
[…]
collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes; further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes (‘purpose limitation’);”. [back] - A particularly poignant example of this problem – that the causes of software and systems failures tend to remain opaque to a large proportion not only of users, but also to those who manage those systems – is the catalogue of systemic errors experienced by sub-postmasters in the UK following the Post Office’s rolling-out of its Horizon branch accounting IT system, which began as a pilot scheme in 1996. The system was the cause of widespread shortfalls in the accounts being submitted by the company’s sub-post offices. These shortfalls were first reported in the year 2000. The Post Office had failed to investigate the problem initially, instead pursuing spurious allegations of false-accounting, fraud, and theft against as many as 900 sub-postmasters, 736 of whom were successfully prosecuted, with many being either jailed or bankrupted as a result. A team of forensic accountants, Second Sight, appointed by the Post Office in 2013, declared the Horizon system as “not fit for purpose”, and reported that it regularly failed to track certain specific forms of transaction. The Post Office, at that point already committed to private prosecutions against hundreds of innocent sub-postmasters, dismissed Second Sight’s critical report, and five senior Post Office executives declared, with spectacular arrogance: “We cannot conceive of there being failings in our Horizon system”. The Post Office has since generally relied upon confidentiality clauses as a means of deterring further enquiry and investigation into this monumental miscarriage of justice, and the false convictions of 736 sub-postmasters have only since 2021 begun to be overturned by the Court of Appeal, when 39 of the convictions were quashed, thanks in part to the work of the Justice For Subpostmasters Alliance (JFSA). See also Nick Wallis’ extensive blog on the case at: https://www.postofficetrial.com. [back]