‘Look what they done to my song’

When I first heard Melanie Safka’s “Look what they done to my song ma” I had no idea that it was a lament about the music industry which has now become a potpourri of writers, producers, session musicians, corporations and investors that exert a strangle hold over artists that are contracted to the big recording companies.

Enter the lament from singers and bands and mostly those who’s careers are now considered ‘inactive’ about how their songs have been ‘stolen’ to train ‘AI’ which seems to be regarded as some quasi human entity that listens to their recordings a steals away in the stealth of the digital night with said same tucked safely under its arms. Well; it’s a furfy, best kept to tales told around a fire to ward off the ghosts of the night.

Here’s a heretical statement. Music, images and texts as such, don’t exist in the digital / online world. The only things that have a corporeal existence are the files used to reproduce the same. The only time music, images and texts are ‘real’ is when they have a physical form. Outside of that, they are essentially just simulations, or in plain speak; reproductions of file content.

Web crawlers and the like have been around since the inception of the internet. They ‘scour’ online content ‘looking’ for header types that indicate what type of data a stored / linked file contains. They also look for stores of data that contain information about search histories, viewed content etc. Up until now no one has ever complained.

Enter the ghost in the machine; ‘Artificial Intelligence’. This algorithm / processing methodology was craftily ‘humanised’ at the outset to facilitate our pre-existing tendency to humanise (anthropomorphise) that which we can’t understand. Unbelievable successful.

But lets not waste time, you’ve been had. Our concerns about the issue and the algorithm / process itself aren’t even remotely aligned. What happens is roughly this. As far as stored online data goes; an appropriate file, when encountered is scanned and the file components (bits, as opposed to bytes) are converted to tokens. Tokens represent the smallest components of the file. At this stage the type of file i.e., sound. image or text is irrelevant. What we’re looking at here are relationships between parts. That’s what the ‘weight lifter’ with AI tattooed on its back is trained on. Not songs, not photographs, not paintings, not designs, not books, not web posts etc, relationships between parts. Nothing is stolen, or even borrowed / copied. The tokens are then transferred to the LLM.

The predicament people imagine they are in or maybe really ‘are in’ was facilitated by the shift away from physical products to digitised products. The soul of the creative process was sold down the river, and particularly for those in the music industry with the advent of streaming services. Without ‘digitised product’ the advent of AI into the domain of the ‘creative’ would be dead in the water.

Artificially Intelligent? or just a ‘clone ranger’

It’s been a while, a whole year nearly since posting to this site. A lot has happened some of which is worth talking about. It’s been the year of generative image making and ChatGPT. Myths and legends abound and for me it was a year to do some myth busting.

ChatGPT was my entry point. Through some extensive dialog with educators who had concerns about student usage and the supposed advantages thereof I found myself doing some system testing to see the extent of the application of generative text and to better understand the process. Leaving aside the standard “write me an essay about…….” I went straight for the throat and entered questions directly from HSC exam papers (relevant only to teachers in NSW Australia) What i discovered was that that ChatGPT struggled with contextualising the nature of the question and the response. Firstly given that most exam or essay questions are not questions but instructions; questions generally start with what, when, where, why, who, how etc and not verbs such as discuss, investigate, analyse, compare, describe, assess, clarify, evaluate, examine, identify, outline etc the responses were consistently very average; if you were to scale them most would fall into the ‘C’ range. So; no advantage to be had here. This then led to ‘why does this happen’ and to understand that I realised i had to look at ‘how’ this all happens. That led to looking at decoding and encoding text.

Machines, so it appears do not ‘read’ text as we do. Whilst the eyes and brain are involved in a decoding and encoding of the letter shapes and their relationships, machine learning involves encoding text something like this;

‘\x74\x27\x73\x20\x62\x65\x65\x6e\x20\x61\x20\x77\x68\x69\x6c\x65\x2c\x20\x61\x20\x77\x68\x6f\x6c\x65\x20\x79\x65\x61\x72\x20\x6e\x65\x61\x72\x6c\x79\x20\x73\x69\x6e\x63\x65\x20\x70\x6f\x73\x74\x69\x6e\x67\x20\x74\x6f\x20\x74\x68\x69\x73\x20\x73\x69\x74\x65\x2e\x20\x41\x20\x6c\x6f\x74\x20\x68\x61\x73\x20\x68\x61\x70\x70\x65\x6e\x65\x64\x20\x73\x6f\x6d\x65\x20\x6f\x66\x20\x77\x68\x69\x63\x68\x20\x69\x73\x20\x77\x6f\x72\x74\x68\x20\x74\x61\x6c\x6b\x69\x6e\x67\x20\x61\x62\x6f\x75\x74\x2e\x20\x49\x74\x27\x73\x20\x62\x65\x65\x6e\x20\x74\x68\x65\x20\x79\x65\x61\x72\x20\x6f\x66\x20\x67\x65\x6e\x65\x72\x61\x74\x69\x76\x65\x20\x69\x6d\x61\x67\x65\x20\x6d\x61\x6b\x69\x6e\x67\x20\x61\x6e\x64\x20\x43\x68\x61\x74\x47\x50\x54\x2e\x20\x4d\x79\x74\x68\x73\x20\x61\x6e\x64\x20\x6c\x65\x67\x65\x6e\x64\x73\x20\x61\x62\x6f\x75\x6e\x64\x20\x61\x6e\x64\x20\x66\x6f\x72\x20\x6d\x65\x20\x69\x74\x20\x77\x61\x73\x20\x61\x20\x79\x65\x61\x72\x20\x74\x6f\x20\x64\x6f\x20\x73\x6f\x6d\x65\x20\x6d\x79\x74\x68\x20\x62\x75\x73\x74\x69\x6e\x67\x2e\x20’

The above is the first paragraph of this post encoded in UTF-8 Hex code

Encoded in UTF-32 it looks like this; u+00000074u+00000027u+00000073u+00000020u+00000062u+00000065u+00000065u+0000006eu+00000020u+00000061u+00000020u+00000077u+00000068u+00000069u+0000006cu+00000065u+0000002cu+00000020u+00000061u+00000020u+00000077u+00000068u+0000006fu+0000006cu+00000065u+00000020u+00000079u+00000065u+00000061u+00000072u+00000020u+0000006eu+00000065u+00000061u+00000072u+0000006cu+00000079u+00000020u+00000073u+00000069u+0000006eu+00000063u+00000065u+00000020u+00000070u+0000006fu+00000073u+00000074u+00000069u+0000006eu+00000067u+00000020u+00000074u+0000006fu+00000020u+00000074u+00000068u+00000069u+00000073u+00000020u+00000073u+00000069u+00000074u+00000065u+0000002eu+00000020u+00000041u+00000020u+0000006cu+0000006fu+00000074u+00000020u+00000068u+00000061u+00000073u+00000020u+00000068u+00000061u+00000070u+00000070u+00000065u+0000006eu+00000065u+00000064u+00000020u+00000073u+0000006fu+0000006du+00000065u+00000020u+0000006fu+00000066u+00000020u+00000077u+00000068u+00000069u+00000063u+00000068u+00000020u+00000069u+00000073u+00000020u+00000077u+0000006fu+00000072u+00000074u+00000068u+00000020u+00000074u+00000061u+0000006cu+0000006bu+00000069u+0000006eu+00000067u+00000020u+00000061u+00000062u+0000006fu+00000075u+00000074u+0000002eu+00000020u+00000049u+00000074u+00000027u+00000073u+00000020u+00000062u+00000065u+00000065u+00…………etc, etc

So when a prompt is entered into ChatGPT it is first encoded so that it can be read. The response from ChatGPT is likewise scripted in machine language and then decoded into text. How does it work?

It’s primarily a predictive model, that predicts sequences based on learnings from ‘other encodings’, because that’s what the neural network reads. When this is understood a lot of the ‘myth-understandings’ about machine learning are to some degree dissolved. The better the quality of the language structure of the texts that neural networks are trained on the better the likelihood of a cohesive albeit somewhat standardised, (bearing in mind the encoding and decoding sequence) response.

to be continued @ STAGESIX

  • Code points are numbers that represent Unicode characters. “A code point is the atomic unit of information. Text is a sequence of code points. Each code point is a number which is given meaning by the Unicode standard.”
  • Code units are numbers that encode code points to store or transmit Unicode text. One or more code units encode a single code point. Each code unit has the same size, which depends on the encoding format that is used. The most popular format, UTF-8, has 8-bit code units. @https://www.coderstool.com/unicode-text-converter
  • Code points are converted into ‘tokens’. The relationship between tokens is calculated in relation to the ‘prior learning’ in the LLM.