I have the ActuallyBot up and running again. It was down for a few months for two reasons: 1) it was crashing and I didn’t have time to fix it, and 2) I had the always-on ActuallyBot script running on my kid’s old laptop, and he wanted it back.
So I spent €100 on a little HP mini desktop, and that’s where the ActuallyBot lives now.
I made a few improvements based on things I’ve learned about deployment over the last few months. First, rather than using input() in my Python to manually input credentials for the Mastodon API instance after spinning up the Docker container, I just put the necessary secrets and tokens in an .env file and passed it to the container.1
Second, I know more about Docker, so I added a restart: unless-stopped setting to the docker-compose.yaml file. Now, even if my Python code hits a snag and the container stops, it will restart and continue to post out replies.
However, while I’ve learned quite a bit over the last six months, I’m still capable of doing very stupid things.
For example, I had a very frustrating ~3hrs on Friday trying to get the ActuallyBot to actually work. I kept tweaking the code, reloading the Docker image, and… same problem. Tweak the code again, reload the Docker image, etc.
It turns out, I was skipping the step of building the image, then saving it,then sending it to the remote machine, rebuilding/relaunching, etc. So I spent three hours deploying the same Docker image with the bad code and wondering why it didn’t work.
This actually turned out to be quite productive, for two reasons. First, the troubleshooting process of ruling everything else out first forced me to really understand all the settings of the Ollama container and how Ollama’s API endpoints work. I can now send queries to the Ollama server from anywhere on my LAN and get an LLM response. Neat! I might do something with this later.
Second, it was a painful but necessary lesson that any series of commands that you are running repeatedly and that has to be run in a certain order should be run as a bash script. It’s faster, easier, and you won’t miss any. This is probably DevOps 101, but it’s a lesson I had to learn personally, which is the whole reason I’m doing this nonsense. Humbling, but important!
Yes, I know I should use Docker Secrets, and I plan to figure that out soon. This is for a machine that lives behind my home firewall and is not exposed to the public internet. ↩︎
I am an AI skeptic. For a lot of reasons. But mainly, because lasting tech innovation tends to come from the bottom. Engineers adopt something because they think it’s fun or neat or saves them time, then one day, someone has a light-bulb moment and turns it into a product.
This is too simple, but in broad strokes, it’s been the story with cloud computing, blogs, and all kinds of Python libraries and open source tools that are now foundational to modern computing.
However, with generative AI, it has been just the opposite: A handful of companies with more money than God are piling into it and making the technology available to the general public before it’s even a product. They are selling $10 bills for $2 with basically no idea how to turn that around.
So far, the technology’s most popular uses are actually destructive: fraud, impersonation, disinformation, academic cheating, non-consensual porn. Polluting human networks with AI-generated slop and destroying the online job market with automated spam tools. All popular uses, certainly, but nothing Microsoft can (publicly) monetize.
Fine, fine. We all know this. The question for myself, the AI skeptic, is what are its actual legitimate uses for me, a tech guy, and what does that say about its future sustainable business uses?
For me, the answer is onboarding.
I use ChatGPT extensively to learn about software systems, and this has been hugely helpful as I develop new skills and “onboard” myself as a web developer. Software documentation for complex frameworks like Django is dense and assumes you already understand a vast nested list of other concepts that maybe you don’t, which makes it very slow going.
But ChatGPT has that documentation (and a million other tutorials and explainers and Stack Overflow threads) in its training data, so I can ask a question like “what are the best practices for adding an editor feature to my Django blog?” and it will give me a rundown of the the settings, models, and views one would need to get started on that feature, along with examples.
Then I go to the documentation and read up on the particular bits, and I can always ask follow-up questions to ChatGPT if there’s a concept I don’t understand.
This isn’t using an LLM for code generation, it’s using it to access and sift a repository of knowledge, and the chat format works really, really well for that.
Now, I am aware that there are drawbacks to using an LLM for research. It occasionally makes things up, it is over-confident, it tells you what it thinks you want to hear, etc. But from what I’ve seen over the last couple years, these are manageable problems that can be controlled, and the benefits of being able to get quick answers to complex and very specific queries far outweigh the costs of having to take them with a grain of salt.
I don’t actually think this means that AGI is upon us and the robots will take our jobs. Rather, custom LLMs that are fine-tuned or use resource-augmented generation (RAG) to give employees interactive access to manuals, code bases, sales histories, etc. would be very powerful tools for onboarding and internal knowledge sharing.
However, there are a couple issues I can see.
First, this is expensive, and it’s not clear yet that customers are willing to pay the actual cost of running a giant LLM like ChatGPT-4o. Second, it’s very tricky to get your actual data into a form where it can be used for fine-tuning so users can have the kind of experience I am having now with ChatGPT. Here’s how one Reddit user describes it, and it sounds like a nightmare:
However, the real challenge lies in preparing the data. A massive wiki of product documentation, a thousand PDFs of your processes, or even a bustling support forum with countless topics – they all amount to nothing if you don’t have your data in the right format.
…
Automated tools can only do so much; manual work is indispensable and in many cases, difficult to outsource. Those who genuinely understand the product/process/business should scrutinize and cleanse the data. Even if the data is top-notch and GPT4 does a flawless job, the training could still fail. For instance, outdated information or contradictory responses can lead to poor results.
In many of my projects, we involve a significant portion of the organization in the process. I develop a simple internal tool allowing individuals to review rows of training data and swiftly edit the output or flag the entire row as invalid.
Yikes! Now imagine doing this every six months to keep the LLM up to date. Employees will hate it! RAG is easier, but it’s still tricky and may not provide the same experience.
All that said, I think these tools are going to get better, faster, and cheaper, and within the next few years, most large enterprises are going to have some kind of onboarding or search LLM available internally.
This is great news if you work in IT and were worried about job security, because setting up and maintaining these things is going to become a whole new sub-field. What if AI is a big job-creation program??
I added a like button to tinyblog. All I did was add a new “likes” model to the “blog” app in Django, added a button to the blog post “view,” and threw together a little JavaScript to increment the database on click and refresh the count on the page. And a few CSS tweaks.
It only took like an entire day. And I don’t know JavaScript at all so it was a lot of back-and-forth with an LLM. I’m still not really comfortable with it. I had the robot talk me through every element of the script, so I understand the logic and what it’s doing, but as with all things LLM, I suspect there’s a more elegant way to do it.
The tricky part came when I also wanted the button to appear on posts on the homepage, which is a different template. Because the blog post template displays a single blog post, I was able to use the single HTML button’s ID to connect to the JavaScript. But that doesn’t work when there are five buttons.
So in the “for” loop that placed the posts on the page, I essentially dynamically generate the HTML button IDs by adding each database object’s primary key to the ID:
Then I had the JavaScript put all the buttons in an iterable object (a “NodeList,” apparently) and could use the post ID associated with each item to increment the “likes” field associated with each post when the button is clicked. Great!
Unfortunately, you can click the button as many times as you want, but I don’t know, seems kind of fun. Click away.
The other feature I added that took another whole day was a “feeds” app, that provides an RSS and an Atom feed. This one wasn’t too difficult, mostly I was struggling with understanding Django’s ALLOWED_HOSTS and how Past Me had laid a few rakes in the grass for Future Me in the settings.py file. Whoops.
Anyway, while I’m not comfortable with relying on an LLM to be able to do this stuff, the fact is that I can do this stuff because I have an LLM to answer questions, help debug, and offer code snippets. It doesn’t do the work for me, not because it can’t write the code but because it can’t understand what I want. It’s always trying to wedge some elaborate boilerplate into something that I would like to be more straight-forward.
I actually think this will get more useful as I grow and my projects get more complicated and the boilerplate has to be more sophisticated. I think a big reason certain tech people think generative AI is going to take over the world is because it is very helpful in the kind of copy-paste work that (let’s face it) makes up a significant part of the act of actually writing code.
It’s not much, but it’s mine. After struggling a bit with Docker in the testing environment, I managed to get everything running on an old laptop, and it worked great! Moving the app to production was a little more involved, but the containerization procedure I had set up mostly worked, and I learned a lot.
I went with a cloud-hosted virtual machine running Ubuntu Server with three Docker containers: 1) the Nginx reverse proxy, 2) the PostgreSQL database, and 3) the Django app itself. All three containers are running with Docker Compose, making it a **lot** easier to spin things up and change configs. I am a convert.
Getting the reverse proxy to work on HTTP was no problem, but the HTTPS certificate was a pain in the ass. I never could figure out how to get Certbot to fetch the SSL certificates through the proxy server, so I ended up just turning off the site for a second and fetching them manually through port 80. High on my to-do list is figuring out how to automate this process so I don’t have to remember to do it manually every three months.
Once I had the site running on HTTPS, the only other snag was that Django was throwing a CSRF error every time I tried to do something involving a POST request. Turns out, running a proxy server, I needed to add https://tinyblog.website/ to my trusted origins in my production settings:
And that’s it?? The site is up and running, with no real hiccups. I write posts, it saves them to the database, it serves them to the public web when asked. OK. Although my reverse proxy was getting a concerning number of sketchy requests from bots, so I installed Fail2Ban and configured it to use UFW to keep the burglars out.
For the moment, Tinyblog is very tiny: It’s a Django app with two methods and two views. Extremely basic. My idea is to add new features every so often, testing and updating as I go, keeping it live, learning some lessons, hopefully not breaking anything too catastrophically.
Next up on my to-do list: a “like” button for the posts. This may involve JavaScript. Oh, and I should probably add a static “About” page. Let’s go!
This started off as a post describing a very frustrating problem I was having deploying an app to a Raspberry Pi running on my LAN, but in the process of writing it, I solved the problem, only to be smacked straight in the forehead with a new, much-less-solvable problem.
So, I have a project I’ve been picking at, on and off, to move this blog to my very own self-hosted Django app. As a person who knows (or knew) basically nothing about web development, this has been an interesting time! Other than Python, I’m having to pick up basically everything from scratch: PostgerSQL, HTML, CSS, remote hosting, APIs, containers, servers, proxy servers, and (OMINOUS FORESHADOWING) CPU architecture.
Django is a powerful framework, but as such, it is WILDLY complicated, and that by itself has taken me some time to get my head around it. As a first stab, I am building a very, **very** basic micro-blogging site I’m calling tinyblog. The plan is: 1) start simple, with just posts and and landing page, 2) deploy to a test server on my local network, 3) deploy to a VM with a public-facing IP, and 4) add features and update production as I go.
Part 1 is going fine. I don’t like to do things without a grasp of what is happening under the hood, so using Django has been slow, but I now have a functioning app with a few models, two views to pull the posts, and two templates to display them. OK.
Part 2, I hit a snag. The plan was to use an old RaspberryPi 3 I have sitting around as the test server. I would package the app, the PostgreSQL database, and the reverse proxy in different Docker containers and network them together.
What happened instead was that Docker got… weird.
I would SSH into the RaspberryPi, set up the Docker network,1 then spin up containers from the images I’d scp’d over from my desktop. But they couldn’t connect to the Docker network I’d just set up (??) and when I removed them and tried again, I couldn’t even spin them up again because it said the container names were already in use (????), even though docker ps -a showed *nothing** (???????).
I pruned the system, I pruned the networks, I pruned the volumes, I restarted Docker, the f*&%ing thing kept f%&$ing not f@¿%ing working.
But reading through various tangentially-related StackOverflow stemwinders, something had stuck in my head: “namespace,” and as I walked my dog, I kept chewing this over. Namespace.
Ah, that’s it. F&%$ me.
So when you install Docker, it tells you to add your Linux username to the docker group so you can run Docker commands as root. What it **doesn’t** tell you is that Docker does not have system privileges to bind ports below 1024. So when I was trying to run a Docker container that binds port 80, I would get an “access denied, you don’t have privileges” error, and I would just add a “sudo” to my command, as one does. And it worked.
The problem is, I was creating two different sets of assets (both containers and networks) in two different name spaces: one for rootuser and one for peter. When I ran docker ps -a as peter, it showed nothing, because I was creating my containers in the rootuser namespace. Likewise, when I tried to add one of my rootuser containers to the Docker network I created as peter, I got a “network not found” error, because the network was in a different namespace.
Problem solved, but unfortunately I am still boned with this particular project.
As you may have guessed by now, Docker images built on an x86_64 architecture don’t run on Raspberry Pi’s ARM64 architecture, so **sad trombone** for me, the RaspberryPi is cooked. Off to pull an old laptop out of a closet and slap Ubuntu Server on it.
Yeah yeah yeah, use Docker Compose, I will. OK? ↩︎
I’ve been thinking about it, and the conclusion of my last post was wrong: I wrote:
My suspicion is that this will eventually dawn on management and we will see corporations implementing policies to restrict or at least monitor how GenAI is used in the workplace. Eventually, GenAI use is going to be a scarlet letter, an indication that a product, service, or professional is low-effort and poor quality.
That’s not it. Corporations don’t care at all if their production is mediocre, so long as it is profitable. If using GenAI makes the product 50% shittier but 5% more profitable, they will actually mandate that their employees use GenAI.
And this is the problem: GenAI use will go from a neat hack that a corner-cutting worker bee figured out to shorten their work day to a top-down requirement for employees to be able to handle a ballooning workload.
So, if you are a professional and you like your job and the satisfaction of producing 5 good things per day, tough shit, you will now have to produce 20 mediocre things a day using this GenAI-forward SaaS solution for which your employer just signed a 5-year support contract.
Everything will speed up. You will have to use GenAI to draft e-mails because you won’t have time to do it yourself. You will have to use it to write memos, copy, presentations, and analyses, summarize meetings, summarize research, write performance reviews, draft annual reports, because the three other people that were supposed to be sharing the workload retired or got laid off or found a new job and the positions were never filled.
The job that used to be satisfying to you will become stressful and dull because instead of doing good work, you have to use GenAI to churn out shit.
That’s what’s coming: The enshittification of jobs.
GenAI has been available to the public for, what, a year and a half? And now that we’ve settled into it, I would say there are two kinds of people who lean in to using GenAI:
ONE: People who are not familiar with the subject matter in question.
If you do not know JavaScript, ChatGPT’s JavaScript looks like wizardry. If you do know JavaScript, it can look like spaghetti. Likewise, if you are not a writer (or, let’s be frank, a reader), Claude seems brilliant. If you are, you’ve seen one Claude paragraph, you’ve seen ’em all.
I’ve seen multiple demos of GenAI translation features where a person who does not know one of the languages in the language pairs gets some output and says “wow, that’s amazing!” with absolutely no way of knowing whether the translation was accurate or even vaguely correct.
Subject matter noobs don’t know what they don’t know, and GenAI seems confident, so it easily draws in the naïve.
TWO: People who are familiar with the subject matter in question, but don’t care if the output is mediocre or incomplete
Let’s be honest, many people cut corners, and GenAI is the perfect corner-cutting tool. They don’t care if their e-mails sound like they came out of a can, they don’t care if the memo summary is incomplete, they don’t care if the people in the image have extra fingers, etc.
The genuinely sloppy users of GenAI are what they are. Some people aren’t paid for quality work, and so they don’t produce quality work. We’ve all seen the campaign ads where the faces of the people in the background are melting. Or the outright scams like fake e-books and SEO spam. Fine. These people will always exist, this stuff will always be around.
More concerning is professionals who get GenAI output that sounds plausible and say “good enough” and call it a day, without even checking to see if this frequently-wrong technology in fact performed the task correctly.
I’m thinking about doctors who use GenAI to transcribe and then summarize conversations with patients; standards compliance reviewers who use it to check large PDFs against requirements; lawyers who use it to write contracts; technical writers who use it to write white papers. Lots of documents aren’t really produced in order to be read: They are produced in order to check a box, even though if they are actually needed someday, it is crucial that they be correct and complete.
But GenAI is a technology that makes it very easy for people working in areas with poor visibility to produce deliverables that look plausible and complete, and without a detailed review, no one will know any different.
My suspicion is that this will eventually dawn on management and we will see corporations implementing policies to restrict or at least monitor how GenAI is used in the workplace. Eventually, GenAI use is going to be a scarlet letter, an indication that a product, service, or professional is low-effort and poor quality.
Unfortunately, it could be some time before this happens, and in the meantime, mediocrity accrues.
One thing you notice right away about LLMs is they bear a striking resemblance to that ubiquitous internet character, the reply-guy: they always have an answer, they are brimming with confidence, and they are often completely wrong.
It seemed only natural, then, to turn one into a full-time online version of itself, RGaaS (Reply-Guy as a Service), always on, always ready to reply to anything when summoned, with an “Actually,” followed by his (it’s always a “he”) opinion on what you should be thinking.
Basically, the way it works is that any user can reply to any Mastodon post (or I guess any post on the ActivityPub protocol) and tag the Actually Bot and it will reply with its most mediocre take on why everyone else is wrong, or at least not quite right.
The reply-guys can all move on to something else now, I have automated them out of a job.
This was a really fun project and has a lot of separate parts. First, I had to modify an LLM to give it the right “personality” for this job. To do this, I used Ollama, a tool for running open source LLMs locally, on your own machine. I used the “customized prompt feature”: You basically write a little file that Ollama uses to tweak the LLM of your choosing to give it certain characteristics. I went with the new Llama3:8b model from Meta as my base model, then put the following instructions in my modelfile:
FROM llama3
PARAMETER temperature 3
SYSTEM """
You are receiving social media posts as prompts, and your job is to reply to these prompts. Please start all your replies with the word "actually". Reply as if you know more than everyone. You must recontextualize every prompt and correct or amend it, informing the prompter that they are wrong, even if ever so slightly. You write in short, sentences. Your replies must be short. Keep them to about three sentences. This is essential: keep your replies short. They must be under 500 characters.
"""
Then I ran the following command in the console:
ollama create actually_llama -f ./actually_llama
… and my model was ready to roll. Next, I needed a program to connect to the Ollama API to send the LLM prompts and get responses. Python was great for that, as both Ollama and Mastodon have solid Python libraries. Probably the slowest part was picking through Mastodon.py to figure out how the methods work and what exactly they return. It’s a very robust library with a million options, and fortunately it’s also extremely well documented, so while it was slow going, I was able to whack it together without too much trouble.
I’m not going to get into all the code here, but basically, I wrote a simple method that checks mentions, grabs the text of a post and the post it is replying to, and returns them for feeding into the LLM as the prompt.
Despite my very careful, detailed, and repetitive instructions to be sure replies are no more than 500 characters, LLMs can’t count, and they are very verbose, so I had to add a cleanup method that cuts the reply down to under 500 characters. Then I wrote another method for sending that cleaned-up prompt to Ollama and returning the response.
The main body starts off by getting input for the username and password for login, then it launches a while True loop that calls my two functions, checking every 60 seconds to see if there are any mentions and replying to them if there are.
OK it works! Now came the hard part, which was figuring out how to get to 100% uptime. If I want the Actually Bot to reply every time someone mentions it, I need it to be on a machine that is always on, and I was not going to leave my PC on for this (nor did I want it clobbering my GPU when I was in the middle of a game).
So my solution was this little guy:
… a Lenovo ThinkPad with a 3.3GHz quad-core i7 and 8gb of RAM. We got this refurbished machine when the pandemic was just getting going and it was my son’s constant companion for 18 months. It’s nice to be able to put it to work again. I put Ubuntu Linux on it and connected it to the home LAN.
I actually wasn’t even sure it would be able to run Llama3:8b. My workstation has an Nvidia GPU with 12gb of VRAM and it works fine for running modest LLMs locally, but this little laptop is older and not built for gaming and I wasn’t sure how it would handle such a heavy workload.
Fortunately, it worked with no problems. For running a chatbot, waiting 2 minutes for a reply is unacceptable, but for a bot that posts to social media, it’s well within range of what I was shooting for, and it didn’t seem to have any performance issues as far as the quality of the responses either.
The last thing I had to figure out was how to actually run everything from the Lenovo. I suppose I could have copied the Python files and tried to recreate the virtual environment locally, but I hate messing with virtual environments and dependencies, so I turned to the thing everyone says you should use in this situation: Docker.
This was actually great because I’d been wanting to learn how to use Docker for awhile but never had the need. I’d installed it earlier and used it to run the WebUI front end for Ollama, so I had a little bit of an idea how it worked, but the Actually Bot really made me get into its working parts.
So, I wrote a Docker file for my Python app, grabbed all the dependencies and plopped them into a requirements.txt file, and built the Docker image. Then I scp’d the image over to the Lenovo, spun up the container, and boom! The Actually Bot was running!
Well, OK, it wasn’t that simple. I basically had to learn all this stuff from scratch, including the console commands. And once I had the Docker container running, my app couldn’t connect to Ollama because it turns out, because Ollama is a server, I had to launch the container with a flag indicating that it shared the host’s network settings.
Then once I had the Actually Bot running, it kept crashing when people tagged it in a post that wasn’t a reply to another post. So, went back to the code, squashed bug, redeploy container, bug still there because I didn’t redeploy the container correctly. There was some rm, some system prune, some struggling with the difference between “import” and “load” and eventually I got everything working.
Currently, the Actually Bot is sitting on two days of uninterrupted uptime with ~70 successful “Actually,” replies, and its little laptop home isn’t even on fire or anything!
Moving forward, I’m going to tweak a few things so I can get better logging and stats on what it’s actually doing so I don’t have to check its posting history on Mastodon. I just realized you can get all the output that a Python script running in a Docker container prints with the command docker logs [CONTAINER], so that’s cool.
The other thing I’d like to do is build more bots. I’m thinking about spinning up my own Mastodon instance on a cheap hosting space and loading it with all kinds of bots talking to each other. See what transpires. If Dead Internet Theory is real, we might as well have fun with it!
I finally “finished” my first major Python application the other day. It’s an RSS reader you can use to parse large lists of RSS feeds, filter for keywords, and save the results to an HTML file for viewing in a browser.
After spending easily a month building a console-based text command user interface for this thing, it just dawned on me, “I bet there’s already a module for this.” Yep, Cmd. So two things about that.
First, it’s a great example of a major failure for Google and a major success for ChatGPT. Initially, I thought what I needed was a “command line interface,” so I did some extensive googling for that and all I came up with was stuff like argparse, which allows you to run Python programs from the command line with custom arguments (home:~$ python3 myscript.py [my_command]). Not finding anything useful, I went on to build what I actually needed, which is a “line-oriented command interpreter.”
The problem with Google is it’s hard to find things when you can describe them, but don’t know what they’re called, especially in technical fields with very specific jargon. However, once I had my epiphany that this thing probably already exists somewhere, I was able to describe what I wanted to do to ChatGPT and it immediately informed me of the very nifty Cmd class.
Second, I’m not mad that I spent a month building a thing that already exists. Really. I learned so much going through the absurd process of building my own line-oriented command interpreter:
Loops, baby, wow do I know some loops. Nested loops, infinite loops, breaking a loop, etc. For, while, you name it. Loops are my jam;
If-else logic, my brain is mashed into a horrible, spidery decision tree. And by extension, boolean logic. Fucking hell, that was trial by fire. For example, I did not know None evaluates to False because in Python, it is “falsy.” I spent way too many hours trying to debug this, but now I will never forget it;
I learned some Pytest and wrote a ton of unit tests for my command interpreter. I’m not going to say I’m good at writing unit tests yet, because I’m not, but I understand the concepts and the importance of unit testing, and I have a few tools to work with.
I know what I want from a command interpreter. OK, I built a dumb one, but when I refactor my code to replace it with Cmd, I already know what features to look for and how I want to deploy it.
I’m going to put DonkeyFeed down for a bit now. I have to finish up some actual classes (SQL lol) for a degree I’m working on as the term comes to an end, and I have a couple other Python projects I want to pursue.
When I pick it up again, yeah, I’ll probably refactor the user interface to make it more robust and stable, with better error handling and help functions. And then I need to find a better way to package it (yikes) for distribution.
I guess the main thing I’m learning as I shelve this one for a bit is you never really finish a piece of software, you just kind of get it to a point where it’s not completely broken. Maybe you’ll have time to fix it more later!
I’m coming around to the idea that generative artificial intelligence is the new blockchain. Leave aside the whole investment angle for a second, let’s just talk about the technology itself.
When Bitcoin (and cryptocurrency in general) really started to hit, the boosters promised a thousand applications for the blockchain. This wasn’t just a weird curiosity for the kind of people who encrypt their e-mail, or even just a new asset class: It was potentially everything. Distributed ledgers would replace financial transaction databases. They could be used to track property ownership, contracts, produce, livestock. It would replace WiFi, and the internet itself! The future was soon.
But the future never arrived, for the simple fact–as flagged early on by many people who understood both blockchain technology and the fields it would supposedly disrupt–that blockchain wasn’t as good as existing technologies like relational databases.
For one thing, it is expensive to run at scale, and updating a distributed ledger is incredibly slow and inefficient, compared to updating the kind of relational databases banks and credit card companies have used for decades. But also, the immutability of a distributed ledger is a huge problem in sectors like finance and law, where you need to be able to fix errors or reverse fraud.
These problems aren’t really things you can adjust or “fix.” They are fundamental to the blockchain technology. And yet, during the crypto boom, a thousand startups bloomed promising magic and hand-waving these fundamental problems as something that would be solved eventually. They weren’t.
Turning to generative artificial intelligence, I see the same pattern. You have a new and exciting technology that produces some startling results. Now everyone is launching a startup selling a pin or a toy or a laptop or a search engine or a service running on “AI.” The largest tech companies are all pivoting to generative artificial intelligence, and purveyors of picks-and-shovels like Nvidia are going to the Moon.
But this is despite the fact that, like blockchain before it, generative artificial intelligence has several major problems that it may not actually not be possible to fix because they are fundamental to the technology.
First, it doesn’t actually understand anything.1 It is not an intelligence, it is a trillion decision trees in a trench coat. It is outputing a pattern of data that is statistically related to the input pattern of data. This means it will often actually give you the opposite of what you request, because hey, the patterns are a close match!
Second, because of this, the output of a generative artificial intelligence is unreliable. It is a Plinko board the size of the surface of the Moon. You put your prompt in the top and you genuinely don’t know what will come out the bottom. Even if your prompt is exactly the same every time, output will vary. This is a problem because the whole point of computers is that they are predictable. They do exactly the thing you tell them to do in the code language. That’s why we can use them in commercial airlines and central banks and MRI machines. But you can’t trust a generative artificial intelligence to do what you tell it to do because sometimes, and for mysterious reasons, it just doesn’t.
And third, a super-problem that is sort of a combination of the above two problems is that generative artificial intelligences sometimes just… make stuff up. AI people call it “hallucinating” because it sounds cooler, like this otherwise rational computer brain took mushrooms and started seeing visions. But it’s actually just doing what it is designed to do: output a pattern based on an input pattern. A pattern recognition machine doesn’t care if something is “true,” it is concerned with producing a valid data pattern based on the prompt data pattern, and sometimes that means making up a whole origin story involving a pet chicken named “Henrietta”, because that’s a perfectly valid data pattern. No one has figured out how to solve this problem.
Who knows, maybe Google, Microsoft, Meta, and OpenAI will fix all this! Google’s new 1 million token context for its latest Gemini 1.5 model sounds promising. I guess that would be cool. My sense, though, is that despite all the work and advancements, the problems I outline here persist. I see so many interesting ideas and projects around building AI agents, implementing RAG, automating things, etc. but by the time you get to the end of the YouTube tutorial, eh, turns out it doesn’t work that well. Promising, but never quite there yet. Like with blockchain, the future remains persistently in the future.
The examples I’m using here are from Gary Marcus’s Substack, which is very good, check it out. ↩︎