Inspired by the book Autumn by Karl Ove Knausgard (KOK) in which he writes about a different object each day (for his unborn daughter in order to teach her about the world), I decided to generate a novel using this same structure but from the computer’s perspective. With this exercise, I hoped to explore the following questions: what have computers learned about the world? And what would computers tell us about the world if we asked them about it in 2019?
I decided to use the top 100 books from Project Gutenberg from the last 30 days as a proxy for all the things we have collectively been teaching computers (“computers” here is also a proxy for really what should more specific questions about individual algorithms or computational systems).
I took the top 365 nouns from these books, one for each day of the year. Then I pulled out all the sentences that included each of these words to create a “corpus” for each object. For each object I then used markovify to generate 100-200 words “about” that object (5 sentences).
KOK starts his entries about objects often with technical descriptions (followed by more narrative-type text), so I looked into adding wikipedia descriptions using the wikipedia python library. I had hoped that I would be able to easily add the first part of the, but got many errors because there was not an exact match for a wikipedia entry. I tried to instead choose the next wikipedia entry, but often this would throw disambiguation errors too or lead to a very random description (about a song for example). With this method of going down the line. For now I left these cases (32/365) blank if there was not a good match and I will plant to fix these in the next phase (I think I may have to fix them by hand unless I can think of a better way to do it computationally). For the ones that matched, I included the first three sentences of the wikipedia entry.
The output doesn’t make much sense, but I do find the juxtaposition of the technical text with the generated text interesting at times when they end up connecting or reflecting each other - in particular seeing which elements carry over. I also find it a fun exercise to try to guess which novels the generated text is pulling most from and thinking about why that is. I have kept the output in the order of most common to least common nouns, which gives another kind of insight into the type of source text this uses - I think it says a lot that the most common noun is “man”. My hope with this project is that these kinds of questions can prompt deeper thinking into what we are “teaching” computers and the care and intention we put into that act.
Here are some links:
Github repository which has the following: