Verified Facts

03 Feb, 2013

EDIT: A lot of people who have seen this site on Reddit and Hacker News have assumed that we did something fancy to get the conspiracies flowing. We didn’t! There is no “Markov chain generation” in this project, just a LOT of writing and some clever sentence linking-rules. See Ian’s post about how he programmed this project to learn more.

Recently, my friends Ian Webster, Emily Snowman, and I made a website that generates conspiracy theories. (ED NOTE: the site no longer exists and I've removed it from this blog post)

It’s essentially a very detailed mad-libs generator. It mixes and matches about 33 pages of content we wrote over the course of two weeks. Although the conspiracy theories it generates are not always very coherent, it does attempt to create an illusion of intent or coherency by getting the main sentences in each conspiracy to share nouns.

How it works

Each passage is made up of two kinds of sentences: “main sentences” and “filler sentences.” Main sentences look like this:

Studies show that people who spend too much time in Fukushima frequently end up with incurable cases of age spots. This trend is consistently repeated all the way back through the Bush wars, when the USGS first set up shop in Fukushima.

but the content we wrote looks more like this:

Studies show that people who spend too much time in {{place1}} frequently end up with incurable cases of {{malady}}. This trend is consistently repeated all the way back through {{era}}, when {{government_org}} first set up shop in {{place1}}.

The items in {{brackets}} stand for words which will be substituted in from our massive list of content variables. There are 12 categories of content variables: maladies, dangerous nouns, eras, abstract nouns, government organizations, companies, countries, civilian organizations, events, places, famous people, and government people. I haven’t counted all the individual nouns we have in the system, but it appears to be over 650. Some of them are silly and a bit flippant; some of them are the kind of things that conspiracy theorists might actually say. Others are historically problematic and make me uncomfortable whenever they show up! Filler sentences are static, and contain no variables. They say practically nothing, too.

The passages this program generates each have a minimum of 4 main sentences: an introduction, two arguments about “evidence,” and a conclusion or warning. There can be additional evidence sentences, and there is a chance for one “filler” sentence to appear between any sentence after sentence 2. Here’s a conspiracy it generated, with the parts labelled:

structure

When the program first begins constructing a random passage, it selects one introductory sentence and a bunch of nouns to fill it with. It then attempts to select a second sentence that shares at least one of those noun fields. There are never any filler sentences between these first two main sentences; this helps the passage feel more like someone is telling you a story about something instead of just linking random phrases together.

"In an enormous mansion hidden in a dark forest, the Illuminati met to plot vast initiatives that have affected our daily lives.

Our government officials are working to pass legislation which would classify innocent protests against DDT as acts of terrorism--and the Illuminati's lobbyists are deeply involved in this campaign."

The process continues. For each main sentence, the program selects a sentence that shares noun categories with the immediately previous main sentence, then fills those categories with nouns used in the earlier sentences, generating new nouns to fill new categories if necessary. The nouns shared between later sentences don’t have to be identical the noun shared between the first two sentences. There are certain classes of nouns which the program prefers NOT to link: for example, it would rather share a {{person}} between two sentences than an {{era}}, an {{abstract_noun}}, a {{dangerous_noun}}, or a {{malady}}. The program also attempts to bring back previously-mentioned nouns, even if that noun has not been consistently shared. If the {{person}} “Oprah” shows up in sentence 1, but not in main sentences 2 or 3, she could still show up in main sentence 4 or 5.

The effect of this passage-structuring system is that our passages tend to feel rambly and unfocused, but that they also have a chance of returning to original topics or making neat circular arguments. We feel that this is an appropriate way for conspiracy theories to behave. I’ve noticed that writings by real conspiracy theorists tend to display bizarre, barely-penetrable dream-logic, as if the author is following directions on a map nobody else can read. Therefore, when it came to verisimilitude, we found it acceptable to write a system that assembles passages randomly, with no real comprehension of related “topics” or sentence subjects.

The project also contains fake footnotes and relevant page titles. Page titles can list one or two nouns. The first noun listed is always the noun shared between sentences 1 and 2. If this noun is also the noun repeated most-often throughout the conspiracy, no other noun is listed. But if another noun is listed more often than this first noun, that noun will also be included in the conspiracy title. Footnotes, on the other hand, are just selected randomly and in random quantities from a collection of works found via Google Scholar.

Problems and solutions

We are aware that much of the time, our conspiracy theories sound nothing at all like actual speech, or like actual coherent conspiracy theories. However, we have settled with this in favor of solutions which would have harmed the overall quality of the project.

We could have trained the program to recognize the subject of a sentence. We could have done this by adding additional tags to indicate which noun field in each sentence was the subject. We then could have written a linking algorithm which paid more attention to sentence topics/subjects. However, this would have taken a very long long time and probably a lot of backtracking and content re-writing, and we were treating this as a short project.
We could have written the main sentences in a more vague or homogenous way, so that they resembled one another more closely in tone and style. However, this would have forced us to make the content overall more “mellow” or “middle of the road”, which would have made the project less hilarious.
We could have had a more strict linking system, where each sentence in a passage shared the same noun or couple of nouns. This would have made the sentences less rambly and more uniform, which was not something we found appealing, stylistically.
We could have been more judicious about using certain noun categories more frequently. Right now our collection of sentences contains far more instances of {{government_org}}, {{dangerous_noun}}, or {{organization}} than {{era}}, {{abstract_noun}}, or {{event}}. I would like us to even out noun category usage over time, but right now, I think the imbalance is OK, since it gives more weight to the kinds of noun categories that are traditionally associated with actual conspiracy theories.

We are satisfied with the current state of the project but figure that there’s a lot of work we could still do on it.

About mad-libbed content

I have previously tried to complete other mad-libbed content projects, but have always failed to actually implement any systems. This is the first time I have seen a random-prose-generation project through from start to finish.

In the end, content proved to be both our greatest strength and our greatest weakness. Our system was entirely creator-content-based; the program had no ability to write its own sentences, and users could only contribute individual nouns in predefined categories, via the “Search” feature. No matter how sophisticated our linking systems are– and no matter how sophisticated they might become in the future, after more development– the quality of our content will still be our major limiting factor.

For example, when we first started working on the project, there was no distinction between {{organization}} and {{government_org}} and no distinction between {{famous_person}} and {{government_person}}. Additionally, we had a lot of people in those lists who were long dead– so long dead that the didn’t make sense in most of the sentences they’d turn up in. At one point, we had to go through the variable lists and split everything up and cull a ton of words.

We had similar problems with the main and filler sentences. Sometimes we’d write a sentence that sounded good by itself, but made no sense in the context of other sentences. Sometimes we’d write introductory sentences that had too many variables, and seemed to tell too complete a story– so every sentence that followed would sound unrelated and weird.

Every time Ian (who programmed the entire project) made adjustments to the linking systems, our passages improved– but they could improve only so long as the sentences made sense. I think the most time-consuming, dull, and frustrating barrier we had to overcome was the degree to which poor content limited us.

Conclusion

We hope to continue updating the site, but we will always have to navigate a tradeoff between conspiracy “coherency” and other qualities we value, like randomness, rambliness, high variance, and the sheer size of the project overall. (The sentences themselves contain over 9500 words!)

If I ever have to work on a project where I must write a mad-lib content generator, I will prefer the passages to be much less detailed than the ones in this project. I’d prefer to work with passages consisting of only a few sentences– three sentences, say, and no filler– and much less than 12 variable categories. With good content, such a project could seem just as varied and interesting, while requiring a fraction of the necessary man-hours to implement, and granting the creators more fine control over the final generated results.