Ben Willenbring

View Original

Machine-Generated Art Descriptions v1

Version 1 of art descriptions output by my super ghetto natural language processing algorithm, which parses text from 277 modern art exhibits in 4 New York City art galleries –– David Zwirner, Gagosian, Gladstone, and Hauser and Wirth. As expected, it’s laughably bad. But it’s a start. Now that I can see what it’s generating, I can work on specific improvements. The genesis of this idiotic project can be found in an earlier blog post.


How the sausage is made…

Step 1: Develop intake scripts to programmatically build up a Corpora of text

  • 4 New York City art galleries are crawled by my scripts, from which content is gathered from…

  • 483 individual pieces of art described in…

  • 9,467 sentences with…

  • 281,800 words composed of…

  • 1,550,999 characters

Step 2: Sift through the corpora to locate sentences with 1 or more stem words

  • Stem words are simply words that share the same word stem: eg: run, runner, and running all have the same stem: run

  • I am not doing any fancy lookups with correlated words with weighted affinities

  • I generate sentences from a string of related stem words

Step 3: Run transformations on my generated sentences to produce novel output

  • Replace all Names (proper nouns) with generated names of the same gender

  • Replace all nouns, adjectives, verbs, and adverbs with their shortest synonym/counterpart

  • Print out 4 sentences


Stems = grapple, quest, yearn…


Stems = youth, glory, immortality


Stems = digital, internet, technology…


A few directions I’m interested in pursuing

  • Getting simple stuff out of the way:

    • Article noun agreement – She ate a apple

    • Subject verb agreement – She go to the movies.

    • Tense agreement: He goes to the movies with his friend to saw the film

  • Establish context better before fetching content from the corpora – don’t just rely on word stemming. Definitely look into lemmatizing. Also, with wordnet, you can use hyponyms and hypernyms to get word substitutes that more closely align with the sense of a word in the context of the sentence from which it was plucked (pretty cool!)

  • Dynamically deriving my own set of context-free grammars from the corpora – I’ve been reading up on CFG’s, and how these are used in simple clientside JavaScript libraries. The cool thing about them is that when they’re expanded, they can be recursive. So a sentence can consist of a noun phrase + a verb phrase. But any noun phrase might consist of another noun phrase + verb phrase. For example… this is the house that Jack built =≥ this is lake that lies next to the house that Jack built.

  • Hooking up a frontend to a backend to make the corpora interactive 😃