For my corpus analysis, I decided to use an extremely comprehensive collection of William Wordsworth’s poetry and books. Wordsworth was a prolific writer and poet. His juxtaposition between nature and happiness has always been interesting to me, and I thought understanding more about the core diction and word use might really help me break down the general direction of his works. Throughout much of my english creature, I’ve focused mostly on feeling and word arrangement, not something as mechanical as word amount and word variety. Analyzing a highly emotional work from a detached lens can add a new depth to it.
Most of the text was pulled directly from Gutenberg. I pulled together several collections of poetry and short stories, focusing exclusively on the stories themselves. Wordsworth actually loved to comment on his own stories, and for the purpose of this assignment that wasn’t necessary, so I removed most if not all of his afterwords on his own works. I also of course removed all Gutenberg anecdotes and wishwash. After doing that, I saved the raw text to a word document on my computer, and then used that for Voyant.
For my first use of Voyant, I decided to focus exclusively on Wordsworth use of nature in conjunction with my emotions. My question was specifically regarding the dominant emotion used in conjunction with his nature motifs (a highly common theme in his works), and came up with Man, Day, Life, and Time showing up more then 1200 times throughout the corpus. Just based on these four words, I decided to slightly adjust my question to regard mortality and the passage of time in regards to human life. To further delve into that question, I decided to then search for instances of death, night, and mortality.
My first research actually gave me a rather interesting conclusion. First of all, one of my searchable keywords had no matches. Death didn’t show up even a single time. I decided to tweak the search to then include all variants of death such as dying, dead, and other similar words. This returned a modest 500 when compared to the almost 1300 matches of Life, not even counting words similar to life like I did with death. Night on the other hand did show up 700 times, but Day still outnumbered it by more than twice as many instances at 1500. This led me to take a rather different approach. Deciding that I was focusing too much on physical aspects, I decided to instead focus on negative and positive emotions rather then negative and positive emotional states. Rather then correlating positive imagery with negative imagery, I would see if Wordsworth had correlated positive imagery with positive emotions and vice versa.
My next search showed that like had over 1700 matches, with love having almost 1200, tied nearly with heart. So far our positive emotions just about match the positive symbols in terms of recurrence in the corpus, so I then decided to try negative opposites.
Surprisingly, the opposite of love, hatred, once again has a staggering 0 matches in the corpus. I decided to try variants of hatred, hoping that perhaps I’d give a number equivalent to my similar variants of death, and return with a measly 30 matches. Dislike also fizzled out, its variants only returning 38 times. This really shocked me, because while Wordsworth has been known to juxtaposition life and death using nature, his use of emotional words doesn’t necessarily reflect that same juxtaposition. That being said, I thought this was a fascinating insight into the way Wordsworth works. The fact that he manages to instill such feelings without explicitly using words that depict them really lends credit to the subtle nature of his works, as well as the fact that he really goes out of his way to avoid simple variations of words when writing.
Ngram gave me another insightful look into the way Wordsworth writes, by seeing correlations into my word pool. Using Ngram, I searched for instances of nature, love, death, hatred, hate, dislike, and life over 200 years in all books registered on Google. What shocked me is how rarely hatred and its variants, as well as dislike, actually show up in literature! They were both easily the bottom of the search, being eclipsed by the other words in my searches by anywhere from 20 to 50 times! When we consider love occurring 40 times more often than hatred in Wordsworth’s corpus, it seems to point out that strong negative emotions are just not that common in writing over all. This was quite a shock to me, I was expecting it to be lower then positive equivalents, but I expected it to be somewhat of a close race. The astronomically different occurrences really had me rethink my conclusions through voyant, perhaps Wordsworth wasn’t as subtle as I previously believed? His usage of negative emotional words aren’t actually that obscure when compared to a vast number of other works of fiction.
Ultimately, I learned some conflicting information about my corpus. Predictably, positive emotions and romantic associations are much more common than their negative counterparts. When considering the progression of stories, generally negative states are the height of conflict and therefore more rare. That being said, I didn’t expect such an insane disparity between them. I was expecting the negative counterparts of words such as like or love or happy to have at least a third of occurrences, but ultimately negative emotions are vastly eclipsed. That being said, I did find this a fascinating experience that really upended my own basic perceptions and biases in regard to literature. Being able to crack down on raw data can really help us understand the general direction and tone of works, and lets us approach a work that we might define by “feeling” more mechanically, adding layers of depth that wouldn’t be available without the tools today. Ultimately, I’d say to a text analysis newbie to come into the experience with a strong expectation. Using tools like voyant or ngrams are much more meaningful when they blow your preconceived notions out of the water. Going into it not expecting or knowing anything about the corpus would really lower the amount of contrast you’re getting, so a basic understanding of your corpus and prior knowledge are a must in my opinion. Ultimately I believe this is a very valuable tool to extract a different feeling from a in depth corpus.