Sunday, 15 December 2019

Machine learning has revealed exactly how much of a Shakespeare play was written by someone else

quote [ Literary analysts have long noticed the hand of another author in Shakespeare’s Henry VIII. Now a neural network has identified the specific scenes in question—and who actually wrote them. ]

Full article in extended.

Machine learning has revealed exactly how much of a Shakespeare play was written by someone else
Literary analysts have long noticed the hand of another author in Shakespeare’s Henry VIII. Now a neural network has identified the specific scenes in question—and who actually wrote them.

by Emerging Technology from the arXiv
Nov 22, 2019

For much of his life, William Shakespeare was the house playwright for an acting company called the King’s Men that performed his plays on the banks of the River Thames in London. When Shakespeare died in 1616, the company needed a replacement and turned to one of the most prolific and famous playwrights of the time, a man named John Fletcher.

Fletcher’s fame has since quelled. But in 1850, a literary analyst named James Spedding noticed a remarkable similarity between Fletcher’s plays and passages in Shakespeare’s Henry VIII. Spedding concluded that Fletcher and Shakespeare must have collaborated on the play.

The evidence comes from studies of each author’s linguistic idiosyncrasies and how they crop up in Henry VIII. For example, Fletcher often writes ye instead of you, and ’em instead of them. He also tended to add the word sir or still or next to a standard pentameter line to create an extra sixth syllable.

These characteristics allowed Spedding and other analysts to suggest that Fletcher must have been involved. But exactly how the play was divided is highly disputed. And other critics have suggested that another English dramatist, Philip Massinger, was actually Shakespeare’s coauthor.

Which is why analysts and historians would dearly love to determine, once and for all, who wrote which parts of Henry VIII.

Enter Petr Plecháč at the Czech Academy of Sciences in Prague, who says he has solved the problem using machine learning to identify the authorship of more or less every line of the play. “Our results highly support the canonical division of the play between William Shakespeare and John Fletcher proposed by James Spedding,” says Plecháč.

The new approach is straightforward in principle. Machine-learning algorithms have been used for some years to identify distinctive patterns in the way authors write.

The technique uses a body of the author’s work to train the algorithm and a different, smaller body of work to test it on. However, because an author’s literary style can change throughout his or her lifetime, it is important to ensure that all works have the same style.

Once the algorithm has learned the style in terms of the most commonly used words and rhythmic patterns, it is able to recognize it in texts it has never seen.

Plecháč follows exactly this technique. He first trains the algorithm to recognize Shakespeare’s style using other plays written at the same time as Henry VIII. These plays are The Tragedy of Coriolanus, The Tragedy of Cymbeline, The Winter’s Tale, and The Tempest.

He then trains the algorithm to recognize the work of John Fletcher using plays he wrote at this time—Valentinian, Monsieur Thomas, The Woman’s Prize, and Bonduca.

Finally, he lets the algorithm loose on Henry VIII and asks it to determine the author of the text, using a rolling window technique to scroll through the play.

The results are interesting. They tend to agree with Spedding’s analysis that Fletcher wrote scenes amounting to almost half the play. However, the algorithm allows a more fine-grained approach that reveals how the authorship sometimes changes not just for new scenes, but also towards the end of previous ones. For example, in Act 3, Scene 2, the model suggests a mixed authorship after line 2081 and finds that Shakespeare takes over completely at line 2200, before the start of Act 4, Scene 1.

Plecháč also trained his model to recognize the work of Philip Massinger but finds little evidence of his involvement. “The participation of Philip Massinger is rather unlikely,” he concludes.

That’s interesting work that shows how linguists and literary analysts are using machine learning to better understand our literary past.

However, there is much work ahead. For example, when machine vision algorithms were trained to recognize artistic style, computer scientists quickly worked out how to extract a style and apply it to other images, using a technique known as neural style transfer. Overnight, it became possible to give an ordinary photograph the style of a Van Gogh or a Monet.

That raises the question of whether a similar technique is possible for text. Might it be possible to transform an essay, or indeed an article for MIT Technology Review, into the style of Shakespeare or John Fletcher, for example?

Sadly, not yet, other than in the trivial way of replacing word like them with ’em and so on. This is largely because the underlying structure of communication is not well enough understood by linguists or their algorithmic charges.

Ref: : Relative contributions of Shakespeare and Fletcher in Henry VIII: An Analysis Based on Most Frequent Words and Most Frequent Rhythmic Patterns

[SFW] [literature] [+5 Interesting]
[by snowfox@12:05pmGMT]


Bruceski said @ 7:40pm GMT on 15th Dec [Score:2 Underrated]
Interesting, but I'd be more satisfied if there were works to test this process on that had known results. If there aren't confirmed Shakespeare collaborations (I don't think there are but I am not a historian) then point it at other authors. Analyze Good Omens and see if Gaiman agrees with the results.

The big flaw of machine learning is figuring out if the machine's biases are what you want them to be, and a single result doesn't help that.
takajou said @ 1:01am GMT on 16th Dec [Score:5 Informative]
Funny you should say that. Someone did in fact run a text analysis program on Good Omens.
Details here.
zarathustra said @ 2:22am GMT on 16th Dec
I wonder if you ran good omens though this guy's analysis if it would say Shakespeare wrote some of it.
Bruceski said @ 4:53am GMT on 17th Dec
Oh neat!
takajou said[1] @ 1:00am GMT on 16th Dec [Score:1 Underrated]
One of these day's I'll learn to SE.
LacheChance said[2] @ 6:26am GMT on 16th Dec
Are you claiming to be a bot trying to learn how to post by analyzing comments from the original site?
Paracetamol said @ 7:48pm GMT on 17th Dec
He means replying to comments in the right thread. Look at his comment revision.
TM said @ 9:00pm GMT on 16th Dec
I was pleasantly surprised to find this was a real article about real textual analysis. Gotta confess I was expecting another of those interminable "why Shakespeare could not possibly have written Shakespeare's plays and poetry, it was actually [choose your favorite candidate]."

Post a comment
[note: if you are replying to a specific comment, then click the reply link on that comment instead]

You must be logged in to comment on posts.

Posts of Import
If you got logged out, log back in.
4 More Years!
SE v2 Closed BETA
First Post
Subscriptions and Things
AskSE: What do you look like?

Karma Rankings