Solving Twitter’s abuse problem

Twitter is awesome, but unless it gets its abuse problem under control, it’s going to struggle to attract new users.

28 min readFeb 29, 2016

I was born in the early 1980s. Bullying was something that happened. A lot. To me, to people around me, to my friends, and to people I knew. Throughout all of this, there was reprieve. With a few very rare exceptions, as soon as the bullying victim got home, threw their backpack into the corner of their bedroom and sunk into the sofa to bury themselves in a book, they were safe. Bullying was something that happened in the playground. On the way to and from school. In the youth centre. In the shop. At school. At work.

But once you were home, you were safe. That was the 1990s. And then everything changed.

Warning: This article is about abuse and harassment online. As you might expect, it contains some strong language and some unpleasant concepts. If that bothers you, I’m afraid this article isn’t for you.

Enter the Internet

The internet happened. A new world of information, entertainment, and a chance to escape, for sure. But also an avenue for abuse, harassment, and bullying, with an additional pinch of insidious venom added to the mix: A combination of availability and anonymity.

The 24/7 nature of the internet meant that in a period of time spanning less than a generation, people moved a large portion of their lives from meatspace to cyberspace. You’d be crazy not to, really: There’s such a vibrant, incredible world on the internet. There are hobbies to share, and conversations to be had. But it also meant that closing your home’s door behind you was no longer a guarantee that you were safe from bullying.

As for anonymity… It doesn’t take a lot of imagination to see how, when the risk of being caught is taken away, that bullying can become a much more effective pursuit for those so inclined.

Why is this such a big problem for social media?

For this article, I’ve largely focused on Twitter, both in thinking and in research, not least because it’s the platform closest to my heart — 2016 marks my 10th anniversary as a Twitter user.

Twitter is the #1 platform for breaking news. Facebook dreams to have even a fraction of the influence Twitter has on this front. Twitter is the must-have platform for journalists and media professionals. The public frequently learn of breaking news stories via twitter, and — come to think of it — rarely does a news broadcast go by without mentioning Twitter. Never has this been more obvious as when the presidential hopefuls are meeting to debate ahead of the 2016 elections here in the US.

But there’s a dark side too: Imagine if you stumble onto twitter for the first time and start looking at the #GOPdebate hashtag. You’ll be met with stuff like this:

I’m no expert, but that sounds a bit… slander-y to me.

And this…

And this…

There’s an excellent reason why most people avoid reading ‘below the fold’ (i.e. the comments) on the internet. The comment section is where, from behind a veil of anonymity, people’s personality quirks come to play with impunity, whether that’s racism, wild rumours based on very little indeed, name calling, or any number of other sins.

The blessing and the curse for Twitter is that they an oft-used medium for online debate to offline events. Sports events, political events, breaking news stories; the live commentary is a huge part of why Twitter is awesome. At the Oscars, for example, when the Best Actor was announced, Twitter peaked at 440,000 tweets per minute. That’s mental.

But that’s also the problem: The nature of these events means that this might well be someone’s first interaction with the platform, and when it is, you’re likely to come across a tremendous amount of content that might make you think twice about signing up for an account. It’s easy to imagine someone reading the three tweets listed above, and doing the digital equivalent of peeking into a bar you’re not familiar with. One quick look, and you’ll go “You know what? I don’t like the look of this, let’s go somewhere else”.

It’s no secret that Twitter is struggling to continue to grow, and I’d suggest that the above is at least part of the reason for this.

In this article, I’m exploring abuse and harassment on social media — especially on Twitter. I’m doing so in an effort to see whether it’s possible to come up with some solutions, or at least some suggestions, for how help reduce what is a huge problem — big enough, in fact, that it may be a significant hindrance to further growth of my favourite social media platform.

article continues below

Part 2 — Why creating a sensible policy on online abuse is complicated

Everybody agrees they want freedom, but they can’t agree what ‘freedom’ means.

Whenever you talk about considering to do something about online abuse, you’ll generally find that people are split into two camps.

Both camps agree that what they want is freedom, and you’d think that means they want to same thing; but that’s not the case. They disagree on the type of freedom they want; they disagree on what freedom means.

In short, it’s a dichotomy between freedom to and freedom from.

Freedom to Speak

One side of the argument believes in a freedom to say anything — Freedom of Speech. As a journalist, I’m an ardent believer in freedom of speech. A government that tries to repress the free sharing of opinions, ideas, and information is inherently problematic.

Those arguing for an unfettered freedom of speech on social media will often argue that freedom of speech should be unabridged and unrestricted, without any limits. The argument goes that freedom of speech is protected by the constitution (in the US) and various other laws (elsewhere), and that preventing anyone from saying anything is an infringement on those rights.

Suggestions that some types of content aren’t welcome on a platform are often met with comments such as this one:

Freedom from Abuse

The other camp want a different type of freedom: They are usually also believers in freedom of speech, but with certain limitations. Those limitations are usually related to threats, abuse, and illegal activities.

Put simply, they argue for a freedom from certain types of communication, especially when it is aimed at them, and when it causes harassment, distress or alarm. This includes abusive language, threats of violence, or threats to have personal information shared (‘doxxing’).

As I’ll discuss in Abusive or Not below, there are cases where something is obviously not abuse (although it might seem so at first glance), and where something is very obviously abusive. Those matters are less problematic than the gray areas in between — It is hard to find a consensus about where the limits are for where a conversation crosses over from a heated debate into abuse or harassment, for example.

The friction is where the freedoms clash

Neither on the internet nor offline (e.g. Schenck v United States) is anyone granted a complete freedom of communication. Sharing a database of stolen credit card details, peddling child pornography, and joking about blowing up an airline will all result in police knocking on (or, in some cases, knocking down) your door.

In addition, being banned from expressing certain things from a site like Twitter or Facebook, is not an infringement on one’s freedom of speech. Social media platforms are private companies, and they are within their right to determine what they think is appropriate behaviour for the platform. In fact, there are already differences between Facebook and Twitter in this regard: Facebook has a strict no-nudity rule, whereas Twitter doesn’t. Seen from a freedom of speech / first amendment perspective, nobody is prevented from starting their own social network, where the types of communications banned from other platforms is welcome.

If we agree that absolute freedom isn’t wanted, and it’s impracticable to manually review every dubious tweet, the solution has to lie somewhere in between those two extremes.

I’ve discussed earlier in this article about the technology challenges of automatically trying to determine what is abusive and what isn’t, but policies go far beyond what is possible; it’s about what you should be doing.

Creating an abuse policy

I helped moderate the comments at a major news site for a while, and I was part of the team that put together a user-generated content policy for a major UK broadcaster. The most important lesson I learned in the process is that policing content is immensely complicated.

The content that causes the most problems is the same content that people are emotionally attached to. When that content is removed, it evokes some rather extreme reactions, from the type of people who were likely to post abusive content in the first place. Creating policies and policing them is a thankless, horrible job — but it’s an important one.

Moreover, the act of creating a policy in the first place is contentious. A site such as Twitter has a decade’s worth of unwritten rules and a culture of sorts. The first time a comprehensive policy is created and disseminated, it will be the first time the site draws a circle around what is OK and what is not, which will almost certainly cause a huge amount of debate and upheaval — especially among the users who operate close to, and occasionally beyond, the lines of acceptability.

There’s also a consideration around the different types of abuse — or the wider ‘safety’ element of social media. Phishing, scamming, doxxing, threats, harassment, impersonation, encouraging people who are already suicidal to ‘go through’ with the deed, Swatting, sharing / distribution of illegal content, facilitation of crime (i.e. advertising illegal goods and services), aiding / encouraging illegal activity, and even potentially going into copyright infringement and plagiarism issues. It’s hard to overstate how big the topic is, but it would be of great help to at least clarify some of the Twitter Rules as a start.

The current section on abusive behaviour in ‘Twitter Rules’ is actually pretty good, but both under- and inconsistently enforced.

My policy experience is relatively limited, so I’ll leave the specifics to a group of people more qualified and experienced than myself. I will weigh in on this, however:

It’s crucial that any actions that are taken as a response to abuse are taken transparently.
There must be a way to correct mistakes made in the process of implementing the policy.
There needs to be a clear set of guidelines for what is OK and what isn’t.
Reported content must be dealt with as quickly as possible. In the case of Twitter, a news source that operates in the fractions-of-a-second timeframe, rather than the hours or days of most other websites, I’d suggest that ‘as quickly as possible’ should mean ‘less than an hour’.
The people dealing with abuse need a strong suite of tools to help draw contextual information from the posts they’ve had reported.

An aside on public figures

A complication in the creation of an abuse policy is the discussion of ‘public figures’ — people who actively invite public attention, including musicians, artists, and political figures.

An abuse policy would probably need to address all of these issues, both for private citizens participating in a public debate, and for public figures, who — for better or worse — often have a different expectations around the sort of things they ought to be able to tolerate.

Conclusion

Creating an abuse policy is difficult, and impossible to make ‘correct’ for everyone involved: Emotions run high.
You cannot placate both the people who believe they deserve the freedom to say anything, and the people who believe they deserve a freedom from abuse. A good abuse policy is likely to impinge on what both of these groups perceive as their rights.
Not easy, but it’s important that any policy that is implemented is transparent, easy to understand, and hard to misinterpret.
People have different tolerances and expectations for abuse, and a policy should either reflect that, or pick an appropriate middle ground.

article continues below

Part 3 — Abusive or not — A challenge of sentiment analysis

“‘Fucking cocksucker scumbag asshole’, for example, is relatively unlikely to be meant as a phrase of endearment”

For the purposes of learning more about abuse and harassment online, I decided to take a deep dive into social media’s murky waters.

Given that I’ve done a few projects using Twitter’s API in the past (I analysed who Twitter’s Verified users are, and did a project on racism on Twitter), I decided to limit my analysis t0 Twitter.

Abusive or not, here I come…

Trying to determine whether any given tweet is abusive is largely a problem of sentiment analysis. It is the step between “I can transcribe your words” and “I understand what you mean”.

For example, when you ask Siri “What is 5 degrees in Fahrenheit”, and Siri responds with “5 degrees Fahrenheit is 5 degrees Fahrenheit”, it sounds quite philosophical, really, but it isn’t actually all that useful. While Siri correctly heard what I said, she didn’t understand what I meant.

I ran into similar problems when trying to analyse tweets and determine whether or not they were to be considered ‘abusive’ or not. I already suspected that I would need to look beyond the words per se, but I hadn’t realised quite how much of a challenge that would be.

It turns out, for example, that there’s a sizeable group of people on Twitter who say absolutely horrendous things to each other. On first glance, it seems like these people are very close to digging out the machetes and having at each other… Except the torrents of abuse that are unleashed on each other is a sociolect of sorts, where the people communicating with each other use a vocabulary that would make a late-night sysadmin blush.

This isn’t even close to the worst of the worst: This is a perfectly normal day on Twitter.

For example, early on in my research, my abuse analysis bot picked up a conversation because of a tweet that said ‘you fucking slut’. It received a rather high abuse score, as you might imagine, but I had to laugh when I looked into the context:

(tweet recreated after I managed to misplace the screenshot. Apologies.)

This tweet was the epiphany for me, and reminded me of a situation I like to refer to as the James is a cunt problem — more about that in just a moment.

People do say all sorts of things to each other, and while one phrase may be acceptable among your closest friends after a few beers, you wouldn’t tweet it at your acquaintances or work mates. And even then, that isn’t necessarily true either: maybe you do have a group of acquaintances between whom coarse language is OK.

Imagine, for a moment, then, that you’re trying to understand all of the above from a tweet that just reads @username is a cunt. How do you determine whether this tweet is abusive, just from those four words? You couldn’t. You can’t.

I probably wouldn’t tweet that exact phrase, even at my best friends. But others might. In fact, they do. Frequently.

Sometimes, spotting abuse is easy.

It’s pretty hard to see how this could be anything other than a threat.

… and …

This one is part of a huge, ongoing abuse campaign against a prominent feminist blogger. Definitely not a joke.

… and …

This one is taking the abuse to another level: Not just a threat and an instance of abuse, but also a promise of continued campaign of abuse from any number of Twitter accounts, against another prominent female blogger. Twitter has now suspended this account, but — as the tweeter points out — there’s more where that account came from.

… and …

There were dozens of this particular instance of abuse, from dozens of different accounts. It’d be hard to argue that this is a prank.

… and …

It is hard to imagine a scenario or context where this is a prank, a joke, or a misunderstanding. This tweet was at a blogger who has been the target of sustained abuse for years. I’d argue this particular example is probably serious enough to warrant a call to the police.

Okay, you got the gist, and I’ll leave it at that for now.

I’ve read more than 100,000 abusive tweets over the past couple of months. Suffice to say that it gets a little disheartening after a while. There are a lot of… You know, this is already a pretty explicits-laden blog post, and I think this is one of those times where using a soupçon of strong language might just be warranted… There are a lot of hate-filled wanktodgers out there.

But, most importantly… don’t fool yourself into thinking that the above is cherry-picking. It isn’t even close to the worst of the worst: This is a perfectly normal day on Twitter.

The abuse ranges from name calling to death and rape threats, and everything in between. The above tweets all have something in common: My little Twitter bot correctly identified these tweets as abusive. That’s the good news.

Given that it’s relatively easily to identify tweets like these on Twitter, it is another matter what Twitter can or should do about these posts. And, considering that there are thousands of them, it’s not going to be a small job to clean that mess up.

The bad news is that it’s this easy to determine something as abusive in just a 20% of cases. 15% of the time, it’s downright impossible, and in the the remaining 65%, it becomes incredibly, deceptively hard to determine whether what appears to be abuse, harassment and bullying, actually is.

Allow me a digression to explain why actually determining whether a given 140-character string of characters is abusive is such a hard problem:

The ‘James is a cunt’ problem

Whenever I speak on the topic of context and abuse, I tend to refer to the challenge of context as the ‘James is a cunt’ problem. The challenge is as follows:

Is it OK to call your co-worker a cunt?

In the US especially — where that word is usually referred to only as the ‘c word’, I’d hazard that for many — probably most — people, the answer is an emphatic and enthusiastic ‘no’. In England, it’s still one of the less acceptable swearwords, but it’s in more regular circulation.

Is it OK to have a T-shirt made that says ‘James is a cunt’, and then give it as a gift to James?

“What. The. Actual. Fuck”, you think. And perhaps rightfully so.

In some work places, this is par for the course. I once worked somewhere where this exact interaction happened: I had a t-shirt made for my co-worker that had his name on it, then ‘is a cunt’. Not only did he love it, he wore it. Proudly. At work. At a company with well over a thousand employees. The CEO spotted it, sighed, shook her head, and went on with her business.

At this point, you’re probably wondering whether I’m completely insane (you might just be onto something), or whether you’re missing something.

You are. Missing something, I mean.

The ‘something’ that’s missing from the above narrative is a thick layer of context.

The job in question was well over a decade ago; one of my very first jobs straight out of university, as the online editor for a magazine focusing on modified cars. When they decided to give me the job, the then-editor nervously asked me “er, are you OK with a bit of banter?” “How do you mean?” I asked. “Well, you know… Banter.” “I give as good as I get, I suppose”, I replied, and that was the first flag in the ground for what was to become one of the most outrageous jobs I’ve ever had. The pay was terrible, but I travelled all over the world, got to drive all sorts of bonkers cars, and got myself into all manner of insane situations. It was tremendous fun.

You’ll have to forgive the aside, but I’m trying to explain that there is a backdrop against which giving a t-shirt that reads ‘James is a Cunt’ to a colleague is actually a pretty normal thing to do, but to understand why, you’ll need a lot of context.

All of which can be summarised as follows:

Sometimes, it’s hard to determine abuse

To wit:

If you are just scanning for phrases like ‘you’re a cunt’ (or misspellings thereof), as I was doing when I first started the research project, the above triggers the abuse bot, with a high score. But in this particular case, it probably isn’t abusive — both the to-and-fro seems relatively jocular, and there is metadata available that contra-indicates the abuse status (more about meta-data a bit later).

What if we make things more complicated?

In this case, user 1 calls user 2 a whore, but user 2 favourites user 1’s tweet. Then they go back and forth for a while. Abusive? Not abusive? On the face of it, it’s hard to say, but I’d err on the side of ‘probably not’. Interactions like this happen by the thousands every week on Twitter.

Only by analysing all the factors (see adding context to analysis, below) do you stand a chance of identifying whether these tweets are actually abusive.

Sometimes, it’s complicated…

Some times, my little bot successfully identified barrages upon barrages of abuse agains a particular person.

Martin Shkreli, for example, has been in the media a fair bit recently, for hiking the price of a drug, for being under investigation for a Ponzi scheme, for buying the only copy of a Wu Tang album for $2m, and for supposedly causing a multi-day delay to Kanye West’s most recent album.

Whether you love or him or loathe him, Shkreli apparently revels in the attention, and his Twitter stream would be ill-advised for someone who is trying to stay out of the limelight. You won’t be surprised to learn that my poor little bot was working some serious overtime dealing with replies to Martin:

All of these tweets happened in a span of a few hours. Shkreli isn’t the most-loved man in the world, for sure. Also shown: Some of the phrases that the algorithm considers abusive. No surprises there.

And yet, while Shkreli does appear to fuel the fire at times, egging on his detractors, some of the attacks move from what I learned to be regular run-of-the-mill abuse, and over into something more sinister…

This tweet broadens the abuse from just Martin to further threatening his sisters, which is a further problem. While it is conceivable that Martin himself revels in the attention, I don’t know whether he has any siblings, far less whether they have given permission to be dragged into the limelight.

The challenge becomes one of policy rather than one of automatic detection: While the attention aimed at Shkreli is abusive in nature (“Fuck you you piece of shit making money off defenseless people fucking cocksucker scumbag asshole”, for example, is relatively unlikely to be meant as a phrase of endearment), is there a limit where a not-so-gentle ribbing turns into actual abuse? This is a challenge both for automated abuse detection algorithms and for the policy team.

And sometimes, it’s impossible…

“If you get a blade sharp enough, it’ll cut through anything”

Some tweets are extremely hard to analyse as to whether or not they are abusive, and highlight why even the most sophisticated algorithms are bound to get stumped from time to time.

One particularly heart-breaking example of that is illustrated by Lindy West in the This American Life’s episode 545 (there’s a transcript, too, but do listen to it, it is incredible storytelling). In short: Lindy’s father suddenly created a Twitter account and started tweeting at her. Except her father had recently passed away, and the twitter account was created by one of her abusers. “I thought I was coping”, Lindy says in the episode, “But if you get a blade sharp enough, it’ll cut through anything”

Taking a step back from the human side how horrendous it must be to receive such a message, there’s another challenge: No amount of technology can catch this type of abuse.

Imagine being Beck and receiving the tweet “I may just take your advice at your show tomorrow”. On the surface, that looks like a supportive tweet, but there’s an off-line context here: his most famous song includes the phrase ‘i’m a loser baby, so why don’t you kill me’, and ‘taking your advice’ could, in fact, be a death threat.

Or put yourself in the shoes of David Axelrod, a political commentator who has been candid about his father’s suicide, receiving the following:

I can’t even imagine what goes through David’s head when he reads Roy’s tweet. No matter how much you disagree with David, first amendment be damned. That’s never an OK thing to say to another human being.

Actually, this tweet is a bad example, by virtue of the fact that my Twitter bot did correctly identify this tweet as potentially abusive — but it only did so because it at some point learned that ‘kill yourself’ is an indicator for abuse. If the tweet had read “I hope you follow in your father’s footsteps”, it wouldn’t have turned up on anyone’s radar — except, most crucially, David’s.

The point I am trying to make here is that, with the volume of tweets every day, even if they wanted to, it’s simply impossible to filter out all the abuse.

And then there’s images

All of the above continues to be true — but gets a hell of a lot more complicated — when you start considering Twitter’s image-as-text problem. This one, for example, is hard to analyse on several levels:

Is it a quote from a TV show? (i.e. innocent)? Is it banter among friends (i.e. neutral)? Is it a tweet to a person who is depressed and expressing a desire to end their life (i.e. deeply troubling)?

For my research, I decided to give images a slight score for likelyhood of abusive, but realistically, it’s neutral: An image is as likely to be of a kitten as it is of a corpse or a rape threat. Doing image recognition and OCR is possible, of course, but with the volume of images being sent through social media, that particular challenge was beyond the scope of this project.

Adding context to abuse analysis

The biggest — and in retrospect most obvious — conclusion I drew from my research is that it isn’t the tweets themselves that are a strongest indicator for most of the internet abuse — it is their context.

In fact, it turned out that I was more successful in identifying abusive tweets when I turned my algorithm inside out: Instead of trying to find tweets that were abusive, and then determine whether they were from context, it was easier to find tweets that score high for abuse by taking context into consideration first, and then dropping any tweets that didn’t contain abusive text in the tweet body.

In part, that can be explained by the simple fact that 140 characters isn’t a lot of text to analyse. Even just digging into the metadata, there was a wealth of information — far more than you can deduce from the tweet itself.

The below indicators were ones that came up after considering the data sources I had available to me as part of Twitter’s APIs, and an educated guess as to whether each data items were of use in determining abuse. Subsequently, I ran a large series of experiments on the data set to try to determine weighting of each indicator. I do have a stab at precise weightings, but I anticipate that a more experienced data scientist than myself could write an automated algorithm to help educate how influential each variable is. For the purpose of this article, I’ve abstracted the specific weightings to more generalised descriptions.

Note: For clarity, I’m using the phrases Abuser and Target to identify the sender and recipient of a public tweet, even if the tweet/metadata determines that the tweet is not abusive.

Contextual indicators of abuse:

How many followers does the abuser have? (Fewer than 50 is bad news, and fewer still increases the abuse risk)
How many friends does the abuser have? (Fewer than 50 is bad news, and fewer still increases the abuse risk)
How old is the account the abuser is using? (Less than a month is bad, less than a week is worse, less than a day is really bad news)
What is the ratio of followers-to-friends? (If the abuser follows more people than what follows them, that’s an indicator of abuse. If their friend-to-follower ratio is 3:1 or above, it’s a strong indicator)
What is the followers-to-followers ratio between the abuser and the target? (A ratio is 100:1 or higher is a strong indicator of abuse)
How many times has the abuser tweeted? (Fewer than 100 tweets is bad, and it gets worse the fewer tweets there are)
Is the abuser an ‘egg’? (i.e. did they set a picture? Yes is bad news)
Has the bot caught the abuser been being abusive before? (Once is usually a strong indicator. Habitual abuse sends the risk of abuse off the charts)
Has the target of abuse been abused before? (The more, the higher the risk of future abuse)
Is the tweet part of a conversation? (If it is an at-mention out of the blue, that’s bad news.)
Does the abuser follow the target? (If no, it’s a weak indicator of abuse, but there is some correlation)
Is the target verified? (If yes, it can help inform some of the other indicators of the algorithm. Verified users also seem to get a disproportionate amount of abuse)

Contextual contra-indicators of abuse:

Is the tweet part of a conversation? (The more tweets back-and-forth in that conversation, the less likely it is to be abusive)
Has the abuser and the target had interactions in the past? (Yes is usually a contra-indication, but not always; it could mean two people constantly at each other’s throats, or two people who used to be friends but are no longer)
Does the target follow the abuser? (If yes, it’s much less likely to be abusive)
Do the target and abuser follow each other (If yes, it’s dramatically less likely to be abusive)
Is the abuser verified? (If so: unlikely to be abusive. In fact, in the six weeks my bot was running, it only found a couple of instances of abuse by verified users, as I’m writing this, those users are no longer bearing the blue tickmark.)
Did the target favourite the abusive tweet? (A favourite is usually a strong contra-indicator of abuse).
Did the target retweet the abusive tweet? (A retweet can go either way; It is usually positive, but some that are in the crosshairs of abuse retweet their abusers, presumably to shame them).

And that’s just the beginning…

And that is just the information I had access to via Twitter’s APIs — Twitter itself has access to a metric tonne of additional information.

A few, just off the top of my head:

I wouldn’t be surprised if users with a verified phone number are much less likely to be abusive.
If the twitter account is tied to a mobile app, it may be possible to identify what other accounts are tied to that mobile device, and draw conclusions from that.
Does the target block the abuser? Blocking, muting, and reporting tweets as abusive would likely be strong indicators of abuse, and could be added to the algorithms.
It may be possible to do IP analysis on new accounts, and see whether they are likely to be abusive or not.

Findings

This section is a summary of some of my findings from running a gradually-refining bot against Twitter’s API over six weeks, to determine whether it’s possible to categorically identify abusive tweets or not.

The bot scores tweets on a scale from -50 (i.e. definitely not abusive) to +250 (i.e. yeah, I’m gonna need you to take a seat over here, son). From my research, anything scoring lower than -20 is >95% unlikely to be abusive, and anything scoring over +130 is >95% likely to be abusive. Any tweet scoring over 200 is abusive; I haven’t identified a single false positive for tweets scoring that high so far, but more research is required.

Tweets scoring less than -20 and more than +130 cover roughly 30% of all tweets analysed, which is, well, a bit disappointing: A 30% determination rate is not ideal — my goal was to successfully categorise more than 90% of tweets.

There is a silver lining, however: The tweets scoring over 150 or so are the vast bulk of the horrendous abuse going on on Twitter. Using this scoring system in combination with a pattern-matching system tied to the ‘report abuse’ system means that you could leverage the various systems to quash abuse. For example, in the case of the person receiving dozens and dozens of the same abusive tweets; they all followed a very particular pattern. If one of those tweets is reported and removed by Twitter, it’d be possible to feed that one into an algorithm to also automatically remove all the other abusive posts (at a minimum) and consider taking further action against the accounts in question.

Of course, all of this research is done in isolation; I haven’t had help from Twitter — on the contrary, in fact: at 180 API requests every 15 minutes, the bottleneck of my analysis was largely Twitter’s API limitations. The reason for this is that some of the analysis was ‘expensive’ in terms of API calls (to find out how many tweets are in a conversation, you have to ‘walk up the tree’. Each post in a conversation takes a separate call, so on the few occasions I ran into conversations that were 80–90 tweets back and forth, that munched up half my API quota for a 15-minute period. To find out whether two arbitrary Twitter users follow each other, that’s an additional API call. To find out whether two users have interacted in the past, that’s another search API call). On the flipside, if Twitter is interested in doing this research in more depth, they could very easily do so, either by commissioning a data analytics team in-house, or by allowing a team of researchers access to an API key with more API calls, say, during non-peak hours.

Conclusion

Analysing tweets for abuse is a surprisingly difficult problem
Context is everything, and the historical relationship between the users in question is often a better indicator for abuse than the tweets themselves
The Twitter API is too restrictive to really be able to analyse a big enough dataset to draw more conclusive, er, conclusions.

article continues below

Part 4 — Solving Twitter’s abuse problem

There is no doubt in my mind that on *my* Twitter, there is no space for threats, abuse, and harassment.

In this final section, I’m trying to pull together some suggestions for how Twitter can tackle its abuse problems.

Step 1 — Outline the problem, create a good policy & set clear goals

A strong response to abuse and harassment starts with a trio of obvious places to start: For one thing, Twitter needs to clearly articulate what the problem is, and why it is a problem worth solving for the organisation. Without this piece of work, you may as well not go any further: It’s crucial; the foundation for everything that happens next.

As discussed earlier, creating a good abuse policy isn’t easy. Quite the opposite, but having a clear policy in place is important both for internal communications to the teams that will implement the solutions, and for external communications as a line in the sand: X is fine, but Y is not.

Finally, a set of metrics and goals to track whether all of this is working, and to identify where more thought, tech, or resources are needed.

Step 2 — Full buy-in or bust.

Once a solid policy draft is in place, it’s crucial that there’s a 100% buy-in from the senior team at Twitter, possibly even at a board level. As discussed in the piece on why creating a policy is hard, I anticipate some strong disagreements towards whether Twitter wants to be leaning towards safety or 1st amendment rights.

It takes a strong CEO to champion a fight against abuse and harassment, and Jack has stated on several occasions that he’s willing to pick that fight — but at least from the outside, it appears that not nearly enough has been done so far.

But, assuming that there is a clear policy, and a comprehensive buy-in to put an end to harassment and abuse on Twitter, it’s possible to start the work…

Step 3 — Run strong post-moderation

The first part of implementation is to overhaul the post-moderation process, i.e. how do you deal with posts that have been reported? This includes creating a set of tools and comprehensive training for the people dealing with reported posts.

It becomes crucial to empower the operators to take whatever action necessary to deal with an abuser, whether that’s a warning, the deletion of a tweet, the suspension of an account, or whether it triggers a more serious action, such as starting a police investigation.

The tools should include automated ways of doing contextual analysis (as discussed in more depth in ‘Abusive or Not’ above), good ways to discover systematic abusers, ways to offer additional layers of protection to particularly vulnerable targets, and clear policies for when and how to include external agencies for mental health assistance (in the case of suicide prevention) or law enforcement (serious threats, terrorism, etc).

On top of that, post-moderation needs a solid set of metrics, measuring and prescribing how quickly reported tweets are dealt with, whether appropriate actions were taken, and as potential escalation points in case a user feels that they weren’t dealt with fairly in the process.

Step 4 — Implement better preventative moderation

The final step of this is to get ahead of the curve: My hastily-assembled abuse bot was able to determine conclusively whether something was or wasn’t abusive in 30% of cases.

Twitter has access both to the raw data and better coders than me (just my little joke — I’m a data-enabled journalist at best, and no sane person would let me anywhere near a live code base), and I suspect that a team at Twitter could get up to 70–80% correct, conclusive determination as to whether a particular tweet is abusive or not. What is done from there goes back to the policy question above, but even if you skim off the 10% most abusive tweets (either by taking automatic action, or by sending them into a separate moderation queue for manual moderation), that makes Twitter a significantly nicer place to be.

One way to do this would be to have switch under your Safety settings, much like the quality filter that verified users have had access to for a year:

Verified users are given a Quality Filter, which makes some types of content disappear.

Step 5 — Close the feedback loop

Of course, the final thing to do is to close the feedback loop.

There are too many examples of users who are trying to work within the current system who are showing off blatant cases of abuse that, even after using Twitter’s reporting functionality, are met with either an absence of action, or what appears to be obviously wrong decisions to keep instances of heinous abuse live on the platform.

Not only is that embarrassing to Twitter, it means that the most vulnerable users on Twitter are in a semi-permanent state of wondering whether it’s time to leave the platform.

Of course, not all posts that are reported should be deleted; that is, in itself, an avenue of abuse potential; but a user should at least have the opportunity to be heard, and receive an explanation for why a particular post is not against the policies that are in place. From there, if it seems that a policy is wrong, inaccurate, or incomplete (it happens…), there should be a process for refining and adjusting the policies to match the intended outcomes.

It won’t be easy, but it must be done.

Abuse, harassment, impersonation, bullying, threats, terrorism, doxxing, illegal pornographic content… There is no shortage of challenges Twitter is facing on the content side.

Solving the problems is not going to be easy, but I feel that if Twitter wants to be the place where the global conversation continues to happen, it needs to be a place where people can discuss, share, and explore together. They need to be able to do so safely, with an explicit agreement for what the parameters are of how you interact. Ideally, this ‘agreement’ should include provisions so groups of friends who do talk to each other largely in explicit language should be able to continue to do so.

Having said that, I’m all for spirited discussions and passionate opinions, but there is no doubt in my mind that on my Twitter, there is no space for threats, abuse, and harassment.

It’s time for a social network to step up and show the world how you can be the venue for a global conversation on every topic under the sun, without alienating the voices belonging to people who are less inclined to accept that ‘abuse is just part of it’.

I do hope that Twitter takes a plunge, and decides to go through the undoubtedly painful process of becoming that social network; if not Twitter, then who?

Try the bot yourself

By popular demand, I’ve created a public version of the bot. It’s very limited compared to the bot I used myself (it doesn’t store info between sessions etc). You can try it out here, if you want.

Haje is a founder coach, working with a small, select number of startup founders to build exciting, robust organizations that can stand the test of time. Find out more at Haje.me. You can also find Haje on Twitter and LinkedIn.

Solving Twitter’s abuse problem

Twitter is awesome, but unless it gets its abuse problem under control, it’s going to struggle to attract new users.

Enter the Internet

Why is this such a big problem for social media?

Part 2 — Why creating a sensible policy on online abuse is complicated

Freedom to Speak

Freedom from Abuse

The friction is where the freedoms clash

Creating an abuse policy

An aside on public figures

Conclusion

Part 3 — Abusive or not — A challenge of sentiment analysis

Abusive or not, here I come…

Sometimes, spotting abuse is easy.

The ‘James is a cunt’ problem

Sometimes, it’s hard to determine abuse

Sometimes, it’s complicated…

And sometimes, it’s impossible…

And then there’s images

Adding context to abuse analysis

Contextual indicators of abuse:

Contextual contra-indicators of abuse:

And that’s just the beginning…

Findings

Conclusion

Part 4 — Solving Twitter’s abuse problem

Step 1 — Outline the problem, create a good policy & set clear goals

Step 2 — Full buy-in or bust.

Step 3 — Run strong post-moderation

Step 4 — Implement better preventative moderation

Step 5 — Close the feedback loop

It won’t be easy, but it must be done.

Try the bot yourself

Written by Haje Jan Kamps

Responses (1)