Welcome to the Nerditorium, where we’ll be publishing ebooks, white papers, articles and general brain froth that comes out of our Wordnerds work.

We’ll focus on our specialist subjects: Natural Language Processing, Corpus Linguistics, AI and all that other stuff that we get concerningly excited about.

But we’ll also look at the real-world outcomes of our work – from finding the best products, to the most hard-to-reach customers, to the sexiest footballer.

Because at heart, we’re a big data company. And data is only useful in the actions you can bring out of it.

Turning Qual into Quant

There are, of course, thousands of big data companies. They do amazing things, create gorgeous visualisations, and organise the numbers in extraordinary and beautiful new ways.

You can probably feel the “but” coming.

The reason we call big data small is that is concentrates on numbers, when 80% of all data the exists is in the form of unstructured text.

Think of the information that lives in that 80%. The questions it could answer. The problems it could solve. The difference it could make to your organisation. If we could just organize it in the way we do the numbers.

So why don’t we? Well, lots of reasons, but long story short: language is a nightmare.

It’s vast, nebulous, loud, confusing, sarcastic, diverse, surprising, colloquial, fluid, shrtnd and yoof, bruv. Spelling is hit and miss. Data scientists are generally focused on quantative data, and this could not be more qual.

Some not-so-awesome solutions

So how does big data interact with text data? There are four main appraoches:

1. Ignore the text data altogether. The simplest solution. Not the most effective.

2. Count the number of times a word is used. Remember that time you got actionable insight out of a word cloud? No, us neither. That’s because the meaning of words comes from their context. Almost every word has more than one meaning. See a word stripped of its context tells you nothing.

3. Sentiment analysis.This one is a little cleverer, but has the same problem. Are the words in a sentence happy words or sad words? It depends entirely on how they’re arranged.

4. Try to fit people’s thoughts into a finite number of options. You know the kind of thing. Drop-down menu, multiple choice questions, choose-your-opinion-from-a-pre-approved-list. When companies can’t get the meaning from unstructured data, they’ll force user opinions into a structure. Which means that they’re no longer telling you what they really think.

So it’s hopeless. Right?

You’d have to be pretty brave or pretty stupid to take this problem. We’re at least one of the two. We’ll take you deeper into how we’ve worked towards bringing the other 80% of data into big data over the next few weeks.

But in the meantime, here are three things we’ve learned about approaching big problems.

1. Find the intersections between your interests, individually and as a group. No one person in our organization could have come up with the solution to these problems.

2. Let go of your darlings. We all came to this problem with different pre-existing ideas. There is received wisdom on approaching this problem in development, Corpus Linguistics, Machine Learning and Social Listening. The solution was in the middle of these disciplines, and everybody had to let go of things they’d always believed.

3. Ask for help. We have received incredible support from all kinds of organisations, We also been extremely lucky to have been supported by two universities, Sunderland and Durham, who have been instrumental in rapidly increasing our AI capabilities. Sunderland Software City have been incredible, offering advice, support, and opportunities at every turn. Our first customer, Nissan, and our sister company Daykin and Storey have also been amazing in developing our work. There are all kinds of help and support out there, and you know what they say about shy bairns.