We take fun seriously

Fetching, crunching, analyzing, and visualizing data is a complex endeavor. Dealing with the ever-evolving language of online communication adds another level of difficulty.

The Spiketrap team spends an extensive amount of time researching, discovering, and applying techniques to serve our core challenges:

  • Automate social understanding
  • Extract objective/actionable data
  • Provide apples-to-apples comparison
  • Achieve real-time evaluations

From word to sentence embeddings, from recurrent neural networks to latest neural architectures, from topic models to conditional random fields, from clustering to locality-sensitive hashing, Spiketrap is able combine the best ideas from the panoply of machine learning methods.

Natural Language Processing

Spiketrap pushes the limits of what can be done to automate natural language understanding. Our main differentiator comes from augmenting latest NLP techniques with years of data focused on the gaming language. Off-the-shelf tokenizers and taggers heavily depend on local linguistic features, such as capitalization, POS/NER tags of previous words; due to this reason, they are well-known to provide unsatisfactory results when dealing with error-prone, badly-formatted, short texts such as tweets. Our NLP technologies are built in-house to overcome such challenges.

Product Attribution

Spiketrap combines a structured database of media and entertainment-related intellectual properties (including companies, franchises, characters, movies, television shows, video games, DLCs, and even more!) with a network of classifiers to identify which pieces of content are talking about which entities, even when those entities are not present in the text. Additionally, vanilla string-matching simply does not work: common language usage is filled with ambiguity, abbreviations, and acronyms: "See", "Anthem", "Dreams", "Control"; companies such as "2K" or "Blizzard"; the "Switch" console; or "WoW" to mean "World of Warcraft", or "PS" for "PlayStation." We have learned that passionate discourse rarely gives a free ride in terms of grammar, capitalization, or in-quoting!

Sentiment Analysis

One of the hot topics of the last couple decades, sentiment analysis is still going strong due to recent incremental SOTA (state-of-the-art) improvements thanks to recurrent, convolutional, transformers, or more complex neural networks. Too bad these improvements on the typical IMDb dataset barely make a scratch when applied to the language of the internet! Our sentiment classifiers are trained against our large in-house labeled dataset, and vastly outperform latest classifiers and off-the-shelf sentiment services when applied to the media and entertainment landscape. Apart from the actual classification, another core challenge is the extremely diverse set of data to process: think of a two-emoticon chat message versus a long article with dozens of paragraphs.

Conversation Detection

Call it topics/conversations/trends, our main area of research, besides sentiment, revolves around organic data segmentation. In order to have actionable insights and be able to focus on what’s important, we invest our energy in ensuring we can automate topic discovery. LDA and other standard probabilistic topic models do not perform well on short texts, and this issue is exacerbated when fronted with a corpus of documents of dissimilar lengths. In addition, parameter estimation of common topic models --usually Gibbs sampling or variational inference-- does not scale well with large datasets.

To make the problem of topic discovery even more challenging, consider that even if you can identify meaningful topics for a specific window of time, you still need to relate them over the course of weeks, months, and years. Our proprietary methodology addresses all these challenges providing you with meaningful and coherent conversation over time.


Andrea Vattani

Chief Scientist


Research Advisors


Alessandro Panconesi

Full Professor

Sapienza University of Rome


Flavio Chierichetti

Associate Professor

Sapienza University of Rome

Research Publications