Natural Language Processing
Spiketrap pushes the limits of what can be done to automate natural language understanding. Our main differentiator comes from augmenting latest NLP techniques with years of data focused on the gaming language. Off-the-shelf tokenizers and taggers heavily depend on local linguistic features, such as capitalization, POS/NER tags of previous words; due to this reason, they are well-known to provide unsatisfactory results when dealing with error-prone, badly-formatted, short texts such as tweets. Our NLP technologies are built in-house to overcome such challenges.
Spiketrap combines a structured database of gaming-related intellectual properties (including companies, franchises, characters, games, DLCs, and more) with a network of classifiers to identify which pieces of content are talking about which entities, even when those entities are not present in the text. Even if they were indeed present, vanilla string-matching won’t do as the gaming industry is filled with ambiguity, abbreviations, and acronyms: games by the name of “Anthem”, “Dreams”, “Prey”, “Steep”; companies such as “2K” or “Blizzard”; the “Switch” console; or “WoW” to mean “World of Warcraft”, or “PS” for “Playstation.” And we observed that passionate gamers rarely give you a free ride in terms of using capitalization or in-quoting!
One of the hot topics of the last couple decades, sentiment analysis is still going strong due to recent incremental SOTA (state-of-the-art) improvements thanks to recurrent, convolutional, transformers, or more complex neural networks. Too bad these improvements on the typical IMDb dataset barely make a scratch when applied to the language of gamers! Our sentiment classifiers are trained against our large in-house labeled dataset, and vastly outperform latest classifiers and off-the-shelf sentiment services when applied to the gaming vertical. Apart from the actual classification, another core challenge is the extremely diverse set of data to process: think of a two emoticon chat message versus a long article with dozens of paragraphs.
Call it topics/conversations/trends, our main area of research, besides sentiment, revolves around organic data segmentation. In order to have actionable insights and be able to focus on what’s important, we invest our energy in ensuring we can automate topic discovery. LDA and other standard probabilistic topic models do not perform well on short texts, and this issue is exacerbated when fronted with a corpus of documents of dissimilar lengths. In addition, parameter estimation of common topic models --usually Gibbs sampling or variational inference-- does not scale well with large datasets.
To make the problem of topic discovery even more challenging, consider that even if you can identify meaningful topics for a specific window of time, you still need to relate them over the course of weeks, months, and years. Our proprietary methodology addresses all these challenges providing you with meaningful and coherent conversation over time.