textplainer : Intuitive explanations for text based machine learning models

less than 1 minute read

Published: October 10, 2020

Text data is a powerful a useful source of signal in machine learning systems. There are a set of very standard approaches to dealing with raw text to prepare it for machine learning and statistics. We routinely use: bag of words and n-gram models, TF-IDF, topic modelling or potentially word embeddings derived from one of the neural network language models like Word2Vec, GloVe or fastText. All of these approaches have proven themselves as effective text pre-processing techniques that require no domain knowledge and are widely applicable.

However, most of these approaches are relatively opaque which it comes to explaining what is driving the performance or outputs of the models. Text data is less amenable to SHAP values for local feature explanations, and there is no intuitive way to do permutation based feature importance correctly.

In the package we will explore methods of understanding the sub-structures of text that drive predictive performance.

In Development Here

Share on

Twitter Facebook Google+ LinkedIn

texturizer : Exploring diverse text derived features for machine learning

3 minute read

Published: September 06, 2020

Text data is a fascinating source of information for data scientists. It can betray subtle clues as to the mood, motives and behaviours of people, in both conscious and unconscious expressions. We can extract text from a wide variety of sources: internal documents, email records, web forms, social media posts, and even the text descriptions from financial transactions.

dfsummarizer : A command line application for summarizing data frames

less than 1 minute read

Published: July 03, 2020

Summarizing data is one of those small tasks that data scientists and analysts need to do routinely. However, we often need to write bespoke scripts to get exactly what we want, coping with missing values and assorted data types. We then need to go through a tedious process to format it for sharing or publication.

John Hawkins

textplainer : Intuitive explanations for text based machine learning models

Share on

You May Also Enjoy

Improving Machine Learning Outcomes

texturizer : Exploring diverse text derived features for machine learning

dfsummarizer : A command line application for summarizing data frames

Will your job be automated out of existance by AI?