Posts by Tags

The Simple Guide to Agent Skills

less than 1 minute read

Published: March 02, 2026

The Simple Guide to Agent Skills

Mean Imputation in Apache Spark

less than 1 minute read

Published: September 26, 2017

If you are interested in building predictive models on Big Data, then there is a good chance you are looking to use Apache Spark. Either with MLLib or one of the growing number of machine learning extensions built to work with Spark such as Elephas which lets you use Keras and Spark together.

The Simple Guide to Agent Skills

less than 1 minute read

Published: March 02, 2026

The Simple Guide to Agent Skills

Will your job be automated out of existance by AI?

less than 1 minute read

Published: December 08, 2018

Are you feeling anxious about whether your career is in danger of being automated out of existence?

Why all scientists are not data scientists

less than 1 minute read

Published: November 03, 2017

There is a meme you will see floating around the internet that comes in many forms, one version is shown in the header image above. It is part of the vague internet resistance to this new occupation. The response is somewhat justified, Data Scientist is a job title that requires no specific qualification, and garners differing opinions on what the core skill set is.

dfsummarizer : A command line application for summarizing data frames

less than 1 minute read

Published: July 03, 2020

Summarizing data is one of those small tasks that data scientists and analysts need to do routinely. However, we often need to write bespoke scripts to get exactly what we want, coping with missing values and assorted data types. We then need to go through a tedious process to format it for sharing or publication.

dfsummarizer : A command line application for summarizing data frames

less than 1 minute read

Published: July 03, 2020

Summarizing data is one of those small tasks that data scientists and analysts need to do routinely. However, we often need to write bespoke scripts to get exactly what we want, coping with missing values and assorted data types. We then need to go through a tedious process to format it for sharing or publication.

Mean Imputation in Apache Spark

less than 1 minute read

Published: September 26, 2017

If you are interested in building predictive models on Big Data, then there is a good chance you are looking to use Apache Spark. Either with MLLib or one of the growing number of machine learning extensions built to work with Spark such as Elephas which lets you use Keras and Spark together.

The Simple Guide to Agent Skills

less than 1 minute read

Published: March 02, 2026

The Simple Guide to Agent Skills

Improving Machine Learning Outcomes

less than 1 minute read

Published: July 15, 2021

Improving Machine Learning Outcomes Focusing on Framing, Timing, and Targets

textplainer : Intuitive explanations for text based machine learning models

less than 1 minute read

Published: October 10, 2020

Text data is a powerful a useful source of signal in machine learning systems. There are a set of very standard approaches to dealing with raw text to prepare it for machine learning and statistics. We routinely use: bag of words and n-gram models, TF-IDF, topic modelling or potentially word embeddings derived from one of the neural network language models like Word2Vec, GloVe or fastText. All of these approaches have proven themselves as effective text pre-processing techniques that require no domain knowledge and are widely applicable.

texturizer : Exploring diverse text derived features for machine learning

3 minute read

Published: September 06, 2020

Text data is a fascinating source of information for data scientists. It can betray subtle clues as to the mood, motives and behaviours of people, in both conscious and unconscious expressions. We can extract text from a wide variety of sources: internal documents, email records, web forms, social media posts, and even the text descriptions from financial transactions.

dfsummarizer : A command line application for summarizing data frames

less than 1 minute read

Published: July 03, 2020

Summarizing data is one of those small tasks that data scientists and analysts need to do routinely. However, we often need to write bespoke scripts to get exactly what we want, coping with missing values and assorted data types. We then need to go through a tedious process to format it for sharing or publication.

Will your job be automated out of existance by AI?

less than 1 minute read

Published: December 08, 2018

Are you feeling anxious about whether your career is in danger of being automated out of existence?

Why all scientists are not data scientists

less than 1 minute read

Published: November 03, 2017

There is a meme you will see floating around the internet that comes in many forms, one version is shown in the header image above. It is part of the vague internet resistance to this new occupation. The response is somewhat justified, Data Scientist is a job title that requires no specific qualification, and garners differing opinions on what the core skill set is.

Mean Imputation in Apache Spark

less than 1 minute read

Published: September 26, 2017

If you are interested in building predictive models on Big Data, then there is a good chance you are looking to use Apache Spark. Either with MLLib or one of the growing number of machine learning extensions built to work with Spark such as Elephas which lets you use Keras and Spark together.

textplainer : Intuitive explanations for text based machine learning models

less than 1 minute read

Published: October 10, 2020

Text data is a powerful a useful source of signal in machine learning systems. There are a set of very standard approaches to dealing with raw text to prepare it for machine learning and statistics. We routinely use: bag of words and n-gram models, TF-IDF, topic modelling or potentially word embeddings derived from one of the neural network language models like Word2Vec, GloVe or fastText. All of these approaches have proven themselves as effective text pre-processing techniques that require no domain knowledge and are widely applicable.

texturizer : Exploring diverse text derived features for machine learning

3 minute read

Published: September 06, 2020

Text data is a fascinating source of information for data scientists. It can betray subtle clues as to the mood, motives and behaviours of people, in both conscious and unconscious expressions. We can extract text from a wide variety of sources: internal documents, email records, web forms, social media posts, and even the text descriptions from financial transactions.

Will your job be automated out of existance by AI?

less than 1 minute read

Published: December 08, 2018

Are you feeling anxious about whether your career is in danger of being automated out of existence?

Will your job be automated out of existance by AI?

less than 1 minute read

Published: December 08, 2018

Are you feeling anxious about whether your career is in danger of being automated out of existence?

The Simple Guide to Agent Skills

less than 1 minute read

Published: March 02, 2026

The Simple Guide to Agent Skills

Improving Machine Learning Outcomes

less than 1 minute read

Published: July 15, 2021

Improving Machine Learning Outcomes Focusing on Framing, Timing, and Targets

textplainer : Intuitive explanations for text based machine learning models

less than 1 minute read

Published: October 10, 2020

Text data is a powerful a useful source of signal in machine learning systems. There are a set of very standard approaches to dealing with raw text to prepare it for machine learning and statistics. We routinely use: bag of words and n-gram models, TF-IDF, topic modelling or potentially word embeddings derived from one of the neural network language models like Word2Vec, GloVe or fastText. All of these approaches have proven themselves as effective text pre-processing techniques that require no domain knowledge and are widely applicable.

texturizer : Exploring diverse text derived features for machine learning

3 minute read

Published: September 06, 2020

Text data is a fascinating source of information for data scientists. It can betray subtle clues as to the mood, motives and behaviours of people, in both conscious and unconscious expressions. We can extract text from a wide variety of sources: internal documents, email records, web forms, social media posts, and even the text descriptions from financial transactions.

Improving Machine Learning Outcomes

less than 1 minute read

Published: July 15, 2021

Improving Machine Learning Outcomes Focusing on Framing, Timing, and Targets

Why all scientists are not data scientists

less than 1 minute read

Published: November 03, 2017

There is a meme you will see floating around the internet that comes in many forms, one version is shown in the header image above. It is part of the vague internet resistance to this new occupation. The response is somewhat justified, Data Scientist is a job title that requires no specific qualification, and garners differing opinions on what the core skill set is.

textplainer : Intuitive explanations for text based machine learning models

less than 1 minute read

Published: October 10, 2020

Text data is a powerful a useful source of signal in machine learning systems. There are a set of very standard approaches to dealing with raw text to prepare it for machine learning and statistics. We routinely use: bag of words and n-gram models, TF-IDF, topic modelling or potentially word embeddings derived from one of the neural network language models like Word2Vec, GloVe or fastText. All of these approaches have proven themselves as effective text pre-processing techniques that require no domain knowledge and are widely applicable.

texturizer : Exploring diverse text derived features for machine learning

3 minute read

Published: September 06, 2020

Text data is a fascinating source of information for data scientists. It can betray subtle clues as to the mood, motives and behaviours of people, in both conscious and unconscious expressions. We can extract text from a wide variety of sources: internal documents, email records, web forms, social media posts, and even the text descriptions from financial transactions.

John Hawkins

Posts by Tags

ai agents

apache spark

artificial intelligence

business analytics

data analytics

data engineering

data processing

data science

explainable machine learning

feature engineering

future of work

job security

machine learning

natural language processing

problem framing

statistics

text mining