Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Page Not Found

Page not found. Your pixels are in another canvas.

Jupyter notebook markdown generator

Posts

Improving Machine Learning Outcomes

less than 1 minute read

Published: July 15, 2021

Improving Machine Learning Outcomes Focusing on Framing, Timing, and Targets

textplainer : Intuitive explanations for text based machine learning models

less than 1 minute read

Published: October 10, 2020

Text data is a powerful a useful source of signal in machine learning systems. There are a set of very standard approaches to dealing with raw text to prepare it for machine learning and statistics. We routinely use: bag of words and n-gram models, TF-IDF, topic modelling or potentially word embeddings derived from one of the neural network language models like Word2Vec, GloVe or fastText. All of these approaches have proven themselves as effective text pre-processing techniques that require no domain knowledge and are widely applicable.

texturizer : Exploring diverse text derived features for machine learning

3 minute read

Published: September 06, 2020

Text data is a fascinating source of information for data scientists. It can betray subtle clues as to the mood, motives and behaviours of people, in both conscious and unconscious expressions. We can extract text from a wide variety of sources: internal documents, email records, web forms, social media posts, and even the text descriptions from financial transactions.

dfsummarizer : A command line application for summarizing data frames

less than 1 minute read

Published: July 03, 2020

Summarizing data is one of those small tasks that data scientists and analysts need to do routinely. However, we often need to write bespoke scripts to get exactly what we want, coping with missing values and assorted data types. We then need to go through a tedious process to format it for sharing or publication.

Will your job be automated out of existance by AI?

less than 1 minute read

Published: December 08, 2018

Are you feeling anxious about whether your career is in danger of being automated out of existence?

Why all scientists are not data scientists

less than 1 minute read

Published: November 03, 2017

There is a meme you will see floating around the internet that comes in many forms, one version is shown in the header image above. It is part of the vague internet resistance to this new occupation. The response is somewhat justified, Data Scientist is a job title that requires no specific qualification, and garners differing opinions on what the core skill set is.

Mean Imputation in Apache Spark

less than 1 minute read

Published: September 26, 2017

If you are interested in building predictive models on Big Data, then there is a good chance you are looking to use Apache Spark. Either with MLLib or one of the growing number of machine learning extensions built to work with Spark such as Elephas which lets you use Keras and Spark together.

books

The Backpackers Guide to The Land Down Undead

Published: January 01, 2012

A backpacker’s guide to surviving in Australia’s undead wasteland. A weird and twisted comedy adventure through the Australian zombie tourism industry.

Download here

Fury

Published: January 01, 2016

A hungover young man in a youth hostel comes to terms with the grim reality of surviving the zombie apocalypse.

Download here

X-mas in Berlin

Published: January 01, 2017

A young woman scours the streets of Berlin looking for signs that life is returning to the city. She clings to her memories of her missing family.

Download here

Land Down Undead 2 - Choose your gory demise.

Published: January 01, 2018

A choose your own adventure sequel to the Land Down Undead. You are a journalist from the UK on a bus tour of Australia’s zombie wasteland. What could go wrong?

Download here

Googad Magee

Published: December 20, 2019

Googad Magee is a children’s book about an old man struggling to find something good in his life.

A chance encounter with a happy go-lucky snail turns things around for Googad as he learns from her that it is very easy to appreciate what you already have. All proceeds from the sale of Googad Magee are donated to OzHarvest. An amazing organisation fighting food waste and feeding the needy. A childrens picture book about a sad old man who meets a happy snail. All proceeds donated to the Australian organisation OzHarvest.

Download here

GDSD - Getting Data Science Done

Published: August 24, 2022

Getting Data Science Done outlines the essential stages in running successful data science projects. The book provides comprehensive guidelines to help you plan and manage data science projects, communicate with clients, identify and mitigate issues, and finally deploy your solutions into production systems.

publications

Published in , 2025

Identifying Novel Peroxisomal Proteins

Published in Proteins: structure, Function, and Bioinformatics, 2007

Recommended citation: Hawkins, J., Mahony, D., Maetschke, S., Wakabayashi, M., Teasdale, R. and Boden, M., (2007). "Identifying Novel Peroxisomal Proteins" Proteins: structure, Function, and Bioinformatics. 69(3); 606-616.. https://link.springer.com/chapter/10.1007/978-3-540-78839-3_10

Predicting Nuclear Localization

Published in Journal of Proteome Research, 2007

Recommended citation: Hawkins, J., Davis, L. and Boden, M., (2007). "Predicting Nuclear Localization" Journal of Proteome Research. 6(4); 1402-1409.. https://link.springer.com/chapter/10.1007/978-3-540-78839-3_10

The Statistical Power of Phylogenetic Motif Models

Published in RECOMB, 2008

Recommended citation: Hawkins, J., and Bailey, T.L. (2008). "The Statistical Power of Phylogenetic Motif Models." RECOMB 2008 Proceedings; 112-126. . https://link.springer.com/chapter/10.1007/978-3-540-78839-3_10

Assessing phylogenetic motif models for predicting transcription factor binding sites

Published in Bioinformatics, 2009

Recommended citation: Hawkins, J., Grant; C., Noble, W.S., and Bailey, T.L. (2009). "Assessing phylogenetic motif models for predicting transcription factor binding sites." Bioinformatics 25, i339-i347.. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2687955/

Studies on the inference of protein binding regions across fold space based on structural similarities

Published in Proteins: structure, Function, and Bioinformatics, 2010

Recommended citation: Teyra, J., Hawkins, J., Zhu, H., and Pisabarro, M. Teresa. (2010). "Studies on the inference of protein binding regions across fold space based on structural similarities." Proteins: structure, Function, and Bioinformatics. 69(3); 606-616.. https://www.ncbi.nlm.nih.gov/pubmed/21069715

NFIA Controls Telencephalic Progenitor Cell Differentiation through Repression of the Notch Effector Hes1

Published in The Journal of Neuroscience, 2010

Recommended citation: Michael Piper, Guy Barry, John Hawkins, Sharon Mason, Charlotta Lindwall, Erica Little, Anindita Sarkar, Aaron Smith, Randal Moldrich, Glen Boyle, Shubha Tole, Richard Gronostajski, Timothy Bailey, and Linda Richards. (2010). "NFIA Controls Telencephalic Progenitor Cell Differentiation through Repression of the Notch Effector Hes1." The Journal of Neuroscience, July 7, 2010, 30(27):9127-9139.. https://www.ncbi.nlm.nih.gov/pubmed/20610746

Reduced False Positives in PDZ Binding Prediction using Sequence and Structural Descriptors

Published in IEEE/ACM transactions on computational biology and bioinformatics, 2012

Recommended citation: Hawkins, J., Zhu, H., Teyra, J., and Pisabarro, M. Teresa. (2012). "Reduced False Positives in PDZ Binding Prediction using Sequence and Structural Descriptors." IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM. https://www.ncbi.nlm.nih.gov/pubmed/22508908

Rational Structure-Based Rescaffolding Approach to De Novo Design of Interleukin 10 (IL-10) Receptor-1 Mimetics

Published in PLoS One, 2016

Recommended citation: Ruiz-Gómez, Gloria., Hawkins, John., Philipp, Jenny., Künze, Georg., Löser, Reik., Fahmy, Karim., and Pisabarro, M. Teresa. (2016) "Rational Structure-Based Rescaffolding Approach to De Novo Design of Interleukin 10 (IL-10) Receptor-1 Mimetics" PLoS One. Apr 28;11(4) http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0154046

Minimum Viable Model Estimates for Machine Learning Projects

Published in CSEA 2020, 2020

Recommended citation: Hawkins, John. (2020) "Minimum Viable Model Estimates for Machine Learning Projects " Proceedings of the 6th International Conference on Computer Science, Engineering And Applications (CSEA 2020), Dec 18~19; 10(18) 10.5121/csit.2020.101803

MinViME/Minimum Viable Model Estimator

Published in Software Impacts, 2021

Recommended citation: Hawkins, John. (2021) "MinViME/Minimum Viable Model Estimator" Software Impacts, Aug 01; Volume 9 https://doi.org/10.1016/j.simpa.2021.100073

Estimating Gaze Duration Error with Eye Tracking Data

Published in , 2023

Recommended citation: Hawkins, John. (2023) "Estimating Gaze Duration Error with Eye Tracking Data" Proceedings of the 2023 5th International Conference on Image, Video and Signal Processing Pages 70-75, Mar 25, 2023

Published in , 2025

OAGRE: Outlier Attenuated Gradient Boosted Regression

Published in Hanoi, Vietnam, 2024

Recommended citation: Hawkins, John. (2023) "OAGRE: Outlier Attenuated Gradient Boosted Regression" Proceedings of The Fifth International Conference on Artificial Intelligence and Computational Intelligence (AICI 2024) Hanoi, Vietnam https://link.springer.com/chapter/10.1007/978-3-031-63929-6_15

Literature Filtering for Systematic Reviews with Transformers

Published in Jeju, Korea, 2024

Recommended citation: Hawkins, John. and Tivey, David. (2024) "Literature Filtering for Systematic Reviews with Transformers" Proceedings of the 2nd International Conference on Communications, Computing and Artificial Intelligence (CCCAI 2024) https://dl.acm.org/doi/abs/10.1145/3676581.3676582

Enigme: Generative Text Puzzles for Evaluating Reasoning in Language Models

Published in Phuket, Thailand, 2025

Recommended citation: Hawkins, John. (2025) "Enigme: Generative Text Puzzles for Evaluating Reasoning in Language Models" 11th International Conference on Engineering, Applied Sciences, and Technology (ICEAST), Phuket, Thailand, 2025, pp. 117-121 https://ieeexplore.ieee.org/document/11088210

talks

Evolutionary Game Theory with G-functions

Published: November 03, 2005

In this talk I summarised the content delivered at the workshop I attended on Evolutionary Game Theory with G-functions. The workshop was given by Tom Vincent at the University of Adelaide. Based on his book Evolutionary Game Theory, Natural Selection, and Darwinian Dynamics

The role of machine learning in modelling the cell

Published: December 03, 2005

Talk given for the students of the complex systems masters course at the University of Mexico City.

A multi-agent simulation model of fishery fleet dynamics for the Queensland coral reef line fishery

Published: February 10, 2006

In this talk I presented the work done with Rodney Beard and Stuart McDonald on developing a multi-agent simualtion system for iterated game theory models of behaviour for coral reef fisheries.

Paper

Predicting Nuclear Proteins

Published: July 12, 2006

In this talk I presented initial work done with Mikael Boden on the task of building machine learning systems to classify proteins that are bound for the nucleus after transcription. It involves the creation of new datasets, and evaluating a range of existing techniques.

Evolving PTS2 Motifs

Published: July 20, 2006

In this talk I presented the work done with Mikael Boden on the task of designing evolutioning algorithms to create regular expression like motifs to distinguish proteins that carry the PTS2 motif. This is a difficult classification task due to the absence of large data sets and highly variable sequences in the signalling section of the proteins.

The Statistical Power of Phylogenetic Motif Models

Published: March 30, 2008

In this talk I presented the results of the research paper completed with Tim Bailey on the task of exploiting the phylogenetic information in comparative gene sequence alignments to try and improve the prediction of transcription factor binding site prediction.

Can Comparative Genomics Improve Transcription Factor Binding Site Prediction

Published: April 07, 2008

In this presentation for the Institute for Molecular Bioscience at Queensland University I summarised some of the observations and conclusions that Tim Bailey and I had come to in working on the task of using information from gene sequence alignments to try and improve our ability to identify transcription factor binding sites.

Assessing Phylogenetic Motif Models For Predicting Transcription Factor Binding Sites

Published: July 01, 2009

In this talk I gave an overview of my work with Tim Bailey on the task of exploiting the phylogenetic information in comparative gene sequence alignments to try and improve the prediction of transcription factor binding site prediction.

Protein Structure Search Strategies

Published: December 13, 2010

In this BIOTEC Post-Doc Seminar Series talk I gave an overview of the algorithms used to search protein databases to look for functional motifs and active sites that determine biological function and potetnial biomedical applications.

Being Bayesian

Published: November 29, 2016

In this Sydney Data Science Meet-Up talk I gave an overview of the history and reasoning that lead to the distinction beetwen Frequentist and Bayesian Inference. I give several worked examples and show the results of simulations designed to answer the question under which circumstances should we prefer one over the other.

Full video of the talk here

Building Model Factories with the DataRobot API

Published: May 31, 2018

In this Sydney Data Science Sponsored Meet-Up talk I gave an introduction of the idea of Model Factories, discussing the history of the idea and how it has lead to AutoML systems like DataRobot. Ultimately enabling us to build new forms of automated ML systems.

DataRobot Vs The Red Queen

Published: September 10, 2019

In this talk I gave a brief overview of the Red Queen effect that has been used in evolutionary biology to describe co-evolution of competing species. I apply this idea to the competition betweenm organisations that are using data science and machine learning to differentiate against their competitors.

Introduction to Bayesian Machine Learning.

Published: November 28, 2019

In this invited talk for the Machine Learning & Deep Learning Day I presented a ground-up introduction to understanding the fundamentals of Bayesian Machine Learning. I introduced the idea of Bayesian statistics and described the connections between maximum likelihood, maxium a-posteriori and finally the Bayesian goal of a complete estimate of the posterior distribution. I introduced Markov Chain Monte Carlo and the Metropolis Hastings Algorithm. Finally I share some brief cautions on how people from freqeuntist machine learning tend to go wrong either through their expectations or implementations.

Event Link

Modern Machine Learning Language Models

Published: October 16, 2020

In this invited talk for the Selenium Day I presented an overview of the technical innovations that have led to modern machine learning successes with language processing. This involved discussing what is special about processing text, the fundamentals of recurrent processing, the development of attention and self-attention models, and finally how this led to the Transformer architecture.

Event Link

Minimum Viable Model Estimates for Machine Learning Projects

Published: December 18, 2020

Prioritization of machine learning projects requires estimates of both the potential ROI of the business case and the technical difficulty of building a model with the required characteristics. In this work we present a technique for estimating the minimum required performance characteristics of a predictive model given a set of information about how it will be used. This technique will result in robust, objective comparisons between potential projects. The resulting estimates will allow data scientists and managers to evaluate whether a proposed machine learning project is likely to succeed before any modelling needs to be done. The technique has been implemented into the open source application MinViME (Minimum Viable Model Estimator) which can be installed via the PyPI python package management system, or downloaded directly from the GitHub repository.

Analytics Problem Framing

Published: September 20, 2022

In this guest lecture at the Australian Graduate School of Management we discussed a range of fundamental ideas in analytics projects. All of these ideas relate to framing problems such that they have a greater chance of success.

Data Science in Industry

Published: March 17, 2023

In this guest lecture for data science and analytics students at Imperial College London we discussed the emergence of data science as a career in industry. We covered both the historical conditions that created the field, and the onging changes and challenges that people face with being technical detail oriented people working with a wide variety of different business people.

Estimating Gaze Duration Error from Eye Tracking Data

Published: March 25, 2023

Eye tracking applications produce a series of gaze fixation points that can be attributed to objects within a subject’s field of vision. Error is typically measured on the basis of individual gaze fixation point measurements. These applications are often used to infer a gaze duration metric from a series of fixation measurements. There is no direct method for infering the error in a gaze duration measurement from an error in fixation points.

Brands, Verticals & Contexts: Coherence Patterns in Consumer Attention

Published: August 19, 2023

Consumers are expected to partially reveal their preferences and interests through the media they consume. The development of visual attention measurement with eye tracking technologies allows us to investigate the consistency of these preferences across the creative executions of a given brand and over all brands within a given vertical.

Evaluating Ad Creative and Web Context Alignment with Attention Measurement

Published: September 16, 2023

Contextual targeting is a common strategy that places marketing messages in media locations that are aligned with a target audience. The challenge of contextual targeting is knowing the ideal schema and the set of categories that provide the right audience. Refinement of the contextual targeting process has been limited by the use of metrics that are either rapid but unreliable (click through rates), or reliable but slow, expensive and inaccessible in real-time (conversions or brand awareness).

teaching

, , 2025