Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
About me
This is a page not in th emain menu
Published:
Improving Machine Learning Outcomes Focusing on Framing, Timing, and Targets
Published:
Text data is a powerful a useful source of signal in machine learning systems. There are a set of very standard approaches to dealing with raw text to prepare it for machine learning and statistics. We routinely use: bag of words and n-gram models, TF-IDF, topic modelling or potentially word embeddings derived from one of the neural network language models like Word2Vec, GloVe or fastText. All of these approaches have proven themselves as effective text pre-processing techniques that require no domain knowledge and are widely applicable.
Published:
Text data is a fascinating source of information for data scientists. It can betray subtle clues as to the mood, motives and behaviours of people, in both conscious and unconscious expressions. We can extract text from a wide variety of sources: internal documents, email records, web forms, social media posts, and even the text descriptions from financial transactions.
Published:
Summarizing data is one of those small tasks that data scientists and analysts need to do routinely. However, we often need to write bespoke scripts to get exactly what we want, coping with missing values and assorted data types. We then need to go through a tedious process to format it for sharing or publication.
Published:
Are you feeling anxious about whether your career is in danger of being automated out of existence?
Published:
There is a meme you will see floating around the internet that comes in many forms, one version is shown in the header image above. It is part of the vague internet resistance to this new occupation. The response is somewhat justified, Data Scientist is a job title that requires no specific qualification, and garners differing opinions on what the core skill set is.
Published:
If you are interested in building predictive models on Big Data, then there is a good chance you are looking to use Apache Spark. Either with MLLib or one of the growing number of machine learning extensions built to work with Spark such as Elephas which lets you use Keras and Spark together.
Published:
A backpacker’s guide to surviving in Australia’s undead wasteland. A weird and twisted comedy adventure through the Australian zombie tourism industry.
Download here
Published:
A hungover young man in a youth hostel comes to terms with the grim reality of surviving the zombie apocalypse.
Download here
Published:
A young woman scours the streets of Berlin looking for signs that life is returning to the city. She clings to her memories of her missing family.
Download here
Published:
A choose your own adventure sequel to the Land Down Undead. You are a journalist from the UK on a bus tour of Australia’s zombie wasteland. What could go wrong?
Download here
Published:
Googad Magee is a children’s book about an old man struggling to find something good in his life.
A chance encounter with a happy go-lucky snail turns things around for Googad as he learns from her that it is very easy to appreciate what you already have. All proceeds from the sale of Googad Magee are donated to OzHarvest. An amazing organisation fighting food waste and feeding the needy. A childrens picture book about a sad old man who meets a happy snail. All proceeds donated to the Australian organisation OzHarvest.
Download here
Published:
Getting Data Science Done outlines the essential stages in running successful data science projects. The book provides comprehensive guidelines to help you plan and manage data science projects, communicate with clients, identify and mitigate issues, and finally deploy your solutions into production systems.
Published in Proteins: structure, Function, and Bioinformatics, 2007
Recommended citation: Hawkins, J., Mahony, D., Maetschke, S., Wakabayashi, M., Teasdale, R. and Boden, M., (2007). "Identifying Novel Peroxisomal Proteins" Proteins: structure, Function, and Bioinformatics. 69(3); 606-616.. https://link.springer.com/chapter/10.1007/978-3-540-78839-3_10
Published in Journal of Proteome Research, 2007
Recommended citation: Hawkins, J., Davis, L. and Boden, M., (2007). "Predicting Nuclear Localization" Journal of Proteome Research. 6(4); 1402-1409.. https://link.springer.com/chapter/10.1007/978-3-540-78839-3_10
Published in RECOMB, 2008
Recommended citation: Hawkins, J., and Bailey, T.L. (2008). "The Statistical Power of Phylogenetic Motif Models." RECOMB 2008 Proceedings; 112-126. . https://link.springer.com/chapter/10.1007/978-3-540-78839-3_10
Published in Bioinformatics, 2009
Recommended citation: Hawkins, J., Grant; C., Noble, W.S., and Bailey, T.L. (2009). "Assessing phylogenetic motif models for predicting transcription factor binding sites." Bioinformatics 25, i339-i347.. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2687955/
Published in Proteins: structure, Function, and Bioinformatics, 2010
Recommended citation: Teyra, J., Hawkins, J., Zhu, H., and Pisabarro, M. Teresa. (2010). "Studies on the inference of protein binding regions across fold space based on structural similarities." Proteins: structure, Function, and Bioinformatics. 69(3); 606-616.. https://www.ncbi.nlm.nih.gov/pubmed/21069715
Published in The Journal of Neuroscience, 2010
Recommended citation: Michael Piper, Guy Barry, John Hawkins, Sharon Mason, Charlotta Lindwall, Erica Little, Anindita Sarkar, Aaron Smith, Randal Moldrich, Glen Boyle, Shubha Tole, Richard Gronostajski, Timothy Bailey, and Linda Richards. (2010). "NFIA Controls Telencephalic Progenitor Cell Differentiation through Repression of the Notch Effector Hes1." The Journal of Neuroscience, July 7, 2010, 30(27):9127-9139.. https://www.ncbi.nlm.nih.gov/pubmed/20610746
Published in IEEE/ACM transactions on computational biology and bioinformatics, 2012
Recommended citation: Hawkins, J., Zhu, H., Teyra, J., and Pisabarro, M. Teresa. (2012). "Reduced False Positives in PDZ Binding Prediction using Sequence and Structural Descriptors." IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM. https://www.ncbi.nlm.nih.gov/pubmed/22508908
Published in PLoS One, 2016
Recommended citation: Ruiz-Gómez, Gloria., Hawkins, John., Philipp, Jenny., Künze, Georg., Löser, Reik., Fahmy, Karim., and Pisabarro, M. Teresa. (2016) "Rational Structure-Based Rescaffolding Approach to De Novo Design of Interleukin 10 (IL-10) Receptor-1 Mimetics" PLoS One. Apr 28;11(4) http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0154046
Published in CSEA 2020, 2020
Recommended citation: Hawkins, John. (2020) "Minimum Viable Model Estimates for Machine Learning Projects " Proceedings of the 6th International Conference on Computer Science, Engineering And Applications (CSEA 2020), Dec 18~19; 10(18) 10.5121/csit.2020.101803
Published in Software Impacts, 2021
Recommended citation: Hawkins, John. (2021) "MinViME/Minimum Viable Model Estimator" Software Impacts, Aug 01; Volume 9 https://doi.org/10.1016/j.simpa.2021.100073
Published in , 2023
Recommended citation: Hawkins, John. (2023) "Estimating Gaze Duration Error with Eye Tracking Data" Proceedings of the 2023 5th International Conference on Image, Video and Signal Processing Pages 70-75, Mar 25, 2023
Published in , 2024
Recommended citation: Hawkins, John. (2023) "OAGRE: Outlier Attenuated Gradient Boosted Regression" Proceedings of The Fifth International Conference on Artificial Intelligence and Computational Intelligence (AICI 2024) Hanoi, Vietnam
Published in , 2024
Recommended citation: Hawkins, John. and Tivey, David. (2023) "Literature Filtering for Systematic Reviews with Transformers" Proceedings of the 2nd International Conference on Communications, Computing and Artificial Intelligence (CCCAI 2024)
Published:
In this talk I summarised the content delivered at the workshop I attended on Evolutionary Game Theory with G-functions. The workshop was given by Tom Vincent at the University of Adelaide. Based on his book Evolutionary Game Theory, Natural Selection, and Darwinian Dynamics
Published:
Talk given for the students of the complex systems masters course at the University of Mexico City.
Published:
In this talk I presented the work done with Rodney Beard and Stuart McDonald on developing a multi-agent simualtion system for iterated game theory models of behaviour for coral reef fisheries.
Published:
In this talk I presented initial work done with Mikael Boden on the task of building machine learning systems to classify proteins that are bound for the nucleus after transcription. It involves the creation of new datasets, and evaluating a range of existing techniques.
Published:
In this talk I presented the work done with Mikael Boden on the task of designing evolutioning algorithms to create regular expression like motifs to distinguish proteins that carry the PTS2 motif. This is a difficult classification task due to the absence of large data sets and highly variable sequences in the signalling section of the proteins.
Published:
In this talk I presented the results of the research paper completed with Tim Bailey on the task of exploiting the phylogenetic information in comparative gene sequence alignments to try and improve the prediction of transcription factor binding site prediction.
Published:
In this presentation for the Institute for Molecular Bioscience at Queensland University I summarised some of the observations and conclusions that Tim Bailey and I had come to in working on the task of using information from gene sequence alignments to try and improve our ability to identify transcription factor binding sites.
Published:
In this talk I gave an overview of my work with Tim Bailey on the task of exploiting the phylogenetic information in comparative gene sequence alignments to try and improve the prediction of transcription factor binding site prediction.
Published:
In this BIOTEC Post-Doc Seminar Series talk I gave an overview of the algorithms used to search protein databases to look for functional motifs and active sites that determine biological function and potetnial biomedical applications.
Published:
In this Sydney Data Science Meet-Up talk I gave an overview of the history and reasoning that lead to the distinction beetwen Frequentist and Bayesian Inference. I give several worked examples and show the results of simulations designed to answer the question under which circumstances should we prefer one over the other.
Published:
In this Sydney Data Science Sponsored Meet-Up talk I gave an introduction of the idea of Model Factories, discussing the history of the idea and how it has lead to AutoML systems like DataRobot. Ultimately enabling us to build new forms of automated ML systems.
Published:
In this talk I gave a brief overview of the Red Queen effect that has been used in evolutionary biology to describe co-evolution of competing species. I apply this idea to the competition betweenm organisations that are using data science and machine learning to differentiate against their competitors.
Published:
In this invited talk for the Machine Learning & Deep Learning Day I presented a ground-up introduction to understanding the fundamentals of Bayesian Machine Learning. I introduced the idea of Bayesian statistics and described the connections between maximum likelihood, maxium a-posteriori and finally the Bayesian goal of a complete estimate of the posterior distribution. I introduced Markov Chain Monte Carlo and the Metropolis Hastings Algorithm. Finally I share some brief cautions on how people from freqeuntist machine learning tend to go wrong either through their expectations or implementations.
Published:
In this invited talk for the Selenium Day I presented an overview of the technical innovations that have led to modern machine learning successes with language processing. This involved discussing what is special about processing text, the fundamentals of recurrent processing, the development of attention and self-attention models, and finally how this led to the Transformer architecture.
Published:
Prioritization of machine learning projects requires estimates of both the potential ROI of the business case and the technical difficulty of building a model with the required characteristics. In this work we present a technique for estimating the minimum required performance characteristics of a predictive model given a set of information about how it will be used. This technique will result in robust, objective comparisons between potential projects. The resulting estimates will allow data scientists and managers to evaluate whether a proposed machine learning project is likely to succeed before any modelling needs to be done. The technique has been implemented into the open source application MinViME (Minimum Viable Model Estimator) which can be installed via the PyPI python package management system, or downloaded directly from the GitHub repository.
Published:
In this guest lecture at the Australian Graduate School of Management we discussed a range of fundamental ideas in analytics projects. All of these ideas relate to framing problems such that they have a greater chance of success.
Published:
In this guest lecture for data science and analytics students at Imperial College London we discussed the emergence of data science as a career in industry. We covered both the historical conditions that created the field, and the onging changes and challenges that people face with being technical detail oriented people working with a wide variety of different business people.
Published:
Eye tracking applications produce a series of gaze fixation points that can be attributed to objects within a subject’s field of vision. Error is typically measured on the basis of individual gaze fixation point measurements. These applications are often used to infer a gaze duration metric from a series of fixation measurements. There is no direct method for infering the error in a gaze duration measurement from an error in fixation points.
Published:
Consumers are expected to partially reveal their preferences and interests through the media they consume. The development of visual attention measurement with eye tracking technologies allows us to investigate the consistency of these preferences across the creative executions of a given brand and over all brands within a given vertical.
Published:
Contextual targeting is a common strategy that places marketing messages in media locations that are aligned with a target audience. The challenge of contextual targeting is knowing the ideal schema and the set of categories that provide the right audience. Refinement of the contextual targeting process has been limited by the use of metrics that are either rapid but unreliable (click through rates), or reliable but slow, expensive and inaccessible in real-time (conversions or brand awareness).