Application of Machine Learning – Infer gender from first name

Twitter does not ask for your gender or where you from when you sign up. Thanks to Machine Learning,  they can infer based on what you tweet and serve up relevant ads.

Perhaps Twitter is using “Infer gender from first name” approach based on your Twitter handle. A search on Google turns up some interesting tools and approaches that helps you to “Infer gender from first name”

There are times, they do not get it spot on. Ads for Ukraine woman seeking marriage do appears on my Twitter timeline on few occasions. 😊


Maybe I should tweet more about my family instead of just current affairs, data science, Singapore, living in Ireland and aviation  stuff.


So you called yourself a Data Scientist?

There should be a law where people can’t call themselves data scientist unless

  • the title being conferred by a recognised award association,
  • wrote a peer-reviewed IEEE published relevant research paper,
  • develop and contribute a R/Python package,
  • work with GB of structured and unstructured data daily
  • and at least a Master Degree.

Sem 3 Project Update 1

As I decide whether how I can use that data for my project, I revisited the proposal.

The proposal was intend for private rental not rental by local authorities or approved housing bodies (housing associations)

Such distinction make a difference. This mean data from local authorities or approved housing bodies (housing associations) can be drop and focus on data that matters.

The wrong set of data, other things being equal is going to adversely impact on the final outcome of the machine learning algorithm.

Happy Days!


Ultimate R Resources

R Resources that guides me when I am stuck. You can’t possibly remember every command.

View Ultimate R Resources on Hackpad.


What your census story?

Screenshot 2016-04-26 22.21.23
Looking at my own Electoral Division (CSO Area Code ED 02158) where I live, the population is 2,164 (1,266 males and 1,353 females). Of this, 2,110 respondents said they were Catholic.

There are 294 Asian or Asian Irish.

741 males were reported as single vs 727 females.

1,824 were reported as not being able to speak Irish.

1,090 population out of 2,159 are at work.

Only 21 out of 1,052 households had no central heating – the vast majority of 770 were heated by natural gas.

As for education, there are 8 in my local area with PhDs (5 males/3 females), 236 people gave their occupation as Professional Occupations.

716 houses (out of 1,052) have a computer, while 1 households have four or more cars.

466 of them drive to work, school or college.


Day 3 with RapidMiner

Hit the wall on the 3rd day with RapidMiner.

Instead of diving in as usual, today I decided to go over the tutorial that came together with the software. Screenshot 2016-04-26 18.01.35

Unlike R which is a command line tool, you type in to get thing done. RapidMiner offers a point and click approach. Every step of the process, there is dialogue box to alert you what you have overlook. The most common one is this.

Screenshot 2016-04-26 17.53.49

Frustrated, I feel like dropping them an email but decided to press ahead by doing something different. Trying out their tutorial.

I learnt that if you want to export your data to csv in RapidMiner, you use the Write CSV Operator.


However, I am not please with the output csv. I expecting all those data to be in each individual column.

Until then, I will keep exploring this tool.

As someone who learn thing visually, this tool serve as an aid for me to learn more about data mining techniques.

Most likely, this RapidMiner going to be the first tool I go to before writing a R command. When collaborating with others, the visual representation offered by RapidMiner can be less intimidating than confronting with lines of code.


Which R reference book for me?

View Which R book for me? on Hackpad.


Linkedin is not FaceBook


Much have said about Linkedin is not FaceBook which I agreed. However, We are more than our jobs. It can’t be the only thing that defines us fully. At the workplace, I am sure you don’t just talk about work all day long. You share your life favourite moment with your co-worker, your like and dislike, your take on a sporting event over the weekend. You put your family photo on the work desk. You “decorate” your work desk with bottles of hand lotion.

The best place to have the best of both world online – professionally and personal is Twitter. No one will censure you if you share what you eat for the day (just don’t over do it) and the next tweet, you share business idea or how to extract un-sample raw data from Google Analytics using Open Source tool.

Let connect on Twitter. I am at


Successfully connect RStudio on my PC to GitHub

Tinkering with Tech. Successfully connect RStudio on my PC to GitHub and push the changes.

The idea behind this tinkering is somewhat similar to Dropbox.  When you move a file to a folder and that file sync to  When you are not with your PC, you have access to the file from any PC.

In this case, instead of a file, any changes to your R code is being reflected on GitHub.  This helps when you collaborate with other people.  They are able to see what you have done when you use R to manipulate the data.


To get RStudio to push R code to Github, you got to install Git. RStudio can act as a GUI front-end for Git. This means one less software , a Git Client to install.

This is a new learning milestone for me. Make the geek in me happy.

This part of setting up Git for version control is not cover in NCIRL’s Higher Diploma in Science in Data Analytics.  If you enroll in one of those Coursera data science course, you are introduce to Git.

Here my R code on GitHub



First week into Semester 2

Communication module is rubbish. The class more like a TV talk-show with way too much interaction and at some point off topic.

Chatting with another course mates, I am not the only one that share this sentiment.  Of course as a data analyst student, 2 is not a good sample size to go by! Let see next week attendance.

This the only module that where I see my fellow course mates are more “interactive” with the lecturer.  You won’t see this level of participation at Statistics class 102 or Programming for Big Data.

The only thing I approve of this module is the requirement to produce a Reflective Journal.  In fact, it should be mandatory for all modules!

I suppose this subject was result of “ticking the box” exercise. Industry people expressed concern about data analyst who can’t communicate the insight they have gathered.  Rightly so. A data analyst who has problem communicating is dead in the water.

Business Analyst and Problem -Solving Techniques, I am sorry.  You are out of my bad book. Communication can you take the place instead?

The first week also saw two of my course mates whom I interacted more often than others no longer join me this semester 2.