I remembered my 1st data analyst interview when I was kindly advised against doing A to Z. The takeway of the interview is I got to focus.
In bigger organisations, there exists separate roles in the field of data analytics. Each data profession contributes his or her area of expertise from Architecting to Modeling. In smaller outfit, the same data analyst might be doing end to end work.
Creating a system that can take in data in its raw form, process it, store it, use it and get the results to the program or end user that needs them. Open-source technologies like Hive, Pig and Spark are the tools of trade. If the organisation is a “Microsoft Shop”, there is a certified exam. Mostafa has a good run down on preparing for the exam. For smaller entity, this architecting part might be outsource to the cloud. I do not see myself working in this area.
Creating a plan to collect enough data of the right type to give meaningful results.
Creating clean and useful data from messy real world collections from disparate sources and concatenate them together .
Creating a mathematical representation of your data.
What is my focus?
I would love to do Experiment Design, Data Wrangling and Data Modeling all together. It a multi-step process of data analysis. These steps brings variety to the job.
Instead of diving in as usual, today I decided to go over the tutorial that came together with the software.
Unlike R which is a command line tool, you type in to get thing done. RapidMiner offers a point and click approach. Every step of the process, there is dialogue box to alert you what you have overlook. The most common one is this.
Frustrated, I feel like dropping them an email but decided to press ahead by doing something different. Trying out their tutorial.
I learnt that if you want to export your data to csv in RapidMiner, you use the Write CSV Operator.
However, I am not please with the output csv. I expecting all those data to be in each individual column.
Until then, I will keep exploring this tool.
As someone who learn thing visually, this tool serve as an aid for me to learn more about data mining techniques.
Most likely, this RapidMiner going to be the first tool I go to before writing a R command. When collaborating with others, the visual representation offered by RapidMiner can be less intimidating than confronting with lines of code.
Much have said about Linkedin is not FaceBook which I agreed. However, We are more than our jobs. It can’t be the only thing that defines us fully. At the workplace, I am sure you don’t just talk about work all day long. You share your life favourite moment with your co-worker, your like and dislike, your take on a sporting event over the weekend. You put your family photo on the work desk. You “decorate” your work desk with bottles of hand lotion.
The best place to have the best of both world online – professionally and personal is Twitter. No one will censure you if you share what you eat for the day (just don’t over do it) and the next tweet, you share business idea or how to extract un-sample raw data from Google Analytics using Open Source tool.
Tinkering with Tech. Successfully connect RStudio on my PC to GitHub and push the changes.
The idea behind this tinkering is somewhat similar to Dropbox. When you move a file to a folder and that file sync to dropbox.com. When you are not with your PC, you have access to the file from any PC.
In this case, instead of a file, any changes to your R code is being reflected on GitHub. This helps when you collaborate with other people. They are able to see what you have done when you use R to manipulate the data.
To get RStudio to push R code to Github, you got to install Git. RStudio can act as a GUI front-end for Git. This means one less software , a Git Client to install.
This is a new learning milestone for me. Make the geek in me happy.
This part of setting up Git for version control is not cover in NCIRL’s Higher Diploma in Science in Data Analytics. If you enroll in one of those Coursera data science course, you are introduce to Git.