Introduction to Statistics &
Machine-learning with R

15 & 22 July, 2017, Sat (Course Code: SCIR0715)

10:00a.m. - 1:00p.m., Data Studio @ Science Park


Day 1

  • Introduction to R language, community and package
  • Data analysis fundamentals: mathematical expectations, standard deviation, estimation, hypothesis testing, linear regression, central limit theorem
  • Probability fundamentals: frequency and histogram, binomial distribution, normal distribution, Poisson distribution, exponential distribution, gamma distribution
  • Using ‘rattle’ package for data mining

Day 2

  • Marketing concept example: churn analysis on AT&T
  • Binary classification for customer retention
  • R packages for supervised classification: rpart, e1071, randomForest, nnet, xgboost
  • Prediction result evaluation: confusion matrix, training-testing regime, cross validation and ROC curve 


Target Participants: 

IT professionals, data analysts or data science passionate learners with solid programming background and would like to have a refresh view on emerging programming languages in big data domain. Students with science background or good at mathematics are encouraged for this introductory course. 


  • The course will be conducted in Cantonese with course materials in English
  • A certificate will be presented to the students with 100% attendance rate
  • The students are required to bring along with their own notebooks for the classwork sessions

Enquiry: email to or call +852 3188 7401

Supporting Organizer: Data Studio @ Hong Kong Science Park



Brain Tsang is Data Analytics Consultant at Radica. He is an active data scientist on Kaggle, a global crowdsourcing platform for predictive modeling and joined the big data analytics competition since 2010 with high ranking record. He has over 5 years of professional training experience on course content design and development for subjects in Information, Communications and Technology.