Object Oriented Programming cource | Assoc.Prof. Abzetdin Adamov | Computer Engineering Department | Qafqaz University

COURSE SYLLABUS FOR: Introduction to Big Data Analytics

COURSE ID: INFT 4836
CREDITS: 6
CLASS TERM: Fall, 2017
INSTRUCTOR: Dr. Abzetdin Adamov
CLASS SCHEDULE: see My Calendar
JOB TITLES: Chief Data Officer, Big Data Solution Architect, Big Data Platform Engineer, Big Data Analyst, Big Data Analytics Business Consultant, Big Data Software Designer, Big Data Consultant, Hadoop Architects, Consultant Hadoop Developer, Senior Analytics Manager.

Course Description:

The Internet Services, Web and Mobile Applications, Pervasive Communication widely available today that are meeting many of our needs have stimulated production of tremendous amounts of data (call metadata, texts, emails, social media updates, photos, videos, location, etc.). Even with the power of today’s modern computers it still big challenge for business and government organizations to manage, search, analyze, and visualize this vast amount of data as information. Over 90% of this information is unstructured, what means data does not have predefined structure and model. Generally, unstructured data is useless unless applying data mining, data extraction and advanced data analytics techniques. At the same time, just in case if you can process and understand your data, this data worth anything, otherwise it becomes useless.

Big Data Analytics is scientific process of transforming data into knowledge large amounts of data into actionable knowledge (intelligent insights) enabling Data-driven decision making. Big Data Analytics is not brand new technology and existing many years before. But because of numbers of factors all come together now, this technology is becoming more and more important today.

In "Introduction to Big Data Analytics" is designed to provide students with fundamental knowledge on: reasons of Big Data problem, use-cases of Big Data by multi-sectoral industries, distributed architectures and platforms for Big Data storing and processing. Students will be introduced with architecture of Hadoop, HDFS and concept of MapRecuce. Several other key components of Hadoop Ecosystem will be introduced as well. Real data analytics and visualization will be accomplished using R programming language.

Course Objectives:

As the business environment becomes more sophisticated, the software development (software engineering is about managing complexity) is becoming increasingly complex. As of the best programming paradigm which helps to eliminate complexity of large projects, Object Oriented Programming (OOP) has become the predominant technique for writing software in the past decade. Many other important software development techniques are based upon the fundamental ideas captured by object-oriented programming. This aim of this course is to introduce to programming in Java in accordance with Object-Oriented Programming concept.

Learning Outcomes and Competences:

At the end of this course student will:
  • Understanding Big Data nature and what drives Big Data
  • How Big Data Analytics can effect business and bring them new opportunities and benefits
  • Big Data Management and Processing platforms that can handle Volume, Velocity and Veracity of the Data
  • Understanding in-depth the architecture of Hadoop, HDFS, MapReduce and other Hadoop Ecosystem components and how them leverage Big Data solutions
  • Understand and being able to apply Big Data Analytics lifecycle
  • Being able to identify Big Data problems in cross-sectoral industry and determine right methodology, techniques and tools to solve these problems

Prerequisites:

Prerequisites will not be applied for this course, but I suppose that you successfully passed or have knowledge/experience in following courses:
  • CSCI 2406 - Computer Organization & Architecture,
  • CSC 105 - Programming Principles I,
  • CSCI 3615 - Database Systems,
  • CSCI 2303 - Introduction to Computer Networks

Methods of instruction:

The class will be taught through lectures, including discussion around class examples/case studies, laboratory assignments and homework. Discussions based on student contributions add a vital and dynamic element to the class. Students are expected to come to the class with comments or questions from the course readings and actively participate in in-class discussions. Final project that will be assigned to student-groups of 2-3 students and their presentation will help to students to get experience of solving real data-driven problems of cross-sectoral business and share the experience they acquired to classmates.

Requirements

  • Exams and quizzes: Students will take 2 exams (midterm and final). These will be closed book (no books, no laptops or other devices) tests consisting of very limited number multiple-choice, open-ended test questions, problem solving using codding and understanding written code. Time and place will be communicated during the term.
  • Assignment/problem sets/projects/reports/research papers: Homework(s) in a form of weekly/monthly written team assignments will be given during the term. These will be software development documents for a hypothetical project. Each homework assignment will be based on the previous assignments, reflecting subsequent phases of the project. Laboratory Assignments and Final Projects will be assigned to teams of 2-3 students. Detailed information and the exact dates will be communicated during the term. The students will submit the homework assignments online and in hard copy. The homework will be graded based on clarity, technical soundness, thoroughness and coverage, relevance to provided standards and utilization of resources.
  • Knowledge and Skillset:Since this is not a beginning programming course, it is strongly recommended that students have fundamental knowledge and skills of programming.
  • Other requirement: Academic honesty is required in all stages of exams, assignments, labs and projects.

Assessment Methods and Criteria:

Midterm Exam25%
Labs and quizzes20%
Final Project15%
Final Exam35%
Attendance5%

Technology requirements:

  • Equipment: Students are encouraged to use their laptops to install required software, do appropriate platform settings, implement the class assignments and implementations.
  • Software: R and RStudio will be mostly used for Data Manipulation and Analytics. Limited time will be allocated for learning it why having programming skills are important (for some examples we’ll use Java also).

Software Installations:

In order to be able to run example codes, perform class tasks and exercises, homework assignments and final projects students must install following software...
Compulsory Software:

Tutorials:

Text Books and References:

Primary or required books/readings for the course:
  • Deepak Vohra, Practical Hadoop Ecosystem - A Definitive Guide to Hadoop-Related Frameworks and Tools, 2016, ISBN-13: 978-1-4842-2198-3, (electronic): 978-1-4842-2199-0
  • Dirk deRoos, Paul C. Zikopoulos, Bruce Brown, Rafael Coss, and Roman B. Melnyk, Hadoop for Dummies, 2014, ISBN: 978-1-118-60755-8
Supplemental or optional books/readings:
  • Jared P. Lander, R for Everyone: Advanced Analytics and Graphics, 2014, ISBN-13: 978-0-321-88803-7
  • Wes McKinney, Python for Data Analytics, 2013, ISBN 978-1-449-31979-3
  • John White, Hadoop: The Definitive Guide 4th ed., 2015, ISBN 978-1-491-90163-2
  • Judith Hurwitz, Alan Nugent, Dr. Fern Halper, and Marcia Kaufman Judith Hurwitz, Big Data For Dummies, 2013, ISBN: 978-1-118-50422-2
Students also are free to benefit from another books and reference guides of their own choice for OOP concept, Java programming, and Java technologies topics.

Course Syllabus (.PDF):

Download of .PDF version of course syllabus INFT-4836 Intro Big Data Analytics, Fall, 2017

Lab Assignments and Homework Tasks:

Each student must complete all of following Lab and Homework Assignments and submit them to email address (use a subject as "??? - ") before the deadline. Submission deadlines are the latest date/time for submission, so works submitted after the deadline will be awarded a zero.
 

Lab Assignments:

ANNOUNCEMENT:  

  • The first INFT-4836 Introduction to Big Data Analytics class will be held in September 11, 2017 - 08:30

Contents of the course (week topics):

Topic's Title Example Codes Documents
1 Introduction to Big Data Analytics
  • General Information and class policy
  • References and learning resources
  • Digital Universe Volume and trends
  • Why Data growth is so high?
  • What is Big Data and When it becomes Big?
  • Understanding 5Vs of Big Data
  • Data concept and format of the available Data
  • Big Data Landscape
 
2 New Business Opportunities from Big Data Analytics (Use Cases)
  • Data Science and what a Data Scientist does
  • Key skill-sets for Data Engineer and Data Scientist
  • List Big Data and Data Science Use Cases
  • Define Standard Parameters for Use Cases
 
3 Big Data Platforms, Hadoop Distributed File System (HDFS)
  • Understanding key differences of Big Data approach from Traditional IT approach
  • Common Big Data Architecture
  • Schema-on-Read vs. Schema-on-Write
  • Why Big Data becomes Hot Topic right now?
  • Introduction to Hadoop
  • HDFS and MapReduce as Key Components of the Hadoop
  • Understanding of architecture and working of HDFS and MapReduce
4 Setting Virtual Environment for Data Science
  • Installation and setting some of recommended tools for Data Scientist
5 Hadoop Ecosystem (Primary components: Hadoop, YARN, MapReduce, HDFS, Pig, Hive, HBase, Zookeeper, Sqoop, Flume)
  • Key components of Hadoop Ecosystem
  • Relation between Components
  • Classification of Hadoop Ecosystem’s Components according to assigned Duties
  • Core Hadoop’s Components
  • Data Access
  • Data Storage
  • Data Integration
6 Hadoop Installation and Configuration (Laboratory)
  • Minimum Technical Requirements
  • Step-by-Step Guide of Installation
7 Data Mining Techniques and Tools
8 Distributed Computing with Hadoop and MapReduce (YARN)
9 MapReduce Programming Concept
10 Brief introduction to programming in R
  • Installation, R Studio IDE, R packages for Data Management and Analytics.
  • R and Python – Which one to use?
11 Data Manipulation and Processing with R and Python
  • Standard Structures available in R and Python.
  • Advanced Data Structures.
12 Big Data Statistical Analytics and ML Algorithms
13 Text Analytics
14 Final Projects Presentations
 
15 Big Data Security and Privacy
 

Course Policies

General:

Respect due dates - No late projects, lab assignments, papers, or quizzes will be accepted unless you have made prior arrangements with the instructor. Quizzes will be announced at least one week before the actual date, will be no quizzes allowed after the established deadline.
Since this course is about computer programming, it is strongly recommended that you have a working computer at home connected to the Internet. Most of course matherials are located online and it's also good for your success to use recommended additional online tutorials.

Student contributions:

The Students are expected to follow to instructor's recommendations when preparing homeworks, tasks, laboratory reports, tests, theses, etc.
  • Communicate in a professional manner with instuctors and/or classmates (including online communication)
  • Participate actively in-class discussion topics
  • Complete all in-class and homework-tasks/projects in time
  • Most successful students are expected to make proposal for in-class seminar on a original topic that is not covered by course
  • Preview course materials for the topic is going to be covered that week, before class starts

Bonus Project:

Bonus is optional project and each student can offer his own unique topic within bounds of the course or request one from instructor. Each student who pretend for bonus points should prepare paper according to the announced standards and present it in class. Bonus points will be added to your final grades just in case of success for course without considering bonus points. Part of bonus points may come from participation points related with your attendance.

The Student Conduct Code:

Students are expected to follow to general rules of condact and behavior. Students have to be familiar with regulations described in the ADA University Student Handbook.
To avoid distractions late students are asked NOT to enter the class after the doors are closed. In particular, excessive and loud talking, leaving and reentering class without permision, cell phones using, or other means of disrupting the class will not be tolerated and students may be asked to leave the class. Students who constantly disrupt class may be asked to leave permanently and will fail the course. Be responsible for your own actions.
Important: All cell phones and other gadgets should be turned OFF, they may not be used in the classroom and students are NOT allowed to leave room to use their cell phones.

Attendance:

It is expected that students will attend and participate actively in all classes during semester. One of the the key factors in success in this course is to attend to all scheduled classes and be actively involved in learning process. Attendance is an indispensable element of the educational process. In compliance with Azerbaijani legislation, instructors are required to monitor attendance and inform the Registrar and the Dean of the respective School when students miss significant amounts of class time. Azerbaijani legislation mandates that students who fail to attend at least 75% of classes will fail the course.
ADA attendance policy excuses two (2) student absences, though these should reflect a serious need on the student’s part to be away from class. In case of involuntary and unpredictable serious disruption of normal life, students may appeal to a grievance procedure through Office of the Dean of the School of Education.
Student is responsible for all work missed during his absences.

Academic Dishonesty - Cheating:

All graded projects must be your own work. My strong recommendation - "If you are not able to do more, do less, but do them right and by yourself". An act of academic dishonesty or plagiarism may result in failure for a project or in a course. Plagiarism involves representing another person's ideas or outcomes, including material from the Internet, as your own. Cheating or acts of academic dishonesty also include fabricating data and results, copying, and offering or receiving unauthorized assistance or information from another person. Students involved in activities such as cheating and/or plagiarism will be subject to disciplinary action in accordence with ADA University Student Disciplinary Regulation.

Communication:

eMail is a Prefered Communication Tool. Be awared that only emails that sent from your ADA University @ADA.EDU.AZ account will be considered - CLICK to Sign In

Meeting Hours and Appointment:

  • For meeting follow to Office Hours (check my online Calendar or on-door calendar)
  • You can apply for meeting by appointment beyond the stated office hours via email
  • Only emails that has been sent from @ADA.EDU.AZ will be considered

Please let me know if you do not understand any topic, concepts or projects in bounds of this course. Feel free to clarify any questions you may have during class, or via email.