Home
Big Data Training
Apache Spark Training
SMACK Stack for Data Science Training Course

SMACK Stack for Data Science Training Course

SMACK is a collection of data platform softwares, namely Apache Spark, Apache Mesos, Apache Akka, Apache Cassandra, and Apache Kafka. Using the SMACK stack, users can create and scale data processing platforms.

This instructor-led, live training (online or onsite) is aimed at data scientists who wish to use the SMACK stack to build data processing platforms for big data solutions.

By the end of this training, participants will be able to:

Implement a data pipeline architecture for processing big data.
Develop a cluster infrastructure with Apache Mesos and Docker.
Analyze data with Spark and Scala.
Manage unstructured data with Apache Cassandra.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

This course is available as onsite live training in Sri Lanka or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

SMACK Stack Overview

What is Apache Spark? Apache Spark features
What is Apache Mesos? Apache Mesos features
What is Apache Akka? Apache Akka features
What is Apache Cassandra? Apache Cassandra features
What is Apache Kafka? Apache Kafka features

Scala Language

Scala syntax and structure
Scala control flow

Preparing the Development Environment

Installing and configuring the SMACK stack
Installing and configuring Docker

Apache Akka

Using actors

Apache Cassandra

Creating a database for read operations
Working with backups and recovery

Connectors

Creating a stream
Building an Akka application
Storing data with Cassandra
Reviewing connectors

Apache Kafka

Working with clusters
Creating, publishing, and consuming messages

Apache Mesos

Allocating resources
Running clusters
Working with Apache Aurora and Docker
Running services and jobs
Deploying Spark, Cassandra, and Kafka on Mesos

Apache Spark

Managing data flows
Working with RDDs and dataframes
Performing data analysis

Troubleshooting

Handling failure of services and errors

Summary and Conclusion

Requirements

An understanding of data processing systems

Audience

Data Scientists

14 Hours

Number of participants

Online

Classroom

Select Location

Please select a Venue

Price per participant

Open Training Courses require 5+ participants.

SMACK Stack for Data Science Training Course - Booking

Full name *

Email *

Phone *

Job Title

Company Name

Address 1 *

City *

State / Province

Country *

Postcode *

Start Date

Tax ID

Dates are subject to availability and take place between 09:30 and 16:30.

Payment *

Bank Transfer (Invoice, PO)

Debit / Credit Card

Comments

Allow Publishing Certificate

If you check this box the participants will receive an option to publish their course certificate on the NobleProg Certified Professional Catalogue.

Terms and Conditions *

I am an authorised representative of the above named client and I wish to book the above courses or services in accordance with NobleProg Terms and Conditions and Privacy Policy.

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

SMACK Stack for Data Science Training Course - Enquiry

Full name *

Email *

Phone *

Number of participants

Company Name

Company Address

How do you want to take the course?

Client Premises

Online

Classroom

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

SMACK Stack for Data Science - Consultancy Enquiry

Consultancy Enquiry

Full name *

Phone *

Email *

Company Name

Consultancy Subject *

Consultancy Goal

Consultancy Duration

Number of Consultants

Suitable Date

Who will the consultant work with?

Consultancy Urgency *

Comments

Inform me about discounts and promotions

Please read our Privacy Policy to find out how we use your data

Testimonials (1)

very interactive...

Richard Langford

Course - SMACK Stack for Data Science

Upcoming Courses

SMACK Stack for Data Science

2025-09-25 09:30

14 hours

Colombo - Classroom

210661 LKR (Online)

228661 LKR (Classroom)

SMACK Stack for Data Science

2025-10-09 09:30

14 hours

Dehiwala

210661 LKR (Online)

230661 LKR (Classroom)

SMACK Stack for Data Science

2025-10-23 09:30

14 hours

Kotte

210661 LKR (Online)

230661 LKR (Classroom)

SMACK Stack for Data Science

2025-11-06 09:30

14 hours

Kolonnawa Classroom

210661 LKR (Online)

230661 LKR (Classroom)

Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

21 Hours

This course is aimed at developers and data scientists who wish to understand and implement artificial intelligence in their applications. Special focus will be given to data analysis, distributed artificial intelligence, and natural language processing.

Anaconda Ecosystem for Data Scientists

14 Hours

This instructor-led, live training in Sri Lanka (online or onsite) is aimed at data scientists who wish to use the Anaconda ecosystem to capture, manage, and deploy packages and data analysis workflows in a single platform.

By the end of this training, participants will be able to:

Install and configure Anaconda components and libraries.
Understand the core concepts, features, and benefits of Anaconda.
Manage packages, environments, and channels using Anaconda Navigator.
Use Conda, R, and Python packages for data science and machine learning.
Get to know some practical use cases and techniques for managing multiple data environments.

Big Data Business Intelligence for Telecom and Communication Service Providers

35 Hours

Overview

Communications service providers (CSP) are facing pressure to reduce costs and maximize average revenue per user (ARPU), while ensuring an excellent customer experience, but data volumes keep growing. Global mobile data traffic will grow at a compound annual growth rate (CAGR) of 78 percent to 2016, reaching 10.8 exabytes per month.

Meanwhile, CSPs are generating large volumes of data, including call detail records (CDR), network data and customer data. Companies that fully exploit this data gain a competitive edge. According to a recent survey by The Economist Intelligence Unit, companies that use data-directed decision-making enjoy a 5-6% boost in productivity. Yet 53% of companies leverage only half of their valuable data, and one-fourth of respondents noted that vast quantities of useful data go untapped. The data volumes are so high that manual analysis is impossible, and most legacy software systems can’t keep up, resulting in valuable data being discarded or ignored.

With Big Data & Analytics’ high-speed, scalable big data software, CSPs can mine all their data for better decision making in less time. Different Big Data products and techniques provide an end-to-end software platform for collecting, preparing, analyzing and presenting insights from big data. Application areas include network performance monitoring, fraud detection, customer churn detection and credit risk analysis. Big Data & Analytics products scale to handle terabytes of data but implementation of such tools need new kind of cloud based database system like Hadoop or massive scale parallel computing processor ( KPU etc.)

This course work on Big Data BI for Telco covers all the emerging new areas in which CSPs are investing for productivity gain and opening up new business revenue stream. The course will provide a complete 360 degree over view of Big Data BI in Telco so that decision makers and managers can have a very wide and comprehensive overview of possibilities of Big Data BI in Telco for productivity and revenue gain.

Course objectives

Main objective of the course is to introduce new Big Data business intelligence techniques in 4 sectors of Telecom Business (Marketing/Sales, Network Operation, Financial operation and Customer Relation Management). Students will be introduced to following:

Introduction to Big Data-what is 4Vs (volume, velocity, variety and veracity) in Big Data- Generation, extraction and management from Telco perspective
How Big Data analytic differs from legacy data analytic
In-house justification of Big Data -Telco perspective
Introduction to Hadoop Ecosystem- familiarity with all Hadoop tools like Hive, Pig, SPARC –when and how they are used to solve Big Data problem
How Big Data is extracted to analyze for analytics tool-how Business Analysis’s can reduce their pain points of collection and analysis of data through integrated Hadoop dashboard approach
Basic introduction of Insight analytics, visualization analytics and predictive analytics for Telco
Customer Churn analytic and Big Data-how Big Data analytic can reduce customer churn and customer dissatisfaction in Telco-case studies
Network failure and service failure analytics from Network meta-data and IPDR
Financial analysis-fraud, wastage and ROI estimation from sales and operational data
Customer acquisition problem-Target marketing, customer segmentation and cross-sale from sales data
Introduction and summary of all Big Data analytic products and where they fit into Telco analytic space
Conclusion-how to take step-by-step approach to introduce Big Data Business Intelligence in your organization

Target Audience

Network operation, Financial Managers, CRM managers and top IT managers in Telco CIO office.
Business Analysts in Telco
CFO office managers/analysts
Operational managers
QA managers

Data Science Programme

245 Hours

The explosion of information and data in today’s world is un-paralleled, our ability to innovate and push the boundaries of the possible is growing faster than it ever has. The role of Data Scientist is one of the highest in-demand skills across industry today.

We offer much more than learning through theory; we deliver practical, marketable skills that bridge the gap between the world of academia and the demands of industry.

This 7 week curriculum can be tailored to your specific Industry requirements, please contact us for further information or visit the Nobleprog Institute website

Audience:

This programme is aimed post level graduates as well as anyone with the required pre-requisite skills which will be determined by an assessment and interview.

Delivery:

Delivery of the course will be a mixture of Instructor Led Classroom and Instructor Led Online; typically the 1st week will be 'classroom led', weeks 2 - 6 'virtual classroom' and week 7 back to 'classroom led'.

Data Science for Big Data Analytics

35 Hours

Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

Introduction to Graph Computing

28 Hours

In this instructor-led, live training in Sri Lanka, participants will learn about the technology offerings and implementation approaches for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using a Graph Computing (also known as Graph Analytics) approach. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.

By the end of this training, participants will be able to:

Understand how graph data is persisted and traversed.
Select the best framework for a given task (from graph databases to batch processing frameworks.)
Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel.
View real-world big data problems in terms of graphs, processes and traversals.

Jupyter for Data Science Teams

7 Hours

This instructor-led, live training in Sri Lanka (online or onsite) introduces the idea of collaborative development in data science and demonstrates how to use Jupyter to track and participate as a team in the "life cycle of a computational idea". It walks participants through the creation of a sample data science project based on top of the Jupyter ecosystem.

By the end of this training, participants will be able to:

Install and configure Jupyter, including the creation and integration of a team repository on Git.
Use Jupyter features such as extensions, interactive widgets, multiuser mode and more to enable project collaboraton.
Create, share and organize Jupyter Notebooks with team members.
Choose from Scala, Python, R, to write and execute code against big data systems such as Apache Spark, all through the Jupyter interface.

Kaggle

14 Hours

This instructor-led, live training in Sri Lanka (online or onsite) is aimed at data scientists and developers who wish to learn and build their careers in Data Science using Kaggle.

By the end of this training, participants will be able to:

Learn about data science and machine learning.
Explore data analytics.
Learn about Kaggle and how it works.

MATLAB Fundamentals, Data Science & Report Generation

35 Hours

In the first part of this training, we cover the fundamentals of MATLAB and its function as both a language and a platform. Included in this discussion is an introduction to MATLAB syntax, arrays and matrices, data visualization, script development, and object-oriented principles.

In the second part, we demonstrate how to use MATLAB for data mining, machine learning and predictive analytics. To provide participants with a clear and practical perspective of MATLAB's approach and power, we draw comparisons between using MATLAB and using other tools such as spreadsheets, C, C++, and Visual Basic.

In the third part of the training, participants learn how to streamline their work by automating their data processing and report generation.

Throughout the course, participants will put into practice the ideas learned through hands-on exercises in a lab environment. By the end of the training, participants will have a thorough grasp of MATLAB's capabilities and will be able to employ it for solving real-world data science problems as well as for streamlining their work through automation.

Assessments will be conducted throughout the course to gauge progress.

Format of the Course

Course includes theoretical and practical exercises, including case discussions, sample code inspection, and hands-on implementation.

Note

Practice sessions will be based on pre-arranged sample data report templates. If you have specific requirements, please contact us to arrange.

Accelerating Python Pandas Workflows with Modin

14 Hours

This instructor-led, live training in Sri Lanka (online or onsite) is aimed at data scientists and developers who wish to use Modin to build and implement parallel computations with Pandas for faster data analysis.

By the end of this training, participants will be able to:

Set up the necessary environment to start developing Pandas workflows at scale with Modin.
Understand the features, architecture, and advantages of Modin.
Know the differences between Modin, Dask, and Ray.
Perform Pandas operations faster with Modin.
Implement the entire Pandas API and functions.

Python Programming for Finance

35 Hours

Python is a programming language that has gained huge popularity in the financial industry. Adopted by the largest investment banks and hedge funds, it is being used to build a wide range of financial applications ranging from core trading programs to risk management systems.

In this instructor-led, live training, participants will learn how to use Python to develop practical applications for solving a number of specific finance related problems.

By the end of this training, participants will be able to:

Understand the fundamentals of the Python programming language
Download, install and maintain the best development tools for creating financial applications in Python
Select and utilize the most suitable Python packages and programming techniques to organize, visualize, and analyze financial data from various sources (CSV, Excel, databases, web, etc.)
Build applications that solve problems related to asset allocation, risk analysis, investment performance and more
Troubleshoot, integrate, deploy, and optimize a Python application

Audience

Developers
Analysts
Quants

Format of the course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

This training aims to provide solutions for some of the principle problems faced by finance professionals. However, if you have a particular topic, tool or technique that you wish to append or elaborate further on, please please contact us to arrange.

GPU Data Science with NVIDIA RAPIDS

14 Hours

This instructor-led, live training in Sri Lanka (online or onsite) is aimed at data scientists and developers who wish to use RAPIDS to build GPU-accelerated data pipelines, workflows, and visualizations, applying machine learning algorithms, such as XGBoost, cuML, etc.

By the end of this training, participants will be able to:

Set up the necessary development environment to build data models with NVIDIA RAPIDS.
Understand the features, components, and advantages of RAPIDS.
Leverage GPUs to accelerate end-to-end data and analytics pipelines.
Implement GPU-accelerated data preparation and ETL with cuDF and Apache Arrow.
Learn how to perform machine learning tasks with XGBoost and cuML algorithms.
Build data visualizations and execute graph analysis with cuXfilter and cuGraph.

Python and Spark for Big Data (PySpark)

21 Hours

In this instructor-led, live training in Sri Lanka, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.

By the end of this training, participants will be able to:

Learn how to use Spark with Python to analyze Big Data.
Work on exercises that mimic real world cases.
Use different tools and techniques for big data analysis using PySpark.

Apache Spark MLlib

35 Hours

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.

It divides into two packages:

spark.mllib contains the original API built on top of RDDs.
spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

Audience

This course is directed at engineers and developers seeking to utilize a built in Machine Library for Apache Spark

Stratio: Rocket and Intelligence Modules with PySpark

14 Hours

Stratio is a data-centric platform that integrates big data, AI, and governance into a single solution. Its Rocket and Intelligence modules enable rapid data exploration, transformation, and advanced analytics in enterprise environments.

This instructor-led, live training (online or onsite) is aimed at intermediate-level data professionals who wish to use the Rocket and Intelligence modules in Stratio effectively with PySpark, focusing on looping structures, user-defined functions, and advanced data logic.

By the end of this training, participants will be able to:

Navigate and work within the Stratio platform using Rocket and Intelligence modules.
Apply PySpark in the context of data ingestion, transformation, and analysis.
Use loops and conditional logic to control data workflows and feature engineering tasks.
Create and manage user-defined functions (UDFs) for reusable data operations in PySpark.

Format of the Course

Interactive lecture and discussion.
Lots of exercises and practice.
Hands-on implementation in a live-lab environment.

Course Customization Options

To request a customized training for this course, please contact us to arrange.

Related Categories

SMACK Stack for Data Science Training Course

Course Outline

Requirements

Testimonials (1)

Richard Langford

Course - SMACK Stack for Data Science

Upcoming Courses

SMACK Stack for Data Science

SMACK Stack for Data Science

SMACK Stack for Data Science

SMACK Stack for Data Science

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites