Introducing KitKat Series: A Hub to practice Big Data Projects

Image: Introductory image of Big Data
  • It is a series of technical blogs (published in Medium) containing mini Big data Hadoop projects from various domains such as Healthcare, Telecom, Government, Finance, etc.
  • It signifies Short, and sweet 🍫. No rocket science, I wanted the name to be relatable with its content.
  • First, Any project(s) start for a reason, it could be Solving a problem, enhancing existing process, blah-blah.. and to make better business decisions.
  • Hence, All my blogs will start with problem statement or a goal-statement to enable business decisions.
  • We will then draw out the available sources and its origin and connect with our data pipeline and decide which component of Hadoop eco-system to be used and where.
  • Initiate the data pipeline Hands-on. Steps for each task, would be briefed and the source code will also be available in my GitHub repo💻. mihirdhakan93 (Mihir Dhakan) (github.com)
  • Follow the steps mentioned in blog or GitHub Repo and perform on your own.
  • Let’s be very honest here, It would not be as full-fledged as in the industry, but you would be able to figure out the different aspects of data pipelines and how it works.
  • Willingness to learn and get hands dirty. Just by reading the blog, you won’t learn —indeed, bitter truth.
  • Beginner knowledge of Hadoop, hive, hdfs, scala, python, sqoop, mysql, etc. If you don’t fall under this, Don’t worry, it’s not going to be that tough, give a try mate; better ‘try’ than never.
  • Data — I would be sharing some open-source Data repo’s in each blog which we will use throughout.
  • Cluster — I would be using Big Data Cloud Cluster provided by some providers such as CloudxLab for nominal charges. I am big fan of cloud, but you are free to use it in your local, virtual machines, etc.
  • Hive, HDFS, Kafka, Spark, Scala, Python (libs such as Pandas), MySQL, File storage methods such as Avro, Parquet, ORC
  • Follow me ( I mean, Follow on Medium 😉) and subscribe to get a notification.

--

--

--

Data Evangelist | Blogger | Super Optimist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Logistic regression the Bayesian way (PART 2) (how to achieve the S-shaped function)

How To Evaluate a Data Catalog…

What’s in a name? Plenty of possibilities for analytics

If I had to start learning data science again, how would I do it?

Why the next BI tool will be a data marketplace

An A/B Test Loses Its Luster If A/A Tests Fail

Experimentation and Causal Inference A Statistical Approach to A/A Tests What it is? Why do you need? How to do it?

Ever since I was a little boy, I have only ever wanted to be a Doctor.

Why Companies Need Data Scientists For Product Innovation

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Mihir Dhakan

Mihir Dhakan

Data Evangelist | Blogger | Super Optimist

More from Medium

How to build a consolidated PPT for a dashboard filtered on a dimension (Each Slide per value in…

RapidMiner to build and visualize data science workflow

Experience with Adobe Analytics

Adobe Analytics Challenge 2021

How Data Science Is Helping Grocery Delivery Business | HData Systems