My Experience on CCA-175 Spark and Hadoop Developer Exam [Feb-2021]
Allow me to share with you Checklist, Tips, What to expect and much more. Stay Tuned !
Relieved right ? I can understand. Finally, you got some article which would throw some light on the latest (2021) CCA-175 exam pattern and prepare you for that 60 minutes.
Firstly, let me assure you — you are at the right place. I am strictly here to jot down what I didn’t find it. So, Let’s start mate.
Either you are an Techy, who has hands-on Big Data technologies OR you are a newbie in to this whole new stream. Relax, either way I am going to layout a simplified version on what to do and what not, and what to expect on your exam day.
- Motivation / Positivity / Mindfulness : I know, I know, this is not some health and motivation related article, but then why am I listing it ? Simply because this will boost your confidence. Keep yourself surrounded with like-minded people. Be positive and affirm yourself “You have passed CCA-175 Exam”.
- Big Data Lab / VM Book : Download Cloudera CDH VM from the official sources.
- Learning Sources : I am a fan of SparkByExamples. Alternatively, if you are not a fan of reading, opt for YouTube Videos of Durga Gadiraju, you can check out Udemy Scala/Python CCA 175 course.
- Strategy: What’s your commitment ? how much time can you spare in a day/week ? whatever it is. Open excel right away, commit to yourself which topics will you cover by when (most importantly). Deadlines are important, so is discipline.
What to Expect:
- GUI : It is a Remote desktop on your chrome browser. Once you start the exam, you would be logging in to the Remote Desktop where you have Mozilla Firefox open which has Questions. you need to open Terminal and get into Spark using Spark-shell or pyspark. BEWARE ! There is no timer, you need to ask Live Agent every time you want to know remaining time. Pro-Tip : note the time in your mind, when you started the exam; so it’s easy to see the remaining time from the windows Clock. 60 minutes it is.
- Big Data Components : RDD’s ? Data Frames ? Spark Streaming? Sqoop ? Kafka ? Hive ? Impala ? out of these, just focus on Data frames, Spark SQL, and hive. That’s it. Nothing more, nothing less.
- Flavor of Questions: you need to be very familiar with :
- Spark Read [ Reading Files ]
- Spark Write [ you guessed it right ! ]
- Type of File formats [ Avro, Parquet, JSON, CSV, Text ]
- Compression Methods [ gzip, snappy, and so on.. ]
- Hive [ Creating Tables ; Internal/External ; Custom Location Path ]
Yup! nothing more mate. Relax. Believe me, you can make it.
- You would be asked to read variety of files with different delimiters, either compressed or uncompressed.
- Achieve the KPI such as count, sum, casting to a particular data type, and most importantly- JOINS.
- Post this, you would be asked to write it to variety of files OR Hive Table with different delimiters, either compressed or uncompressed.
- Note it down: While you are learning, the first thing to do is notes.. this save you from forgetting little things which matter a lot. It would a blessing to go through before your exam. It’s a Fast forward RECAP.
- Go GO SQL : Are you hands-on with SQL ? then dear friend, it’s way more easy to write SQL and get your output instead of thinking which method to use.
- Time flies : No intention to scare you, but trust me time flies, so don’t be very relaxed while attempting the exam. Be alert. Tick.. Tick.. Tick..
- Screen Lags : You would notice the millisecond delay or so initially while working on the remote desktop, but get used to it.
- Shortcut Keys ? : Copy paste works very well using Ctrl+ Shift+ C (copy) and Paste as Ctrl+Shift+V. Practice this using your Linux CDH VM. Mind it, by mistake you press Ctrl+C and boom ! your terminal session gets ended. So Careful. Go for Right-Click -> Copy safer side.
- Comments: There would be 7 to 12 questions. Mostly around 9 to 10. So now, I made a habit to write down which question I attempted using “//” comment on the terminal E.g “//Completed 4,9” and once I am done with any question I would update this comment by pressing “Ctrl+R” to search and then update to “//Completed 4,9,1” — You can attempt questions in any sequence. Go with the easy one’s first.
- Validation: Post attempting the questions, ensure you perform 2 validations a) Count b) check the output using spark.read method. I suggest to do this once you have attempted all questions. If you do in between and you found some issue you would go fixing it and this would waste your time.
- Score Report: It takes around 24-hours from the exam time to get your results in your inbox. Score report would show which Problem you Passed and which you had (fingers crossed) failed.
- Certification: 3–4 days post exam, and you will get the most awaited email in your inbox. Unfortunately, while I am writing this there is no Digital badge of Cloudera such as “YourAcclaim” like AWS, Google, IBM which you can add to your credential profile. Hopefully, they would introduce this soon.
“Congratulations ! You have passed the exam successfully !” Get ready to see such email coming soon to your inbox. So there you go ! All the very Best !
Like it ? Claps please ! Suggestions ? Comments please.
“Follow” to get notified for more such upcoming articles !