What Skills Are Required To Be A Hadoop Developer?


Hadoop Developer Requirements

Different employers have different business cases hence they have different implementation requirements So they need varied skill set from Hadoop eco system.

This article explains a 360-degree details of Hadoop developer skills and then present few examples of Hadoop developer skill-set required by employers.

360 Degree Overview of a Hadoop DeveloperWhat Tasks Employers Expected From An Experienced And Strong Hadoop Developer To Do ( A 360-Degree View)


– Design and Build distributed, scalable, and reliable data pipelines that ingest and process data at scale and in real-time.

– Source huge volume of data from diversified data platforms into Hadoop platform

– Perform offline analysis of large data sets using components from the Hadoop ecosystem.

– Provide technical and development support to the government client to build and maintain a modernized Enterprise Data Warehouse (EDW) by expanding the current on-premises Hadoop cluster to accommodate an increased volume of data flowing into the enterprise data warehouse

– Leads implementation (installation and Configuration) of HDP 2.2 with complete cluster deployment layout with replication factors, setup NFS Gateway to access HDFS data, resource managers, node managers & various phases of Map Reduce Jobs. Experience with configuring workflows and deployment using tools such as Apache Oozie is necessary.

– Monitor workflows and job execution using the Ambari UI, Ganglia or any equivalent tools. Assisting administration in commission and decommission of nodes, back up and recover Hadoop data using snapshots & high availability. Good understanding of rack awareness and topology is preferred.

– Develops, implements, and participates in designing column family schemas of Hive and Hbase within HDFS. Experience in designing Hadoop flat and Star models with Map Reduce impact analysis is necessary.

– Recommends and assists with the development and design of HDFS – hive data partitioning, Vectorization and bucketing with Horton works Big Insights query tools. Perform Day to Day operational tasks using flume and Sqoop insight data to different RDBMS. Expertise in java scripts, UNIX shell scripts to support custom functions or steps is required.

– Develops guidelines and plans for Performance tuning of a Hadoop/NoSQL environment with underlying impact analysis of Map-reduce jobs using CBO and analytical conversions and. Implement a mixed batch / near-real time architecture to analyze, index, and publish data for applications. Write a custom reducer that reduces the number of underlying Map Reduce jobs generated from a Hive query. Helps with cluster efficiency capacity planning and sizing.

– Develops efficient Hive scripts with joins on datasets using a variety of techniques, including Map-side and Sort-Merge joins with various analytical functions .Experience with advanced Hive features like windowing, CBO,views and ORC files and compression techniques are necessary. Perform development of jobs to capture CDC (Change Data Capture) from Hive based internal, external and managed systems.

– Perform data formatting involves cleaning up the data.

– Assign schemas and create HIVE tables

– Apply other HDFS formats and structure (Avro, Parquet, etc. ) to support fast retrieval of data, user analytics and analysis

– Assess the suitability and quality of candidate data sets for the Data Lake and the EDW

– Someone capable of writing Java based Crunch or Cascading pipelines. This person would be proficient in Java.

– Develop applications that can interact with the data in the most appropriate way, from batch to interactive SQL or low latency access using latest tools – Hortonworks Data Platform (HDP) preferred.

– Troubleshoot and debug Hadoop ecosystem run-time issues

– Recovering from node failures and troubleshooting common Hadoop cluster issues

– Automate operation, installation and monitoring of Hadoop ecosystem components in our open source infrastructure stack; specifically: HDFS, MapReduce, Yarn, HBase , Oozie, Hive, Tez, Kafka, Storm

-The ideal candidate will dream about distributed systems for the parallel processing of massive quantities of data, be familiar with Hadoop/Pig/HBase and MapReduce/Sawzall/Bigtable, and frequently think to themselves, ‘Yeah, that works for 500 MB of data; what about 500 TB?’

Hands On Programming Related Skills Expected From A Hadoop Developer (e.g.Programming Language – Java)

– Spring framework + JVM, JAXP, JAXB, and JAX-WS.
– JVM, Java XML – JAXP, JAXB, and JAX-WS.
– Java Enterprise Ecosystem, Hadoop/Hive/MapReduce/Pig/Sqoop/Flume, SQL, XML, JSON, Unix, ESB, RESTful Web Services

Hands On Hadoop Developer Related Skills

– Ingesting data into Apache Hadoop or Apache Accumulo
– Scripting languages e.g. Perl or Python or Scale
– Experience as an Open Source software contributor, including GitHub, code.Google.com, or Apache.com
– ElaticSearch, Hadoop, Kafka, Zookeeper, Spark, NoSQL, Storm
– HDFS,MapReduce, Pig, Hive, Streaming, Cascading,Apache Spark
– Defining job flows using Hadoop Scheduler, e.g. Apache Oozie, Apache Falcon, Apache Zookeeper , Apache Flume, Apache Storm
– Good working knowledge in NoSQL DB (Mongo, Hbase, Casandra)
– Experience building ETL frameworks in Hadoop using Pig/Hive/Map reduce
– Understanding of various Visualization platforms (Tableau, Qlikview, others),Knowledge on ETL Tools (Informatica, Talend,Pentaho, etc.)
– ETL (Syncsort DMX-h, Ab Initio, IBM- InfoSphere Data Replication, etc.), mainframe skills, JCL
– OLTP and OLAP data modeling experience
– Cloud computing platforms and services, e.g. AWS.
– Strong knowledge of open source monitoring tools such as Nagios, Ganglia, OpenTSDB

Misc Software Engineering Skills

– Strong object-oriented design and analysis skills
– Excellent technical and organizational skills
– Excellent written and verbal communication skills
– Design Patterns
– Developing multithreading application
– Design and prepare technical specifications and guidelines.
– Ability to understand business problems and translate them into technical requirements
– Strong oral and written communication skills and be able to communicate complex technical knowledge in laymans terms
– Comfortable with building multi-threaded, multi-server applications in Java utilizing distributed computation systems

Job Titles Offered by Employer

Big Data in CloudData Analytics in Cloud

Hadoop ETL Engineer,Software Engineer Hadoop ,Senior Software Engineer Hadoop , Senior Software Engineer Backend (custom ETL/Hadoop data),Hadoop Engineer, Hadoop Developer

It all depends how you wrap your skill set in your resume and market to different employers, Below are few examples of Hadoop Developer Positions @Seattle,WA,USA



Example -1

Year 2015 Amazon Interview QuestionsHow amazon defines an Hadoop Engineer position on their EMR(Elastic Map Reduce) team, and what typical skills set require to invite the applications :

What is the Hadoop Engineer position?

EMR is looking for candidates strong in Java development, RPM packaging, Puppet, and Linux. We are building out our stack of Hadoop ecosystem applications and need to grow our team of Open Source Apache project experts. In this position you will bring new applications to EMR, keep us up to date with the latest versions, help customers use those applications, and develop and contribute open source code within the applications to add features and fix customer issues. The sky is the limit with what can be achieved.

• Experience with Apache Hadoop ecosystem applications: Hadoop, Hive, Oozie, Pig, Presto, Hue, Spark, Tachyon, Zeppelin and more!
• 3+ years Java development experience.
• AWS or other cloud vendor experience.
• Commits on Apache Hadoop ecosystem projects.

Example -2 

Another Big Data Engineer Opportunity

We’re looking for a Big Data/Hadoop Software Engineer to join our team.

In this role you will work on our big data analytics systems, having responsibility for processing user behavior and geospatial data throughout our platform based on Java, Python, Hadoop, and NoSQL. You’ll be responsible for systems that answer user-interactive analytic queries about how people behave in the real-world, segmenting that data, combining it with other internal and external data sources, and providing insights to front-end systems and customers.

As a Big Data Engineer At Placed, You Will
Build Hadoop processing pipelines to prepare data for user interactive analytics
Build systems to pull meaningful insights from our data platform
Integrate our analytics platform internally across products and teams
Focus on performance, throughput, latency and drive these throughout our architecture
Write test automation, conduct code reviews, and take end-to-end ownership of deployments to production

Our Ideal Engineer Will Have
2+ years of software development experience
Experience with Java and/or C#, web services, and Hadoop/EMR (or other Map/Reduce) data processing pipelines
Experience working with analytics systems (e.g. OLAP, BI tools) and semi-structured data (e.g. NoSQL, MongoDB, etc.)
Python and/or Ruby is a plus
Experience with AWS, EMR, geospatial data, Python, or Ruby would be a bonus

Example -3

Rooted Trees Data Structure

– Must be great with data, curious about how the world of digital marketing manifest itself in log files and other large data sources and able to solve problems pragmatically while delivering scalable, reliable solutions.
– This role will be responsible for defining, designing and developing big data applications, specially using Hadoop [Map/Reduce] by leveraging frameworks such as Cascading and Hive

Example – 4

Red Black Trees Data Structure

The Business Intelligence/Data Warehouse team is looking for a Big Data Hadoop Engineer in Platform Services to build a core set of our high-demand data services to form a service-oriented infrastructure to meet the company’s growth trajectories in both data volume and analytic needs. You will work with stakeholders to understand their information needs and translate these into business functional and technical requirements and solutions, and leave your mark in formulating the solutions using the newest technologies. These solutions shall be robust, reusable, and highly scalable, and as a critical part of our analytics backbone shall be used to drive business decisions for both internal clients and external partners. You will design and develop service oriented applications that encompass both ETL processes in the backend and the web service presentation layer that are performant and scalable. You will ensure data quality and data governance, create/maintain data dictionary and related metadata. You will own all phases of development lifecycle from gathering business requirements, design and modeling, development, deployment, and support.

2+ years’ experience in database development, reporting, and analytics.
2+ years’ experience in web service or middle tier development of data driven apps.
Knowledge and experience in “big-data” technologies such as Hadoop, Hive, Impala
Experience and ability to demonstrate advanced proficiency, in writing complex SQL queries and stored procedures

Previous articleRED-BLACK Tree Data Structure
Next articleHow You Rate Yourself In Linux And Networking Technology ?
Since last 15 years in different geographical locations, Sumit prepared hiring format for several hiring managers/teams to hire the balanced talents and interviewed talents on the various stages of their selection process. He also interviewed by hundreds of companies in different geographical locations.His best conclusion for hiring teams and candidate is to prepare in advance. Here ‘advance’ means keep your interview book ready and continue to update it even you are not going to interview candidates or applying for any job in next six months.