Imp Note: Worker can be remote, but must support East Coast Hours. Local preferred. If they are to be remote, they need to travel once a month, or so to onsite.
We are looking for a Big Data Engineer that will work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.
• Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
• Implementing ETL process using APACHE NIFI
• Monitoring performance and advising any necessary infrastructure changes
• Defining data retention policies
• Proficient with Spark and SparkR.
Skills and Qualifications
• Proficient understanding of distributed computing principles.
• Proficiency with ETL infrastructure such as Nifi, Talend.
• Management of Hadoop cluster, with all included services such as Hive,HBase,mapReduce and Sqoop
• Ability to solve any ongoing issues with operating the cluster and identifying performance bottlenecks.
• Proficiency with Hadoop v2, MapReduce, HDFS
• Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming.
• Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
• Experience with Spark and SparkR
• Experience with integration of data from multiple data sources
• Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
• Knowledge of various ETL techniques and frameworks, such as Flume
• Experience with various messaging systems, such as Kafka.
• Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O
• Good understanding of Lambda Architecture, along with its advantages and drawbacks
• Experience with Cloudera/MapR/Hortonworks