Data Engineer
About this position
We are looking for a savvy data engineer to join our team of data heroes. You will be responsible for designing and building big data architecture pipelines for data lakehouses in cloud, as well as optimizing and productionizing machine learning and predictive models.
Responsibilities
• Design and implement data ingestion and processing of various data sources using public cloud (MS Azure, AWS, GCP) big data technologies like Databricks, AWS Glue, Azure DataFactory, Redshift, Kafka, Azure Event Hubs, AWS Step Functions, AWS Lambda, Azure Functions etc.
• Collaborate with Business Intelligence consultants and assemble large and complex data sets that meet functional / non-functional business requirements for data lakehouse.
• Support data scientist / analyst teams in deployment and optimization of AI / Machine Learning models and other data algorithms in services like AWS SageMaker or Azure ML.
• Develop data pipelines to provide actionable insights into marketing automation, customer acquisition and other key businesses areas.
• Develop DevOps automation of continuous development / test / deployment processes.
• Document implemented data pipelines and logic in a structured manner using Confluence, plan your activities using Agile methodology in Jira.
• Work with stakeholders to assist with data-related technical issues and support their data infrastructure needs, like optimizing existing data delivery, re-designing infrastructure for greater scalability, etc.
• Support pre-sales by proposing technical solution and accurate effort estimate
Requirements
• Experience in building and productionizing big data architectures, pipelines and data sets.
• Understanding data concepts and patterns of big data, data lake, lambda architecture, stream processing, DWH, BI & reporting.
• Min. 2+ years of experience in a Data Engineer role, who has attained experience using the following software/tools:
• Experience with big data tools like Hadoop, Spark, Kafka, etc.
• Experience with object-oriented / functional / scripting languages like Python, Scala, Java, R, C++, Bash, PowerShell etc.
• Experience with MS Azure (Databricks, Data Factory, Data Lake, Azure SQL, Event Hub, etc.) or AWS (Glue, EC2, EMR, RDS, Redshift, Sagemaker, etc.) cloud services.
• Implementing large-scale data/events oriented pipelines/workflows using ETL tools.
• Extensive working experience with relational (MS SQL, Oracle, Postgres, Snowflake, etc.) and NoSQL databases (Cassandra, MongoDB, Elasticsearch, Redis, etc.)