We are searching for an ambitious go-getter who welcomes the challenge of meeting the needs of a hyper-growth startup. As a Sr. Data Engineer, you will be at the heart of Growth Protocol’s data infrastructure, playing a foundational role in building the systems that power our AI platform. Your work will directly influence product features, client outcomes, and strategic business decisions.
You will collaborate with Data Scientists, Backend Engineers, Client IT, and business stakeholders to build and maintain scalable pipelines that serve billions of rows of structured and unstructured data weekly, enabling high-impact insights across multiple industries.
THE ROLE
Growth Protocol is hiring a Senior Data Engineer to play a foundational role in building the systems that power our AI platform. You will be at the heart of Growth Protocol's data infrastructure, directly influencing product features, client outcomes, and strategic business decisions.
You will collaborate with Data Scientists, Backend Engineers, Client IT, and business stakeholders to build and maintain scalable pipelines that serve billions of rows of structured and unstructured data weekly, enabling high-impact insights across multiple industries.
Ideal candidates are ambitious go-getters who welcome the challenge of meeting the needs of a hyper-growth startup and bring deep technical expertise across modern data infrastructure and ML operationalization.
OBJECTIVES OF THE ROLE
Collaboration
- Work closely with Data Scientists to translate business and ML requirements into robust data workflows
- Ensure timely delivery of clean, reliable data to support model development and production features
Technical Development
- Engineer and manage scalable ETL architecture using Airflow, Snowpark, Cloud Run, and Apache Beam
- Design and implement a high-performance data infrastructure for seamless processing and integration
- Extract data from diverse online platforms
- Operationalize machine learning models, focusing on deployment, reliability, and performance
Data Connectivity
- Partner with client IT teams to identify the most efficient and secure methods for data ingestion including Snowflake Sharing, Databricks Delta Sharing, Private Link, and VPN
- Work alongside the Platform Engineering team to define requirements for secure networking paths that support high-performance data transfers
- Perform end-to-end testing of client connections to ensure data integrity and connectivity
- Integrate customer databases with our platform
Monitoring and Reliability
- Create and manage real-time monitoring systems for data ingestion and transformation pipelines
- Proactively identify and resolve issues to maintain high levels of system reliability and data integrity
REQUIRED SKILLS AND QUALIFICATIONS
- 5+ years of experience in Data Engineering
- Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
- Experience building data pipelines with robust unit and integration testing
- Proficiency in distributed computing frameworks including Apache Beam and Spark
- Functional understanding of enterprise networking including VPC peering, Private Link, and VPNs, with the ability to troubleshoot connectivity in a cloud environment
- Hands-on experience operationalizing ML models in production
- Familiarity with ML/AI, NLP, and Data Science workflows including MLFlow
- Deep understanding of ETL workflows, data modeling, and data architecture
- Strong debugging and problem-solving skills
- Excellent communication skills and experience collaborating across teams
Preferred Qualifications
- Experience working on enterprise products serving Fortune 500 clients across Financial Services, Industrials, and Consumer Products
- Prior startup experience
- Interest in current events, market dynamics, and emerging technologies
- Experience creating Agent Skills
- Familiarity with APIs and web scraping for data collection
- Familiarity with Graph Databases
TECH STACK
- Languages: Python, TypeScript
- Frameworks: Apache Beam, Spark, FastAPI, Airflow
- Cloud: Google Cloud Platform
- Data: Elasticsearch, Snowflake, Databricks, Neo4J, PostgreSQL, MongoDB, GCS
- Infrastructure and DevOps: Docker, Terraform, GitHub Actions, Cloud Run
- Frontend: Next.js
PERKS
- Competitive compensation and equity in a rapidly growing company
- 100% company-paid health, dental, and vision insurance plus 401(k)
- Pet-friendly office
Are you looking for remote jobs near your area? At Yulys, thousands of employers are looking for exceptional talent like yours. Find a perfect job now.