Job Description
About Us
At IR Labs, we are on a mission to revolutionize the way businesses harness the power of data. We are not just building products; we are shaping the future of business innovation. Our mission is to create cutting-edge AI solutions that redefine industries and enhance everyday life for our customers. Our transformative AI and analytics solutions are designed to unlock new insights, drive innovation, and create competitive advantages for our customers. We are a passionate team of innovators dedicated to building groundbreaking technology. Join us as we lead the way in AI and analytics, transforming visionary ideas into impactful solutions. Together, we will redefine what it means to innovate and succeed in the digital age.
Job Description
Are you a talented Data Infrastructure Engineer looking to make a significant impact in a rapidly evolving AI and machine learning innovation lab? Do you thrive in a fast-paced setting where your work bridges the gap between data infrastructure, MLOps, and scalable cloud-native architectures? If you have a passion for building high-performance, real-time data systems that power cutting-edge AI applications, we want you on our team!
As a Data Infrastructure Engineer at IR Labs, you will play a foundational role in designing, implementing, and scaling mission-critical data systems and workflows. You’ll work closely with machine learning engineers, backend developers, and DevSecOps teams to create robust data pipelines, real-time streaming architectures, and automation frameworks that accelerate AI innovation. If this sounds exciting to you, then we need to talk!
What You’ll Do
- Serve as the foundational data infrastructure engineer, responsible for designing, implementing, and scaling the core data systems and workflows to empower machine learning engineers (MLEs) and software engineers to independently build and operate data pipelines.
- Build and maintain stream-first data architectures (Kappa) using tools like Apache Kafka, Apache Flink, or Spark Streaming, ensuring low-latency, real-time data processing at scale.
- Develop and implement infrastructure-as-code (IaC) solutions with tools like Terraform or AWS CloudFormation, ensuring scalable, repeatable deployments of data infrastructure on AWS.
- Automate infrastructure operations, data pipeline orchestration, and CI/CD workflows using GitHub Actions, ArgoCD, and configuration management tools like Ansible.
- Partner closely with MLEs, backend developers, and DevSecOps professionals to create a unified platform that supports data lake, data streaming, and MLOps pipelines.
- Enable self-service capabilities by building SDKs, APIs, and portal integrations (e.g., Backstage) for seamless onboarding and management of data workflows by other engineers.
- Ensure observability and reliability by deploying monitoring (Prometheus, Grafana), logging (OLK stack + Fluentd), and tracing (Jaeger) across data workflows and infrastructure.
- Design and implement robust data security, IAM policies, and secrets management solutions leveraging AWS IAM, AWS Secrets Manager, and AWS Security Hub.
- Research and evaluate emerging data and MLOps technologies, such as Flyte, Ray, Triton Inference Server, and Databricks, to ensure the platform evolves with industry best practices.
- Establish and promote best practices for data infrastructure automation, stream processing, and scaling workflows across batch and real-time use cases.
Qualifications
Data Infrastructure Expertise
- Extensive experience (8+ years) building and operating scalable, real-time data infrastructure with a focus on stream processing using tools like Kafka, Apache Flink, or Spark Streaming.
- Strong expertise in data lake/warehouse architectures using Delta Lake (Databricks) with S3 or similar backing stores.
- Proficiency with MLOps tooling, including orchestration (e.g., Flyte), experiment tracking (e.g., MLFlow, Weights & Biases), and model serving (e.g., Triton, Ray, vLLM).
- Solid understanding of container orchestration with Kubernetes (AWS EKS) and containerization using Podman or Docker.
Cloud Data Services & Security
- Hands-on experience with AWS services for data infrastructure, including S3, Glue, Lambda, Redshift, and Athena.
- Proven ability to design and enforce data security best practices, including IAM, role-based access control (RBAC), and centralized secrets management (AWS Secrets Manager).
- Familiarity with metadata management solutions such as Unity Catalog (Databricks) for governance, lineage tracking, and compliance.
Automation & Observability
- Demonstrated ability to automate the end-to-end data pipeline lifecycle, including provisioning, monitoring, and scaling using tools like GitHub Actions, ArgoCD, and Terraform.
- Experience implementing observability for data workflows using Prometheus, Grafana, Jaeger, and Fluentd to enable proactive troubleshooting and ensure SLAs.
- Proficiency in defining and monitoring data SLAs, data quality metrics, and lineage to ensure reliable production pipelines.
Programming Skills
- Proficient in Python for building and orchestrating data workflows, with additional experience in Rust or C++ for performance-critical components.
- Strong understanding of distributed system design and algorithmic principles required for scalable, fault-tolerant data processing.
- Knowledge of TypeScript for creating developer portals or integrations, e.g., with Backstage.
Collaboration & Learning
- Experience mentoring engineers and establishing standards for data infrastructure development, monitoring, and security.
- Strong communication skills to articulate and document architectural decisions and complex workflows for technical and non-technical stakeholders.
Nice to Have’s
- Educational Background: Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related fields.
- Stream-First Systems: Deep knowledge of Kappa architecture principles, with hands-on experience operating streaming systems under high throughput and low latency.
- Big Data: Experience working with petabyte-scale data and optimizing infrastructure to handle both batch and real-time analytics workloads.
- Data Governance: Familiarity with advanced data governance and compliance solutions, particularly AWS Lake Formation and Unity Catalog.
- AI/ML Workflow Automation: Experience integrating MLOps pipelines with annotation tools (e.g., LabelStudio), LLM gateways (e.g., Kong AI Gateway), or agentic frameworks (e.g., LangGraph).
What We Offer
- Culture: Join a passionate, driven team that values collaboration, innovation, and having fun while making a difference.
- Impact: Be a key player in an early-stage innovation lab where your contributions directly influence the company's success and you get to help build from the ground up.
- Innovation: Work on cutting-edge AI solutions that solve real-world problems and shape the future of technology.
- Growth: Opportunity for personal and professional growth as the company scales.
- Flexible Work Culture: Benefit from a flexible work environment that promotes work-life balance and remote work.
- Competitive Compensation: Receive a competitive salary and benefits package, with eligibility for equity.
- Medical, Dental, Vision Insurance
- 401k with Employer Contributions
- Paid Time Off
- Health Savings Account (HSA) Contributions with High Deductible Health Plan
- Short-Term/Long-Term Disability Insurance
- And more!
Compensation Range:
- $170,000 - $180,000 base compensation
- $26,000 - $36,000 variable compensation
Actual compensation offer to candidate may vary from posted hiring range based upon geographic location, work experience, education, and/or skill level. The pay ratio between base pay and target incentive (if applicable) will be finalized at offer.
At IR we celebrate, support, and thrive on difference for the benefit of our employees, our products, and our community. We are proud to be an Equal Employment Opportunity employer and encourage applications from all suitable candidates; we never discriminate based on race, religion, national origin, gender identity or expression, sexual orientation, age, or marital, veteran, or disability status.
Job Tags
Temporary work, Work experience placement, Flexible hours,