Machine Learning/ AI Engineer
100% Remote
Long Term
Job Description
Summary of the project/initiatives which describes what’s being done:
- Build, modernize and maintain the AI/ML Platform & related frameworks / solutions.
- Participate and contribute in architecture & design reviews.
- Build/Deploy AI/ML platform in Azure with open-source applications (Argo, Jupyter Hub/Kubeflow) and/or cloud/SaaS solutions (Azure ML, Databricks).
- You will design, develop, test, deploy, and maintain distributed & GPU-enabled Machine Learning Pipelines using K8s/AKS based Argo Workflow Orchestration solutions, while collaborating with Data Scientists.
- Enable/Support platform to do distributed data processing using Apache Spark and other distributed / scale technologies.
- Build ETL pipelines, ingress / egress methodologies in context to AIML use-cases.
- Build highly scalable backend REST APIs for metadata management and other misc. business needs.
- Deploy Application in Azure Kubernetes Service using GitLab, Jenkins, Docker, Kubectl, Helm and Manifest
- Experience in branching, tagging, and maintaining the versions across different environments in GitLab.
- Review code developed by other developers and provide feedback to ensure best practices (e.g., design patterns, accuracy, testability, efficiency etc.)
- Work with relevant engineering, operations, business lines, and infrastructure groups to ensure effective architectures and designs and communicate findings clearly to technical and non-technical partners.
- Perform functional, benchmark & performance testing and tuning to achieve performant AIML workflow(s), interactive notebook user experiences, and pipelines.
- Assess, design & optimize the resources capacities for ML based resource (GPU) intensive workloads.
- Communicate processes and results of the application with all parties involved in the product team, like engineers, product owner, scrum master and third-party vendors.
Top 5-10 Responsibilities For This Position
- Experience developing AIML platforms & frameworks (including core offerings such as model training, inferencing, distributed/parallel programming), preferably on Kubernetes and native cloud.
- Highly skilled with Python or JAVA programming languages
- Highly skilled with database languages like SQL & NoSQL
- Experience designing, developing, and deploying highly maintainable, extensible, and testable distributed applications using Python and other languages.
- Experience developing ETL pipelines and REST APIs in Python using Flask or Django
- Experienced with technologies/frameworks including Kubernetes, Helm Charts, Notebooks, Workflow orchestration tools, and CI/CD & monitoring frameworks.
Basic Qualifications
- Bachelor’s/master’s degree in computer science or data science
- 6 – 8 years of experience in software development and with data structures/algorithms
Required Technical Qualifications / Skills
- Experience with AI/ML open-source projects in large datasets using Jupyter, Argo, Spark, Pytorch, TensorFlow
- Experience creating Unit and Functional test cases using PyTest, UnitTest
- Experience with training and tuning models in Machine Learning
- Experience working with Jupyter Hub
- Experience with DB management system like PostgreSQL
- Experience in searching, monitoring, and analyzing logs using Splunk/Kibana
- GraphQL/Swagger implementation knowledge
- Strong understanding and experience with Kubernetes for availability and scalability of applications in Azure Kubernetes Service
- Experience building CI/CD pipelines using Cloudbees Jenkins, Docker, Artifactory, Kubernetes, Helm Charts and Gitlab
- Experience with tools like Jupyter Hub, Kubeflow, MLFlow, TensorFlow, Scikit, Apache Spark, Kafka
- Experience with workflow orchestration tools such as Apache Airflow, Argo workflows
- Familiarity with Conda, PyPi, and Node.js package builds
Preferred Qualifications / Skills
- Experience with AI/ML open-source projects in large datasets using Jupyter, Argo, Spark, Pytorch, TensorFlow
- Experience creating Unit and Functional test cases using PyTest, UnitTest
- Experience with training and tuning models in Machine Learning
- Experience working with Jupyter Hub
- Experience with DB management system like PostgreSQL
- Experience in searching, monitoring, and analyzing logs using Splunk/Kibana
- GraphQL/Swagger implementation knowledge
- Strong understanding and experience with Kubernetes for availability and scalability of applications in Azure Kubernetes Service
- Experience building CI/CD pipelines using Cloudbees Jenkins, Docker, Artifactory, Kubernetes, Helm Charts and Gitlab
- Experience with tools like Jupyter Hub, Kubeflow, MLFlow, TensorFlow, Scikit, Apache Spark, Kafka
- Experience with workflow orchestration tools such as Apache Airflow, Argo workflows
- Familiarity with Conda, PyPi, and Node.js package builds