Cleveland, Ohio, 44113

Apply now
Full Time
Remote
$160000.00 Annual

Job description

Data Lead
Remote
$160,000-210,000

The Data Lead will play a crucial role in architecting and sustaining our data landscape, encompassing ETL pipelines, vector databases, and retrieval systems tailored for RAG-based applications. This position will oversee data quality, governance, and performance enhancement initiatives, ensuring our platform provides precise, scalable, and cost-effective data-driven solutions.

Responsibilities of the Data Lead

  • Data Engineering: Proficient in SQL and Python, with expertise in designing ETL workflows and normalizing/cleaning data.
  • Vector Databases & Retrieval: Experience with platforms like Pinecone, Weaviate, Milvus, or pgvector, and knowledge of indexing strategies such as HNSW, IVF, and PQ.
  • RAG (Retrieval Augmented Generation): Crafting retrieval methodologies including chunking, embedding selection, and re-ranking.
  • Embedding Models: Competence in selecting and assessing embedding models tailored for domain-specific applications.
  • Data Modeling & Knowledge Graphs: Familiarity with enhancing connections between structured and unstructured data (preferred but not essential).
  • Data Quality & Governance: Establishing benchmarks for metadata management, access controls, data lineage, and data freshness.
  • Performance Optimization: Assessing and tuning variables like latency, recall/precision, and balancing cost/performance ratios.

Requirements for the Data Lead

  • Over 6 years of experience in data engineering, data platform management, or related ML data roles.
  • Exceptional skills in SQL and Python for ETL processes and data manipulation.
  • Experience with vector database technologies like Pinecone, Weaviate, Milvus, and pgvector.
  • Demonstrated proficiency in developing retrieval pipelines for RAG applications.
  • In-depth knowledge of embedding models and their assessment criteria.
  • Awareness of data quality and governance principles.
  • Capacity to enhance systems for improved latency, accuracy, and cost-effectiveness.

#ZR

X Icon Instagram Icon Facebook Icon LinkedIn Icon
Stay in the Know With STEMHUNTER
Back to Top Data Lead