Cleveland, Ohio, 44113
Job description
Data Lead
Remote
$160,000-210,000
The Data Lead will play a crucial role in architecting and sustaining our data landscape, encompassing ETL pipelines, vector databases, and retrieval systems tailored for RAG-based applications. This position will oversee data quality, governance, and performance enhancement initiatives, ensuring our platform provides precise, scalable, and cost-effective data-driven solutions.
Responsibilities of the Data Lead
- Data Engineering: Proficient in SQL and Python, with expertise in designing ETL workflows and normalizing/cleaning data.
- Vector Databases & Retrieval: Experience with platforms like Pinecone, Weaviate, Milvus, or pgvector, and knowledge of indexing strategies such as HNSW, IVF, and PQ.
- RAG (Retrieval Augmented Generation): Crafting retrieval methodologies including chunking, embedding selection, and re-ranking.
- Embedding Models: Competence in selecting and assessing embedding models tailored for domain-specific applications.
- Data Modeling & Knowledge Graphs: Familiarity with enhancing connections between structured and unstructured data (preferred but not essential).
- Data Quality & Governance: Establishing benchmarks for metadata management, access controls, data lineage, and data freshness.
- Performance Optimization: Assessing and tuning variables like latency, recall/precision, and balancing cost/performance ratios.
Requirements for the Data Lead
- Over 6 years of experience in data engineering, data platform management, or related ML data roles.
- Exceptional skills in SQL and Python for ETL processes and data manipulation.
- Experience with vector database technologies like Pinecone, Weaviate, Milvus, and pgvector.
- Demonstrated proficiency in developing retrieval pipelines for RAG applications.
- In-depth knowledge of embedding models and their assessment criteria.
- Awareness of data quality and governance principles.
- Capacity to enhance systems for improved latency, accuracy, and cost-effectiveness.
#ZR