AI/ML Solutions Architect
Software Engineering, IT, Data Science
Remote
US Citizenship is required
In person interview is required.
The Judge Group is seeking an AI/ML Solutions Architect (Agentic Systems) to design and deliver secure, scalable, and cost-effective cloud-native AI solutions for federal clients. You will bridge complex mission needs and modern technology by owning end-to-end architectures and leading hands-on implementation—especially for agentic AI systems, RAG-based applications, and production-grade ML pipelines.
This role blends technical vision + practical delivery leadership: selecting tools, defining architectures, establishing engineering standards, and guiding implementation through prototypes, code reviews, and reference solutions.
What You Will Do:
- Design and implement agentic AI systems that enable autonomous decision-making, workflow orchestration, and mission process optimization—with appropriate guardrails and human oversight.
- Develop Generative AI applications for summarization, extraction, predictive insights, and conversational interfaces.
- Build and maintain scalable data pipelines integrating structured + unstructured data to support analytics and AI workloads.
- Apply advanced statistical and machine learning techniques to decision support and policy/program evaluation.
- Lead AI initiatives spanning:
- Retrieval-Augmented Generation (RAG) and evaluation
- Re-ranking strategies and retrieval quality optimization
- Prompt engineering, safety patterns, and defensive design
- Knowledge graph integration and graph-enhanced retrieval
- AI chatbots and conversational agents
- Fine-tune embeddings and LLMs (when appropriate) to improve domain performance, accuracy, robustness, and retrieval quality.
- Build entity graphs using entity resolution (matching, deduplication, linking, relationship discovery) to enable graph analytics and enhanced retrieval.
- Collaborate across engineering, security, and stakeholders to prototype rapidly, iterate responsibly, and deliver mission-ready outcomes.
- Lead deployment in AWS-first cloud environments, leveraging Infrastructure-as-Code, DevOps/DevSecOps, and operational excellence patterns.
- You will own and drive the technical foundation and delivery rigor for mission AI solutions:
- End-to-end solution architecture: system boundaries, trust zones, data flows, integrations/APIs, security controls, observability, and cost models.
- Tooling and platform selection: LLMs, embeddings, vector stores, orchestration frameworks, graph technologies, data platforms—documenting tradeoffs and decisions.
- Engineering and delivery standards: secure SDLC, CI/CD quality gates, automated testing, code review practices, evaluation harnesses, and production readiness checklists.
- Hands-on technical leadership: prototypes, reference implementations, PR reviews, mentoring, and architecture governance to ensure delivery quality.
What You Will Need:
- Must be able to OBTAIN and MAINTAIN a Federal or DoD "PUBLIC TRUST"; candidates must obtain approved adjudication of their PUBLIC TRUST prior to onboarding. Candidates with an ACTIVE PUBLIC TRUST or SUITABILITY are preferred.
- Bachelor’s degree in Engineering, IT, Computer Science, or related field (or equivalent experience).
- Minimum EIGHT (8) years in solutions architecture, software engineering, data engineering, and/or applied ML with a track record of delivering production systems. A Master’s degree may be substituted for up to 2 years of relevant professional experience.
- Strong Python proficiency and strong SQL skills (data modeling, query optimization).
- Experience designing and delivering cloud-based AI/ML solutions end-to-end (ingestion → modeling → deployment → monitoring) in secure environments.
- Hands-on experience with AI application frameworks such as LangChain, Haystack, crewAI, or similar.
- Strong knowledge of core Python ML/data libraries: NumPy, Pandas, Scikit-learn, NLTK, OpenCV.
- Familiarity with deep learning frameworks such as PyTorch or TensorFlow.
- Experience with search technologies such as Elasticsearch or OpenSearch.
- Experience with relational databases (e.g., PostgreSQL, Oracle) and in-memory analytics engines (e.g., DuckDB).
- Experience using cloud SDKs (e.g., Boto3) and building reliable integrations with cloud services.
- Familiarity with agentic AI frameworks such as AWS Strands Agents, PydanticAI, and related orchestration patterns.
- Advanced prompt engineering skills for complex tasks beyond code generation (reasoning workflows, guardrails, evaluations).
- Experience with asynchronous Python development (asyncio patterns, concurrency, reliability).
- Experience with MCP servers and tool-calling within agentic workflows (tool governance, reliability, and security considerations).