We are looking for paper reviewers for the NeurIPS 2026 conference workshop, titled: “From Copilots to Co-scientists in Omics: Evaluating and Governing Tool-Using AI Agents”. If you are a researcher (of any level) in the field of ML, AI, Computer Science, Omics, Genomics, Systems Biology, or any other related field and would like to help us review submissions during September 2026, please join our team! A reviewer will not be expected to assess more than 3 submissions, and it will be a great way to gain exposure to the boldest ideas shaping the next paradigm shift in how biological research is conducted.
Workshop description:
In 2026, with fast-paced advances in AI research, our power to derive insight from this data is only increasing. Agentic tools can now act as co-scientists, planning and executing on multi-step bioinformatics analyses that encompass the breadth of bioinformatics tools, synthesising across transcriptomics, single-cell and multi-omics workflows and datasets - however, they come with their own challenges: long-context handling, reproducibility, data privacy, the absence of ground-truth, vast hypothesis spaces and incomplete data. At the cusp of this paradigm shift, when benchmarks have yet to be established and know-how is unevenly distributed and siloed, this workshop will bring together the omics community to address three core areas:
The submissions will fall into the following 3 main categories:
(1) Capability and Limitations of AI agents
• What AI innovations in other domains have yet to be applied to omics?
• How do agents expand the space of feasible analysis?
(2) Co-scientist: Integration into workflows
• How can human-in-the-loop workflows be designed to ensure the agent-human collaboration provides more than the sum of its parts?
• What are the risks AI agents bring to current human workflows and how can these be overcome?
• What is bottlenecking current adoption (in academia and in industry)?
(3) Benchmarking, Evaluation and Reporting Standards
• How should we evaluate end-to-end agent performance in scientific workflows?
• What metrics capture scientific correctness, not just code correctness?
• How can we design benchmarks that include task completion, robustness to dataset shift and calibration?
• What constitutes a reproducible agent-driven analysis (run-to-run reproducibility)?
• Which elements (prompts, tool versions, execution traces) must be reported?
• How can agents be encouraged to produce transparent and auditable outputs?
Please email Hanna Szafranska on hannas@mrclmb.ac.uk or n.ezaz-nikapy@imperial.ac.uk if you’re interested or if you have any questions.