AI-Assisted Evaluations and EDI

How Temperature Settings Influence Reliability, Bias, and Inclusion

Authors

DOI:

https://doi.org/10.18357/cjpe.2026.40.1.23

Keywords:

AI-assisted evaluation, temperature settings, bias mitigation, reliability and variability, equity diversity inclusion

Abstract

This practice note examines the role of generative artificial intelligence (GenAI) temperature settings in program evaluation, particularly their impact on response consistency, variability, and implicit bias. Temperature settings influence the consistency and variability of generated outputs, affecting how evaluators interpret and use AI-assisted content. Lower temperature settings produce more deterministic responses, which can enhance reliability but may inadvertently reinforce dominant narratives, potentially limiting diverse perspectives. Conversely, higher temperature settings introduce variability, fostering creativity and increasing the risk of bias and inconsistency. Given the growing integration of AI-assisted methods in evaluation, this paper explores the implications of temperature settings for equitable and inclusive analysis. By examining temperature settings through an evaluation lens, this paper provides insights into mitigating bias, improving methodological rigor, and ensuring alignment with Equity, Diversity, and Inclusion (EDI) principles. The discussion offers practical guidance for evaluators on calibrating AI parameters to enhance transparency, fairness, and reliability in data-driven decision-making. This practice note aligns these technical considerations with established evaluation principles, including rigor, credibility, and culturally responsive and equitable practice.

References

Abdali, S., Anarfi, R., Barberan, C., & He, J. (2024). Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices. arXiv Preprint arXiv:2403.12503.

Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016). Machine Bias. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Barocas, S., & Andrew, D. S. (2016). Big Data’s Disparate Impact. SSRN Scholarly Paper. https://doi.org/10.2139/ssrn.2477899 DOI: https://doi.org/10.2139/ssrn.2477899

Dastin, J. (2022). Amazon scraps secret AI recruiting tool that showed bias against women. In Ethics of data and analytics (pp. 296–299). Auerbach Publications. DOI: https://doi.org/10.1201/9781003278290-44

Elvidge, J., & Dawoud, D. (2024). Reporting standards to support cost-effectiveness evaluations of AI-driven health care. The Lancet Digital Health, 6(9), e602–e603. https://doi.org/10.1016/S2589-7500(24)00171-7 DOI: https://doi.org/10.1016/S2589-7500(24)00171-7

Khodyakov, D., Stockdale, S., Jones, A., Mango, J., Jones, F., & Lizaola, E. (2013). On Measuring Community Participation in Research. Health Education & Behavior, 40(3), 346–354. https://doi.org/10.1177/1090198112459050 DOI: https://doi.org/10.1177/1090198112459050

Lewis, J. I., Toney, A., & Shi, X. (2024). Climate change and artificial intelligence: Assessing the global research landscape. Discover Artificial Intelligence, 4(1), 1–12. https://doi.org/10.1007/s44163-024-00170-z DOI: https://doi.org/10.1007/s44163-024-00170-z

McLane, B. (2024). How to Evaluate Generative AI Models: Best Practices and Metrics. https://www.datastax.com/guides/how-to-evaluate-generative-ai-models

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2022). A Survey on Bias and Fairness in Machine Learning (arXiv:1908.09635). arXiv. https://doi.org/10.48550/arXiv.1908.09635 DOI: https://doi.org/10.1145/3457607

Nayeem, S. (2024). Predictive Analytics: Using AI to Forecast Student Performance and Outcomes. https://www.excelsoftcorp.com/blog-post/predictive-analytics-using-ai-to-forecast-student-performance-and-outcomes/

OECD. (2022). The impact of AI on public sector evaluations: Ethical and fairness considerations. https://one.oecd.org/document/C/MIN(2024)17/en/pdf

O’Neil, C. (2016). Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown Publishing Group.

Pan-Canadian Artificial Intelligence Strategy. (2024). Innovation, Science and Economic Development Canada. https://ised-isde.canada.ca/site/ai-strategy/en/pan-canadian-artificial-intelligence-strategy

Patel, M. (2024). The Ethics of AI Addressing Bias, Privacy, and Accountability in Machine Learning. https://www.cloudthat.com/resources/blog/cybersecurity-in-the-modern-world/

Pretsch, J., Pretsch, E., & Fuchs, M. (2024). Artificial Creativity–Early Analyses of LLMs’ Creative Approaches. European Journal of Social Science Education and Research, 11(1), 140–166. DOI: https://doi.org/10.26417/qagjr841

Ramlochan, S. (2024). Complete Guide to Prompt Engineering with Temperature and Top-p. Prompt Engineering. https://promptengineering.org/prompt-engineering-with-temperature-and-top-p/

Salamone, L. (2024). What is Temperature in NLP? https://lukesalamone.github.io/posts/what-is-temperature/

Treasury Board of Canada Secretariat. (2024). Responsible use of artificial intelligence in government. https://www.canada.ca/en/government/system/digital-government/digital-government-innovations/responsible-use-ai.html

Zuiderwijk, A., Chen, Y.-C., & Salem, F. (2021). Implications of the use of artificial intelligence in public governance: A systematic literature review and a research agenda. Government Information Quarterly, 38(3), 101577. https://doi.org/10.1016/j.giq.2021.101577 DOI: https://doi.org/10.1016/j.giq.2021.101577

Downloads

Published

2026-05-21

How to Cite

Shapiro, S., & Khalid, A. K. M. I. (2026). AI-Assisted Evaluations and EDI: How Temperature Settings Influence Reliability, Bias, and Inclusion. Canadian Journal of Program Evaluation, 40(1), 1–11. https://doi.org/10.18357/cjpe.2026.40.1.23

Issue

Section

Research and Practice Notes