Aiding Evaluator Ethical Decision Making About How and When to Use AI in Evaluation

Authors

DOI:

https://doi.org/10.18357/cjpe.2026.40.1.1223

Keywords:

artificial intelligence, program evaluation, decision making, responsible AI, AIDEM

Abstract

The public launch of ChatGPT in late 2022 led to a surge of Artificial Intelligence (AI) models and tools being released on the market. Many of these large language models (e.g., ChatGPT-4, Gemini 2.0, etc.) and tools (e.g., Notebook LM, CoLoop) offer potential to support evaluation practice. Use of AI models and tools, however, also pose practical and ethical challenges. In this paper we describe the AI in Evaluation Decision-Making (AIDEM) Framework, a six-step decision-making framework to help evaluators make decisions about when and how to use AI tools. We first explore a range of decision-making frameworks, then describe the AIDEM Framework, and finally illustrate its use in a real-world case scenario.

References

American Evaluation Association (AEA) (2018). Evaluator competencies. https://www.eval.org/Portals/0/Docs/AEA%20Evaluator%20Competencies.pdf

Ashwin, J., Chhabra, A., & Rao, V. (2025). Using large language models for qualitative analysis can introduce serious bias. Sociological Methods & Research, 0(0), 1–45. https://doi.org/10.1177/00491241251338246 DOI: https://doi.org/10.1177/00491241251338246

Arkes, H. R., & Ayton, P. (1999). The sunk cost and Concorde effects: Are humans less rational than lower animals? Psychological Bulletin, 125(5), 591. DOI: https://doi.org/10.1037/0033-2909.125.5.591

Azzam, T. (2023). Artificial intelligence and validity. New Directions for Evaluation, 2023(178-179), 85–95. https://doi.org/10.1002/ev.20565 DOI: https://doi.org/10.1002/ev.20565

Bruzzese, S., Blanc, S., & Brun, F. (2023). The decision trees method to support the choice of economic evaluation procedure: The case of protection forests. Forest Science, 69(3), 241–253. https://doi.org/10.1093/forsci/fxac062 DOI: https://doi.org/10.1093/forsci/fxac062

Canadian Evaluation Society (CES) (2024a). Guidance for Ethical Evaluation Practice. https://evaluationcanada.ca/career/ethical-guidance.html

Canadian Evaluation Society (CES) (2024b). Competencies for Canadian evaluation practice. https://evaluationcanada.ca/career/evaluator-competencies.html

Christie, C. A., & Fleischer, D. N. (2010). Insight into evaluation practice: A content analysis of designs and methods used in evaluation studies published in North American evaluation-focused journals. American Journal of Evaluation, 31(3), 326–346. https://doi.org/10.1177/1098214010369170 DOI: https://doi.org/10.1177/1098214010369170

Davidson, E. J. (2005). Evaluation methodology basics: The nuts and bolts of sound evaluation. Sage. https://doi.org/10.4135/9781452230115 DOI: https://doi.org/10.4135/9781452230115

Davidson, E. J., Chianca, T., Dulieu, N., & Sigdel, A. (2025). Rubrics methodology in detail: Helping save the children turn children's experiences of discrimination and exclusion into rich, trackable outcomes. Journal of MultiDisciplinary Evaluation, 21(49), 38–55. https://doi.org/10.56645/jmde.v21i49.1047 DOI: https://doi.org/10.56645/jmde.v21i49.1047

Dehingia, N., Porqueras, E.B., Huynh, U., Almanzar, M., Iacoella, F., & Bruckauf, Z. (2024). An operational framework for machine learning in evaluation. United Nations Children’s Fund. https://www.unicef.org/evaluation/media/5606/file/ML%20Operational%20Framework%20FINAL.pdf.pdf

Dobbs, C. L., Ippolito, J., & Charner-Laird, M. (2017). Scaling up professional learning: Technical expectations and adaptive challenges. Professional Development in Education, 43(5), 729–748. https://doi.org/10.1080/19415257.2016.1238834 DOI: https://doi.org/10.1080/19415257.2016.1238834

Ferretti, S. (2023). Hacking by the prompt: Innovative ways to utilize ChatGPT for evaluators. New Directions for Evaluation, 2023(178–179), 73–84. https://doi.org/10.1002/ev.20557 DOI: https://doi.org/10.1002/ev.20557

Fjeld, J., Achten, N., Hilligoss, H., Nagy, A., & Srikumar, M. (2020). Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. (Berkman Klein Center Research Publication No. 2020-1). https://ssrn.com/abstract=3518482 or http://dx.doi.org/10.2139/ssrn.3518482 DOI: https://doi.org/10.2139/ssrn.3518482

Forester-Miller, H., & Davis, T. E. (1995). A practitioner's guide to ethical decision making. American Counseling Association. https://www.counseling.org/docs/default-source/ethics/practioner-39-s-guide-to-ethical-decision-making.pdf

Franzen, S., Quang, C., Schweizer, L., Budzier, A., Gold, J., Vellez, M., Ramirez, S., and Raimondo, E. (2022). Advanced content analysis: Can artificial intelligence accelerate theory-driven complex program evaluation? World Bank. https://www.sidalc.net/search/Record/dig-okr-1098637117/Description DOI: https://doi.org/10.1596/37117

Frierson, H. T., Hood, S., Hughes, G. B., & Thomas, V. G. (2010). A guide to conducting culturally-responsive evaluations. In J. Frechtling (Ed.), The 2010 user-friendly handbook for project evaluation (pp. 75–96). National Science Foundation. https://evalu-ate.org/external-resource/doc-2010-nsfhandbook/

Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62(1), 451–482. https://doi.org/10.1146/annurev-psych-120709-145346 DOI: https://doi.org/10.1146/annurev-psych-120709-145346

Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103(4), 650–669. https://doi.org/10.1037/0033-295X.103.4.650 DOI: https://doi.org/10.1037/0033-295X.103.4.650

Greenstein, N., & Cho, S. W. (2025). Ethics and equity in data science for evaluators. In G. J. Petersson, S. Bohni Nielsen, & F. Mazzeo Rinaldi (Eds.), Artificial Intelligence and Evaluation: Emerging Technologies and Their Implications for Evaluation (pp. 56–77). Routledge. https://doi.org/10.4324/9781003512493-4 DOI: https://doi.org/10.4324/9781003512493-4

Hammond, J. S., Keeney, R. L., & Raiffa, H. (1998). The hidden traps in decision making. Harvard Business Review, 76(5), 47–58.

Head, C. B., Jasper, P., McConnachie, M., Raftree, L., & Higdon, G. (2023). Large language model applications for evaluation: Opportunities and ethical implications. New Directions for Evaluation, 2023(178–179), 33–46. https://doi.org/10.1002/ev.20556 DOI: https://doi.org/10.1002/ev.20556

Heifetz, R. A., Grashow, A., & Linsky, M. (2009). The practice of adaptive leadership: Tools and tactics for changing your organization and the world. Harvard Business Press.

Higdon, G. L., & Raftree, L. (2025, April). Tool for assessing AI vendors: A resource for decision-makers in international development, humanitarian, and social impact. MERL Tech. https://merltech.org/new-resource-tool-for-assessing-ai-vendors/

Hood, S., Hopson, R., & Kirkhart, K. (2015). Culturally responsive evaluation: Theory, practice, and future. In K. Newcomer, H. Hatry, & J. Wholey. (Eds.), Handbook of practical program evaluation (4th edition). Wiley. DOI: https://doi.org/10.1002/9781119171386.ch12

House, E. R. (1995). Putting things together coherently: Logic and justice. New Directions for Evaluation, 1995(68), 33–48. https://doi.org/10.1002/ev.1018 DOI: https://doi.org/10.1002/ev.1018

House, E., & Howe, K. R. (1999). Values in evaluation and social research. Sage Publications. DOI: https://doi.org/10.4135/9781452243252

Huotari, R. (2010). Viewpoint on ethical reflection in evaluation practice in multiactor networks. Journal of Multidisciplinary Evaluation, 6(14), 114–127. https://doi.org/10.56645/jmde.v6i14.268 DOI: https://doi.org/10.56645/jmde.v6i14.268

Individuals with Disabilities Education Act, 20 U.S.C. § 1419 (2004).

Individuals with Disabilities Education Act, 20 U.S.C. § 1431 (2004).

Iyengar, S. S., & Lepper, M. R. (2000). When choice is demotivating: Can one desire too much of a good thing? Journal of Personality and Social Psychology, 79(6), 995–1006. https://doi.org/10.1037/0022-3514.79.6.995 DOI: https://doi.org/10.1037/0022-3514.79.6.995

Keeney, R. L., & Raiffa, H. (1993). Decisions with multiple objectives: Preferences and value trade-offs. Cambridge University Press. DOI: https://doi.org/10.1017/CBO9781139174084

King A.C., Doueiri Z.N., Kaulberg A., & Goldman Rosas L. (2025). The promise and perils of artificial intelligence in advancing participatory science and health equity in public health. JMIR Public Health Surveillance, 11. https://doi.org/10.2196/65699 DOI: https://doi.org/10.2196/65699

King, J. A., & Stevahn, L. (2013). Interactive evaluation practice: Mastering the interpersonal dynamics of program evaluation. Sage. DOI: https://doi.org/10.4135/9781452269979

Kundin, D. M. (2010). A conceptual framework for how evaluators make everyday practice decisions. American Journal of Evaluation, 31(3), 347–362. https://doi.org/10.1177/1098214010366048 DOI: https://doi.org/10.1177/1098214010366048

Kwong, E., Barber, R. G., Chinn, H., & Ramirez, R. (2025, May 7). Why the true water footprint of AI is so elusive. National Public Radio. https://www.npr.org/2025/05/07/1249592906/energy-water-ai-climate-tech

Loewenstein, G. (2001). The creative destruction of decision research. Journal of

Consumer Research, 28(3), 499–505. https://doi.org/10.1086/323738 DOI: https://doi.org/10.1086/323738

Mason, S. (2023). Finding a safe zone in the highlands: Exploring evaluator competencies in the world of AI. New Directions for Evaluation, 2023(178-179), 11–22. https://doi.org/10.1002/ev.20561 DOI: https://doi.org/10.1002/ev.20561

Mason, S. & Montrosse-Moorhead, B. (2023). Editors Notes. New Directions for Evaluation, 2023(178-179), 123–134. https://doi.org/10.1002/ev.20563

Manson, H. M. (2012). The development of the CoRE-Values framework as an aid to ethical decision-making. Medical Teacher, 34(4), e258–e268. https://doi.org/10.3109/0142159X.2012.660217 DOI: https://doi.org/10.3109/0142159X.2012.660217

Microsoft. (n.d.). What is copilot? https://www.microsoft.com/en-us/microsoft-copilot/copilot-101/what-is-copilot

Microsoft. (2025, March 09). Data, privacy, and security for Microsoft 365 copilot. https://learn.microsoft.com/en-us/microsoft-365/copilot/microsoft-365-copilot-privacy

Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision making be improved? Perspectives on Psychological Science, 4(4), 379–383. https://doi.org/10.1111/j.1745-6924.2009.01142.x DOI: https://doi.org/10.1111/j.1745-6924.2009.01142.x

Montrosse‐Moorhead, B. (2023). Evaluation criteria for artificial intelligence. New Directions for Evaluation, 2023(178-179), 123–134. https://doi.org/10.1002/ev.20566 DOI: https://doi.org/10.1002/ev.20566

Munda, G. (2004). Social multi-criteria evaluation: Methodological foundations and operational consequences. European Journal of Operational Research, 158(3), 662–677. https://doi.org/10.1016/S0377-2217(03)00369-2 DOI: https://doi.org/10.1016/S0377-2217(03)00369-2

Nielsen, S. B. (2023). Disrupting evaluation? Emerging technologies and their implications for the evaluation industry. New Directions for Evaluation, 2023(178-179), 47–57. https://doi.org/10.1002/ev.20558 DOI: https://doi.org/10.1002/ev.20558

Nielsen, S., Mazzeo Rinaldi, F., & Petersson, G.J. (Eds.). (2024). Artificial intelligence and evaluation: Emerging technologies and their implications for evaluation (1st ed.). Routledge. https://doi.org/10.4324/9781003512493 DOI: https://doi.org/10.4324/9781003512493-1

Ofosu-Asare, Y. (2025). Cognitive imperialism in artificial intelligence: Counteracting bias with indigenous epistemologies. AI & Soc, 40, 3045–3061. https://doi.org/10.1007/s00146-024-02065-0 DOI: https://doi.org/10.1007/s00146-024-02065-0

Oster, E. (2021). The family firm: A data-driven guide to better decision making in the early school years. Penguin.

Patel, D. (2023). Revolutionizing program evaluation with Generative AI: An evidence-based methodology. International Journal for Multidisciplinary Research, 5(3). https://doi.org/10.36948/ijfmr.2023.v05i03.4105 DOI: https://doi.org/10.36948/ijfmr.2023.v05i03.4105

Qamar, M. T., Sohail, S. S., Ansari, G., & Saxena, C. (2024). The language of nuance: Exploring the limits of large language models in handling ambiguity. In Sachdeva, S., Watanobe, Y., Bhalla, S. (Eds.), Big Data Analytics in Astronomy, Science, and Engineering. BDA 2024. Lecture Notes in Computer Science, Vol 15546. Springer, Cham. https://doi.org/10.1007/978-3-031-86193-2_12 DOI: https://doi.org/10.1007/978-3-031-86193-2_12

Queensland Education Department (n.d.). Decision-making framework. https://alt-qed.qed.qld.gov.au/our-publications/managementandframeworks/Documents/decision-making-framework.pdf

Redelmeier, D. A., & Shafir, E. (1995). Medical decision making in situations that offer multiple alternatives. JAMA, 273(4), 302–305. https://doi.org/10.1001/jama.1995.03520280048038 DOI: https://doi.org/10.1001/jama.1995.03520280048038

Reichwein, B. (2025). SWOT analysis. Better Evaluation knowledge platform. https://www.betterevaluation.org/methods-approaches/methods/swot-analysis

Reid, A. M. (2023). Vision for an equitable AI world: The role of evaluation and evaluators to incite change. New Directions for Evaluation, 2023(178–179), 111–121. https://doi.org/10.1002/ev.20559 DOI: https://doi.org/10.1002/ev.20559

Rovner, H., Molina, E., & Barron Rodriguez, M. R. (2025, June 25). The hidden cost of our AI habits: Choosing the right AI for the job–because not every task needs maximum power, and power takes a toll. MERL Tech. https://merltech.org/the-hidden-cost-of-our-ai-habits/

Sabarre, N. R., Beckmann, B., Bhaskara, S., & Doll, K. (2023). Using AI to disrupt business as usual in small evaluation firms. New Directions for Evaluation, 2023(178–179), 59–71. https://doi.org/10.1002/ev.20562 DOI: https://doi.org/10.1002/ev.20562

Schwandt, T. (2015). Evaluation foundations revisited: Cultivating a life of the mind for practice. Stanford University Press. https://doi.org/10.1515/9780804795722 DOI: https://doi.org/10.1515/9780804795722

Scriven, M. (1981a). The "weight and sum" methodology. Evaluation News, 2(1), 85–90. https://doi.org/10.1177/109821408100200124 DOI: https://doi.org/10.1177/109821408100200124

Scriven, M. (1981b). The logic of evaluation. Edgepress.

Shapiro, S., & Vinh, L. (2024). Artificial intelligence in program evaluation: Insights and applications. Canadian Journal of Program Evaluation, 39(2), 382–391. https://doi.org/10.3138/cjpe-2024-0027 DOI: https://doi.org/10.3138/cjpe-2024-0027

Simon, H. A. (1955). A behavioral model of rational choice. The Quarterly Journal of Economics, 69(1) 99–118. https://doi.org/10.2307/1884852 DOI: https://doi.org/10.2307/1884852

Simon, H. A. (1987). Satisficing. In Eatwell, J., Milgate, M., & Newman, P. (Eds.), The New Palgrave Dictionary of Economics (1st ed., pp. 243–245). Palgrave Macmillan. DOI: https://doi.org/10.1057/978-1-349-95121-5_1767-1

Smith, T. L., Barlow, P. B., Skolits, G. J., & Peters, J. M. (2015). Demystifying reflective practice: Using the DATA model to enhance evaluators’ professional activities. Evaluation and Program Planning, 52, 142–147. https://doi.org/10.1016/j.evalprogplan.2015.04.004 DOI: https://doi.org/10.1016/j.evalprogplan.2015.04.004

Thaler, R. H., & Sunstein, C. R. (2009). Nudge: improving decisions about health, wealth, and happiness (Rev. and expanded ed.). Penguin Books.

Thornton, I. (2023). A special delivery by a fork: Where does artificial intelligence come from? New Directions for Evaluation, 2023(178-179), 23–32. https://doi.org/10.1002/ev.20560 DOI: https://doi.org/10.1002/ev.20560

Tilton, Z., LaVelle, J. M., Ford, T., & Montenegro, M. (2023). Artificial intelligence and the future of evaluation education: Possibilities and prototypes. New Directions for Evaluation, 2023(178-179), 97–109. https://doi.org/10.1002/ev.20564 DOI: https://doi.org/10.1002/ev.20564

Tovey, T. L. S., & Archibald, T. (2023). The relationship between reflective practice, evaluative thinking, and practical wisdom. In Hurteau, M., & Archibald, T. (Eds.), Practical wisdom for an ethical evaluation practice (pp. 87–101). Information Age Publishing. https://doi.org/10.1108/979-8-88730-088-7-20251011 DOI: https://doi.org/10.1108/979-8-88730-088-7-20251011

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. https://doi.org/10.1126/science.185.4157.1124 DOI: https://doi.org/10.1126/science.185.4157.1124

Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211(4481), 453–458. https://doi.org/10.1126/science.7455683 DOI: https://doi.org/10.1126/science.7455683

United Nations EG. (2025). UNEG Ethical Principles for Harnessing AI in United Nations Evaluations. United Nations Evaluation Group (UNEG). https://www.unevaluation.org/sites/default/files/file_uploads/2025_Ethical%20Principles%20for%20Using%20AI%20in%20UN%20Evaluations.pdf

United Nations FPA. (2024). GenAI-powered evaluation function at UNFPA: Strategy for leveraging the benefits of responsible and ethical generative artificial intelligence while minimizing risks. United Nations Population Fund (UNFPA) Independent Evaluation Office (IEO). https://www.unfpa.org/sites/default/files/admin-resource/IEO__GenAI_Strategy.pdf

United Nations System CEB (2022). Principles for the Ethical Use of Artificial Intelligence in the United Nations System. United Nations System Chief Executives Board for Coordination (CEB). https://www.ictworks.org/wp-content/uploads/2022/10/Principles-Ethical-Use-AI-UN-System.pdf

Vo, A. T., & Archibald, T. (2018). New directions for evaluative thinking. New Directions for Evaluation, 2018(158), 139–147. https://doi.org/10.1002/ev.20317 DOI: https://doi.org/10.1002/ev.20317

Wilkens, S. (2011). Beyond bumper sticker ethics: An introduction to theories of right and wrong. InterVarsity Press.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Sage.

Downloads

Published

2026-05-21

How to Cite

Mason, S., Melvin, O., David, T., & Montrosse-Moorhead, B. (2026). Aiding Evaluator Ethical Decision Making About How and When to Use AI in Evaluation. Canadian Journal of Program Evaluation, 40(1), 1–33. https://doi.org/10.18357/cjpe.2026.40.1.1223

Issue

Section

Thematic Segment