Aiding Evaluator Ethical Decision Making About How and When to Use AI in Evaluation
DOI:
https://doi.org/10.18357/cjpe.2026.40.1.1223Keywords:
artificial intelligence, program evaluation, decision making, responsible AI, AIDEMAbstract
The public launch of ChatGPT in late 2022 led to a surge of Artificial Intelligence (AI) models and tools being released on the market. Many of these large language models (e.g., ChatGPT-4, Gemini 2.0, etc.) and tools (e.g., Notebook LM, CoLoop) offer potential to support evaluation practice. Use of AI models and tools, however, also pose practical and ethical challenges. In this paper we describe the AI in Evaluation Decision-Making (AIDEM) Framework, a six-step decision-making framework to help evaluators make decisions about when and how to use AI tools. We first explore a range of decision-making frameworks, then describe the AIDEM Framework, and finally illustrate its use in a real-world case scenario.
References
American Evaluation Association (AEA) (2018). Evaluator competencies. https://www.eval.org/Portals/0/Docs/AEA%20Evaluator%20Competencies.pdf
Ashwin, J., Chhabra, A., & Rao, V. (2025). Using large language models for qualitative analysis can introduce serious bias. Sociological Methods & Research, 0(0), 1–45. https://doi.org/10.1177/00491241251338246 DOI: https://doi.org/10.1177/00491241251338246
Arkes, H. R., & Ayton, P. (1999). The sunk cost and Concorde effects: Are humans less rational than lower animals? Psychological Bulletin, 125(5), 591. DOI: https://doi.org/10.1037/0033-2909.125.5.591
Azzam, T. (2023). Artificial intelligence and validity. New Directions for Evaluation, 2023(178-179), 85–95. https://doi.org/10.1002/ev.20565 DOI: https://doi.org/10.1002/ev.20565
Bruzzese, S., Blanc, S., & Brun, F. (2023). The decision trees method to support the choice of economic evaluation procedure: The case of protection forests. Forest Science, 69(3), 241–253. https://doi.org/10.1093/forsci/fxac062 DOI: https://doi.org/10.1093/forsci/fxac062
Canadian Evaluation Society (CES) (2024a). Guidance for Ethical Evaluation Practice. https://evaluationcanada.ca/career/ethical-guidance.html
Canadian Evaluation Society (CES) (2024b). Competencies for Canadian evaluation practice. https://evaluationcanada.ca/career/evaluator-competencies.html
Christie, C. A., & Fleischer, D. N. (2010). Insight into evaluation practice: A content analysis of designs and methods used in evaluation studies published in North American evaluation-focused journals. American Journal of Evaluation, 31(3), 326–346. https://doi.org/10.1177/1098214010369170 DOI: https://doi.org/10.1177/1098214010369170
Davidson, E. J. (2005). Evaluation methodology basics: The nuts and bolts of sound evaluation. Sage. https://doi.org/10.4135/9781452230115 DOI: https://doi.org/10.4135/9781452230115
Davidson, E. J., Chianca, T., Dulieu, N., & Sigdel, A. (2025). Rubrics methodology in detail: Helping save the children turn children's experiences of discrimination and exclusion into rich, trackable outcomes. Journal of MultiDisciplinary Evaluation, 21(49), 38–55. https://doi.org/10.56645/jmde.v21i49.1047 DOI: https://doi.org/10.56645/jmde.v21i49.1047
Dehingia, N., Porqueras, E.B., Huynh, U., Almanzar, M., Iacoella, F., & Bruckauf, Z. (2024). An operational framework for machine learning in evaluation. United Nations Children’s Fund. https://www.unicef.org/evaluation/media/5606/file/ML%20Operational%20Framework%20FINAL.pdf.pdf
Dobbs, C. L., Ippolito, J., & Charner-Laird, M. (2017). Scaling up professional learning: Technical expectations and adaptive challenges. Professional Development in Education, 43(5), 729–748. https://doi.org/10.1080/19415257.2016.1238834 DOI: https://doi.org/10.1080/19415257.2016.1238834
Ferretti, S. (2023). Hacking by the prompt: Innovative ways to utilize ChatGPT for evaluators. New Directions for Evaluation, 2023(178–179), 73–84. https://doi.org/10.1002/ev.20557 DOI: https://doi.org/10.1002/ev.20557
Fjeld, J., Achten, N., Hilligoss, H., Nagy, A., & Srikumar, M. (2020). Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. (Berkman Klein Center Research Publication No. 2020-1). https://ssrn.com/abstract=3518482 or http://dx.doi.org/10.2139/ssrn.3518482 DOI: https://doi.org/10.2139/ssrn.3518482
Forester-Miller, H., & Davis, T. E. (1995). A practitioner's guide to ethical decision making. American Counseling Association. https://www.counseling.org/docs/default-source/ethics/practioner-39-s-guide-to-ethical-decision-making.pdf
Franzen, S., Quang, C., Schweizer, L., Budzier, A., Gold, J., Vellez, M., Ramirez, S., and Raimondo, E. (2022). Advanced content analysis: Can artificial intelligence accelerate theory-driven complex program evaluation? World Bank. https://www.sidalc.net/search/Record/dig-okr-1098637117/Description DOI: https://doi.org/10.1596/37117
Frierson, H. T., Hood, S., Hughes, G. B., & Thomas, V. G. (2010). A guide to conducting culturally-responsive evaluations. In J. Frechtling (Ed.), The 2010 user-friendly handbook for project evaluation (pp. 75–96). National Science Foundation. https://evalu-ate.org/external-resource/doc-2010-nsfhandbook/
Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of Psychology, 62(1), 451–482. https://doi.org/10.1146/annurev-psych-120709-145346 DOI: https://doi.org/10.1146/annurev-psych-120709-145346
Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review, 103(4), 650–669. https://doi.org/10.1037/0033-295X.103.4.650 DOI: https://doi.org/10.1037/0033-295X.103.4.650
Greenstein, N., & Cho, S. W. (2025). Ethics and equity in data science for evaluators. In G. J. Petersson, S. Bohni Nielsen, & F. Mazzeo Rinaldi (Eds.), Artificial Intelligence and Evaluation: Emerging Technologies and Their Implications for Evaluation (pp. 56–77). Routledge. https://doi.org/10.4324/9781003512493-4 DOI: https://doi.org/10.4324/9781003512493-4
Hammond, J. S., Keeney, R. L., & Raiffa, H. (1998). The hidden traps in decision making. Harvard Business Review, 76(5), 47–58.
Head, C. B., Jasper, P., McConnachie, M., Raftree, L., & Higdon, G. (2023). Large language model applications for evaluation: Opportunities and ethical implications. New Directions for Evaluation, 2023(178–179), 33–46. https://doi.org/10.1002/ev.20556 DOI: https://doi.org/10.1002/ev.20556
Heifetz, R. A., Grashow, A., & Linsky, M. (2009). The practice of adaptive leadership: Tools and tactics for changing your organization and the world. Harvard Business Press.
Higdon, G. L., & Raftree, L. (2025, April). Tool for assessing AI vendors: A resource for decision-makers in international development, humanitarian, and social impact. MERL Tech. https://merltech.org/new-resource-tool-for-assessing-ai-vendors/
Hood, S., Hopson, R., & Kirkhart, K. (2015). Culturally responsive evaluation: Theory, practice, and future. In K. Newcomer, H. Hatry, & J. Wholey. (Eds.), Handbook of practical program evaluation (4th edition). Wiley. DOI: https://doi.org/10.1002/9781119171386.ch12
House, E. R. (1995). Putting things together coherently: Logic and justice. New Directions for Evaluation, 1995(68), 33–48. https://doi.org/10.1002/ev.1018 DOI: https://doi.org/10.1002/ev.1018
House, E., & Howe, K. R. (1999). Values in evaluation and social research. Sage Publications. DOI: https://doi.org/10.4135/9781452243252
Huotari, R. (2010). Viewpoint on ethical reflection in evaluation practice in multiactor networks. Journal of Multidisciplinary Evaluation, 6(14), 114–127. https://doi.org/10.56645/jmde.v6i14.268 DOI: https://doi.org/10.56645/jmde.v6i14.268
Individuals with Disabilities Education Act, 20 U.S.C. § 1419 (2004).
Individuals with Disabilities Education Act, 20 U.S.C. § 1431 (2004).
Iyengar, S. S., & Lepper, M. R. (2000). When choice is demotivating: Can one desire too much of a good thing? Journal of Personality and Social Psychology, 79(6), 995–1006. https://doi.org/10.1037/0022-3514.79.6.995 DOI: https://doi.org/10.1037/0022-3514.79.6.995
Keeney, R. L., & Raiffa, H. (1993). Decisions with multiple objectives: Preferences and value trade-offs. Cambridge University Press. DOI: https://doi.org/10.1017/CBO9781139174084
King A.C., Doueiri Z.N., Kaulberg A., & Goldman Rosas L. (2025). The promise and perils of artificial intelligence in advancing participatory science and health equity in public health. JMIR Public Health Surveillance, 11. https://doi.org/10.2196/65699 DOI: https://doi.org/10.2196/65699
King, J. A., & Stevahn, L. (2013). Interactive evaluation practice: Mastering the interpersonal dynamics of program evaluation. Sage. DOI: https://doi.org/10.4135/9781452269979
Kundin, D. M. (2010). A conceptual framework for how evaluators make everyday practice decisions. American Journal of Evaluation, 31(3), 347–362. https://doi.org/10.1177/1098214010366048 DOI: https://doi.org/10.1177/1098214010366048
Kwong, E., Barber, R. G., Chinn, H., & Ramirez, R. (2025, May 7). Why the true water footprint of AI is so elusive. National Public Radio. https://www.npr.org/2025/05/07/1249592906/energy-water-ai-climate-tech
Loewenstein, G. (2001). The creative destruction of decision research. Journal of
Consumer Research, 28(3), 499–505. https://doi.org/10.1086/323738 DOI: https://doi.org/10.1086/323738
Mason, S. (2023). Finding a safe zone in the highlands: Exploring evaluator competencies in the world of AI. New Directions for Evaluation, 2023(178-179), 11–22. https://doi.org/10.1002/ev.20561 DOI: https://doi.org/10.1002/ev.20561
Mason, S. & Montrosse-Moorhead, B. (2023). Editors Notes. New Directions for Evaluation, 2023(178-179), 123–134. https://doi.org/10.1002/ev.20563
Manson, H. M. (2012). The development of the CoRE-Values framework as an aid to ethical decision-making. Medical Teacher, 34(4), e258–e268. https://doi.org/10.3109/0142159X.2012.660217 DOI: https://doi.org/10.3109/0142159X.2012.660217
Microsoft. (n.d.). What is copilot? https://www.microsoft.com/en-us/microsoft-copilot/copilot-101/what-is-copilot
Microsoft. (2025, March 09). Data, privacy, and security for Microsoft 365 copilot. https://learn.microsoft.com/en-us/microsoft-365/copilot/microsoft-365-copilot-privacy
Milkman, K. L., Chugh, D., & Bazerman, M. H. (2009). How can decision making be improved? Perspectives on Psychological Science, 4(4), 379–383. https://doi.org/10.1111/j.1745-6924.2009.01142.x DOI: https://doi.org/10.1111/j.1745-6924.2009.01142.x
Montrosse‐Moorhead, B. (2023). Evaluation criteria for artificial intelligence. New Directions for Evaluation, 2023(178-179), 123–134. https://doi.org/10.1002/ev.20566 DOI: https://doi.org/10.1002/ev.20566
Munda, G. (2004). Social multi-criteria evaluation: Methodological foundations and operational consequences. European Journal of Operational Research, 158(3), 662–677. https://doi.org/10.1016/S0377-2217(03)00369-2 DOI: https://doi.org/10.1016/S0377-2217(03)00369-2
Nielsen, S. B. (2023). Disrupting evaluation? Emerging technologies and their implications for the evaluation industry. New Directions for Evaluation, 2023(178-179), 47–57. https://doi.org/10.1002/ev.20558 DOI: https://doi.org/10.1002/ev.20558
Nielsen, S., Mazzeo Rinaldi, F., & Petersson, G.J. (Eds.). (2024). Artificial intelligence and evaluation: Emerging technologies and their implications for evaluation (1st ed.). Routledge. https://doi.org/10.4324/9781003512493 DOI: https://doi.org/10.4324/9781003512493-1
Ofosu-Asare, Y. (2025). Cognitive imperialism in artificial intelligence: Counteracting bias with indigenous epistemologies. AI & Soc, 40, 3045–3061. https://doi.org/10.1007/s00146-024-02065-0 DOI: https://doi.org/10.1007/s00146-024-02065-0
Oster, E. (2021). The family firm: A data-driven guide to better decision making in the early school years. Penguin.
Patel, D. (2023). Revolutionizing program evaluation with Generative AI: An evidence-based methodology. International Journal for Multidisciplinary Research, 5(3). https://doi.org/10.36948/ijfmr.2023.v05i03.4105 DOI: https://doi.org/10.36948/ijfmr.2023.v05i03.4105
Qamar, M. T., Sohail, S. S., Ansari, G., & Saxena, C. (2024). The language of nuance: Exploring the limits of large language models in handling ambiguity. In Sachdeva, S., Watanobe, Y., Bhalla, S. (Eds.), Big Data Analytics in Astronomy, Science, and Engineering. BDA 2024. Lecture Notes in Computer Science, Vol 15546. Springer, Cham. https://doi.org/10.1007/978-3-031-86193-2_12 DOI: https://doi.org/10.1007/978-3-031-86193-2_12
Queensland Education Department (n.d.). Decision-making framework. https://alt-qed.qed.qld.gov.au/our-publications/managementandframeworks/Documents/decision-making-framework.pdf
Redelmeier, D. A., & Shafir, E. (1995). Medical decision making in situations that offer multiple alternatives. JAMA, 273(4), 302–305. https://doi.org/10.1001/jama.1995.03520280048038 DOI: https://doi.org/10.1001/jama.1995.03520280048038
Reichwein, B. (2025). SWOT analysis. Better Evaluation knowledge platform. https://www.betterevaluation.org/methods-approaches/methods/swot-analysis
Reid, A. M. (2023). Vision for an equitable AI world: The role of evaluation and evaluators to incite change. New Directions for Evaluation, 2023(178–179), 111–121. https://doi.org/10.1002/ev.20559 DOI: https://doi.org/10.1002/ev.20559
Rovner, H., Molina, E., & Barron Rodriguez, M. R. (2025, June 25). The hidden cost of our AI habits: Choosing the right AI for the job–because not every task needs maximum power, and power takes a toll. MERL Tech. https://merltech.org/the-hidden-cost-of-our-ai-habits/
Sabarre, N. R., Beckmann, B., Bhaskara, S., & Doll, K. (2023). Using AI to disrupt business as usual in small evaluation firms. New Directions for Evaluation, 2023(178–179), 59–71. https://doi.org/10.1002/ev.20562 DOI: https://doi.org/10.1002/ev.20562
Schwandt, T. (2015). Evaluation foundations revisited: Cultivating a life of the mind for practice. Stanford University Press. https://doi.org/10.1515/9780804795722 DOI: https://doi.org/10.1515/9780804795722
Scriven, M. (1981a). The "weight and sum" methodology. Evaluation News, 2(1), 85–90. https://doi.org/10.1177/109821408100200124 DOI: https://doi.org/10.1177/109821408100200124
Scriven, M. (1981b). The logic of evaluation. Edgepress.
Shapiro, S., & Vinh, L. (2024). Artificial intelligence in program evaluation: Insights and applications. Canadian Journal of Program Evaluation, 39(2), 382–391. https://doi.org/10.3138/cjpe-2024-0027 DOI: https://doi.org/10.3138/cjpe-2024-0027
Simon, H. A. (1955). A behavioral model of rational choice. The Quarterly Journal of Economics, 69(1) 99–118. https://doi.org/10.2307/1884852 DOI: https://doi.org/10.2307/1884852
Simon, H. A. (1987). Satisficing. In Eatwell, J., Milgate, M., & Newman, P. (Eds.), The New Palgrave Dictionary of Economics (1st ed., pp. 243–245). Palgrave Macmillan. DOI: https://doi.org/10.1057/978-1-349-95121-5_1767-1
Smith, T. L., Barlow, P. B., Skolits, G. J., & Peters, J. M. (2015). Demystifying reflective practice: Using the DATA model to enhance evaluators’ professional activities. Evaluation and Program Planning, 52, 142–147. https://doi.org/10.1016/j.evalprogplan.2015.04.004 DOI: https://doi.org/10.1016/j.evalprogplan.2015.04.004
Thaler, R. H., & Sunstein, C. R. (2009). Nudge: improving decisions about health, wealth, and happiness (Rev. and expanded ed.). Penguin Books.
Thornton, I. (2023). A special delivery by a fork: Where does artificial intelligence come from? New Directions for Evaluation, 2023(178-179), 23–32. https://doi.org/10.1002/ev.20560 DOI: https://doi.org/10.1002/ev.20560
Tilton, Z., LaVelle, J. M., Ford, T., & Montenegro, M. (2023). Artificial intelligence and the future of evaluation education: Possibilities and prototypes. New Directions for Evaluation, 2023(178-179), 97–109. https://doi.org/10.1002/ev.20564 DOI: https://doi.org/10.1002/ev.20564
Tovey, T. L. S., & Archibald, T. (2023). The relationship between reflective practice, evaluative thinking, and practical wisdom. In Hurteau, M., & Archibald, T. (Eds.), Practical wisdom for an ethical evaluation practice (pp. 87–101). Information Age Publishing. https://doi.org/10.1108/979-8-88730-088-7-20251011 DOI: https://doi.org/10.1108/979-8-88730-088-7-20251011
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. https://doi.org/10.1126/science.185.4157.1124 DOI: https://doi.org/10.1126/science.185.4157.1124
Tversky, A., & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211(4481), 453–458. https://doi.org/10.1126/science.7455683 DOI: https://doi.org/10.1126/science.7455683
United Nations EG. (2025). UNEG Ethical Principles for Harnessing AI in United Nations Evaluations. United Nations Evaluation Group (UNEG). https://www.unevaluation.org/sites/default/files/file_uploads/2025_Ethical%20Principles%20for%20Using%20AI%20in%20UN%20Evaluations.pdf
United Nations FPA. (2024). GenAI-powered evaluation function at UNFPA: Strategy for leveraging the benefits of responsible and ethical generative artificial intelligence while minimizing risks. United Nations Population Fund (UNFPA) Independent Evaluation Office (IEO). https://www.unfpa.org/sites/default/files/admin-resource/IEO__GenAI_Strategy.pdf
United Nations System CEB (2022). Principles for the Ethical Use of Artificial Intelligence in the United Nations System. United Nations System Chief Executives Board for Coordination (CEB). https://www.ictworks.org/wp-content/uploads/2022/10/Principles-Ethical-Use-AI-UN-System.pdf
Vo, A. T., & Archibald, T. (2018). New directions for evaluative thinking. New Directions for Evaluation, 2018(158), 139–147. https://doi.org/10.1002/ev.20317 DOI: https://doi.org/10.1002/ev.20317
Wilkens, S. (2011). Beyond bumper sticker ethics: An introduction to theories of right and wrong. InterVarsity Press.
Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Sage.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Sarah Mason, Olivia Melvin, Tahirah David, Bianca Montrosse-Moorhead

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors contributing to The Canadian Journal of Program Evaluation agree to release their articles under the Creative Commons Attribution-Noncommercial 4.0 (CC-BY-NC) license. This licence allows this work to be copied, distributed, remixed, transformed, and built upon for any purpose provided that appropriate attribution is given, a link is provided to the license, and changes made were indicated.
Authors retain copyright of their work and grant the journal right of first publication.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.


