Applying LLMs to a Global Affairs Task

Danielle Goldfarb, Senior Fellow, Munk School of Global Affairs & Public Policy, Lecturer, Master of Global Affairs Program

Danielle Goldfarb teaches on how AI and digital data are painting more detailed and real-time pictures of global issues. She integrated generative AI into her teaching by asking students to apply large language models to specific global affairs challenges and observe both strengths and vulnerabilities.  

Assessment Objectives 

This assignment is part of a non-technical graduate seminar (Real Time Data and AI for Global Affairs Intelligence, Winter 2025) exploring the potential and risks of digital data combined with advances in artificial intelligence for understanding global challenges. Students select a specific global affairs task and experiment with applying free versions of generative AI tools such as ChatGPT, Gemini, and DeepSeek to help them with that task. 

The overall goal is to evaluate these tools’ usefulness and limitations or vulnerabilities in analyzing real-world global issues. Students a) select an appropriate case or cases, b) prompt the LLMs to complete the task, c) compare the results to their baseline and at least one other model, and d) reflect on their learnings for this specific task and more generally for global affairs. 

Instructors provide an example and exemplar of something they’ve tried. Students work in small groups during class time, share how their group’s exploration is going along the way to allow for instructor and peer feedback, and submit individual reflections.  

Assignment Process 

  1. Select a global affairs challenge: Choose a specific task related to global affairs where LLMs might provide value and limitations can also be meaningfully tested. Alternatively, pick two challenges: one for opportunities, one for challenges. You might have to experiment with a few different applications to find an appropriate task or challenge.
  2. Select and test LLMs: Pick 2-3 free generative AI tools. Experiment with testing each tool for strengths and vulnerabilities across qualities that you think are important for this challenge, adjusting approaches and refining prompts as needed. Share and refine as you work with your peers.
  3. Attempt adversarial training or “red teaming”: Deliberately challenge the LLMs with difficult, ambiguous, or misleading inputs. For example, introduce a fictional trade alliance or inaccurate claim and assess how the model responds.
  4. Analyze: Compare tools against each other and against your own baseline methods. Identify where LLMs added value, where they fell short, and how your approach evolved.
  5. Reflect: Submit an individual 500-word reflection. Include the process, what you learned specific to your challenge as well as more broadly for global affairs analysis. Include a sample prompt and answer (not included in word length).

Future-Focused Student Skill Development  

This assignment aligns well with the University of Calgary’s STRIVE model for designing assessments that effectively incorporate generative AI. It emphasizes student-centeredness: students experiment with different generative AI tools applied to global affairs challenges they choose. It also emphasizes responsibility: students identify qualities that matter in completing a particular global affairs task and determine where it is and is not appropriate to apply the use of generative AI tools. The assignment also encourages integrity: students test these models for vulnerabilities and strengths, forcing them to consider a range of angles they may not have considered.

Student Feedback 

  1. Most students felt the assignment made them reconsider their views.

    Some students began the assignment skeptical of these tools’ utility, while others were overly confident in genAI capabilities. By the end, most recognized both strengths and limitations they hadn’t previously considered, as well as the continued importance of human judgment. One student concluded: “These tools are great assistants… but no matter how much I narrowed the topic, these tools could not solve the problem on their own.” 

  2. Selecting the right task to meaningfully test LLMs was the first challenge  

    “By zeroing in on a specific aspect, I was able to not only get clearer data but also generate more targeted questions that led to richer discussions in class.”  

    Students who initially approached broad, intractable, or complex topics struggled to receive meaningful AI outputs. Those who narrowed their focus from, say, “the implications of the Russia-Ukraine conflict”, to specific elements, such as “the impact of sanctions on Russian oil exports,” were able to more meaningfully test the tools’ capabilities. Permission to experiment, discussion and input from peers and the instructor, and sharing exemplars were critical for helping students iterate, refine, or abandon and restart their approach. 

  3. Students learned how LLMs can accelerate research and help them spot patterns.  

    Rather than spending hours compiling background material, students could quickly gather overviews and shift to higher-order analysis. “Normally, I’d spend days reading economic reports and compiling data. With these tools, I got an overview in minutes and could focus on analysis.” 

    LLMs helped students spot patterns and connections that were not immediately obvious, such as Russia’s trade shift toward China and India or how aid cuts affected sector-specific funding in countries like Haiti. “ChatGPT helped me see cross-cutting themes like trade diversion and sectoral impacts that weren’t obvious from a single report.”

  4. Students discovered significant limitations and vulnerabilities.  

    “When I asked about the origins of COVID-19, ChatGPT, Claude, and Gemini acknowledged both zoonotic spillover and the possibility of lab-leak theory, while DeepSeek avoided the topic and focused on China’s response.” 

    “ChatGPT provided general policy options but didn’t incorporate the latest developments, like the recent sanctions imposed on Russia or the latest diplomatic efforts.” 

    “These AI tools lack contextual nuance and local-oriented solutions—especially when looking at problems in international development and the SDGs.” 

    “ChatGPT often listed sources without links, making verification difficult, and sometimes it referred to studies that I couldn’t locate online.” 

    “Even with identical prompts among group members, the timelines and data reported varied significantly, which raised concerns about the reliability of the information.” 

    “This taught me that while AI can assist in the research process, I must remain vigilant and critical of the information it produces.”  

    Students also identified very simplistic answers on some topics, which they thought could be due to the dominance of Western country and English-language in the models’ training data. 

  5. Comparing the results of different tools forced students to reflect on which qualities were most relevant for their task.

    Comparing LLM results to one other and to a baseline of how students would typically do the analysis helped student clarify what qualities matter for their task. “I rated each genAI tool on a scale from 0 to 5… This helped me assess where the models were helpful, and where they clearly fell short—especially in structuring data and making meaningful predictions.” “ChatGPT handled data tables well (e.g., breaking down USAID funding by sector), but maps were outdated or generic.” “DeepSeek offered a more structured, quantitative assessment by assigning probabilities… ChatGPT provided a broader, narrative-style analysis.” “DeepSeek relied more on policy announcements and varied sources, while ChatGPT drew mainly from major news outlets (ie. Reuters, AP News).”  

  6. The group format made the project fun and helped with creativity and refining approaches.    
    “Working in a group helped me see other angles on my topic—one student’s prompts about water access in Rwanda helped me refine my own on energy infrastructure.” Students could exchange prompts and effective / ineffective strategies in real time with group members. Students testing the models on their adversarial prompts broke out in frequent laughter as they made up lies about their topic and saw how the models reacted.  
Back to Top