🌸 Digital Garden

Workshop 4

← Back to Reflections

What I’ve done

We designed and constructed a dataset about how students use generative AI. The purpose is to critically engage with issues of data, ethics, and power.

Research Purpose: University-led data collection

Supporting a UK university to understand “digital engagement” and improve student services, considering:

  • Which aspects of students’ generative AI use we want to explore
  • How to narrow down the concept (e.g., writing support, translation tools, image generation)
  • Select one key theme to focus on

Difficulties: Ethical Considerations

We use quiet a lot of time discussing:

  • What ethical issues might arise from this research?
  • Will participants resist giving truthful answers to questions?
  • Could the data collection cause harm or discomfort to certain groups?

Design the Survey (Using Microsoft Forms)

Survey Image 1 Survey Image 2 Survey Image 3 Survey Image 4 Survey Image 5 Survey Image 6

More thoughts

In this workshop, our group conducted a scenario of "university-led data collection", with the focus on designing a survey questionnaire to understand students' needs for guidance courses on the use of generative AI tools. During the design process, we quickly realized the importance of having a clear research purpose. A clear goal helps us determine which data is truly necessary, ensuring that each question directly serves the overall research and avoiding ambiguous or meaningless items.

Another major challenge we faced was the ethical issue of data collection, especially how to avoid sensitive or uncomfortable questions while still obtaining reliable feedback. Since students' use of generative AI may involve concerns about personal learning habits and even academic integrity, we were particularly cautious in our wording. By using neutral expressions, numerical scales, and allowing participants to choose not to answer, we attempted to protect privacy while encouraging respondents to answer truthfully. This process made us more aware that if not designed properly, bias and psychological pressure can easily arise in questionnaires.

Overall, this workshop made us understand that successful data collection not only depends on technology and methods but also requires consideration of ethics, consent, representativeness, and participants' trust. Only by achieving a balance between research goals and ethical practices can meaningful and responsible datasets be constructed.

Reading references (Crawford, 2021)

  • When mug shots are used as training data, they function no longer as tools of identification but rather to fine-­ tune an automated form of vision. We might think of this as Galtonian formalism. They are used to detect the basic mathematical components of faces, to “reduce nature to its geometrical essence.”
  • Nor is it solely the invasion of privacy they represent, since suspects and prisoners have no right to refuse being photographed. It’s that the NIST databases foreshadow the emergence of a logic that has now thoroughly pervaded the tech sector: the unswerving belief that everything is data and is there for the taking. It doesn’t matter where a photograph was taken or whether it reflects a moment of vulnerability or pain or if it represents a form of shaming the subject. It has become so normalized across the industry to take and use whatever is available that few stop to question the underlying politics.
  • The early years of the twenty-first century marked a shift away from consent-driven data collection. In addition to dispensing with the need for staged photo shoots, those responsible for assembling datasets presumed that the contents of the internet were theirs for the taking, beyond the need for agreements, signed releases, and ethics reviews.
  • But training data is a brittle form of ground truth—and even the largest troves of data cannot escape the fundamental slippages that occur when an infinitely complex world is simplified and sliced into categories.