Your AI is More Human Than You Think

In recent years, the rapid advancement of artificial intelligence (AI) has captivated the public imagination, with large language models (LLMs) like ChatGPT showcasing seemingly magical abilities. These AI marvels have become synonymous with technological progress, promising a future where machines can understand and generate human-like text, solve complex problems, and even engage in creative tasks. However, beneath the glossy surface of these AI wonders lies a complex and often troubling reality - the essential role of human workers who make these systems possible through data annotation and content moderation.
The Technical Necessity of Human Labor
At its core, the training of LLMs requires massive amounts of carefully curated and labeled data to function effectively. This process involves feeding the model text data and teaching it to understand and generate human language through various forms of annotation. Data annotation encompasses labeling, categorizing, tagging, and adding attributes to help machine learning models process and analyze information better. The training process typically follows several key steps.
First, vast amounts of text data must be collected from diverse sources like books, articles, and web content. This raw data then needs to be cleaned, preprocessed, and prepared for training through processes like tokenization and formatting. However, the most critical and labor-intensive aspect is the human annotation required to label this data appropriately. Different types of annotation are needed depending on the specific requirements. Manual annotation involves humans directly reviewing and labeling data, while semi-automatic annotation combines human oversight with algorithmic assistance. For tasks requiring high accuracy and nuanced understanding, human annotators remain irreplaceable despite advances in automated systems.
Exploitation and Neglect in the AI Workforce
To meet the massive demand for annotated training data, tech companies have built a vast network of workers, primarily in the Global South. These workers, often paid minimal wages, perform the crucial but mentally taxing work of labeling data and moderating content. Countries like Kenya, India, the Philippines, and Venezuela have become hubs for this digital labor, with large populations of well-educated but underemployed individuals providing a ready workforce for tech giants. In Kenya, for instance, workers employed by companies like Sama have been paid as little as $1.32 to $2 per hour to review and label content for tech giants like OpenAI and Meta. These workers are often required to review disturbing content including violence, hate speech, and explicit material for up to 8 hours daily.
The psychological toll of this work is severe, with many workers reporting trauma, nightmares, and substance abuse issues to cope with the constant exposure to disturbing content. Naftali Wambalo, a Kenyan worker interviewed by CBS News, described his experience: "I looked at people being slaughtered, people engaging in sexual activity with animals. People abusing children physically, sexually. People committing suicide." He added, "Basically- yes, all day long. Eight hours a day, 40 hours a week." The impact of this work on Wambalo's personal life was profound, affecting even his intimate relationships.
The outsourcing model exacerbates these challenges by creating a layer of separation between tech companies and the workers they rely on. Companies like OpenAI or Meta contract third-party firms or use digital labor platforms to recruit annotators and moderators. This arrangement allows them to avoid direct responsibility for poor working conditions while benefiting from the labor that powers their AI systems. Workers are often hired as independent contractors rather than employees, stripping them of benefits like healthcare, job security, or access to grievance mechanisms. The precarious nature of this employment makes it difficult for workers to advocate for better conditions without risking their livelihoods. For instance, employees at Sama who moderated content for OpenAI’s ChatGPT project reported being required to label hundreds of text passages per shift under tight deadlines. Despite OpenAI paying Sama $12.50 per hour per worker for this service, the annotators themselves received only a fraction of that amount—typically $1.32 to $2 per hour after taxes.
Adding to the exploitation is the lack of mental health support for workers exposed to traumatic content. While some companies offer wellness programs or counseling services, these efforts are often superficial and inadequate. Workers report being shuffled between tasks rather than receiving meaningful psychological care. The emotional toll is compounded by constant surveillance and performance monitoring, which create a high-pressure environment where even minor deviations can lead to penalties or termination.
Opacity and Accountability in AI Development
The lack of transparency in how AI systems are developed and maintained poses significant ethical concerns. Tech companies often present AI as fully automated systems that function independently of human input. In reality, these systems depend heavily on human labor for tasks such as data labeling, content moderation, and quality assurance. This "human-in-the-loop" model is rarely disclosed to the public, perpetuating the illusion of automation while obscuring the labor that makes it possible.
This opacity extends to the treatment of workers within the AI supply chain. Many companies fail to disclose basic information about their labor practices, such as how much they pay annotators or what steps they take to ensure fair working conditions. This lack of accountability allows companies to sidestep scrutiny while continuing exploitative practices. Even when issues like low wages or poor mental health support come to light, companies often deflect responsibility by pointing to their subcontractors.
The absence of transparency also undermines efforts to address systemic issues within the industry. Without clear reporting on labor practices, it becomes difficult for policymakers, researchers, or advocacy groups to hold companies accountable or propose meaningful reforms. Initiatives like the AI Labor Disclosure Initiative (AILDI) aim to address this gap by advocating for regular reporting on digital labor practices. However, such efforts remain voluntary and lack the enforcement mechanisms needed to drive widespread change.
Bias in Data Annotation
Bias in data annotation is another critical issue that stems from the human labor involved in training AI systems. Annotators bring their own cultural perspectives, experiences, and biases into their work, which can influence how data is labeled. An annotator's understanding of hate speech may vary depending on their cultural background or personal beliefs. Many data annotation tasks are outsourced to regions where workers may not fully understand the cultural context of the data they are labeling. Instructions are often provided in English or other dominant languages, creating additional barriers for non-native speakers. This disconnect can result in inconsistent or inaccurate annotations that compromise the fairness and reliability of AI systems.
Annotation bias is particularly problematic because it often goes unnoticed until it manifests in real-world applications. For instance, facial recognition systems trained on datasets lacking diversity have been shown to perform poorly on individuals with darker skin tones. Similarly, natural language processing models can perpetuate stereotypes if their training data reflects societal prejudices.
A Structural Problem Across Industries
The exploitation of data workers is not an isolated issue but a symptom of broader structural problems within the tech industry. The push for rapid innovation and cost efficiency has created a system where human labor is undervalued and hidden from view. This dynamic mirrors historical patterns of industrial exploitation but is exacerbated by the globalized nature of today's economy.
Social media giants like Meta (formerly Facebook) also rely heavily on outsourced labor for content moderation. Workers at Sama who moderated Facebook content described similar conditions: low pay, exposure to traumatic material, and inadequate mental health support. Similar dynamics can be seen in industries like fast fashion or electronics manufacturing, where low-cost labor fuels high-margin products sold primarily in affluent markets.
Outsourcing plays a central role in this system by enabling companies to shift labor-intensive tasks to regions with lower wages and weaker labor protections. This model creates a race to the bottom where workers in developing countries compete for contracts by accepting substandard pay and conditions. At the same time, it allows tech companies in wealthier nations to reap enormous profits while avoiding accountability for exploitative practices.
Fair Compensation and Improved Working Conditions
The foundation of ethical AI development lies in ensuring fair compensation and dignified working conditions for the laborers who power these systems. Data workers are often paid wages that fail to meet basic living standards. Companies must commit to paying living wages that reflect the value of this labor while also providing essential benefits such as healthcare, job security, and paid time off. Beyond financial equity, improving working conditions is equally vital. Workers need realistic productivity targets that do not overburden them, adequate rest periods to prevent burnout, and safe environments free from exploitation or undue pressure. These changes would not only improve worker well-being but also enhance the quality of the AI systems they help create.
The psychological toll on data annotators and content moderators is one of the most pressing ethical issues in AI development. Companies must take proactive steps to mitigate these harms by offering comprehensive mental health support. This includes access to trauma-informed counselors, regular mental health check-ins, and wellness programs designed to alleviate stress. Employers should also explore technological solutions that limit human exposure to harmful content, such as using AI tools to pre-filter graphic material before it reaches human reviewers. By prioritizing mental health alongside operational goals, companies can create a more humane workplace for their employees.
Empowering Workers
Empowering data workers is essential for creating a more equitable AI ecosystem. A lack of transparency in AI supply chains enables companies to distance themselves from exploitative labor practices. Companies must adopt transparent supply chain practices that ensure ethical labor standards are upheld at every level. This includes direct engagement with workers or their representatives in shaping workplace policies and conducting regular audits to monitor compliance with labor standards. Additionally, companies should establish safe channels for workers to report grievances or suggest improvements without fear of retaliation and involve them in decision-making processes related to their roles, treating them as collaborators rather than expendable resources. Supporting unionization efforts and transnational organizing can provide workers with a collective voice to advocate for better wages, protections, and working conditions. Worker-led initiatives have already shown promise in amplifying the perspectives of those directly affected by exploitative systems.
Conclusion
Beyond immediate reforms, the industry must fundamentally rethink its approach to progress. These AI systems are rapidly becoming commonplace and have become the hot topic of the tech world. However, the exploitation of data workers to build these models is not an inevitable cost of progress; it is a choice made by an industry that prioritizes profit over people. To build truly responsible AI systems, we must reimagine how we value human labor within the tech ecosystem. This means not only addressing immediate concerns around wages and working conditions but also fostering a culture of accountability that recognizes the dignity and humanity of all workers—whether they are coding algorithms in Silicon Valley or labeling data in Nairobi. By shining a light on this hidden workforce and demanding systemic change, we can ensure that the benefits of AI are shared equitably across society—not built on the backs of its most vulnerable contributors.
Join us next week when we will dive in on the seemingly sudden rise of DeepSeek and what it means for the broader AI ecosystem! If you found this article through a direct link, please consider subscribing! Subscribing lets you express your support and give feedback for free and ensures that you are always first in line to receive our articles.
Citations
Bartholomew, J. (2023, August 29). Q&A: Uncovering the labor exploitation that powers AI. Columbia Journalism Review. https://www.cjr.org/tow_center/qa-uncovering-the-labor-exploitation-that-powers-ai.php
Chen, A. (2014, October 23). The Laborers Who Keep Dick Pics and Beheadings Out of Your Facebook Feed. Wired. https://www.wired.com/2014/10/content-moderation/
Coldewey, D. (2024, July 8). Data workers detail exploitation by tech industry in DAIR report. TechCrunch. https://techcrunch.com/2024/07/08/data-workers-detail-exploitation-by-tech-industry-in-dair-report/
Dzieza, J. (2023, June 20). Inside the AI Factory. The Verge. https://www.theverge.com/features/23764584/ai-artificial-intelligence-data-notation-labor-scale-surge-remotasks-openai-chatbots
Gebrekidan, F. B. (2024). Content moderation: The harrowing, traumatizing job that left many African data workers with mental health issues and drug dependency. In: M. Miceli, A. Dinika, K. Kauffman, C. Salim Wagner, and L. Sachenbacher (eds.) The Data Workers' Inquiry. https://data-workers.org/fasica
Meaker, M. (2023, September 11). These Prisoners Are Training AI. Wired. https://www.wired.com/story/prisoners-training-ai-finland/
Okolo, C. T., & Tano, M. (2024, October 24). Moving toward truly responsible AI development in the global AI market. Brookings. https://www.brookings.edu/articles/moving-toward-truly-responsible-ai-development-in-the-global-ai-market/
Perrigo, B. (2023, January 18). Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer. TIME. https://time.com/6247678/openai-chatgpt-kenya-workers/
Ranta, B. N. D. (2024). The Unknown Women of Content Moderation. In: M. Miceli, A. Dinika, K. Kauffman, C. Salim Wagner, & L. Sachenbacher (eds.), The Data Workers‘ Inquiry. https://data-workers.org/ranta
Stahl, L. (2024, November 24). Labelers training AI say they’re overworked, underpaid and exploited by big American tech companies. CBS News. https://www.cbsnews.com/news/labelers-training-ai-say-theyre-overworked-underpaid-and-exploited-60-minutes-transcript/
Taylor, B. L. (2023, November 15). Long hours and low wages: The human labour powering AI’s development. The Conversation. http://theconversation.com/long-hours-and-low-wages-the-human-labour-powering-ais-development-217038
Bracy, C., & Dark, M. (2023, September 13). The Ghost Workforce the Tech Industry Doesn’t Want You to Think About (SSIR). https://ssir.org/articles/entry/the_ghost_workforce_the_tech_industry_doesnt_want_you_to_think_about
Brandom, R. (2023, June 29). The industry behind the industry behind AI. Rest of World. https://restofworld.org/2023/exporter-industry-behind-ai/
Muñoz, F. M. (2023, October 9). Unmasking the Shadows of AI: Ethical Concerns in the Digital Workforce. Kognic. https://www.kognic.com/articles/shadows-of-ai
Williams, A., Miceli, M., & Gebru, T. (2022). The Exploited Labor Behind Artificial Intelligence. Noema. https://www.noemamag.com/the-exploited-labor-behind-artificial-intelligence