Connect with us

The Conversation

AI companies train language models on YouTube’s archive − making family-and-friends videos a privacy risk

Published

on

theconversation.com – Ryan McGrady, Senior Researcher, Initiative for Digital Public Infrastructure, UMass Amherst – 2024-06-27 07:23:53

Your kid’s silly video could be fodder for ChatGPT.
Halfpoint/iStock via Getty Images

Ryan McGrady, UMass Amherst and Ethan Zuckerman, UMass Amherst

The promised artificial intelligence revolution requires data. Lots and lots of data. OpenAI and Google have begun using YouTube videos to train their text-based AI models. But what does the YouTube archive actually include?

Our team of digital media researchers at the University of Massachusetts Amherst collected and analyzed random samples of YouTube videos to learn more about that archive. We published an 85-page paper about that dataset and set up a website called TubeStats for researchers and journalists who need basic information about YouTube.

Now, we’re taking a closer look at some of our more surprising findings to better understand how these obscure videos might become part of powerful AI systems. We’ve found that many YouTube videos are meant for personal use or for small groups of people, and a significant proportion were created by children who appear to be under 13.

Bulk of the YouTube iceberg

Most people’s experience of YouTube is algorithmically curated: Up to 70% of the videos users watch are recommended by the site’s algorithms. Recommended videos are typically popular content such as influencer stunts, news clips, explainer videos, travel vlogs and video game reviews, while content that is not recommended languishes in obscurity.

Some YouTube content emulates popular creators or fits into established genres, but much of it is personal: family celebrations, selfies set to music, homework assignments, video game clips without context and kids dancing. The obscure side of YouTube – the vast majority of the estimated 14.8 billion videos created and uploaded to the platform – is poorly understood.

Illuminating this aspect of YouTube – and social media generally – is difficult because big tech companies have become increasingly hostile to researchers.

We’ve found that many videos on YouTube were never meant to be shared widely. We documented thousands of short, personal videos that have few views but high engagement – likes and comments – implying a small but highly engaged audience. These were clearly meant for a small audience of friends and family. Such social uses of YouTube contrast with videos that try to maximize their audience, suggesting another way to use YouTube: as a video-centered social network for small groups.

Other videos seem intended for a different kind of small, fixed audience: recorded classes from pandemic-era virtual instruction, school board meetings and work meetings. While not what most people think of as social uses, they likewise imply that their creators have a different expectation about the audience for the videos than creators of the kind of content people see in their recommendations.

Fuel for the AI machine

It was with this broader understanding that we read The New York Times exposé on how OpenAI and Google turned to YouTube in a race to find new troves of data to train their large language models. An archive of YouTube transcripts makes an extraordinary dataset for text-based models.

There is also speculation, fueled in part by an evasive answer from OpenAI’s chief technology officer Mira Murati, that the videos themselves could be used to train AI text-to-video models such as OpenAI’s Sora.

The New York Times story raised concerns about YouTube’s terms of service and, of course, the copyright issues that pervade much of the debate about AI. But there’s another problem: How could anyone know what an archive of more than 14 billion videos, uploaded by people all over the world, actually contains? It’s not entirely clear that Google knows or even could know if it wanted to.

Kids as content creators

We were surprised to find an unsettling number of videos featuring kids or apparently created by them. YouTube requires uploaders to be at least 13 years old, but we frequently saw children who appeared to be much younger than that, typically dancing, singing or playing video games.

In our preliminary research, our coders determined nearly a fifth of random videos with at least one person’s face visible likely included someone under 13. We didn’t take into account videos that were clearly shot with the consent of a parent or guardian.

Our current sample size of 250 is relatively small – we are working on coding a much larger sample – but the findings thus far are consistent with what we’ve seen in the past. We’re not aiming to scold Google. Age validation on the internet is infamously difficult and fraught, and we have no way of determining whether these videos were uploaded with the consent of a parent or guardian. But we want to underscore what is being ingested by these large companies’ AI models.

Small reach, big influence

It’s tempting to assume OpenAI is using highly produced influencer videos or TV newscasts posted to the platform to train its models, but previous research on large language model training data shows that the most popular content is not always the most influential in training AI models. A virtually unwatched conversation between three friends could have much more linguistic value in training a chatbot language model than a music video with millions of views.

Unfortunately, OpenAI and other AI companies are quite opaque about their training materials: They don’t specify what goes in and what doesn’t. Most of the time, researchers can infer problems with training data through biases in AI systems’ output. But when we do get a glimpse at training data, there’s often cause for concern. For example, Human Rights Watch released a report on June 10, 2024, that showed that a popular training dataset includes many photos of identifiable kids.

The history of big tech self-regulation is filled with moving goal posts. OpenAI in particular is notorious for asking for forgiveness rather than permission and has faced increasing criticism for putting profit over safety.

Concerns over the use of user-generated content for training AI models typically center on intellectual property, but there are also privacy issues. YouTube is a vast, unwieldy archive, impossible to fully review.

Models trained on a subset of professionally produced videos could conceivably be an AI company’s first training corpus. But without strong policies in place, any company that ingests more than the popular tip of the iceberg is likely including content that violates the Federal Trade Commission’s Children’s Online Privacy Protection Rule, which prevents companies from collecting data from children under 13 without notice.

With last year’s executive order on AI and at least one promising proposal on the table for comprehensive privacy legislation, there are signs that legal protections for user data in the U.S. might become more robust.

YouTube video
When the Wall Street Journal’s Joanna Stern asked OpenAI CTO Mira Murati whether OpenAI trained its text-to-video generator Sora on YouTube videos, she said she wasn’t sure.

Have you unwittingly helped train ChatGPT?

The intentions of a YouTube uploader simply aren’t as consistent or predictable as those of someone publishing a book, writing an article for a magazine or displaying a painting in a gallery. But even if YouTube’s algorithm ignores your upload and it never gets more than a couple of views, it may be used to train models like ChatGPT and Gemini.

As far as AI is concerned, your family reunion video may be just as important as those uploaded by influencer giant Mr. Beast or CNN.The Conversation

Ryan McGrady, Senior Researcher, Initiative for Digital Public Infrastructure, UMass Amherst and Ethan Zuckerman, Associate Professor of Public Policy, Communication, and Information, UMass Amherst

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post AI companies train language models on YouTube’s archive − making family-and-friends videos a privacy risk appeared first on theconversation.com

The Conversation

Colors are objective, according to two philosophers − even though the blue you see doesn’t match what I see

Published

on

theconversation.com – Elay Shech, Professor of Philosophy, Auburn University – 2025-04-25 07:55:00

What appear to be blue and green spirals are actually the same color.
Akiyoshi Kitaoka

Elay Shech, Auburn University and Michael Watkins, Auburn University

Is your green my green? Probably not. What appears as pure green to me will likely look a bit yellowish or blueish to you. This is because visual systems vary from person to person. Moreover, an object’s color may appear differently against different backgrounds or under different lighting.

These facts might naturally lead you to think that colors are subjective. That, unlike features such as length and temperature, colors are not objective features. Either nothing has a true color, or colors are relative to observers and their viewing conditions.

But perceptual variation has misled you. We are philosophers who study colors, objectivity and science, and we argue in our book “The Metaphysics of Colors” that colors are as objective as length and temperature.

Perceptual variation

There is a surprising amount of variation in how people perceive the world. If you offer a group of people a spectrum of color chips ranging from chartreuse to purple and asked them to pick the unique green chip – the chip with no yellow or blue in it – their choices would vary considerably. Indeed, there wouldn’t be a single chip that most observers would agree is unique green.

Generally, an object’s background can result in dramatic changes in how you perceive its colors. If you place a gray object against a lighter background, it will appear darker than if you place it against a darker background. This variation in perception is perhaps most striking when viewing an object under different lighting, where a red apple could look green or blue.

Of course, that you experience something differently does not prove that what is experienced is not objective. Water that feels cold to one person may not feel cold to another. And although we do not know who is feeling the water “correctly,” or whether that question even makes sense, we can know the temperature of the water and presume that this temperature is independent of your experience.

Similarly, that you can change the appearance of something’s color is not the same as changing its color. You can make an apple look green or blue, but that is not evidence that the apple is not red.

Apple under a gradient of red to blue light
Under different lighting conditions, objects take on different colors.
Gyozo Vaczi/iStock via Getty Images Plus

For comparison, the Moon appears larger when it’s on the horizon than when it appears near its zenith. But the size of the Moon has not changed, only its appearance. Hence, that the appearance of an object’s color or size varies is, by itself, no reason to think that its color and size are not objective features of the object. In other words, the properties of an object are independent of how they appear to you.

That said, given that there is so much variation in how objects appear, how do you determine what color something actually is? Is there a way to determine the color of something despite the many different experiences you might have of it?

Matching colors

Perhaps determining the color of something is to determine whether it is red or blue. But we suggest a different approach. Notice that squares that appear to be the same shade of pink against different backgrounds look different against the same background.

Green, purple and orange squares with smaller squares in shades of pink placed at their centers and at the bottom of the image
The smaller squares may appear to be the same color, but if you compare them with the strip of squares at the bottom, they’re actually different shades.
Shobdohin/Wikimedia Commons, CC BY-SA

It’s easy to assume that to prove colors are objective would require knowing which observers, lighting conditions and backgrounds are the best, or “normal.” But determining the right observers and viewing conditions is not required for determining the very specific color of an object, regardless of its name. And it is not required to determine whether two objects have the same color.

To determine whether two objects have the same color, an observer would need to view the objects side by side against the same background and under various lighting conditions. If you painted part of a room and find that you don’t have enough paint, for instance, finding a match might be very tricky. A color match requires that no observer under any lighting condition will see a difference between the new paint and the old.

YouTube video
Is the dress yellow and white or black and blue?

That two people can determine whether two objects have the same color even if they don’t agree on exactly what that color is – just as a pool of water can have a particular temperature without feeling the same to me and you – seems like compelling evidence to us that colors are objective features of our world.

Colors, science and indispensability

Everyday interactions with colors – such as matching paint samples, determining whether your shirt and pants clash, and even your ability to interpret works of art – are hard to explain if colors are not objective features of objects. But if you turn to science and look at the many ways that researchers think about colors, it becomes harder still.

For example, in the field of color science, scientific laws are used to explain how objects and light affect perception and the colors of other objects. Such laws, for instance, predict what happens when you mix colored pigments, when you view contrasting colors simultaneously or successively, and when you look at colored objects in various lighting conditions.

The philosophers Hilary Putnam and Willard van Orman Quine made famous what is known as the indispensability argument. The basic idea is that if something is indispensable to science, then it must be real and objective – otherwise, science wouldn’t work as well as it does.

For example, you may wonder whether unobservable entities such as electrons and electromagnetic fields really exist. But, so the argument goes, the best scientific explanations assume the existence of such entities and so they must exist. Similarly, because mathematics is indispensable to contemporary science, some philosophers argue that this means mathematical objects are objective and exist independently of a person’s mind.

Blue damselfish, seeming iridescent against a black background
The color of an animal can exert evolutionary pressure.
Paul Starosta/Stone via Getty Images

Likewise, we suggest that color plays an indispensable role in evolutionary biology. For example, researchers have argued that aposematism – the use of colors to signal a warning for predators – also benefits an animal’s ability to gather resources. Here, an animal’s coloration works directly to expand its food-gathering niche insofar as it informs potential predators that the animal is poisonous or venomous.

In fact, animals can exploit the fact that the same color pattern can be perceived differently by different perceivers. For instance, some damselfish have ultraviolet face patterns that help them be recognized by other members of their species and communicate with potential mates while remaining largely hidden to predators unable to perceive ultraviolet colors.

In sum, our ability to determine whether objects are colored the same or differently and the indispensable roles they play in science suggest that colors are as real and objective as length and temperature.The Conversation

Elay Shech, Professor of Philosophy, Auburn University and Michael Watkins, Professor of Philosophy, Auburn University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post Colors are objective, according to two philosophers − even though the blue you see doesn’t match what I see appeared first on theconversation.com

Continue Reading

The Conversation

Perfect brownies baked at high altitude are possible thanks to Colorado’s home economics pioneer Inga Allison

Published

on

theconversation.com – Tobi Jacobi, Professor of English, Colorado State University – 2025-04-22 07:47:00

Students work in the high-altitude baking laboratory.
Archives and Special Collections, Colorado State University

Tobi Jacobi, Colorado State University and Caitlin Clark, Colorado State University

Many bakers working at high altitudes have carefully followed a standard recipe only to reach into the oven to find a sunken cake, flat cookies or dry muffins.

Experienced mountain bakers know they need a few tricks to achieve the same results as their fellow artisans working at sea level.

These tricks are more than family lore, however. They originated in the early 20th century thanks to research on high-altitude baking done by Inga Allison, then a professor at Colorado State University. It was Allison’s scientific prowess and experimentation that brought us the possibility of perfect high-altitude brownies and other baked goods.

A recipe for brownies at high altitude.
Inga Allison’s high-altitude brownie recipe.
Archives and Special Collections, Colorado State University

We are two current academics at CSU whose work has been touched by Allison’s legacy.

One of us – Caitlin Clark – still relies on Allison’s lessons a century later in her work as a food scientist in Colorado. The other – Tobi Jacobi – is a scholar of women’s rhetoric and community writing, and an enthusiastic home baker in the Rocky Mountains, who learned about Allison while conducting archival research on women’s work and leadership at CSU.

That research developed into “Knowing Her,” an exhibition Jacobi developed with Suzanne Faris, a CSU sculpture professor. The exhibit highlights dozens of women across 100 years of women’s work and leadership at CSU and will be on display through mid-August 2025 in the CSU Fort Collins campus Morgan Library.

A pioneer in home economics

Inga Allison is one of the fascinating and accomplished women who is part of the exhibit.

Allison was born in 1876 in Illinois and attended the University of Chicago, where she completed the prestigious “science course” work that heavily influenced her career trajectory. Her studies and research also set the stage for her belief that women’s education was more than preparation for domestic life.

In 1908, Allison was hired as a faculty member in home economics at Colorado Agricultural College, which is now CSU. She joined a group of faculty who were beginning to study the effects of altitude on baking and crop growth. The department was located inside Guggenheim Hall, a building that was constructed for home economics education but lacked lab equipment or serious research materials.

A sepia-toned photograph of Inga Allison, a white woman in dark clothes with her hair pulled back.
Inga Allison was a professor of home economics at Colorado Agricultural College, where she developed recipes that worked in high altitudes.
Archives and Special Collections, Colorado State University

Allison took both the land grant mission of the university with its focus on teaching, research and extension and her particular charge to prepare women for the future seriously. She urged her students to move beyond simple conceptions of home economics as mere preparation for domestic life. She wanted them to engage with the physical, biological and social sciences to understand the larger context for home economics work.

Such thinking, according to CSU historian James E. Hansen, pushed women college students in the early 20th century to expand the reach of home economics to include “extension and welfare work, dietetics, institutional management, laboratory research work, child development and teaching.”

News articles from the early 1900s track Allison giving lectures like “The Economic Side of Natural Living” to the Colorado Health Club and talks on domestic science to ladies clubs and at schools across Colorado. One of her talks in 1910 focused on the art of dishwashing.

Allison became the home economics department chair in 1910 and eventually dean. In this leadership role, she urged then-CSU President Charles Lory to fund lab materials for the home economics department. It took 19 years for this dream to come to fruition.

In the meantime, Allison collaborated with Lory, who gave her access to lab equipment in the physics department. She pieced together equipment to conduct research on the relationship between cooking foods in water and atmospheric pressure, but systematic control of heat, temperature and pressure was difficult to achieve.

She sought other ways to conduct high-altitude experiments and traveled across Colorado where she worked with students to test baking recipes in varied conditions, including at 11,797 feet in a shelter house on Fall River Road near Estes Park.

Early 1900s car traveling in the Rocky Mountains.
Inga Allison tested her high-altitude baking recipes at 11,797 feet at the shelter house on Fall River Road, near Estes Park, Colorado.
Archives and Special Collections, Colorado State University

But Allison realized that recipes baked at 5,000 feet in Fort Collins and Denver simply didn’t work in higher altitudes. Little advancement in baking methods occurred until 1927, when the first altitude baking lab in the nation was constructed at CSU thanks to Allison’s research. The results were tangible — and tasty — as public dissemination of altitude-specific baking practices began.

A 1932 bulletin on baking at altitude offers hundreds of formulas for success at heights ranging from 4,000 feet to over 11,000 feet. Its author, Marjorie Peterson, a home economics staff person at the Colorado Experiment Station, credits Allison for her constructive suggestions and support in the development of the booklet.

Science of high-altitude baking

As a senior food scientist in a mountain state, one of us – Caitlin Clark – advises bakers on how to adjust their recipes to compensate for altitude. Thanks to Allison’s research, bakers at high altitude today can anticipate how the lower air pressure will affect their recipes and compensate by making small adjustments.

The first thing you have to understand before heading into the kitchen is that the higher the altitude, the lower the air pressure. This lower pressure has chemical and physical effects on baking.

Air pressure is a force that pushes back on all of the molecules in a system and prevents them from venturing off into the environment. Heat plays the opposite role – it adds energy and pushes molecules to escape.

When water is boiled, molecules escape by turning into steam. The less air pressure is pushing back, the less energy is required to make this happen. That’s why water boils at lower temperatures at higher altitudes – around 200 degrees Fahrenheit in Denver compared with 212 F at sea level.

So, when baking is done at high altitude, steam is produced at a lower temperature and earlier in the baking time. Carbon dioxide produced by leavening agents also expands more rapidly in the thinner air. This causes high-altitude baked goods to rise too early, before their structure has fully set, leading to collapsed cakes and flat muffins. Finally, the rapid evaporation of water leads to over-concentration of sugars and fats in the recipe, which can cause pastries to have a gummy, undesirable texture.

Allison learned that high-altitude bakers could adjust to their environment by reducing the amount of sugar or increasing liquids to prevent over-concentration, and using less of leavening agents like baking soda or baking powder to prevent dough from rising too quickly.

Allison was one of many groundbreaking women in the early 20th century who actively supported higher education for women and advanced research in science, politics, humanities and education in Colorado.

Others included Grace Espy-Patton, a professor of English and sociology at CSU from 1885 to 1896 who founded an early feminist journal and was the first woman to register to vote in Fort Collins. Miriam Palmer was an aphid specialist and master illustrator whose work crafting hyper-realistic wax apples in the early 1900s allowed farmers to confirm rediscovery of the lost Colorado Orange apple, a fruit that has been successfully propagated in recent years.

In 1945, Allison retired as both an emerita professor and emerita dean at CSU. She immediately stepped into the role of student and took classes in Russian and biochemistry.

In the fall of 1958, CSU opened a new dormitory for women that was named Allison Hall in her honor.

“I had supposed that such a thing happened only to the very rich or the very dead,” Allison told reporters at the dedication ceremony.

Read more of our stories about Colorado.The Conversation

Tobi Jacobi, Professor of English, Colorado State University and Caitlin Clark, Senior Food Scientist at the CSU Spur Food Innovation Center, Colorado State University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post Perfect brownies baked at high altitude are possible thanks to Colorado’s home economics pioneer Inga Allison appeared first on theconversation.com

Continue Reading

The Conversation

Why don’t humans have hair all over their bodies? A biologist explains our lack of fur

Published

on

theconversation.com – Maria Chikina, Assistant Professor of Computational and Systems Biology, University of Pittsburgh – 2025-04-21 07:33:00

Some mammals are super hairy, some are not.
Ed Jones/AFP via Getty Images

Maria Chikina, University of Pittsburgh

Curious Kids is a series for children of all ages. If you have a question you’d like an expert to answer, send it to CuriousKidsUS@theconversation.com.


Why don’t humans have hair all over their bodies like other animals? – Murilo, age 5, Brazil


Have you ever wondered why you don’t have thick hair covering your whole body like a dog, cat or gorilla does?

Humans aren’t the only mammals with sparse hair. Elephants, rhinos and naked mole rats also have very little hair. It’s true for some marine mammals, such as whales and dolphins, too.

Scientists think the earliest mammals, which lived at the time of the dinosaurs, were quite hairy. But over hundreds of millions of years, a small handful of mammals, including humans, evolved to have less hair. What’s the advantage of not growing your own fur coat?

I’m a biologist who studies the genes that control hairiness in mammals. Why humans and a small number of other mammals are relatively hairless is an interesting question. It all comes down to whether certain genes are turned on or off.

Hair benefits

Hair and fur have many important jobs. They keep animals warm, protect their skin from the sun and injuries and help them blend into their surroundings.

They even assist animals in sensing their environment. Ever felt a tickle when something almost touches you? That’s your hair helping you detect things nearby.

Humans do have hair all over their bodies, but it is generally sparser and finer than that of our hairier relatives. A notable exception is the hair on our heads, which likely serves to protect the scalp from the sun. In human adults, the thicker hair that develops under the arms and between the legs likely reduces skin friction and aids in cooling by dispersing sweat.

So hair can be pretty beneficial. There must have been a strong evolutionary reason for people to lose so much of it.

Why humans lost their hair

The story begins about 7 million years ago, when humans and chimpanzees took different evolutionary paths. Although scientists can’t be sure why humans became less hairy, we have some strong theories that involve sweat.

Humans have far more sweat glands than chimps and other mammals do. Sweating keeps you cool. As sweat evaporates from your skin, heat energy is carried away from your body. This cooling system was likely crucial for early human ancestors, who lived in the hot African savanna.

Of course, there are plenty of mammals living in hot climates right now that are covered with fur. Early humans were able to hunt those kinds of animals by tiring them out over long chases in the heat – a strategy known as persistence hunting.

Humans didn’t need to be faster than the animals they hunted. They just needed to keep going until their prey got too hot and tired to flee. Being able to sweat a lot, without a thick coat of hair, made this endurance possible.

Genes that control hairiness

To better understand hairiness in mammals, my research team compared the genetic information of 62 different mammals, from humans to armadillos to dogs and squirrels. By lining up the DNA of all these different species, we were able to zero in on the genes linked to keeping or losing body hair.

Among the many discoveries we made, we learned humans still carry all the genes needed for a full coat of hair – they are just muted or switched off.

In the story of “Beauty and the Beast,” the Beast is covered in thick fur, which might seem like pure fantasy. But in real life some rare conditions can cause people to grow a lot of hair all over their bodies. This condition, called hypertrichosis, is very unusual and has been called “werewolf syndrome” because of how people who have it look.

A detailed painting of a man and a woman standing next to one another in historical looking clothes. The man's face is covered in hair, while the woman's is not.
Petrus Gonsalvus and his wife, Catherine, painted by Joris Hoefnagel, circa 1575.
National Gallery of Art

In the 1500s, a Spanish man named Petrus Gonsalvus was born with hypertrichosis. As a child he was sent in an iron cage like an animal to Henry II of France as a gift. It wasn’t long before the king realized Petrus was like any other person and could be educated. In time, he married a lady, forming the inspiration for the “Beauty and the Beast” story.

While you will probably never meet someone with this rare trait, it shows how genes can lead to unique and surprising changes in hair growth.


Hello, curious kids! Do you have a question you’d like an expert to answer? Ask an adult to send your question to CuriousKidsUS@theconversation.com. Please tell us your name, age and the city where you live.

And since curiosity has no age limit – adults, let us know what you’re wondering, too. We won’t be able to answer every question, but we will do our best.The Conversation

Maria Chikina, Assistant Professor of Computational and Systems Biology, University of Pittsburgh

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post Why don’t humans have hair all over their bodies? A biologist explains our lack of fur appeared first on theconversation.com

Continue Reading

Trending