Connect with us

The Conversation

AI companies train language models on YouTube’s archive − making family-and-friends videos a privacy risk

Published

on

theconversation.com – Ryan McGrady, Senior Researcher, Initiative for Digital Public Infrastructure, UMass Amherst – 2024-06-27 07:23:53

Your kid’s silly video could be fodder for ChatGPT.
Halfpoint/iStock via Getty Images

Ryan McGrady, UMass Amherst and Ethan Zuckerman, UMass Amherst

The promised artificial intelligence revolution requires data. Lots and lots of data. OpenAI and Google have begun using YouTube videos to train their text-based AI models. But what does the YouTube archive actually include?

Our team of digital media researchers at the University of Massachusetts Amherst collected and analyzed random samples of YouTube videos to learn more about that archive. We published an 85-page paper about that dataset and set up a website called TubeStats for researchers and journalists who need basic information about YouTube.

Now, we’re taking a closer look at some of our more surprising findings to better understand how these obscure videos might become part of powerful AI systems. We’ve found that many YouTube videos are meant for personal use or for small groups of people, and a significant proportion were created by children who appear to be under 13.

Bulk of the YouTube iceberg

Most people’s experience of YouTube is algorithmically curated: Up to 70% of the videos users watch are recommended by the site’s algorithms. Recommended videos are typically popular content such as influencer stunts, news clips, explainer videos, travel vlogs and video game reviews, while content that is not recommended languishes in obscurity.

Some YouTube content emulates popular creators or fits into established genres, but much of it is personal: family celebrations, selfies set to music, homework assignments, video game clips without context and kids dancing. The obscure side of YouTube – the vast majority of the estimated 14.8 billion videos created and uploaded to the platform – is poorly understood.

Illuminating this aspect of YouTube – and social media generally – is difficult because big tech companies have become increasingly hostile to researchers.

We’ve found that many videos on YouTube were never meant to be shared widely. We documented thousands of short, personal videos that have few views but high engagement – likes and comments – implying a small but highly engaged audience. These were clearly meant for a small audience of friends and family. Such social uses of YouTube contrast with videos that try to maximize their audience, suggesting another way to use YouTube: as a video-centered social network for small groups.

Other videos seem intended for a different kind of small, fixed audience: recorded classes from pandemic-era virtual instruction, school board meetings and work meetings. While not what most people think of as social uses, they likewise imply that their creators have a different expectation about the audience for the videos than creators of the kind of content people see in their recommendations.

Fuel for the AI machine

It was with this broader understanding that we read The New York Times exposé on how OpenAI and Google turned to YouTube in a race to find new troves of data to train their large language models. An archive of YouTube transcripts makes an extraordinary dataset for text-based models.

There is also speculation, fueled in part by an evasive answer from OpenAI’s chief technology officer Mira Murati, that the videos themselves could be used to train AI text-to-video models such as OpenAI’s Sora.

The New York Times story raised concerns about YouTube’s terms of service and, of course, the copyright issues that pervade much of the debate about AI. But there’s another problem: How could anyone know what an archive of more than 14 billion videos, uploaded by people all over the world, actually contains? It’s not entirely clear that Google knows or even could know if it wanted to.

Kids as content creators

We were surprised to find an unsettling number of videos featuring kids or apparently created by them. YouTube requires uploaders to be at least 13 years old, but we frequently saw children who appeared to be much younger than that, typically dancing, singing or playing video games.

In our preliminary research, our coders determined nearly a fifth of random videos with at least one person’s face visible likely included someone under 13. We didn’t take into account videos that were clearly shot with the consent of a parent or guardian.

Our current sample size of 250 is relatively small – we are working on coding a much larger sample – but the findings thus far are consistent with what we’ve seen in the past. We’re not aiming to scold Google. Age validation on the internet is infamously difficult and fraught, and we have no way of determining whether these videos were uploaded with the consent of a parent or guardian. But we want to underscore what is being ingested by these large companies’ AI models.

Small reach, big influence

It’s tempting to assume OpenAI is using highly produced influencer videos or TV newscasts posted to the platform to train its models, but previous research on large language model training data shows that the most popular content is not always the most influential in training AI models. A virtually unwatched conversation between three friends could have much more linguistic value in training a chatbot language model than a music video with millions of views.

Unfortunately, OpenAI and other AI companies are quite opaque about their training materials: They don’t specify what goes in and what doesn’t. Most of the time, researchers can infer problems with training data through biases in AI systems’ output. But when we do get a glimpse at training data, there’s often cause for concern. For example, Human Rights Watch released a report on June 10, 2024, that showed that a popular training dataset includes many photos of identifiable kids.

The history of big tech self-regulation is filled with moving goal posts. OpenAI in particular is notorious for asking for forgiveness rather than permission and has faced increasing criticism for putting profit over safety.

Concerns over the use of user-generated content for training AI models typically center on intellectual property, but there are also privacy issues. YouTube is a vast, unwieldy archive, impossible to fully review.

Models trained on a subset of professionally produced videos could conceivably be an AI company’s first training corpus. But without strong policies in place, any company that ingests more than the popular tip of the iceberg is likely including content that violates the Federal Trade Commission’s Children’s Online Privacy Protection Rule, which prevents companies from collecting data from children under 13 without notice.

With last year’s executive order on AI and at least one promising proposal on the table for comprehensive privacy legislation, there are signs that legal protections for user data in the U.S. might become more robust.

When the Wall Street Journal’s Joanna Stern asked OpenAI CTO Mira Murati whether OpenAI trained its text-to-video generator Sora on YouTube videos, she said she wasn’t sure.

Have you unwittingly helped train ChatGPT?

The intentions of a YouTube uploader simply aren’t as consistent or predictable as those of someone publishing a book, writing an article for a magazine or displaying a painting in a gallery. But even if YouTube’s algorithm ignores your upload and it never gets more than a couple of views, it may be used to train models like ChatGPT and Gemini.

As far as AI is concerned, your family reunion video may be just as important as those uploaded by influencer giant Mr. Beast or CNN.The Conversation

Ryan McGrady, Senior Researcher, Initiative for Digital Public Infrastructure, UMass Amherst and Ethan Zuckerman, Associate Professor of Public Policy, Communication, and Information, UMass Amherst

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post AI companies train language models on YouTube’s archive − making family-and-friends videos a privacy risk appeared first on theconversation.com

The Conversation

Wildfire smoke’s health risks can linger long-term in homes that escape burning

Published

on

theconversation.com – Colleen E. Reid, Associate Professor of Geography, University of Colorado Boulder – 2024-12-23 11:00:00

The Marshall Fire spared some homes, shown here a day later, but smoke had blanketed the area.

Andy Cross/MediaNews Group/The Denver Post via Getty Images

Colleen E. Reid, University of Colorado Boulder

Three years ago, on Dec. 30, 2021, a wind-driven wildfire raced through two communities just outside Boulder, Colorado. In the span of about eight hours, more than 1,000 homes and businesses burned.

The fire left entire blocks in ash, but among them, pockets of houses survived, seemingly untouched. The owners of these homes may have felt relief at first. But fire damage can be deceiving, as many soon discovered.

When wildfires like the Marshall Fire reach the wildland-urban interface, they are burning both vegetation and human-made materials. Vehicles and buildings burn, along with all of the things inside them – electronics, paint, plastics, furniture.

Research shows that when human-made materials like these burn, the chemicals released are different from what is emitted when just vegetation burns. The smoke and ash can blow under doors and around windows in nearby homes, bringing in chemicals that stick to walls and other indoor surfaces and continue off-gassing for weeks to months, particularly in warmer temperatures.

An aerial view of burned neighborhoods with a few houses standing among burned lots and at the edges of the fire area.

The Marshall Fire swept through several neighborhoods in the towns of Louisville and Superior, Colo. In the homes that were left standing, residents dealt with lingering smoke and ash in their homes.

Michael Ciaglo/Getty Images

In a new study, my colleagues and I looked at the health effects people experienced when they returned to still-standing homes after the Marshall Fire. We also created a checklist for people to use after urban wildfires in the future to help them protect their health and reduce their risks when they return to smoke-damaged homes.

Tests in homes found elevated metals and VOCs

In the days after the Marshall Fire, residents quickly reached out to nearby scientists who study wildfire smoke and health risks at the University of Colorado Boulder and area labs. People wanted to know what was in the ash and causing the lingering smells inside their homes.

In homes we were able to test, my colleagues found elevated levels of metals and PAHs – polycyclic aromatic hydrocarbons – in the ash. We also found elevated VOCs – volatile organic compounds – in airborne samples. Some VOCs, such as dioxins, benzene, formaldehyde and PAHs, can be toxic to humans. Benzene is a known carcinogen.

People wanted to know whether the chemicals that got into their homes that day could harm their health.

At the time, we could find no information about physical health implications for people who have returned to smoke-damaged homes after a wildfire. To look for patterns, we surveyed residents affected by the fire six months, one year and two years afterward.

Symptoms 6 months after the fire

Even six months after the fire, we found that many people were reporting symptoms that aligned with health risks related to smoke and ash from fires.

More than half (55%) of the people who responded to our survey reported that they were experiencing at least one symptom six months after the blaze that they attributed to the Marshall Fire. The most common symptoms reported were itchy or watery eyes (33%), headache (30%), dry cough (27%), sneezing (26%) and sore throat (23%).

All of these symptoms, as well as having a strange taste in one’s mouth, were associated with people reporting that their home smelled differently when they returned to it one week after the fire.

Many survey respondents said that the smells decreased over time. Most attributed the improvement in smell to the passage of time, cleaning surfaces and air ducts, replacing furnace filters, and removing carpet, textiles and furniture from the home. Despite this, many still had symptoms.

We found that living near a large number of burned structures was associated with these health symptoms. For every 10 additional destroyed buildings within 820 feet (250 meters) of a person’s home, there was a 21% increase in headaches and a 26% increase in having a strange taste in their mouth.

These symptoms align with what could be expected from exposure to the chemicals that we found in the ash and measured in the air inside the few smoke-damaged homes that we were able to study in depth.

Lingering symptoms and questions

There are a still a lot of unanswered questions about the health risks from smoke- and ash-damaged homes.

For example, we don’t yet know what long-term health implications might look like for people living with lingering gases from wildfire smoke and ash in a home.

We found a significant decline in the number of people reporting symptoms one year after the fire. However, 33% percent of the people whose homes were affected still reported at least one symptom that they attributed to the fire. About the same percentage also reported at least one symptom two years after the fire.

We also could not measure the level of VOCs or metals that each person was exposed to. But we do think that reports of a change in the smell of a person’s home one week after the fire demonstrates the likely presence of VOCs in the home. That has health implications for people whose homes are exposed to smoke or ash from a wildfire.

Tips to protect yourself after future wildfires

Wildfires are increasingly burning homes and other structures as more people move into the wildland-urban interface, temperatures rise and fire seasons lengthen.

It can be confusing to know what to do if your home is one that survives a wildfire nearby. To help, my colleagues and I put together a website of steps to take if your home is ever infiltrated by smoke or ash from a wildfire.

Here are a few of those steps:

  • When you’re ready to clean your home, start by protecting yourself. Wear at least an N95 (or KN95) mask and gloves, goggles and clothing that covers your skin.

  • Vacuum floors, drapes and furniture. But avoid harsh chemical cleaners because they can react with the chemicals in the ash.

  • Clean your HVAC filter and ducts to avoid spreading ash further. Portable air cleaners with carbon filters can help remove VOCs.

A recent scientific study documents how cleaning all surfaces within a home can reduce reservoirs of VOCs and lower indoor air concentrations of VOCs.

Given that we don’t know much yet about the health harms of smoke- and ash-damaged homes, it is important to take care in how you clean so you can do the most to protect your health.The Conversation

Colleen E. Reid, Associate Professor of Geography, University of Colorado Boulder

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post Wildfire smoke’s health risks can linger long-term in homes that escape burning appeared first on theconversation.com

Continue Reading

The Conversation

In Disney’s ‘Moana,’ the characters navigate using the stars, just like real Polynesian explorers − an astronomer explains how these methods work

Published

on

theconversation.com – Christopher Palma, Teaching Professor, Department of Astronomy & Astrophysics, Penn State – 2024-12-20 07:17:00

Wayfarers around the world have used the stars to navigate the sea.
Wirestock/iStock via Getty Images Plus

Christopher Palma, Penn State

If you have visited an island like one of the Hawaiian Islands, Tahiti or Easter Island, also known as Rapa Nui, you may have noticed how small these land masses appear against the vast Pacific Ocean. If you’re on Hawaii, the nearest island to you is more than 1,000 miles (1,600 kilometers) away, and the coast of the continental United States is more than 2,000 miles (3,200 kilometers) away. To say these islands are secluded is an understatement.

For me, watching the movie “Moana” in 2016 was eye-opening. I knew that Polynesian people traveled between a number of Pacific islands, but seeing Moana set sail on a canoe made me realize exactly how small those boats are compared with what must have seemed like an endless ocean. Yet our fictional hero went on this journey anyway, like the countless real-life Polynesian voyagers upon which she is based.

Oceania as shown from the ISS
Islands in Polynesia can be thousands of miles apart.
NASA

As an astronomer, I have been teaching college students and visitors to our planetarium how to find stars in our sky for more than 20 years. As part of teaching appreciation for the beauty of the sky and the stars, I want to help people understand that if you know the stars well, you can never get lost.

U.S. Navy veterans learned the stars in their navigation courses, and European cultures used the stars to navigate, but the techniques of Polynesian wayfinding shown in Moana brought these ideas to a very wide audience.

The movie Moana gave me a new hook – pun not intended – for my planetarium shows and lessons on how to locate objects in the night sky. With “Moana 2” out now, I am excited to see even more astronomy on the big screen and to figure out how I can build new lessons using the ideas in the movie.

The North Star

Have you ever found the North Star, Polaris, in your sky? I try to spot it every time I am out observing, and I teach visitors at my shows to use the “pointer stars” in the bowl of the Big Dipper to find it. These two stars in the Big Dipper point you directly to Polaris.

If you are facing Polaris, then you know you are facing north. Polaris is special because it is almost directly above Earth’s North Pole, and so everyone north of the equator can see it year-round in exactly the same spot in their sky.

It’s a key star for navigation because if you measure its height above your horizon, that tells you how far you are north of Earth’s equator. For the large number of people who live near 40 degrees north of the equator, you will see Polaris about 40 degrees above your horizon.

If you live in northern Canada, Polaris will appear higher in your sky, and if you live closer to the equator, Polaris will appear closer to the horizon. The other stars and constellations come and go with the seasons, though, so what you see opposite Polaris in the sky will change every month.

Look for the Big Dipper to find the North Star, Polaris.

You can use all of the stars to navigate, but to do that you need to know where to find them on every night of the year and at every hour of the night. So, navigating with stars other than Polaris is more complicated to learn.

Maui’s fishhook

At the end of June, around 11 p.m., a bright red star might catch your eye if you look directly opposite from Polaris. This is the star Antares, and it is the brightest star in the constellation Scorpius, the Scorpion.

If you are a “Moana” fan like me and the others in my family, though, you may know this group of stars by a different name – Maui’s fishhook.

If you are in the Northern Hemisphere, Scorpius may not fully appear above your horizon, but if you are on a Polynesian island, you should see all of the constellation rising in the southeast, hitting its highest point in the sky when it is due south, and setting in the southwest.

Astronomers and navigators can measure latitude using the height of the stars, which Maui and Moana did in the movie using their hands as measuring tools.

The easiest way to do this is to figure out how high Polaris is above your horizon. If you can’t see it at all, you must be south of the equator, but if you see Polaris 5 degrees (the width of three fingers at arm’s length) or 10 degrees above your horizon (the width of your full fist held at arm’s length), then you are 5 degrees or 10 degrees north of the equator.

The other stars, like those in Maui’s fishhook, will appear to rise, set and hit their highest point at different locations in the sky depending on where you are on the Earth.

Polynesian navigators memorized where these stars would appear in the sky from the different islands they sailed between, and so by looking for those stars in the sky at night, they could determine which direction to sail and for how long to travel across the ocean.

Today, most people just pull out their phones and use the built-in GPS as a guide. Ever since “Moana” was in theaters, I see a completely different reaction to my planetarium talks about using the stars for navigation. By accurately showing how Polynesian navigators used the stars to sail across the ocean, Moana helps even those of us who have never sailed at night to understand the methods of celestial navigation.

The first “Moana” movie came out when my son was 3 years old, and he took an instant liking to the songs, the story and the scenery. There are many jokes about parents who dread having to watch a child’s favorite over and over again, but in my case, I fell in love with the movie too.

Since then, I have wanted to thank the storytellers who made this movie for being so careful to show the astronomy of navigation correctly. I also appreciated that they showed how Polynesian voyagers used the stars and other clues, such as ocean currents, to sail across the huge Pacific Ocean and land safely on a very small island thousands of miles from their home.The Conversation

Christopher Palma, Teaching Professor, Department of Astronomy & Astrophysics, Penn State

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post In Disney’s ‘Moana,’ the characters navigate using the stars, just like real Polynesian explorers − an astronomer explains how these methods work appeared first on theconversation.com

Continue Reading

The Conversation

Listening for the right radio signals could be an effective way to track small drones

Published

on

theconversation.com – Iain Boyd, Director of the Center for National Security Initiatives and Professor of Aerospace Engineering Sciences, University of Colorado Boulder – 2024-12-17 17:28:00

Small drones can be hard to track at night.
Kevin Carter/Getty Images

Iain Boyd, University of Colorado Boulder

The recent spate of unidentified drone sightings in the U.S., including some near sensitive locations such as airports and military installations, has caused significant public concern.

Some of this recent increase in activity may be related to a September 2023 change in U.S. Federal Aviation Administration regulations that now allow drone operators to fly at night. But most of the sightings are likely airplanes or helicopters rather than drones.

The inability of the U.S. government to definitively identify the aircraft in the recent incidents, however, has some people wondering, why can’t they?

I am an engineer who studies defense systems. I see radio frequency sensors as a promising approach to detecting, tracking and identifying drones, not least because drone detectors based on the technology are already available. But I also see challenges to using the detectors to comprehensively spot drones flying over American communities.

How drones are controlled

Operators communicate with drones from a distance using radio frequency signals. Radio frequency signals are widely used in everyday life such as in garage door openers, car key fobs and, of course, radios. Because the radio spectrum is used for so many different purposes, it is carefully regulated by the Federal Communications Commission.

Drone communications are only allowed in narrow bands around specific frequencies such as at 5 gigahertz. Each make and model of a drone uses unique communication protocols coded within the radio frequency signals to interpret instructions from an operator and to send data back to them. In this way, a drone pilot can instruct the drone to execute a flight maneuver, and the drone can inform the pilot where it is and how fast it is flying.

Identifying drones by radio signals

Radio frequency sensors can listen in to the well-known drone frequencies to detect communication protocols that are specific to each particular drone model. In a sense, these radio frequency signals represent a unique fingerprint of each type of drone.

In the best-case scenario, authorities can use the radio frequency signals to determine the drone’s location, range, speed and flight direction. These radio frequency devices are called passive sensors because they simply listen out for and receive signals without taking any active steps. The typical range limit for detecting signals is about 3 miles (4.8 kilometers) from the source.

These sensors do not represent advanced technology, and they are readily available. So, why haven’t authorities made wider use of them?

Drones were all the buzz in the Northeast at the end of 2024.

Challenges to using radio frequency sensors

While the monitoring of radio frequency signals is a promising approach to detecting and identifying drones, there are several challenges to doing so.

First, it’s only possible for a sensor to obtain detailed information on drones that the sensor knows the communication protocols for. Getting sensors that can detect a wide range of drones will require coordination between all drone manufacturers and some central registration entity.

In the absence of information that makes it possible to decode the radio frequency signals, all that can be inferred about a drone is a rough idea of its location and direction. This situation can be improved by deploying multiple sensors and coordinating their information.

Second, the detection approach works best in “quiet” radio frequency environments where there are no buildings, machinery or people. It’s not easy to confidently attribute the unique source of a radio frequency signal in urban settings and other cluttered environments. Radio frequency signals bounce off all solid surfaces, making it difficult to be sure where the original signal came from. Again, the use of multiple sensors around a particular location, and careful placement of those sensors, can help to alleviate this issue.

Third, a major part of the concern over the inability to detect and identify drones is that they may be operated by criminals or terrorists. If drone operators with malicious intent know that an area targeted for a drone operation is being monitored by radio frequency sensors, they may develop effective countermeasures. For example, they may use signal frequencies that lie outside the FCC-regulated parameters, and communication protocols that have not been registered. An even more effective countermeasure is to preprogram the flight path of a drone to completely avoid the use of any radio frequency communications between the operator and the drone.

Finally, widespread deployment of radio frequency sensors for tracking drones would be logistically complicated and financially expensive. There are likely thousands of locations in the U.S. alone that might require protection from hostile drone attacks. The cost of deploying a fully effective drone detection system would be significant.

There are other means of detecting drones, including radar systems and networks of acoustic sensors, which listen for the unique sounds drones generate. But radar systems are relatively expensive, and acoustic drone detection is a new technology.

The way forward

It was almost guaranteed that at some point the problem of unidentified drones would arise. People are operating drones more and more in regions of the airspace that have previously been very sparsely populated.

Perhaps the recent concerns over drone sightings are a wake-up call. The airspace is only going to become much more congested in the coming years as more consumers buy drones, drones are used for more commercial purposes, and air-taxis come into use. There’s only so much that drone detection technologies can do, and it might become necessary for the FAA to tighten regulation of the nation’s airspace by, for example, requiring drone operators to submit detailed flight plans.

In the meantime, don’t be too quick to assume those blinking lights you see in the night sky are drones.The Conversation

Iain Boyd, Director of the Center for National Security Initiatives and Professor of Aerospace Engineering Sciences, University of Colorado Boulder

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post Listening for the right radio signals could be an effective way to track small drones appeared first on theconversation.com

Continue Reading

Trending