fbpx
Connect with us

The Conversation

AI plus gene editing promises to shift biotech into high gear

Published

on

theconversation.com – Marc Zimmer, Professor of Chemistry, Connecticut College – 2024-06-06 07:47:02

AI plus gene editing promises to shift biotech into high gear

AI knowledge combined with gene-editing precision the way to dial-a-protein.
KTSFotos/Moment via Getty Images

Marc Zimmer, Connecticut College

During her chemistry Nobel Prize lecture in 2018, Frances Arnold said, “ we can for all practical purposes read, write and edit any sequence of DNA, but we cannot compose it.” That isn't true anymore.

Since then, science and technology have progressed so much that artificial intelligence has learned to compose DNA, and with genetically modified bacteria, scientists are on their way to designing and making bespoke proteins.

The goal is that with AI's designing talents and gene editing's engineering abilities, scientists can modify bacteria to act as mini factories producing new proteins that can reduce greenhouse gases, digest plastics or act as species-specific pesticides.

Advertisement

As a chemistry professor and computational chemist who studies molecular science and environmental chemistry, I believe that advances in AI and gene editing make this a realistic possibility.

Gene sequencing – reading life's recipes

All living things contain genetic materials – DNA and RNA – that provide the hereditary information needed to replicate themselves and make proteins. Proteins constitute 75% of human dry weight. They make up muscles, enzymes, hormones, blood, hair and cartilage. Understanding proteins means understanding much of biology. The order of nucleotide bases in DNA, or RNA in some viruses, encodes this information, and genomic sequencing technologies identify the order of these bases.

The Human Genome Project was an international effort that sequenced the entire human genome from 1990 to 2003. Thanks to rapidly improving technologies, it took seven years to sequence the first 1% of the genome and another seven years for the remaining 99%. By 2003, scientists had the complete sequence of the 3 nucleotide base pairs coding for 20,000 to 25,000 genes in the human genome.

However, understanding the functions of most proteins and correcting their malfunctions remained a .

Advertisement

AI learns proteins

Each protein's shape is critical to its function and is determined by the sequence of its amino acids, which is in turn determined by the gene's nucleotide sequence. Misfolded proteins have the wrong shape and can cause illnesses such as neurodegenerative diseases, cystic fibrosis and Type 2 diabetes. Understanding these diseases and developing treatments requires knowledge of protein shapes.

Before 2016, the only way to determine the shape of a protein was through X-ray crystallography, a laboratory technique that uses the diffraction of X-rays by single crystals to determine the precise arrangement of atoms and molecules in three dimensions in a molecule. At that time, the structure of about 200,000 proteins had been determined by crystallography, costing billions of dollars.

AlphaFold, a machine learning program, used these crystal structures as a training set to determine the shape of the proteins from their nucleotide sequences. And in less than a year, the program calculated the protein structures of all 214 million genes that have been sequenced and published. The protein structures AlphaFold determined have all been released in a freely available database.

To effectively address noninfectious diseases and design new drugs, scientists need more detailed knowledge of how proteins, especially enzymes, bind small molecules. Enzymes are protein catalysts that enable and regulate biochemical reactions.

Advertisement
AI system AlphaFold3 allows scientists to make intricately detailed models of 's molecular machinery.

AlphaFold3, released May 8, 2024, can predict protein shapes and the locations where small molecules can bind to these proteins. In rational drug design, drugs are designed to bind proteins involved in a pathway related to the disease being treated. The small molecule drugs bind to the protein binding site and modulate its activity, thereby influencing the disease path. By being able to predict protein binding sites, AlphaFold3 will enhance researchers' drug capabilities.

AI + CRISPR = composing new proteins

Around 2015, the development of CRISPR technology revolutionized gene editing. CRISPR can be used to find a specific part of a gene, change or delete it, make the cell express more or less of its gene product, or even add an utterly foreign gene in its place.

In 2020, Jennifer Doudna and Emmanuelle Charpentier received the Nobel Prize in chemistry “for the development of a method (CRISPR) for genome editing.” With CRISPR, gene editing, which once took years and was species specific, costly and laborious, can now be done in days and for a fraction of the cost.

AI and genetic engineering are advancing rapidly. What was once complicated and expensive is now routine. Looking ahead, the dream is of bespoke proteins designed and produced by a combination of machine learning and CRISPR-modified bacteria. AI would design the proteins, and bacteria altered using CRISPR would produce the proteins. Enzymes produced this way could potentially breathe in carbon dioxide and methane while exhaling organic feedstocks, or break down plastics into substitutes for concrete.

Advertisement

I believe that these ambitions are not unrealistic, given that genetically modified organisms already account for 2% of the U.S. economy in agriculture and pharmaceuticals.

Two groups have made functioning enzymes from scratch that were designed by differing AI . David Baker's Institute for Protein Design at the of Washington devised a new deep-learning-based protein design strategy it named “family-wide hallucination,” which they used to make a unique light-emitting enzyme. Meanwhile, biotech startup Profluent, has used an AI trained from the sum of all CRISPR-Cas knowledge to design new functioning genome editors.

If AI can learn to make new CRISPR systems as well as bioluminescent enzymes that work and have never been seen on Earth, there is hope that pairing CRISPR with AI can be used to design other new bespoke enzymes. Although the CRISPR-AI combination is still in its infancy, once it matures it is likely to be highly beneficial and could even the world tackle climate change.

It's important to remember, however, that the more powerful a technology is, the greater the risks it poses. Also, humans have not been very successful at engineering nature due to the complexity and interconnectedness of natural systems, which often to unintended consequences.The Conversation

Marc Zimmer, Professor of Chemistry, Connecticut College

Advertisement

This article is republished from The Conversation under a Creative Commons license. Read the original article.

The Conversation

Federal funding for major science agencies is at a 25-year low

Published

on

theconversation.com – Chris Impey, University Distinguished Professor of Astronomy, University of Arizona – 2024-06-28 07:19:14
for science has traditionally been bipartisan, but fights over spending have affected research .
AP Photo/J. Scott Applewhite

Chris Impey, University of Arizona

funding for science is usually immune from political gridlock and polarization in Congress. But, federal funding for science is slated to drop for 2025.

Science research dollars are considered to be discretionary, which means the funding has to be approved by Congress every year. But it's in a budget category with larger entitlement programs like Medicare and Social Security that are generally considered untouchable by politicians of both parties.

Federal investment in scientific research encompasses everything from large telescopes supported by the National Science Foundation to NASA satellites studying climate change, programs studying the use and governance of artificial intelligence at the National Institute of Standards and Technology, and research on Alzheimer's disease funded by the National Institutes of Health.

Advertisement

Studies show that increasing federal research spending benefits productivity and economic competitiveness.

I'm an astronomer and also a senior university administrator. As an administrator, I've been involved in lobbying for research funding as associate dean of the College of Science at the University of Arizona, and in encouraging government investment in astronomy as a vice president of the American Astronomical Society. I've seen the importance of this kind of funding as a researcher who has had federal grants for 30 years, and as a senior academic who helps my colleagues write grants to support their valuable work.

Bipartisan support

Federal funding for many programs is characterized by political polarization, meaning that partisanship and ideological divisions between the two main political parties can lead to gridlock. Science is usually a rare exception to this problem.

The public shows strong bipartisan support for federal investment in scientific research, and Congress has generally followed suit, passing bills in 2024 with bipartisan backing in April and June.

Advertisement

The House passed these bills, and after reconciliation with language from the Senate, they resulted in final bills to direct US$460 billion in government spending.

However, policy documents produced by Congress reveal a partisan split in how Democratic and Republican lawmakers reference scientific research.

Congressional committees for both sides are citing more scientific papers, but there is only a 5% overlap in the papers they cite. That means that the two parties are using different evidence to make their funding decisions, rather than working from a scientific consensus. Committees under Democratic control were almost twice as likely to cite technical papers as panels led by Republicans, and they were more likely to cite papers that other scientists considered important.

Ideally, all the best ideas for scientific research would receive federal funds. But limited support for scientific research in the United States means that for individual scientists, getting funding is a highly competitive process.

Advertisement

At the National Science Foundation, only 1 in 4 proposals are accepted. Success rates for funding through the National Institutes of Health are even lower, with 1 in 5 proposals getting accepted. This low success rate means that the agencies have to reject many proposals that are rated excellent by the merit review process.

Scientists are often reluctant to publicly advocate for their programs, in part because they feel disconnected from the policymaking and appropriations process. Their academic doesn't equip them to communicate effectively to legislators and policy experts.

Budgets are down

Research received steady funding for the past few decades, but this year Congress reduced appropriations for science at many top government agencies.

Advertisement

The National Science Foundation budget is down 8%, which led agency leaders to warn Congress that the country may lose its ability to attract and train a scientific workforce.

The cut to the NSF is particularly disappointing since Congress promised it an extra $81 billion over five years when the CHIPS and Science Act passed in 2022. A deal to limit government spending in exchange for suspending the debt ceiling made the 's goals hard to achieve.

NASA's science budget is down 6%, and the budget for the National Institutes of Health, whose research aims to prevent disease and improve public health, is down 1%. Only the Department of Energy's Office of Science got a bump, a modest 2%.

As a result, the major science agencies are nearing a 25-year low for their funding levels, as a share of U.S. gross domestic product.

Advertisement

Feeling the squeeze

Investment in research and development by the business sector is strongly increasing. In 1990, it was slightly higher than federal investment, but by 2020 it was nearly four times higher.

The distinction is important because business investment tends to focus on later stage and applied research, while federal funding goes to pure and exploratory research that can have enormous downstream benefits, such as for quantum computing and fusion power.

There are several causes of the science funding squeeze. Congressional intentions to increase funding levels, as with the CHIPS and Science Act, and the earlier COMPETES Act in 2007, have been derailed by fights over the debt limit and threats of government shutdowns.

The CHIPS act aimed to spur investment and job creation in semiconductor manufacturing, while the COMPETES Act aimed to increase U.S competitiveness in a wide range of high-tech industries such as exploration.

Advertisement
The CHIPS and Science act aims to stimulate semiconductor production in the U.S. and fund research.

The budget caps for fiscal years 2024 and 2025 remove any possibility for growth. The budget caps were designed to rein in federal spending, but they are a very blunt tool. Also, nondefense discretionary spending is only 15% of all federal spending. Discretionary spending is up for a vote every year, while mandatory spending is dictated by prior laws.

Entitlement programs like Medicare, and Social Security are mandatory forms of spending. Taken together, they are three times larger than the amount available for discretionary spending, so science has to fight over a small fraction of the overall budget pie.

Within that 15% slice, scientific research competes with K-12 education, ' , public health, initiatives for small businesses, and more.

Global competition

While government science funding in the U.S. is stagnant, America's main scientific rivals are rising fast.

Advertisement

Federal R&D funding as a percentage of GDP has dropped from 1.2% in 1987 to 1% in 2010 to under 0.8% currently. The United States is still the world's biggest spender on research and development, but in terms of government R&D as a fraction of GDP, the United States ranked 12th in 2021, behind South Korea and a set of European countries. In terms of science researchers as a portion of the labor force, the United States ranks 10th.

Meanwhile, America's main geopolitical rival is rising fast. China has eclipsed the United States in high-impact papers published, and China now spends more than the United States on university and government research.

If the U.S. wants to keep its status as the world leader in scientific research, it'll need to redouble its commitment to science by appropriately funding research.The Conversation

Chris Impey, University Distinguished Professor of Astronomy, University of Arizona

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Advertisement

Read More

The post Federal funding for major science agencies is at a 25-year low appeared first on .com

Continue Reading

The Conversation

AI companies train language models on YouTube’s archive − making family-and-friends videos a privacy risk

Published

on

theconversation.com – Ryan McGrady, Senior Researcher, Initiative for Digital Public Infrastructure, UMass Amherst – 2024-06-27 07:23:53
Your kid's silly could be fodder for ChatGPT.
Halfpoint/iStock via Getty Images

Ryan McGrady, UMass Amherst and Ethan Zuckerman, UMass Amherst

The promised artificial intelligence revolution requires data. Lots and lots of data. OpenAI and Google have begun using YouTube to train their text-based AI models. But what does the YouTube archive actually include?

Our team of digital media researchers at the University of Amherst collected and analyzed random samples of YouTube videos to learn more about that archive. We published an 85-page paper about that dataset and set up a website called TubeStats for researchers and journalists who need basic information about YouTube.

Now, we're taking a closer look at some of our more surprising findings to better understand how these obscure videos might become part of powerful AI . We've found that many YouTube videos are meant for personal use or for small groups of people, and a significant proportion were created by who appear to be under 13.

Advertisement

Bulk of the YouTube iceberg

Most people's experience of YouTube is algorithmically curated: Up to 70% of the videos users watch are recommended by the site's algorithms. Recommended videos are typically popular content such as influencer stunts, news clips, explainer videos, travel vlogs and video reviews, while content that is not recommended languishes in obscurity.

Some YouTube content emulates popular creators or fits into established genres, but much of it is personal: celebrations, selfies set to music, homework assignments, video game clips without context and kids dancing. The obscure side of YouTube – the vast majority of the estimated 14.8 billion videos created and uploaded to the platform – is poorly understood.

Illuminating this aspect of YouTube – and social media generally – is difficult because big tech companies have become increasingly hostile to researchers.

We've found that many videos on YouTube were never meant to be shared widely. We documented thousands of short, personal videos that have few views but high engagement – likes and comments – implying a small but highly engaged audience. These were clearly meant for a small audience of friends and family. Such social uses of YouTube contrast with videos that try to maximize their audience, suggesting another way to use YouTube: as a video-centered social network for small groups.

Advertisement

Other videos seem intended for a different kind of small, fixed audience: recorded classes from pandemic-era virtual instruction, school board meetings and work meetings. While not what most people think of as social uses, they likewise imply that their creators have a different expectation about the audience for the videos than creators of the kind of content people see in their recommendations.

Fuel for the AI machine

It was with this broader understanding that we read The New York Times exposé on how OpenAI and Google turned to YouTube in a race to find new troves of data to train their large language models. An archive of YouTube transcripts makes an extraordinary dataset for text-based models.

There is also speculation, fueled in part by an evasive answer from OpenAI's chief technology officer Mira Murati, that the videos themselves could be used to train AI text-to-video models such as OpenAI's Sora.

Advertisement

The New York Times story raised concerns about YouTube's terms of service and, of course, the copyright issues that pervade much of the debate about AI. But there's another problem: How could anyone know what an archive of more than 14 videos, uploaded by people all over the world, actually contains? It's not entirely clear that Google knows or even could know if it wanted to.

Kids as content creators

We were surprised to find an unsettling number of videos featuring kids or apparently created by them. YouTube requires uploaders to be at least 13 years old, but we frequently saw children who appeared to be much younger than that, typically dancing, singing or playing video .

In our preliminary research, our coders determined nearly a fifth of random videos with at least one person's face visible likely included someone under 13. We didn't take into account videos that were clearly shot with the consent of a parent or guardian.

Our current sample size of 250 is relatively small – we are working on coding a much larger sample – but the findings thus far are consistent with what we've seen in the past. We're not aiming to scold Google. Age validation on the internet is infamously difficult and fraught, and we have no way of determining whether these videos were uploaded with the consent of a parent or guardian. But we want to underscore what is being ingested by these large companies' AI models.

Advertisement

Small reach, big influence

It's tempting to assume OpenAI is using highly produced influencer videos or TV newscasts posted to the platform to train its models, but previous research on large language model training data shows that the most popular content is not always the most influential in training AI models. A virtually unwatched conversation between three friends could have much more linguistic value in training a chatbot language model than a music video with millions of views.

Unfortunately, OpenAI and other AI companies are quite opaque about their training materials: They don't specify what goes in and what doesn't. Most of the time, researchers can infer problems with training data through biases in AI systems' output. But when we do get a glimpse at training data, there's often cause for concern. For example, Human Rights Watch released a report on June 10, 2024, that showed that a popular training dataset includes many photos of identifiable kids.

The history of big tech self-regulation is filled with moving goal posts. OpenAI in particular is notorious for asking for forgiveness rather than permission and has increasing criticism for putting profit over safety.

Concerns over the use of user-generated content for training AI models typically center on intellectual property, but there are also privacy issues. YouTube is a vast, unwieldy archive, impossible to fully review.

Advertisement

Models trained on a subset of professionally produced videos could conceivably be an AI company's first training corpus. But without strong policies in place, any company that ingests more than the popular tip of the iceberg is likely including content that violates the Federal Trade Commission's Children's Online Privacy Protection Rule, which prevents companies from collecting data from children under 13 without notice.

With last year's executive order on AI and at least one promising proposal on the table for comprehensive privacy legislation, there are signs that legal protections for user data in the U.S. might become more robust.

When the Wall Street Journal's Joanna Stern asked OpenAI CTO Mira Murati whether OpenAI trained its text-to-video generator Sora on YouTube videos, she said she wasn't sure.

Have you unwittingly helped train ChatGPT?

The intentions of a YouTube uploader simply aren't as consistent or predictable as those of someone publishing a book, writing an article for a magazine or displaying a painting in a gallery. But even if YouTube's algorithm ignores your upload and it never gets more than a couple of views, it may be used to train models like ChatGPT and Gemini.

As far as AI is concerned, your family reunion video may be just as important as those uploaded by influencer giant Mr. Beast or CNN.The Conversation

Ryan McGrady, Senior Researcher, Initiative for Digital Public Infrastructure, UMass Amherst and Ethan Zuckerman, Associate Professor of Public Policy, Communication, and Information, UMass Amherst

Advertisement

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post AI companies train language models on YouTube's archive − making family-and-friends videos a privacy risk appeared first on theconversation.com

Advertisement
Continue Reading

The Conversation

Lucy, discovered 50 years ago in Ethiopia, stood just 3.5 feet tall − but she still towers over our understanding of human origins

Published

on

theconversation.com – Denise Su, Associate Professor of Human Evolution and Social Change, Arizona – 2024-06-27 07:23:34
The reconstructed skeleton of Lucy, found in Hadar, Ethiopia, in 1974, and Grace Latimer, then age 4, daughter of a research team member.
James St. John/Flickr, CC BY

Denise Su, Arizona State University

In 1974, on a survey in Hadar in the remote badlands of Ethiopia, U.S. paleoanthropologist Donald Johanson and graduate student Tom Gray found a piece of an elbow joint jutting from the dirt in a gully. It proved to be the first of 47 bones of a single individual – an early human ancestor whom Johanson nicknamed “Lucy.” Her discovery would overturn what scientists thought they knew about the evolution of our own lineage.

Lucy was a member of the species Australopithecus afarensis, an extinct hominin – a group that includes humans and our fossil relatives. Australopithecus afarensis lived from 3.8 million years ago to 2.9 million years ago, in the region that is now Ethiopia, Kenya and Tanzania. Dated to 3.2 million years ago, Lucy was the oldest and most complete human ancestor ever found at the time of her discovery.

Two features set humans apart from all other primates: big brains and standing and walking on two legs instead of four. Prior to Lucy's discovery, scientists thought that our large brains must have evolved first, because all known human fossils at the time already had large brains. But Lucy stood on two feet and had a small brain, not much larger than that of a chimpanzee.

Advertisement

This was immediately clear when scientists reconstructed her skeleton in Cleveland, Ohio. A photographer took a picture of 4-year-old Grace Latimer – who was visiting her father, Bruce Latimer, a member of the research team – standing next to Lucy. The two were roughly the same size, providing a simple illustration of Lucy's small stature and brain. And Lucy was not a young child: Based on her teeth and bones, scientists estimated that she was fully adult when she died.

The also demonstrated how human Lucy was – especially her posture. Along with the 1978 discovery in Tanzania of fossilized footprint trails 3.6 million years old, made by members of her species, Lucy proved unequivocally that standing and walking upright was the first step in becoming human. In fact, large brains did not show up in our lineage until well over 1 million years after Lucy lived.

A human spine and pelvis, with brown fossilized bones and modern white replacements.
Part of Lucy's reconstructed skeleton, on display at the Cleveland of Natural History in 2006.
James St. John/Flickr, CC BY

Lucy's bones show adaptations that allow for upright posture and bipedal locomotion. In particular, her femur, or upper leg bone, is angled; her spine is S-curved; and her pelvis, or hip bone, is short and bowl-shaped.

These features can also be found in modern human skeletons. They allow us, as they enabled Lucy, to stand, walk and on two legs without falling over – even when balanced on one in mid-stride.

In the 50 years since Lucy's discovery, her impact on scientists' understanding of human origins has been immeasurable. She has inspired paleoanthropologists to survey unexplored , pose new hypotheses and develop and use novel techniques and methodologies.

Advertisement

Even as new fossils are discovered, Lucy remains central to modern research on human origins. As an anthropologist and paleoecologist, I know that she is still the reference point for understanding the anatomy of early human ancestors and the evolution of our own bodies. Knowledge of the human fossil record and the evolution of our lineage have exponentially increased, building on the foundation of Lucy's discovery.The Conversation

Denise Su, Associate Professor of Human Evolution and Social Change, Arizona State University

This article is republished from The Conversation under a Creative Commons license. Read the original article.

Read More

The post Lucy, discovered 50 years ago in Ethiopia, stood just 3.5 feet tall − but she still towers over our understanding of human origins appeared first on .com

Advertisement
Continue Reading

News from the South

Trending