过去几年,人们对使用机器学习 (ML) 方法研究非人类动物(以下简称“动物”)行为的兴趣激增 ( 1)。一个引起特别关注的话题是使用深度学习和其他方法解码动物通信系统 ( 2 )。现在是时候应对有关数据可用性、模型验证和研究伦理的挑战,并抓住建立跨学科和倡议合作的机会了。
研究人员必须通过观察和实验来推断动物信号的含义或功能(3)。这是一项具有挑战性的任务,尤其是因为动物使用多种通信方式,包括视觉、听觉、触觉、化学和电信号——通常结合在一起,超出了人类的感知能力。观察工作的重点是记录感兴趣的信号以及详细的上下文信息,包括信号发送者和接收者的身份、状态和行为、他们的关系和过去的交互以及相关的环境条件。某些信号类型可能仅在某些情况下产生,从而引发特定的行为反应;一个典型的例子是黑长尾猴(Chlorocebus pygerythrus当它发现捕食者时会发出警报,导致群体成员寻求庇护。建立这种相关性使得能够制定关于信号功能的假设,然后可以通过实验来测试这些假设(例如,通过受控回放)。
机器学习正在帮助绘制和理解使用工具的夏威夷乌鸦( Corvus hawaiiensis )的声音。照片:圣地亚哥动物园野生动物联盟
按照这种方法,数十年的仔细研究在理解动物交流方面取得了重大进展 ( 3 )。但存在相当大的挑战,例如避免数据收集和解释中的人类中心偏见、处理不断增加的数据量、绘制动物信号行为的全部复杂性以及实现全面的功能解码。机器学习提供了一些潜在的解决方案。
可以使用日益强大的机器学习方法的丰富工具包来研究动物信号,这些方法在建模目标、数据要求和对专家注释的依赖方面各不相同。除其他方法外,这包括监督学习(例如,用于确定哪些特征准确预测人类标记的信号类型)以及无监督和自监督学习(例如,用于发现个体、群体或群体的信号库)。
自监督深度学习方法(4)很有趣,因为它们既不需要带注释的数据集,也不需要与通信潜在相关的预定义特征。它们也是“基础模型”的基础,能够跨任务进行出色的概括(5)。例如,经过训练以根据给定单词序列预测下一个单词的大型语言模型随后可用于执行更复杂的任务,例如推断语言单元的句法类别和之间的关系,或生成现实文本(5)。
可以集成不同数据模态的方法对于促进功能解码似乎特别有前途,因为它们可以提供对通信事件的更全面的描述。ML 模型已经开发出来,可以有效地学习将图像与单词、单词与语音以及跨其他模态组合的内容链接起来 ( 5 ),并且这种方法可以有效地应用于动物研究系统,例如,通过将发声与特定行为相关联。机器学习将有效地协助检测跨模式关联(和结构)这一具有挑战性的任务,而跨模式关联(和结构)反过来又可以为验证实验的设计提供信息,以建立因果关系(见图)。
由于许多机器学习方法最初是为自然语言处理而开发的,因此令人兴奋的途径已经开始开放,用于探索人类语言和动物通信系统之间备受争议的潜在相似性 ( 6 )。观察和实验工作表明,至少有一些动物,例如南方斑鸫(Turdoides bicolor),表现出人类语言特有的一些顺序敏感性和组合性(7)。机器学习方法可以利用大型数据集来搜索传统方法无法发现的微妙性和复杂性,从而有可能扩展不同类群之间共享的已知通信特征集。
越来越多的研究正在利用机器学习的潜力来研究动物交流,包括大型合作项目,例如地球物种计划 (ESP);跨尺度沟通与协调(CCAS);人类、动物和机器人之间的声音互动(VIHAR);种间互联网;CETI 项目(鲸类翻译计划)最近为抹香鲸 ( Physeter macrocephalus ) 通信的机器学习辅助工作提供了详细的路线图 ( 2 )。尽管应对这一重大研究挑战的努力正在明显加强,但该领域至少面临两个与数据相关的主要障碍:大多数方法需要大量数据(4),并且单一模态(例如发声)的录音不足以进行功能解码;需要额外的背景信息,包括有关动物行为和环境的信息。
大量音频和视频数据保存在社区档案馆(例如麦考利图书馆或 xeno-canto)中,通过被动记录阵列积累,或者可以从互联网上抓取。挖掘这些数据源将为动物交流的丰富性提供令人着迷的一瞥,但就其本身而言,此类工作不太可能在解码信号功能方面取得突破。这主要是因为通常缺乏有关发送者和接收者的身份和状态以及特定通信上下文的可靠信息。
某些类群可以获得高质量的数据集,从而能够快速实现核心模型开发目标。但很明显,需要社区动员和适当的资源配置,以确保物种专家充分参与现有记录的注释和解释,并能够领导有针对性的工作,在实验室和实地大规模收集新数据。对于野生动物,可以使用一系列方法来收集合适的数据集,包括观察焦点对象、自主摄像头和录音机、无人机和机器人以及动物可穿戴设备(生物记录仪)。一些生物记录设备可以同时收集同一个人的音频和身体运动数据,为多模态 ML 模型提供有价值的输入(见图)。
这次旅行可能是最大的回报。在广泛的分类单元上训练机器学习的镜头可能会发现动物交流行为中先前隐藏的复杂性程度令人惊讶。许多似乎只使用少数基本叫声类型的物种可能会拥有丰富的发声能力,而那些以复杂的交流方式而闻名的物种可能会表现得更加令人印象深刻。最近的一项研究强调了机器学习的发现潜力,该研究探讨了斑马雀 ( Taeniopygia guttata ) 发声行为的个体和群体差异 ( 8)。
使用多模式数据和实验来理解动物信号
机器学习 (ML) 方法可用于集成发送者、接收者和通信环境的信息,揭示可能为有关信号功能的假设提供信息的模式,进而为受控实验的设计提供信息。机器学习辅助的动物交流研究可能会产生重要的好处,例如改善动物保护和福利,但也面临着挑战;解决道德问题是当务之急。
图:K. HOLOSKI/科学
机器学习能够在不同的分类群中生成声音(或其他信号)输出的系统清单,这将实现前所未有的比较分析,帮助研究人员查明进化驱动因素、基因组特征、生活史相关性以及认知和感觉基础。不同的通信系统。同时,个体受试者的纵向记录可以揭示沟通技巧是如何产生和成熟的(9)。
但也许最重要的是,这一领域的进步可以促进动物保护和福利。例如,在夏威夷乌鸦(Corvus hawaiiensis)等极度濒危物种中,与历史基线数据进行比较可以生成有关种群瓶颈如何改变发声能力的详细记录,从而可能导致沟通能力下降(10);可以想象,失去的与健康高度相关的叫声,例如那些与觅食、求爱或反捕食者行为有关的叫声,可能会被重新引入。此外,人们越来越认识到,社会传播的信息可能会影响种群的生存能力(11 ),虎鲸( Orcinus orca)的觅食专门化就说明了这一点。)(12)。如果声音方言可以被确立为“文化标记”,那么机器学习方法将能够自动绘制社会人口结构图,并识别面临失去关键知识风险的动物群体。
机器学习还可以用于识别与压力、不适、疼痛和逃避或积极状态(例如兴奋和玩耍)相关的动物信号。这可以为改善牲畜和其他圈养动物的生活条件提供动力,甚至可以通过对野生种群进行分析来衡量人为压力源的影响。目前,生态“声景”分析主要集中在物种检测上,但应该可以在景观层面上聆听动物的福利(13)。这个想法可以通过超越通信来进一步发展,例如,通过开发机器学习工具来检查卫星记录的动物运动轨迹,以发现疾病、痛苦或人类回避的特征。
尽管有许多潜在的好处,但机器学习辅助的动物交流研究提出了重大的伦理问题,例如在什么情况下可以接受对野生动物进行回放实验。先进的聊天机器人可能使研究人员能够在信号功能完全被理解之前与动物进行交流,从而可能造成意想不到的伤害。例如,向野生座头鲸(Megaptera novaeangliae)广播发声可能会无意中引发洋盆范围内歌唱行为的变化。这些问题必须正面解决,而不是事后才考虑。迫切需要跨利益相关者协商来制定最佳实践指南和适当的立法框架(14)。
未来还有其他挑战和机遇。例如,重要的是协调现有举措的研究工作,并加强专家在动物交流、追踪、保护和福利方面的参与。尽管技术进步迅速,该领域的进展将继续取决于对每个研究物种的生物学的仔细考虑、对交流环境的详细了解以及受控行为实验(3 )。这种专业知识对于通知和验证机器学习分析以及加强数据解释和收集工作至关重要。专业协会和网络可以帮助协调包容性社区驱动的合作。
应使用数据收集和实验验证相对简单的研究系统来开发工作流程。在圈养环境中,研究人员可以确保出色的实验控制以及最高的道德和福利标准;好的模型包括啮齿动物、蝙蝠和鸟类。这些工作可以通过对某些物种已有的广泛实地数据集的分析来补充。一旦建立了方法,就可以谨慎地将它们应用于研究难以观察的野生动物这一更具挑战性的问题。
目前机器学习的发展速度异常快。除了使用深度学习方法之外,还可以尝试其他机器学习框架,例如强化学习和元学习(即从其他机器学习模型的输出中学习)。随着模型的开发,正式的“基准测试”将是提高分析管道的可靠性和效率的关键(15),尽管必须采取保障措施以防止滥用开放资源,例如试图干扰、杀戮或武器化动物。
机器学习有可能在我们对动物通信系统的理解方面产生革命性的进步,揭示出难以想象的丰富性和复杂性。但至关重要的是,未来的进步必须用于造福于所研究的动物。
Using machine learning to decode animal communication
New methods promise transformative insights and conservation benefits
CHRISTIAN RUTZ , MICHAEL BRONSTEIN, AZA RASKIN, SONJA C. VERNES, KATHERINE ZACARIAN, AND DAMIÁN E. BLASIfewerAuthors Info & Affiliations
SCIENCE
13 Jul 2023
Vol 381, Issue 6654
pp. 152-155
DOI: 10.1126/science.adg7314
Supplementary Materials
References and Notes
eLetters (0)
Information & Authors
Metrics & Citations
View Options
Media
Share
The past few years have seen a surge of interest in using machine learning (ML) methods for studying the behavior of nonhuman animals (hereafter “animals”) (1). A topic that has attracted particular attention is the decoding of animal communication systems using deep learning and other approaches (2). Now is the time to tackle challenges concerning data availability, model validation, and research ethics, and to embrace opportunities for building collaborations across disciplines and initiatives.
Researchers must infer the meaning, or function, of animal signals through observation and experimentation (3). This is a challenging task, not least because animals use a wide range of communication modalities, including visual, acoustic, tactile, chemical, and electrical signals—often in conjunction, and beyond humans’ perceptive capabilities. Observational work focuses on recording the signals of interest as well as detailed contextual information, including the identity, state, and behavior of both the senders and receivers of signals, their relationships and past interactions, and relevant environmental conditions. Some signal types may be produced only under certain circumstances, eliciting a specific behavioral response; a classic example is a vervet monkey (Chlorocebus pygerythrus) giving an alarm call when it spots a predator, which causes group members to seek shelter. Establishing such correlations enables the formulation of hypotheses about signal function that can then be tested experimentally (e.g., with controlled playbacks.
上传失败,网络异常。
重试
Machine learning is helping to map and understand the vocal repertoire of tool-using Hawaiian crows (Corvus hawaiiensis).PHOTO: SAN DIEGO ZOO WILDLIFE ALLIANCE
Following this approach, decades of careful research have produced major advances in understanding animal communication (3). But there are considerable challenges, such as avoiding anthropocentric biases in data collection and interpretation, processing ever-increasing volumes of data, charting the full complexity of animals’ signaling behavior, and achieving comprehensive functional decoding. ML offers some potential solutions.
SIGN UP FOR THE SCIENCE eTOC
Get the latest table of contents from Science delivered right to you!
Animal signals can be investigated using a rich toolkit of increasingly powerful ML methods, which vary in their modeling objectives, data requirements, and reliance on expert annotation. This includes, among other approaches, supervised learning (e.g., for determining which features accurately predict human-labeled signal types) and unsupervised and self-supervised learning (e.g., for discovering signal repertoires of an individual, group, or population).
Self-supervised deep-learning methods (4) are of interest because they require neither annotated datasets nor predefining features that are potentially relevant for communication. They are also the basis of “foundation models,” which are capable of remarkable generalizations across tasks (5). For example, large language models that have been trained to predict the next word from a given sequence of words can subsequently be used to carry out much more complex tasks, such as inferring the syntactic classes of, and relations between, linguistic units, or generating realistic text (5).
Methods that can integrate different data modalities seem particularly promising for facilitating functional decoding because they can provide a fuller account of communication events. ML models have been developed that efficiently learn to link images to words, words to speech, and content across other modality combinations (5), and this approach could be applied productively to animal study systems, for example, by correlating vocalizations with specific behaviors. ML would effectively assist with the challenging task of detecting cross-modal associations (and structure) that can, in turn, inform the design of validation experiments to establish causality (see the figure).
Because many ML methods were originally developed for natural language processing, exciting avenues have started opening up for exploring the much-debated potential similarities between human language and animal communication systems (6). Observations and experimental work suggest that at least some animals, such as southern pied babblers (Turdoides bicolor), exhibit some of the order sensitivity and compositionality that are characteristic of human language (7). ML approaches could leverage large datasets to search for subtlety and complexity that elude traditional methods, potentially expanding the known set of communication features shared across divergent taxa.
There is a growing number of studies that are exploiting the potential of ML for investigating animal communication, including large collaborative initiatives, such as the Earth Species Project (ESP); Communication and Coordination Across Scales (CCAS); Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR); Interspecies Internet; and Project CETI (Cetacean Translation Initiative), which recently provided a detailed roadmap for ML-assisted work on sperm whale (Physeter macrocephalus) communication (2). Although efforts to tackle this grand research challenge are clearly intensifying, the field faces at least two main data-related obstacles: Most methods require vast amounts of data (4), and recordings of a single modality (e.g., vocalizations) are insufficient for functional decoding; additional context is required, including information on the animals’ behavior and environment.
Large volumes of audio and video data are held in community-sourced archives (such as the Macaulay Library or xeno-canto), are being accumulated by passive recording arrays, or can be scraped from the internet. Mining these data sources will provide fascinating glimpses of the richness of animal communication, but on its own, such work is unlikely to achieve breakthroughs in decoding signal function. This is chiefly because robust information on the identities and states of the senders and receivers, and the specific communication context, is usually lacking.
High-quality datasets are available for some taxa, enabling swift progress with core model-development objectives. But it is clear that community mobilization and appropriate resourcing are required to ensure that species experts are fully involved in the annotation and interpretation of existing recordings and can lead targeted efforts to collect new data at scale, in both the laboratory and field. For wild animals, a range of methods can be used to collect suitable datasets, including observation of focal subjects, autonomous cameras and audio recorders, drones and robots, and animal wearables (bio-loggers). Some bio-logging devices can collect audio and body-motion data simultaneously for the same individual, providing valuable input for multimodal ML models (see the figure).
The journey could be the biggest reward. Training the lens of ML on a broad range of taxa will likely uncover surprising degrees of previously hidden complexity in animals’ communicative behavior. Many of the species that appear to use only a handful of basic call types may turn out to possess rich vocal repertoires, and those that are renowned for their sophisticated communication may be shown to be more impressive still. There are early indications of the discovery potential of ML, as highlighted by a recent study that explored individual and group differences in the vocal behavior of zebra finches (Taeniopygia guttata) (8).
Using multimodal data and experiments to understand animal signals
Machine-learning (ML) methods can be used to integrate information on sender, receiver, and communication context, revealing patterns that may inform hypotheses about signal function and, in turn, the design of controlled experiments. ML-assisted research on animal communication will likely generate important benefits, such as improving animal conservation and welfare, but is not without its challenges; addressing ethical concerns is a top priority.
重试
GRAPHIC: K. HOLOSKI/SCIENCE
The ability of ML to produce systematic inventories of vocal (or other signaling) output across a diverse range of taxa will enable unprecedented comparative analyses, helping researchers to pinpoint the evolutionary drivers, genomic signatures, life-history correlates, and cognitive and sensory foundations of different communication systems. At the same time, longitudinal recordings for individual subjects could reveal how communication skills arise and mature (9).
But perhaps most importantly, advances in this field could boost animal conservation and welfare. For example, in critically endangered species, such as the Hawaiian crow (Corvus hawaiiensis), comparisons with historical baseline data could generate a detailed record of how population bottlenecks have altered vocal repertoires, potentially leading to impoverished communicative capabilities (10); lost calls of high fitness relevance, such as those involved in foraging, courtship, or antipredator behavior, could then conceivably be reintroduced. Furthermore, there is increasing recognition that socially transmitted information may affect population viability (11), as illustrated by foraging specializations in killer whales (Orcinus orca) (12). Where vocal dialects can be established as “cultural markers,” ML approaches would enable automated mapping of social population structure and identification of animal groups at risk of losing critical knowledge.
ML could also be used to identify animal signals that are associated with stress, discomfort, pain, and evasion, or with positive states, such as arousal and playfulness. This could provide momentum for improving the living conditions of livestock and other captive animals and may even enable the assaying of wild populations to measure the impact of anthropogenic stressors. Ecological “soundscape” analyses are, at present, largely focused on species detection, but it should be possible to listen in on animals’ welfare at the landscape level (13). This idea could be developed further by looking beyond communication, for example, by developing ML tools that can examine satellite-recorded animal movement tracks for signatures of disease, distress, or human avoidance.
Despite manifold potential benefits, ML-assisted research on animal communication raises major ethical questions, such as under what circumstances it is acceptable to conduct playback experiments with wild animals. Advanced chatbots may enable researchers to initiate communication with animals before signal function is fully understood, potentially causing unintended harm. For example, broadcasting vocalizations to wild humpback whales (Megaptera novaeangliae) could inadvertently trigger changes in singing behavior on an ocean-basin scale. These issues must be tackled head-on and not as an afterthought. Cross-stakeholder consultation is urgently needed to develop best-practice guidelines and appropriate legislative frameworks (14).
Other challenges and opportunities lie ahead. For example, it is important to coordinate research efforts across existing initiatives and to enhance engagement of experts on animal communication, tracking, conservation, and welfare. Despite rapid technological advances, progress in this field will continue to depend on careful consideration of each study species’ biology, a detailed knowledge of communication contexts, and controlled behavioral experiments (3). This expertise is essential for informing and validating ML analyses and intensifying data interpretation and collection efforts. Professional societies and networks could help coordinate inclusive community-driven collaborations.
Workflows should be developed using study systems where data collection and experimental validation are relatively straightforward. In captive settings, researchers can ensure excellent experimental control as well as the highest ethical and welfare standards; good models include rodents, bats, and birds. Such work can be complemented with analyses of extensive field datasets that are already available for some species. Once methods have been established, they can be cautiously applied to the considerably more challenging problem of studying difficult-to-observe wild animals.
Present developments in ML are exceptionally fast paced. Beyond the use of deep-learning methods, there is scope for trialing other ML frameworks, such as reinforcement learning and meta-learning (i.e., learning from the output of other ML models). As models are developed, formal “benchmarking” will be key to improving the reliability and efficiency of analysis pipelines (15), although safeguards must be put in place to prevent the misuse of open resources, such as attempts to disturb, kill, or weaponize animals.
ML holds the potential to generate transformative advances in our understanding of animal communication systems, uncovering unimagined degrees of richness and sophistication. But it is essential that future advances are used to benefit the animals being studied.
Supplementary Materials
This PDF file includes:
Author Contributions
Competing Interests
Funding Sources
DOWNLOAD
915.74 KB
The authors thank V. Baglione, D. Canestrari, M. Cusimano, F. Effenberger, D. Gruber, M. Hagiwara, B. Hoffman, M. Johnson, S. Keen, I. Kuksa, J. Lawton, J. Y. Liu, M. Miron, R. Patchett, L. Rendell, R. Seyfarth, P. Shaw, and additional colleagues from ESP and Project CETI for discussion, comments, and support. Conflicts of interest and funding sources are listed in the supplementary materials.
References and Notes
1
D. Tuia et al., Nat. Commun. 13, 792 (2022).
CROSSREF
PUBMED
GOOGLE SCHOLAR
2
J. Andreas et al., iScience 25, 104393 (2022).
CROSSREF
PUBMED
GOOGLE SCHOLAR
3
J. W. Bradbury, S. L. Vehrencamp, Principles of Animal Communication (Sinauer, 2011).
GOOGLE SCHOLAR
4
I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, 2016).
GOOGLE SCHOLAR
5
R. Bommasani et al., arXiv:2108.07258 (2022).
GOOGLE SCHOLAR
6
A. R. Fishbein et al., Philos. Trans. R. Soc. London Ser. B 375, 20190042 (2020).
CROSSREF
PUBMED
ISI
GOOGLE SCHOLAR
7
S. Engesser, S. W. Townsend, WIREs Cogn. Sci. 10, e1493 (2019).
CROSSREF
GOOGLE SCHOLAR
8
J. Goffinet et al., eLife 10, e67855 (2021).
CROSSREF
PUBMED
ISI
GOOGLE SCHOLAR
9
B. C. Roy et al., Proc. Natl. Acad. Sci. U.S.A. 112, 12663 (2015).
CROSSREF
PUBMED
ISI
GOOGLE SCHOLAR
10
A. M. Tanimoto et al., Anim. Behav. 123, 427 (2017).
CROSSREF
ISI
GOOGLE SCHOLAR
11
P. Brakes et al., Science 363, 1032 (2019).
CROSSREF
PUBMED
ISI
GOOGLE SCHOLAR
12
A. D. Foote et al., Nat. Commun. 7, 11693 (2016).
CROSSREF
PUBMED
GOOGLE SCHOLAR
13
B. Williams et al., Ecol. Indic. 140, 108986 (2022).
CROSSREF
ISI
GOOGLE SCHOLAR
14
O. R. Wearn, R. Freeman, D. M. P. Jacoby et al., Nat. Mach. Intell.1, 72 (2019).
CROSSREF
GOOGLE SCHOLAR
15
M. Hagiwara et al., in 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2023), pp. 1–5.
https://www.science.org/doi/10.1126/science.adg7314
转自:“再建巴别塔”微信公众号
如有侵权,请联系本站删除!