Insufficient natural language processing in the United States
September 23, 2022 05:44 Source: "China Social Sciences" September 23, 2022 Issue 2498 Author: Wang Youran/Compilation

 bet365 live casino games The official website of Stanford University in the United States recently introduced the school's "people -oriented" artificial intelligence research institute Researcher Jazmia Henry, an African -American vernacular corpus library,It helps to improve the tolerance of the natural language processing model。

African -American vernacular English refers to English spoken by African Americans,It in grammar、Vocabulary、The formal level of accent is lower than that of standard English; and standard English refers to the most formal and authoritative English in English countries,For example, education at school、Language Evaluation、English used in the official publishers。In the United States,Users of standard English enjoy the convenience brought by natural language processing models,For bet365 live casino games example, voice navigation function、Digital Assistant、voice recognition software。African Americans may not be able to enjoy the same treatment,Because the existing large natural language processing model cannot understand or generates vocabulary in African Americans' vernacular English。The worse thing is,The data of these models often comes from the Internet,and the Internet is full of race errors and stereotypes。When a misplaced model is used for assisting important decisions,African -American vernacular English users may encounter social media use rights limits、When purchasing a house or applying for a loan, I was rejected、Discrimination such as unfair treatment in the judicial system。For this risk,Henry created an open source corpus that included more than 141,000 African Americans' vernacular English vocabulary,Satellite to help scientific researchers and model designers integrate the complexity and value of African Americans in vernacular and English into the natural language processing model。

When talking about the motivation of creating Bet365 app download a corpus,Henry said,My parents occasionally speaking in English -based Denta Dialects and dialects in the southeast coastal region of the United States,Others can understand or discriminate against,But she can feel that African Americans have a sense of shame in vernacular English -if you speak this language outside the African community,It will be regarded as a low level of intelligence。After engaging in data science research,Henry discovered a common natural language processing model not only cannot help African American groups,It will even bring discrimination。These models are usually difficult to understand or generate African American vernacular English,and include negative associations about African -American in standard English,Therefore, it will deepen the stereotypes of African -American。After being commercialized,These models and the errors attached to their bands may cause various institutions to make decisions that are unfavorable to African Americans。

Henry's initial idea is to directly add data from African Americans and English to the natural Bet365 lotto review language processing model,But I encountered many obstacles。African Americans evolve too fast,And words often are very different from standard English。For example,"MAD" in standard English is often used as an adjective,means "crazy" and "angry"; "MAD" in vernacular English in African Americans is often used as adverb,means "very"。and,The significance of words in African Americans in vernacular English depends to a large extent on the scene、Speaker、Sound,These are all natural language processing models that cannot be taken into account。

Final,Henry decided to create an African -American vernacular English corpus。This corpus is divided into four parts according to the source of the text。"Lyrics" part comes from 15,000 songs from 105 African American artists; the "Leadership" part comes from the speech from well -known African -American Americans,For example, the leader of the civil rights movement Martin Luther King、Dordose and women's rights advocate Sojourner Truth、The current Supreme Court of the United States, Ketanji Brown Bet365 app download Jackson (Ketanji Brown Jackson); "Book" part from African -American book history archives from American universities,The most difficult collection of this part of the corpus,Because the representativeness of African Americans in literary classics is very low; the "social media" part comes from the video transcription text of the leader of African Americans on social media、Blog articles、tweet,This part of this part of the corpus is rich and diversified。

Henry talk about,Some of the currently commonly used natural language processing models are full of errors,Enterprises are also trying to reduce the use of these models,But the risk slow release instead of error mitigation is often followed by it。Enterprises sometimes choose not to touch African Americans or nothing to do with African -American,Not trying to find a solution。In Henry's view,In order not to continue the damage to African -American,Now it is urgent to develop better models、Improvement process、Exploring the new method of processed African Americans and bet365 Play online games Americans。"I hope social linguistics and computing linguistics、Anthropology、Scholars in various fields such as computer science carefully examine this new language library,Use it to carry out research,Test its limit,to make it fully reflects African Americans' vernacular English,Provide feedback and algorithm research and development recommendations。"Henry said。

  (Wang Youran/Compilation

Editor in charge: Changchang
QR code icon 2.jpg
Key recommendation
The latest article
Graphics
bet365 live casino games
Video

Friendship link:

Website filing number: Jinggong.com Anmi 11010502030146 Ministry of Industry and Information Technology:

All rights reserved by China Social Sciences Magazine shall not be reprinted and used without permission

General Editor Email: zzszbj@126.com This website contact information: 010-85886809 Address: Building 11-12, Building 1, Building 1, No. 15, Guanghua Road, Chaoyang District, Beijing: 100026