"Pre -training" language model,That is,GPT) A language model of deep learning。"Pre -training" language model provides efficiency for artificial intelligence to use natural language with the public、Convenient channels,Its representative product is Openai's ChatGPT,Other companies have also launched their own products。ChatGPT shows a powerful language application ability,And thereby extended the potential of empowerment in all areas of human society。With the "pre -training" language model upgraded to a higher version,combined with other smart devices in the form of "GPT+",So as to integrate into the daily scenes of various industries -it may become a kind of electricity、"General Technology" that affects all aspects of human society like the Internet。In this process,Linguistics also has some kind of strength that affects scientific and technological development -after all, research in the field of natural language treatment in artificial intelligence also needs to be promoted by systematic linguistics analysis methods。The current "pre -training" language model is mainly based on text language,But the human language also includes a large number of body language。"Pre -training" language model.,You need to add body language to the content of "pre -training"。
Bet365 app download
Body language is different from the abstract symbol language system expressed in text,but a body language system expressed by action,You can pass through your posture、Gesture、Facial Bet365 lotto review expression、Eyes and other specific actions to use it alone,You can also pass the tone、Tubbles and other forms in conjunction with words and words,During the communication process, the intention of the subject。Ni Haisu divides the "language" into three types: situation language、Sound and text。Although the body language is mainly a "situation language",but also in tone、Top tone and other forms are included in "sound language"。Sometimes,People’s communication does not even need words,Only eye communication is needed -all kinds of intentions are in the changes of the eyes。Smart devices want to master human language completely,"Pre -training" language model requires learning and mastering human limb language。
Language is a tool to convey intention。Bronislaw Kaspar Malinowski proposed,"The most original function of the language is as a way of behavior,Corresponding sign of thought "。From a broad sense,As long as it can convey the main intention in a relatively stable form,can be regarded as a "language" form。Back to the history of human language development,In fact, body language is a far more old than symbolic language、More expressive and more delicate communication form。Create a text symbol,is one of the iconic nodes of the history of human civilization evolution。Because the text symbol is more advanced、Complex language form,The amount of information carried is greater than the body language,and improved words cross -regional、Display ability across time。but but,This communication method of body language,can often be more rich in briefing、intuitive、Simply emotional information,Bet365 app download Let the other person feel the emotion of the emperor directly。Body language has internalized into a human expression system,Can be more personal、Express your emotions in line with instinct,So that we sometimes speak across the phone,Draw with your hands habitually,and various facial expressions show。
expression of text language
As a "pre -training" language model,ChatGPT currently can only communicate with text language composed of text symbols。But,French analysis of French has its limitations,If you only pay attention to the sentence itself, not supplemented by factors such as the expression and physical movement of the other person,Then the information interpreted is likely to be one -sided、Disabled。People in the daily face -to -face communication process,In order to fully understand the meaning of various rhetoric contained in the other party's words,Often also needs to be judged in combination with the body language of the other party。For example,When the other party says "You are too talented",If the eyes can not be combined with the other party、Top tone,Sometimes it is difficult to judge whether the other party is praising or having a bitter,Is it positive expression or negative expression,Therefore, it is impossible to take the corresponding strategy to flexibly adjust the next conversation content,and make corresponding response behaviors for this。So,While human beings carry out daily dialogue,Usually tend to closely observe the specific body language of the other person in real time,to confirm the true semantics of the other party that you speculate。
Natural language is full Bet365 app download of various rhetoric ingredients,Rhetoric methods such as "anti -language", "irony", "exaggeration" have brought about content beyond the content of expression,Must be combined with the context to determine semantics。For example,When parents criticize lying children, "You are really smart"、Heat -covered people complained in the language phenomenon such as "hot death",The specific significance of words such as "smart" and "dead" is not the original meaning。In the language environment of pure text,It can only allow the "pre -training" language model to speculate the real semantics by identifying the grammar connections between the each part of the sentence。For example,By identifying the word "smart" to connect "anti -language",By adding a "anti -language" label to the sentence,Repeat the sentence "You are really smart"。This,The real semantics of the aforementioned sentence can be derived through "children", "lying", "sweaty" and other tags。We read the ironic language in classic works such as "The Romance of the Three Kingdoms" and "Dream of Red Mansions",It is performed by using this path of "text recognition -intention derivation",When training artificial intelligence processing text information,It is also adopting this type of mode。
"Pre -training" should cover body language content
If artificial intelligence wants to enter the daily application scenario in a more adequate anthropomorphic form,It is necessary to have a more "understanding" language model as the kernel to support。If you want to make smart devices fully understand human language,The "pre -training" of the language model needs to be expanded from Bet365 lotto review static text to dynamic video,Observe the physical movement of the person、Facial expression and other body language。Humans have a very rich facial expression,Just express emotions,can be roughly divided into anger、Happy、frustration、excitement、Surprise、Like、Avastic、Farewell、Pinghe and other types。The expression of anger can be subdivided into different types from anger to anger,Involved frown、Squeeze Eye、Combination and conversion of a series of actions such as moving lips。Outstanding actors are often through rich、Variable expressions,and the tone used for dialogue、The delicateness of the tone、Precise grasp,To fully show superb acting skills。Emotional information contained in the facial expressions in order to rigorously and scientific understanding,Psychologists have been trying to conduct research from the perspective of image and psychology since the beginning of the 20th century。For example,1918 Herbert Sidney Langfeld has published a research paper "Judgment of Emotions from Face Emotions"。Now there are three -dimensional data collection equipment,It can be comprehensive to the field through facial modeling、Inquiry of the system。but but,Voice recognition currently has some technical problems (such as "robustness" problem,and the identification distortion problem brought by the distance) needs to be dealt with。
GPT developed to version 4.0, which has initially possessed the "digital visual" ability,Able to identify many target objects in the image and associate them in a certain logical order。For example,The ingredients in a kitchen Bet365 lotto review photo、Identification of ingredients and utensils,Then give the corresponding cooking recipe。If the future version can be further analyzed by the form of frame -by -frame recognition video,So it is equivalent to the "digital eye"。This means that the "pre -training" language model is improved with the ability of "digital vision" ability to communicate with humans,Not only can you communicate by text,Can also recognize human expressions、A communication with body language such as figure。After further improvement of the voice acquisition equipment,"Pre -training" language model can also pass through the tone、Tone and other forms,Grasp the human intention more complete and accurately。Theoretically speaking,Grasp the body language in the form of "pre -training" to grasp the insurmountable obstacles of body language。Different from large corpus library with text language,Understanding the body language only needs to model limb limbs and facial expressions based on computer vision,Use language analysis to analyze the tone、Classification definition of the tone,can cover most of the basic content。
To sum up,Body language is an important factor that should be considered by the "pre -training" language model,The corpus dependent on the smart device cannot only include text content,You need to consider the content of the limb language,and further upgrade the "digital vision" ability that can identify the physical movement of the object。This,Artificial intelligence can understand human language more fully,Therefore, communicate more efficiently and accurately。
(The author is a professor at the School of Fine Arts and Design of Yangzhou University)
Friendship link:
Website filing number: Jinggong.com Anmi 11010502030146 Ministry of Industry and Information Technology:
All rights reserved by China Social Sciences Magazine shall not be reprinted and used without permission
General Editor Email: zzszbj@126.com This website contact information: 010-85886809 Address: Building 1, Building, No. 15, Guanghua Road, Chaoyang District, Beijing: 100026
>