Several characteristics of the current computing linguistics development
December 22, 2020 09:17 Source: "China Social Sciences" December 22, 2020, No. 2074 Author: Feng Zhiwei

Computational Linguistics is an emerging cross -disciplinary discipline that uses computer research and processing natural language。During its development process,The study of this discipline is in linguistics、Computer Science、Mathematics、Psychology、Electronic Engineering、Cognitive science and other fields have been performed,Have obvious interdisciplinary features。

  bet365 best casino games

Computing linguistics is a basic work for developing intelligent electronic computers,The study of artificial intelligence is inseparable from the treatment of natural language,Therefore, the study of computing linguistics play a pivotal role in the study of artificial intelligence。

People have already designed many artificial languages ​​for computer software。These artificial languages ​​are the same as natural language,All follow the laws and rules of formal language。A formal language theory proposed by American linguist Jimsky (N. Chomsky),Applicable to artificial language,It is also suitable for natural language。This strong explanation,Both in terms of form description,There is indeed some commonality。

but,Natural language is very different from artificial language at least four aspects: natural language is full of ambiguity,The ambiguity in artificial language can be controlled; the structure of natural language is complex and diverse,The structure of artificial language is relatively simple; the semantic expression of natural language is ever -changing,There is no simple and common description pathway,The semantics of artificial language can be directly defined by people; the structure and semantics of natural language are intricate,Generally there is no corresponding homogeneous relationship,and artificial language can often be bet365 Play online games processed by the structure and semantics,Structure and semantics have a neatly one -one corresponding homogeneous relationship。These unique properties of natural language have,Make the computer processing of natural language a major problem for artificial intelligence。

  Development trends make higher requirements for researchers

Since the 1950s,Scholars at home and abroad have conducted unremitting explorations in this new subject field,Get gratifying results now。The popularity of the Internet today has put forward higher requirements for computer processing levels of natural language,Countries around the world also attach more importance to the study of computing linguistics。The characteristics of the current computing linguistic development are mainly manifested in the following five aspects。

  First,The rationalism method based on sentence -based method -semantic rules is questioned。With the rise of corpus construction and the rise of corpus linguistics,The processing of large -scale real text becomes the main strategic goal of computing linguistics research,Experience methods based on linguistic big data monopolize in the calculation of linguistics。

An important weakness of the rationalist method is manifested in practice。Principles engaged in computing linguistics research use mainstream technology,Rules -based syntax -semantic analysis,Although these application systems have also achieved a certain degree of success in some limited "sub-langege",But if you want to further expand the coverage of these systems,Use them to handle large -scale real text,Still difficult to heavy。Because,From the language knowledge required by the natural language processing system,The huge number of its quantity and the fine particle size,It is far from any system in the past。and,With the number of knowledge owned by the system, there is a huge change in quantity and extent,How to get the system、Representation and management knowledge and other basic issues,bet365 best casino games Must be taken into account。This,In computational linguistics research, there have been questions that respond to large -scale real texts。The construction of the current corpus and the rise of the corpus linguistics,It is an important symbol of calculating the strategic goal transfer of linguistics。With the increasing attention to the handling of large -scale real text,More and more scholars recognize,Analysis method based on corpus -based analysis methods (ie, empirical methods) is at least an important supplement to rule -based analysis methods (ie, rationalism)。Only "large-scale" and "authentic",Corporars are the most ideal language knowledge resources。

This big data -based empirical method also affects the collection of language materials、Sort and process,Promoting the change of linguistic research methods。The study of theoretical linguistics must be based on language facts as based on,Detailed、A large amount of materials,It is possible to draw more reliable conclusions in theory。and the use of computer,greatly reduced people's collection、Labor of sorting and processing corpus。

  Second,More and more in natural language processing uses the method of using machine learning (Machine Learning) to obtain language knowledge,Deep Learning based on neural networks becomes the mainstream method for calculating linguistics。

After entering the 21st century,Calculating Linguistics The empirical tendencies have further accelerated development with amazing steps。This accelerated development is promoted to a large extent by three trends in collaboration with each other。First is to establish a trend with the tab library。The existence of these language resources,greatly promoted people to use monitoring machine learning methods to deal with those traditional and complicated problems,For example, automatic analysis and automatic semantic analysis, etc.。These language resources have also promoted the establishment of a competitive bet365 best casino games evaluation mechanism。Followed by the trend of statistical machine learning。Attention to the growing growth of machine learning,It has led to calculating linguistic researchers and researchers with statistical machine learning more frequently and interact。For support vector machine technology、Maximum entropy technology, and multiple logical regression with them in the form of formal prices、Study on technologies such as Bayesian model,All become standard research and practice activities for computing linguistics。Once again, the trend of the development of high -performance computer system。Wide application of high -performance computer system,Provides favorable conditions for the large -scale training and effectiveness of the machine learning system。

Due to the reliable construction of the library, the library costs high and difficult,This prompts us to use more unsupervised machine learning technology,Let the computer automatically obtain accurate language knowledge from the vast vocal library。Therefore,The construction of the machine dictionary and large -scale corpus has become a hot spot for current computing linguistics。After entering the 21st century,Traditional machine learning methods are further developed into a deep learning method based on neural networks。This deep learning method is independent of specific languages,As long as the language data is enough,You can let the computer automatically learn the various features of the language,and the analysis accuracy also greatly exceeds the traditional method。This is a revolutionary change in calculating the history of linguistics in the history of linguistics。

  Third, mathematical methods are getting more and more attention.

Use manual observation and method of internal province,Obviously it is impossible to obtain accurate and reliable language knowledge from the vast corpus library,Therefore, you must rely on the method of statistical mathematics。

Language model is a mathematical model describing the rules Bet365 app download of natural language,Constructive language model is the core of calculating linguistics research。Language models can be divided into traditional regular language models、Statistics -based language model and deep learning -based language model。Rules -type language model is artificially prepared language rules,These linguistic rules mainly come from linguistic knowledge held by linguists,It has certain subjectivity and one -sidedness,It is difficult to handle large -scale real text。Statistical -based language model is usually a probability model,The probability parameters of the computer use the language statistical model,It can be estimated that the possibility of language component in natural language,instead of simply judging by the rules of linguistics,Therefore, more objective and comprehensive。Language model based on deep learning does not require artificial design language features,Computers automatically obtain language features from big data。This kind of language model based on deep learning is better than probably the probability language statistical model,The effect of machine learning has been greatly improved。

Current,Calculating the deep learning language model in the linguistics is quite mature,The requirements for the mathematical level of researchers are even higher。

  Fourth,Pay more and more attention to the role of vocabulary in natural language processing,A strong "vocabulary" tendency has appeared。

Vocabulary is the main carrier of discourse implementation,The role of grammar is just management meaning、Combined ingredients and constructing words。This tendency to emphasize the role of vocabulary,called "lexicalism",It has had a greater impact on computing linguistics。

Full of ambiguity in natural language,The solution of this problem is not only related to probability and structure,It is often related to the characteristics of vocabulary,It must be solved by vocabulary bet365 Play online games knowledge。Prove to the facts,Although the probability method was used in the calculation linguistics,But when encountering a problem of vocabulary dependence, it often looks stretched,Therefore, you need to explore other improvements,Especially the introduction of vocabulary information in the probability grammar。

Current,The construction of the vocabulary knowledge base has received extensive attention。Construction of various grammar knowledge bases and semantic knowledge bases,All reflect this strong "vocabulary" tendency。

  Fifth,Multi -language online natural language processing technology rapid development。With the advancement of network technology,The Internet gradually becomes a multi -language online world,Machine translation on the Internet、Computing linguistics research such as information retrieval and information extraction becomes more urgent。

In the era of "information explosion",Development of science and technology is new to each other,New information and knowledge spray。At the same time,Due to the rapid increase in the number of non -English users on the Internet,The situation of English on the Internet has been completely broken,The Internet has indeed become "multi -language network world"。"Multi -language" feature makes the Internet rich and colorful,At the same time, it also caused difficulties in communication and communication between different languages。Therefore,Translation between different languages ​​on the Internet is of course more and more urgent。In addition to the study of linguistics in single language,Vigorously developing multi -language computing linguistics research is becoming more and more necessary,How to communicate different natural languages ​​on the Internet has become an important subject for computing linguistics research。

Under such a new situation,Calculating Linguistics The cross and marginality of the discipline appear more prominent,Calculating linguistics researchers are even more likely to bet365 best casino games limit themselves to a certain professional narrow field。If you do not absorb research results and research methods from other related disciplines,Calculating linguistics research will be wrapped forward。The actual needs of the development of linguistics,has put forward higher and wider requirements for relevant scholars。

(Author Unit: Heilongjiang University)

Editor in charge: Zhang Jing
QR code icons 2.jpg
Key recommendation
The latest article
Graphics
bet365 live casino games

Friendship link: Official website of the Chinese Academy of Social Sciences |

Website filing number: Jinggong.com Anmi 11010502030146 Ministry of Industry and Information Technology: Beijing ICP No. 11013869

All rights reserved by China Social Sciences Magazine shall not be reprinted and used without permission

General Editor Email: zzszbj@126.com This website Contact information: 010-85886809 Address: 11-12, Building 1, Building 1, No. 15, Guanghua Road, Chaoyang District, Beijing: 100026