The national corpus is a major cultural resource
November 08, 2023 10:54 Source: "China Social Sciences", November 8, 2023, Issue 2768, Issue 2768

The General Office of the Central Committee of the Central Committee of the Communist Party of China in April 2022 The "National" Fourteenth Five -Year Period "philosophical and social science development plan" was emphasized,To promote big data、Cloud computing、Application of artificial intelligence, etc. in the field of philosophy and social sciences,Cross -penetration and integration innovation of social science and natural science,Further expand the connotation of the subject,Innovative research methods and technical means。For a country,Language is an important historical resource、Cultural Resources、Language Real Life Resources。The corpus is a database for research and use of a large number of language information integrated by a large number of language information,Basic information carrying national language and culture,Record the history of language and culture development。

Current,The corpus has been widely used in Chinese teaching、Language Study、Language and text specification standard formulation、Dictionary compilation、Language information processing and many aspects。In terms of Chinese teaching,The corpus can provide a basis for the formulation of the outline of the Chinese teaching,Provides rich teaching materials for Chinese teaching,Improve the scientific nature of Chinese teaching。In terms of language research,The corpus can provide real language material for language ontology research,The discovery and summary of the theoretical point of view,At the same time, it also provides data support for verification of language theory。In terms of language specification standards,The corpus can directly serve the survey of the real use of the language,You can also analyze and verify the formulated standards and standards。In terms of dictionary compilation,The corpus can provide bet365 Play online games candidate words,Example sentences that provide naturally use,Provide real language materials for inductive interpretation,Avoid the personal intuition and experience of the compilation of the dictionary too dependent on the editors,to improve the quality of the dictionary。In terms of language information processing,Natural language processing technology has become the key common technology of new generation of artificial intelligence,Breakthrough of natural language processing technology except for the development of new algorithms,High -quality、Deep processing、The corpus that reflects the latest linguistics theory is also essential。It can be said,The corpus plays an increasingly important role in multiple fields。

Many countries regard corpus construction as an important basic project,and the national corpus library。For example,British National Language Library (BNC) began construction in 1991,Completed the first edition in 1994,Since then released the second edition in 2001 and 2017、Third Edition,The scale is 100 million words。National Language Library (ANC) began planning in 1998,Published the first edition in 2003 (scale 11.1 million words),released the second edition in 2005 (scale 22 million words),Since 2006, we will focus on the construction of open corpus (OANC) and artificial labeling submissions (MASC)。The construction of the National Language Library directly benchmark British National Library,The common part of the corpus is also designed as 100 million words,It is still under construction。The South Korean government launched the "21st Century Sejong Plan" project in 1998,Construction of the Korean National Library (KNC),It has been completed now。,Russia、Hungary、Thailand、Estonia and other countries have also built and released their respective national language libraries。These national corpus libraries are balanced corpus libraries,All cut points on the bet365 live casino games corpus、Label and other processing,It played a positive role in promoting the study of this language。

This can be seen,The national corpus library is the construction and control of the national institution or the designated institution of the state、Have a national level、Major cultural projects with national programs objects。The national corpus should have a large scale、Balanced is good、Comprehensive strong、Dynamic update、Mark rich、Diverse uses、Open sharing、Use convenience and other characteristics,The full picture of the use of the use of the country's words and development and development can be truly reflected。Building my country's national corpus library has become an imminent task。The construction of the Mandarin library started in the 1970s.,Many Chinese corpus has been built at present,For example,These corpus plays a positive role in the education and research of the country's general language。but,Because these corpus is more or less temporary at the beginning of construction、Locality、Short -term、Functional single nature and other restrictions,Lack of long -term consideration and global design,Failure to comprehensively reflect the status quo of the use of the country's words。The problems you need to point out are mainly the following points。

One,Elementary sampling of the corpus,Most of them are written Chinese,Lack of spoken language。For example,At present in a certain language library in the Chinese academic community,The proportion of newspapers and periodicals in its contemporary corpus accounted for more than 70%or more,and speaking less than 0.3%。Some corpus only uses Weibo text of a certain year as a spoken language,Some large -scale corpus libraries are not even included in oral libraries。Actually,In terms of academic value,Oral and language is an indispensable type that reflects the real situation of the language,It is a direct manifestation of language personality。bet365 Play online games In the corpus library of mature countries in some languages,There are a lot of proportion of oral language,This cooperative science。For example,90%of the British National Language Library is a written statement,10%is spoken language; the corpus of 11 million words in the first edition of the national corpus,There are 8 million words in the written Chinese,There are 3 million words in oral and liberal libraries。

Its two,The sample size is not controlled,Corgal library text with the same scale is narrow。Some corpus does not control the sample size,It also affects the balance and representativeness of the corpus。For example,Some corpus is included in the full text of modern and contemporary writers,The standard balanced corpus should avoid excessive literary works of the same author、too long、The proportion is too large,Otherwise, it will affect the balance of the corpus。In this regard,If the British national corpus library only draws 45,000 words from the different parts of a single author。

Its three,No or lack of regular update plans,It is difficult to conduct research based on duration -based balanced corpus。The construction of the time library needs to be designed and planned early,Follow up for a long time。Most corpus currently running still lacks enough attention to this。The national corpus should formulate long -term regular update plans,If the national corpus except the common corpus of 100 million words,It is also planned to increase the scale of 10%every 5 years。

Four,Corgers media are relatively single。The corpus in the existing large -scale corpus in China is mostly text forms,The size of the corpus in the form of multimedia is relatively small,Limited materials,Full picture that cannot be reflected in Chinese,Especially the appearance of fresh speaking。At the forefront of international research, "Multimedia、Multi -modal "corpus research is more and more,But the multimedia of the country's words、bet365 Play online games The construction of multi -mode state library is still relatively lagging。

Five,Insufficient functional function of the corpus application system。If the corpus application system lacks rich functions,It will not be able to provide users with the deserved service,The application value and construction significance of the corpus will also be greatly reduced。In terms of corpus applications,Many foreign language libraries have powerful application platforms,Provided rich application functions,If a case is retrieved、frequency statistics、Find out、Comparison analysis, etc.。CQPweb、Sketch Engine and other Internet corpus application platforms are the mainstream directions of future development,You can use a computer array for complex operations,Provide rich corpus application functions。Compared with the above software and platform,Most of the domestic fabrics only provide case retrieval function,Only a small number of corpus provides simple statistical vocabulary functions,The in -depth research needs of linguistics are far away。Especially in multimedia、Multi -mode state language library data integration query and analysis research,There is still a long way to go from theoretical exploration to actual application software development。

From the long run,If the corpus cannot meet the actual needs of language surveys and research,will become the development of disciplines、Scientific research、Obstacles to exchanges and cooperation。The value and significance of national corpus construction are mainly reflected in the following three aspects: one,The national corpus can more comprehensively reflect the full picture of the use and development of the national language,It is a manifestation of the national soft power,It is also an important resource that needs to be constructed;,National corpus construction helps to fill the blank of the lack of a large state language dynamic balancing bet365 live casino games corpus in the academic community,to better serve language research;,National Corgal Library Construction will promote the development of a number of research work,Large -scale countries in the field of linguistics to describe grammar research、Multi -view survey of language life、Research on the Development of Language Multi -as -Agency、Interactive study of language ontology and language information processing,It can also serve digital humanities in the field of literature, history, philosophy and social sciences、Public Observation, etc.。

In the above background,The Institute of Language of the Chinese Academy of Social Sciences launched the construction of the national corpus。Mature time、Accumulate rich、Master new technologies for acquisition and processing,It is the advantage of this project。But we also clearly realized,This new corpus library that started in the new development stage in the new era,Have a new mission,Facing new challenges。We are facing the language and text of the great Chinese cultural heritage with a history of more than 5,000 years of civilization。How to learn from the international mature language library construction experience,Based on the subjectivity of our language and culture,Establish a corpus classification system based on Chinese language characteristics,The achievement of modern linguistics and the characteristics of my country's cultural characteristics,Is a major challenge。Lift it before,We will face the theoretical task of system research and construction principles,Re -integrate existing resources under the new principle and new standard、Re -collect and sort out difficult operating tasks such as applicable resources,Cultivation of cross -type、task of composite talents。As a "dynamic" corpus,You need to support multiple units、Multi -user collaboration,Dynamic update of the editing process management and content of supporting corpus,Able to achieve Bet365 app download various words under the condition of composite、Statistical analysis data high and transmitted、Low latency response。Such a goal for corpus indexing and query technology、The construction of the corpus application platform is proposed。So,In order to ensure the efficient advancement of the teaching and research of the country's promise,We have the confidence to build a large scale、Balanced is good、Comprehensive strong、Dynamic update、Mark rich、Diverse uses、Open sharing、Use convenient Chinese national language library,Provide better guarantees and support for the education and research of the national general language。

  Bet365 app download

Editor in charge: Zhang Jing
QR code icon 2.jpg
Key recommendation
The latest article
Graphics
bet365 live casino games

Friendship link:

Website filing number: Jinggong.com An Bei 11010502030146 Ministry of Industry and Information Technology:

All rights reserved by China Social Sciences Magazine shall not be reprinted and used without permission

General Editor Email: zzszbj@126.com This website contact information: 010-85886809 Address: 11-12, Building 1, Building 1, No. 15, Guanghua Road, Chaoyang District, Beijing: 100026