Children's corpus is studying children's behavior、The important foundation of the exploration language obtaining mechanism。1787,German psychologist、The "Baby Behavior Diary" published by philosopher Ditrih Tidman is the earliest book to record children's language。The 1960s,With the development of corpus linguistics and computer technology,The rise of the construction of children's corpus。Okayama corpus built in 1969 is the world's first batch of preschool children's corpus library。According to incomplete statistics,More than 350 preschool children's corpus library。The construction history and development trend of the world's preschool children's corpus,It can be grasped from two levels of corpus technology and corpus content。
bet365 best casino games Construction Technology: "Three Metaphysics"
Throughout the development process of the construction technology of preschool children's corpus library,Especially the development and changes in the past 40 years,Its characteristics can be summarized as "three people"。
First,Digitalization of corpus collection technology。Children's Clarity Collection Technology Experience Text Record、Recording、Three technical stages of video video。In the early days, I mainly used a transcript、Diary method to collect children's corpus; then began to use recording technology,The first to collect children's corpus using recording technology is Japanese Okayama children's corpus library; in the 1970s,Menn children's corpus in the United States first uses video technology to collect children's corpus。1980s to the 1990s,Text、Recording、Video technology is adopted。Enter the 21st century,Traditional text Bet365 app download records and tape recording、Video method gradually micro,Digital audiovisual recording technology has risen and is widely used。The popularity of portable recording equipment allows people to collect children's corpus anytime, anywhere。Statistics,Except for the children's corpus library of the unknown year,20th century to 21st century,The construction ratio of children's corpus library with text records decreases from 10%to 1%,The recording of the recording drops from 42%to 30%,Video has increased from about 28%to 32%。Overall,Children's Clarity Collection Technology has gradually formed the characteristics of digitalization and generalization,also reflects from the plane media、The trend of the development of sound media to multimedia。However, compared to other mature language resources construction,Children's corpus construction to artificial intelligence、The application of advanced technologies such as cloud computing still has a gap。
Second,Diversified corpus acquisition methods。Children's corpus collection method includes natural observation method、Interview method、Experimental Law、Network collection method, etc.。Before the 1970s,Natural observation method is the main and even unique method of corpus collection。Since then,Except for natural observation method,Start using the interview method、Experimental Law and other collected children's corpus。21st Century,With the development of Internet technology,CHCC Children's Calcology, etc., start to gather children's corpus in the network。20th century to 21st century,The application of the natural observation method drops from 81%to 68%,The experimental method increased from 16%to 37%,The application rate of the interview method has also declined significantly。The collection method directly affects the natural Bet365 app download degree and degree of initiative of the language output of the survey object,Contemporary Children's Cordic Library shall use a variety of collection methods and adapt to the development of the digital age。
third,Corgheet labeling gradually consensus。Current children's corpus library labeling presents three development characteristics。One,From manual labeling to automatic labeling。For example,2013,Children's multi -modal spoken library built by Linyi University uses multimodilica labeling software ELAN for text transcription。Its two,From simple language marks to considering discourse labels。Session behavior、speech behavior、Vice Language、Erotic、Language event、Emotional emotion and other codes appear in corpus label specifications,This is the performance of the corpus construction that values children's words。Its three,Lag rules gradually form consensus。Current,The most widely used children's corpus specifications and recording tools,It is a Childes children's language data exchange system built by the International Children's Language Center of Carnegelon University in the United States。This system also exerts the function of the corpus gathered at the same time,In the world of pre -school children's corpus library,About 89%is recalled for the Childes system; most of the preschool children's corpus also borrows from the corpus labeling rules provided by Childes。Some children's corpus library,Such as Singapore's 5 to 6 -year -old preschool children Chinese spoken language library,Using "973 Contemporary Chinese Text Corpal Database、Word -based marking processing specifications ",and use it for Chinese Chinese children in Singapore's Chinese children。
Preschool children's corpus libraryConstruction content: "Three Consciousness"
Corgal Library technology is a way to achieve the construction of children'bet365 Play online games s corpus library,The content of the corpus reflects the idea of construction。Examine more than 300 official documents provided by the Childs platform,Refer to the literature of nearly a thousand articles to study the children's corpus,The development of the content of the children's corpus in the preschool children's corpus can be summarized as "three consciousness" for nearly decades.。
Fusion consciousness。The integration awareness of the construction of pre -schooling children's corpus library is mainly reflected in two aspects: First, the two areas of "language structure" and "language function" are gradually integrated。Current,In the world of pre -school children's corpus library,More than 230 on the language structure,About 66%,These corpus mainly revolves around children's voice、Vocabulary、grammar、Sub -language (symbols) and literacy skills are constructed; more than 250 of the language functions,About 72%of the total,Construction content focuses on communication intent、Socialization of Language、discourse、Vice Language (communication)、communication strategy、speech behavior, etc.; Nearly 150 on both the language structure and the language function,Represents the construction direction of the two major areas of language structure and language function。
Second, the gradual integration of academic research and social application。The two major social fields of the construction of preschool children's corpus library,It is the research and development of children's language intelligent technology and bet365 best casino games children's language intervention therapy。R & D of children's language intelligent technology,The one -way recognition of language and words and the two -way communication interaction construction of "people -machine",It is reflected in the use of children's language intelligent technology for the development of children's language products,This is also the key to the transformation of language industrialization in children's corpus。Research on children's language disorders combined with children's language intervention treatment,It reflects the consciousness of providing language services for the construction of the corpus,This is also the construction of children's corpus library to solve social language problems、Important areas of pursuing welfare for the society。
Language multiple consciousness。According to incomplete statistics,About 50 languages (including its regional variants and social variants) are included in the construction of children's corpus library,The top 10 in the ranking is English、Spanish、Japanese、French、Dutch language、German、Japanese、Italian、Portuguese、Russian。where,The number of Chinese children's corpus libraries accounts for about 9%of the world's total construction。In recent years,presents English、Spanish、Japanese、French -based,Other languages The language diversified pattern of quickly entering the library。
Bilingual or multi -language preschool children's corpus library is also constantly developing。At present, there are more than 40 bilingual or multi -language preschool children's corpus library,About 30 types of language (including its regional variants and social variants),Among them, the first five languages in the construction quantity are English、Spanish、French、Japanese、Dutch language。You should also see,Children with a young age、Other language children、Bilingual sign language children's multi -language development problem has become a hot topic in the 21st century,But the current corpus construction pays attention to such issues.。The number of languages in the world's preschool children's corpus is about 0.7%of human language,The number of languages participating in the construction of a bilingual or multi -language preschool children's corpus library is about 0.4%,There is still a lot of room for development in the future。
Share consciousness。At present, there are more than 160 children's corpus libraries that provide bet365 live casino games shared services,About 46%of the total。The largest children's corpus sharing platform is the Childes system,In the shared children's corpus library,There are more than 150 from the Childes platform。From the perspective of the development,Children's corpus sharing rate is upward。Preschool children's corpus library built in the 20th century,About 45%of the total construction period providing shared services; 21st Century,The proportion of shared services has increased to 54%。For example,Miyata corpus, while providing corpus sharing services,It is clearly stated that it will "protect the privacy of the investigation object"。On the basis of obeying the basic ethical specifications,Realizing the corpus sharing conforms to the development requirements of the era of resource construction and sharing。
Overall,Since the end of the 20th century,Preschool Children's Cordic Library Construction presents prosperity。However, in the application of advanced technology、Academic Consciousness、Openness、Use benefits, etc. still need to be improved。Preschool children's corpus construction not only involves corpus linguistics、Children's Linguistics、Children's learning、Development of disciplines such as computer science,Still a language education、Cultural gene heritage and global information equality and other long -term businesses,It is not enough to understand the construction of children's corpus from only two levels of technical and content。Combined with today's digital survival、The background of the era of intelligent transformation,It is necessary to explore the status of children's corpus in a larger discipline system and national strategy,In -depth research and improvement of the level and prestige of Chinese children's corpus construction,Consider the future with problems such as "smart playmates" and "smart teachers" for children。
(This article is the Bet365 lotto review "Study and Operation Research Research and Operation Research Research and Operation Research Research on Chinese Presumers" (19ayy010) in the National Social Science Fund
(Author Unit: Academy of Language and Language Sciences of Beijing Language University; Institute of Chinese International Education, Beijing University of Languages University)
Friendship link: Official website of the Chinese Academy of Social Sciences |
Website filing number: Jinggong.com Anmi 11010502030146 Ministry of Industry and Information Technology:
All rights reserved by China Social Sciences Magazine shall not be reprinted and used without permission
General Editor Email: zzszbj@126.com This website contact information: 010-85886809 Address: 11-12, Building 1, Building 1, No. 15, Guanghua Road, Chaoyang District, Beijing: 100026
>