nj bet365 inscriptions contain profound Chinese cultural genes and long-standing cultural roots,is a non-renewable resource used to explore Chinese civilization。nj bet365 inscriptions focus on the history of ancient Chinese social development from 3500 to 3000 years ago in multiple aspects,Provides first-hand authentic historical materials for exploring the origins of Chinese civilization and constructing the system of ancient Chinese history。General Secretary Xi Jinping pointed out,The Chinese Civilization Discovery Project “must strengthen overall planning and scientific layout,Adhere to multi-discipline、Multiple angles、Multiple levels、omnidirectional,Close Archeology and History、Joint research in humanities and natural sciences,Broaden the scope and coverage of research time and space,Further answer to the origin of Chinese civilization、Formation、Basic picture of development、Major issues such as the internal mechanism and the evolution path of civilization in each region"。As a researcher of Oracle in the new era,We should respect literature and appreciate history、Practice in the world,Serving the country’s major strategies、Targeting national cultural development needs,Calm down and keep your academic conscience、Bottom line and scientific spirit,To build a disciplinary system for studying Chinese civilization、The academic system and discourse system make a difference。
nj bet365 inscriptions are comprehensively organized and passed down in an orderly manner
October 30, 2017,UNESCO announced that Oracle has been selected into the “Memory of the World Register”。This is a long-awaited selection,Long time,Oracle continues to be affected by the country、Government、Society and individual attention and protection。Oracle bones since their discovery in 1899,Oracle bones unearthed over the past 124 years have been scattered around the world and collected by nearly a hundred collectors。As a unique and valuable historical material,From Wang Yirong、Liu E、Sun Yirang,To Luo Zhenyu、Wang Guowei、Dong Zuobin、Guo Moruo ("The Four Halls of Oracle Bones"),Back to Tanglan、Rong Geng、Ke Changji、Shang Chengzuo ("Four Young Masters of Oracle Bones") and Wang Xiang、Yu Shengwu、Chen Mengjia、Hu Houxuan and other generations of previous scholars have made unremitting efforts,nj bet365 inscriptions have attracted great attention and in-depth exploration in academic circles at home and abroad,Become an internationally recognized scholar。For the Oracles scattered around the world,Comprehensive and professional finishing work is the most important。As early as 1984,Hu Houxuan once pointed out that the total number of nj bet365 inscriptions unearthed is about 150,000,We later counted more than 160,000 pieces。Mid and late 20th century,The landmark "nj bet365 Inscriptions Collection" and "nj bet365 Inscriptions Collection Supplement" and other comprehensive collections were published,Created good conditions for promoting research on nj bet365 inscriptions and nj bet365 science。Hu Houxuan mentioned it more than 40 years ago,Lots of articles on oracle bones,Scattered in various publications,It’s not easy to find information,Can you compile a book on nj bet365 research documents。What a pity,This wish was not realized until his death。Inherit Hu Houxuan’s last wish,Early 21st century,Under the leadership of researcher Song Zhenhao from the Institute of Ancient History, Chinese Academy of Social Sciences,The "nj bet365 Literature Collection" was finally compiled into 40 volumes,Known as the companion volume of the 13-volume "nj bet365 Inscriptions Collection",The former collects research results,The latter summarizes research materials,Provides basic and complete academic materials for the study of oracle bones。
Taking the Institute of Ancient History of the Chinese Academy of Social Sciences as an example,Compiling "The Third Collection of nj bet365 Inscriptions",Oracle bones were omitted from the collection "nj bet365 Inscriptions Collection" and "nj bet365 Inscriptions Collection Supplement" and later appeared scattered everywhere,Collect and repair some of the oracle bones collected by public and private families,More than 30,000 pieces of oracle bones recorded in total,will provide the academic community with a new comprehensive collection of nj bet365 descriptions;In recent years, I have compiled a travel blog、Jinbo、Shandong Bo、Chongqing Three Gorges Expo、Russian Winter Palace、A collection of records of 15 batches of more than 20,000 nj bet365 collections including private collections。According to statistics,There are 11 public institutions nationwide that have collected more than 1,700 oracle bones,13 companies selling 200-900 pieces,12 companies with 60-199 pieces。Among these 36,12 companies have been cataloged or are being cataloged,Accounting for 33.3%;There are 5 companies that are not organized properly,Accounting for 13.9%;There are 19 companies to be cataloged,Accounting for 52.8%。Therefore,We deeply feel the need to comprehensively launch the holographic compilation, research and documentation of nj bet365 inscriptions from the national level,Implement rescue and protection measures for Oracle heritage,Comprehensive promotion of scientific research on Oracle、Cultural Communication、History Education,Promote in-depth exploration of the roots of Chinese civilization。
“Yin Qi Wenyuan” promotes the digitization of nj bet365 inscriptions
The texture of nj bet365 bones is fragile,Loose, powdery and damaged surfaces are more common,Save、Show、It’s not easy to use。Because nearly 160,000 nj bet365 bones are collected in domestic and foreign museums、Library、Scientific research institution、At least 174 institutions including universities and colleges,Unable to regroup the nj bet365 bones from their “physical form” for study,And nj bet365’s digital service resource construction,Especially with big data、nj bet365 digital project based on artificial intelligence,It can preserve the original information of nj bet365 bones and the text they carry to the greatest extent。
The primary task of Oracle’s digital service resource construction is to integrate Oracle’s materials、Reference Book、Digitization of research documents,The core work is the construction of the database,Ability to denoise nj bet365 rubbings or pictures,And can meet various search needs。Multiple Oracle databases have been built at home and abroad,For example, the Hong Kong Handa Ancient Books Database Retrieval System developed by the Chinese University of Hong Kong,Includes 7 major large-scale nj bet365 books at home and abroad, including "nj bet365 Collection Interpretation" and "nj bet365 Collection Collection in the United Kingdom"。The "nj bet365 World" database developed by the National Library of China collects 5932 nj bet365 photos、3177 rubbings。In addition,"Academia Sinica" in Taiwan, China、East China Normal University、Institutions such as the Institute of Oriental Culture at the University of Tokyo in Japan have also developed several Oracle databases,Some of the databases developed by individuals such as the Chinese Studies Master website are also publicly available。
Although nj bet365 has made many achievements in the construction of digital service resources,It also facilitates the research of nj bet365 to a great extent,But overall the integrity of nj bet365 data、The degree of normativeness and relevance is not high,In particular, there is the problem of poor multi-granularity retrieval efficiency for users。
The hometown of Oracle is Yin Ruins in Anyang, Henan。A scholar said in the 1980s,Oracles are distributed all over the world,I hope that one day those oracle bones living in foreign countries can return to their homeland。I’m afraid it’s unrealistic to return all the nj bet365 cultural relics that have been scattered abroad,But through big data,Concentrate all digital oracle bones to Anyang,It is still possible。Out of such a vision,The Key Laboratory of Oracle Information Processing at Anyang Normal University and the Ministry of Education’s Key Laboratory of Oracle Information Processing and the Research Center for Oracle and Shang History of the Chinese Academy of Social Sciences,Launched "Yin Qi Wenyuan" in October 2019,Utilizing big data、Cloud computing digitizes Oracle、Intelligent,Working together to build the Oracle big data platform。This platform includes "three libraries and one platform",That is, the description library、Font library and document library,And Oracle Knowledge Service Platform,There are 153 kinds of oracle bones recorded、239289 images、More than 4000 prefixes in Oracle、34234 academic works,And still being updated。Through the development of multi-dimensional information annotation,Implementing glyphs and glyphs、Glyphs and related reference books、Description、Documents and other multi-functional associations,Solved the difficulty in inputting nj bet365 inscriptions and cumbersome information annotation resulting in nj bet365 inscriptions、The problem of large-scale sharing and promotion of literature resources。At the same time,The platform is not only free and open to the world,It also provides special public data sets used in various artificial intelligence technology research and various information resource integration services,Currently undergoing the fourth phase of R&D and construction。
Digital intelligence empowers nj bet365 protection and inheritance
With the continuous advancement of nj bet365 research,Produced a large amount of nj bet365 knowledge data,such as rubbings in documentation、Photo、Copy,Especially the three-dimensional nj bet365 data that has appeared in recent years,Radicals of nj bet365 glyphs、Single word、Alternative font,And a large number of nj bet365 research documents,These multiple dimensions、Multimodal data is important data for Oracle research,It is also the data foundation for Oracle information processing research in the new era。Here we mainly record images on oracle bones、nj bet365 glyph、Digitization of nj bet365 research documents and other aspects、Intelligent application for sorting,In order to achieve the purpose of promoting innovative organization of Oracle and helping Oracle research achieve new breakthroughs。
Text detection and recognition in nj bet365 descriptions。Digitizing existing nj bet365 records,The first thing to solve is the detection and recognition of nj bet365 characters,It is the basis for automatic computer processing of nj bet365 image data。On the one hand,The research efficiency of nj bet365 experts can be improved by digitizing existing nj bet365 records,Especially appropriate search techniques (including word search for pictures、Search by word、Searching with pictures) can exponentially improve the efficiency of scholars searching for documents;On the other hand,Using computer vision analysis technology,Detection and recognition of nj bet365 characters in nj bet365 description images,Not only can it speed up the process of digitizing nj bet365 documents,Research on other ancient texts is also possible、Provide assistance in the promotion and dissemination of nj bet365 culture。
Traditional nj bet365 character recognition method,Generally divided into feature extraction and feature classification。The purpose of feature extraction is to obtain the unique features of nj bet365 text images,Feature classification is based on the extracted features to determine which nj bet365 character the feature belongs to。Common feature extraction methods include: scale-invariant feature transformation、Oriented gradient histogram、Gabor、Local binary patterns, etc.。The most common feature classifier is support vector machine。It can be seen from the traditional nj bet365 writing method,The design of feature detection and feature recognizers has a strong dependence on algorithm designers,Choose different features and recognizers,The recognition effects vary greatly。This is also a problem with traditional image recognition。
In recent years,Recognition technology based on deep neural networks has made great progress。This technique does not require manual selection of features,Able to achieve end-to-end recognition results。A more representative result is an nj bet365 character recognition method based on hierarchical representation proposed by Wang Changhu and others from Microsoft Research Asia。
The Oracle rubbings data set OBC306 released by the Key Laboratory of Oracle Information Processing of the Ministry of Education of Anyang Normal University and South China University of Technology,Improved recognition rate for different convolutional neural networks (CNN);The nearest neighbor classification method based on deep metric learning proposed by Liu Chenglin’s team at the Institute of Automation, Chinese Academy of Sciences,Using copy glyphs to assist in the identification of rubbing glyphs;Yang Zhengfeng of East China Normal University and others improved the VGG model,Get the highest 99 in the self-built data set OBI100.5% recognition accuracy;Wang Qiufeng of Xi'an Jiaotong-Liverpool University and others proposed a hybrid augmentation strategy for nj bet365 character recognition;Fu Yanwei and others from Fudan University proposed a data augmentation method for Oracle small-scale learning;Meng Lin and others from Ritsumeikan University in Japan proposed a dynamic data augmentation method that has good results in the self-built data set OBI125。Compared with traditional nj bet365 character recognition technology,These methods have made very obvious progress。
Although the results of various recognition methods seem to be good,But most of them only select 100-300 glyph categories with higher glyph frequencies,And the identification object is the nj bet365 rubbings with a relatively large number of samples。So we need to consider it more,Aiming at the problem of poor recognition performance of the Oracle recognition model in low word frequency categories caused by the extreme imbalance of Oracle data distribution。For this,We unite with Tencent,Based on the annotation and processing of "Yin Qi Wenyuan" nj bet365 inscriptions,Through customized algorithm,Continuously enrich and improve the Oracle model library,So far, the world’s largest Oracle single character database covering 1.43 million words has been established,Improve the recognition and interpretation of nj bet365 inscriptions、Efficiency in extracting content from nj bet365 treatises。
Oracle-bone character encoding and input method application。Although Oracle is already a relatively mature writing system,But because there are no standardized strokes、Many variant characters、A large number of unexplained words and unknown pronunciations,Implementing Oracle’s computer input faces great challenges。The encoding and glyph issues of nj bet365 inscriptions have always been the focus of nj bet365 inscriptions research,It is also one of the key issues in Oracle’s digitalization project。From the perspective of the encoding implementation plan of nj bet365 script,Whether it is the corresponding encoding of modern Chinese characters,Still use the Private Use Area interval of Unicode space for recoding,It is impossible to completely solve the problem of variant characters in nj bet365 inscriptions and the dynamic increase of nj bet365 glyphs as the research deepens、Changing issues。So,The problem that needs to be solved urgently is to determine the basic glyph standard for nj bet365 characters,And implement Oracle’s entry into international Unicode encoding work,After passing the international standard review,Fixed its position in the Unicode encoding space,Construction of oracle-bone glyph library、Laying the foundation for input methods and digital publishing。
The Oracle input method is the basis for digital editing of Oracle。As far as current use is concerned,No longer limited to personal computers,More reflected in text display based on Web pages and digital publishing business based on publishing editing。There is a big difference between nj bet365 writing and modern Chinese characters,Oracle input method research faces great challenges。The currently feasible solution is to compile a concise and easy-to-use coding table,Using the pinyin commonly used by experts and scholars、Encoding、Handwritten、Visualization and other multiple dimensions of Oracle input method to analyze,Each has its own characteristics。However, we believe that as Oracle’s digitalization work continues to deepen,Oracle's unified coding standards once established,can be used、Easy to use、A sufficient Oracle input method will definitely be improved,It also makes Oracle truly "alive"。
Fragmentation and isomerization of nj bet365 research literature。nj bet365 documents are the most complex of all documents,In layout、Text、Very challenging in terms of images and so on。Currently,The "Yin Qi Wenyuan" research group has collected 34,234 oracle-related research documents over the past 120 years,And implemented on the digital platform according to the title、Abstract、Author、Functions such as searching documents using keywords and other bibliographic information and downloading corresponding PDF documents,But full text retrieval and image retrieval are not yet available。
With the deepening of Oracle research,Only by article title、The search method of searching for isolated articles by author or keyword in the database can no longer meet the needs of the increasingly deepening research on nj bet365 science。Intelligent retrieval led by knowledge graph technology、Knowledge services such as related push are currently the demands of more oracle scholars,However, the existing Oracle database generally cannot directly extract document content information from documents mainly based on scanned images,These scanned images need to be processed in depth。To be specific,Converts scanned images of documents into text、Picture、Unstructured data composed of heterogeneous data such as charts,And split it into fine-grained information units based on words,An XML document composed of heterogeneous data is finally formed。
Different from modern documents,nj bet365 documents published before the founding of New China were limited by backward printing technology and the impact of the New Culture Movement on writing rules,There is usually no unified formatting method、Word usage rules and punctuation mark usage rules,This results in conventional fragmentation tools not being directly applicable to Oracle documents。In addition,Different from the heterogeneous data of modern documents, which usually only have image data of article illustrations,Uncommon characters often appear in nj bet365 documents、Glyphs that cannot be effectively recognized by existing character recognition technologies such as official characters and ancient characters,These glyphs also need to be saved in the form of image data。Therefore,In the heterogeneous data structure of nj bet365 Documents,The proportion of picture data is much higher than that of modern literature。The high proportion of picture data sorting requirements has also doubled the difficulty of manually sorting nj bet365 documents。Currently,The isomerization processing of nj bet365 documents basically adopts manual input,Use OCR tools for auxiliary identification in some articles that do not involve Oracle,But the overall progress of document collection is slow,Only a small number of articles implement isomerization。
Digitization technology of nj bet365 documents provides computer retrieval、Digital materials for correlation and analysis,To facilitate the study of nj bet365 science、Intelligence lays the foundation,The use of artificial intelligence technology to organize nj bet365 documents has also become a future development trend。In addition,Document digitization technology can also provide a series of intelligent services for nj bet365 researchers and nj bet365 enthusiasts,Recognition of handwritten nj bet365 characters such as pictures、Retrieval of rubbing character related information, etc.,Continuously expand the breadth and depth of nj bet365 research。
Targeting automatic isomerization processing of nj bet365 documents,Using artificial intelligence technology to perform document analysis and character recognition on literature pictures,According to finishing needs,Heterogeneous data types that identify the content of each part of the image document,Extract it into heterogeneous data such as text or pictures,And stored in the database in XML format。This sorting method is not only suitable for sorting nj bet365 documents,Can also be extended to all in-depth processing tasks involving ancient text documents。In the literature Isomerization、Based on knowledge-based processing,Will combine the Oracle glyph library and description library,Realize the association between the three libraries,And provide intelligent retrieval services based on knowledge reasoning based on the extracted content semantic information。
Oracle full information data model and Oracle digital revitalization。After systematic research on nj bet365 inscription collection institutions and research institutions,We found that the digitization of nj bet365 inscriptions faces two major problems: one is how to achieve high-fidelity digital restoration of the "physical objects" of nj bet365 inscriptions;The second is how to achieve efficient digital search of nj bet365 "text"。From April 2022,We teamed up with Tencent to form a co-creation team to explore the integration of artificial intelligence technology,Use "micromark analysis" to conduct three-dimensional modeling of physical objects,Use "glyph matching" to "search by word" in Oracle、Search pictures by word”,Realize physical high-fidelity display of Oracle、High-efficiency text query、High-quality relationship between physical objects and text。
To break the situation where nj bet365 data is scattered and difficult to connect,The co-creation team formed the “nj bet365 Full Information Data Model”,Realize three-dimensional modeling、High-quality data such as text correlation and traditional data such as copy rubbings are layered and aligned according to coordinates。Under the operation of the coordination mechanism,We propose the integration of artificial intelligence,Breakthrough in rubbings through "micro-mark extraction" technology、Photography、Copying Technology,High-fidelity display restores the details of the nj bet365 bones,Simultaneous multi-dimensional fusion of nj bet365 data,Form extension、“nj bet365 Full Information Data Model” in cross-media format with multi-layer information coordinate alignment,Really realize the "physical" revitalization of nj bet365,Some of the results have been displayed on the "Great nj bet365" WeChat mini program released on April 20, 2023,Received attention and praise from the industry。Also,We pass authority、Professional、Practical、Interesting、nj bet365 digital network carrier co-created,Let more general public understand nj bet365、Perception nj bet365、Research nj bet365、Using nj bet365,Let the inheritance and dissemination of nj bet365 bones be smoothed。
Based on "Yin Qi Wenyuan 2.0”’s Auxiliary Textual Research on nj bet365。With the advancement of artificial intelligence technology in the information age,Promoting nj bet365 textual research and interpretation based on big data technology must be a new idea and method,How to expand nj bet365-related data,Create more data support,Using relatively mature artificial intelligence technology, especially deep learning, to conduct research on nj bet365-assisted textual research。Currently,In the fourth phase of construction of "Yin Qi Wenyuan",We mainly perform low-level cleaning of nj bet365 data,Update bibliographic database、Font library、Library、Conjugation library,Building "Yin Qi Wenyuan 2.0Oracle Bone Text" model library,Provide "search by word" based on glyph matching series algorithm、Data toolbox for "Search images by word",Building nj bet365 Knowledge Graph,Using "glyph matching" AI algorithm and "human-computer collaboration" model to help nj bet365 "decipher"。
The construction of Oracle’s digital services has greatly promoted Oracle’s in-depth research,Especially in recent years, the development of new artificial intelligence technologies, mainly deep learning technology, has attracted great attention at the national level,Heralds a bright future for Oracle research empowered by digital intelligence。Although we still face more technical difficulties and other challenges,But we believe,Integrating new technologies、New means,Carry out more interdisciplinary in-depth research,It will surely bring the nj bet365 culture to life in modern society,Promote the creative transformation and innovative development of nj bet365 inscriptions and other ancient characters。
(This article is part of the ancient writing and Chinese civilization inheritance and development engineering planning project "Yin Qi Wenyuan-nj bet365 Inscriptions Data Platform" (G2812)、"Identification and Extraction Technology of Heterogeneous Data in nj bet365 Documents" (G1806)、China University Industry-University-Research Innovation Fund Project "Research on Personalized Teaching Service Platform for Oracle Inheritance and Innovation" (2021RYA05002) Phased Results)
(Author’s unit: Key Laboratory of nj bet365 Information Processing, Ministry of Education, Anyang Normal University;Institute of Ancient History, Chinese Academy of Social Sciences、Chinese Character Civilization Research Center of Zhengzhou University)
Friendly links:
Website registration number: Beijing Public Network Security No. 11010502030146 Ministry of Industry and nj bet365 Technology:
All rights reserved by nj bet365 Social Sciences Magazine. No reproduction or use without permission is allowed
Chief editor’s email: zzszbj@126.com Contact nj bet365 of this website: 010-85886809 Address: Floor 11-12, Building 1, No. 15 Guanghua Road, Chaoyang District, Beijing Postal Code: 100026
>