Public data classification classification governance in the era of big models
September 14, 2023 15:49 Source: "China Social Sciences" September 14, 2023 Issue 2735 Author: Huang Chengfeng Ding Wanfu

New artificial intelligence technology represented by large language models has obtained important breakthroughs,While bringing opportunities for economic and social development,Data security and privacy have also been generated、Affairs of personal information rights and interests。Recently,The National Network Information Office and seven ministries and commissions have issued the "Interim Measures for the Management of Genesis Artificial Intelligence Services",For the first time, it is proposed to implement tolerance, cautious and classified supervision of the generated artificial intelligence services,Clarified the requirements of training data processing activities and data labels,Standardized the specifications of generated artificial intelligence services。This method also proposes to promote the orderly and orderly opening of public data categories,Expand high -quality public training data resources。Enter the era of big model,Public data security will face great risks and challenges,Generating artificial intelligence development must strengthen the effective supply of high -quality public data,It is also necessary to strengthen security protection for public data,Promote the classification of public data classification and bet365 best casino games compliance use,Preventing artificial intelligence service risk 。

  Bet365 lotto review

In recent years,my country has successively introduced a series of laws and regulations,Including the "Network Security Law", "Data Security Law", "Personal Information Protection Law",Clarify the security protection system for data classification and classification,Protect personal information,Persist in ensuring data security and promoting data security and opening up and use it with heavy use。Should be seen,During the process of generating artificial intelligence training, multi -source data fusion applications。Research indicates,Public data resources account for about 80%of the total data resources of the whole society,Public data security and privacy will be an extremely important issue in the use and research and development process of large models。Data security risks in the era of big models are facing new changes。

First,From text data to multi -mode data,Data classification classification is difficult to increase。Although OpenAI has not disclosed GPT-4 language model data amount,But from the perspective of public data,GPT-3 language model is trained by 175 billion parameters,The GPT-4 language model that comes from it obviously requires a larger amount of data as a support。Big model accelerate multi -mode development,Data type expands from text to pictures、Audio、Video。High -quality、Large -scale、Diversity data requirements for data classification and grading、The cost proposed a challenge。

2,From static protection bet365 best casino games to data full life cycle,Data security environment is becoming increasingly complicated。Traditional data security is mainly based on static protection data entities,Large models need to rely on massive data for training and learning,Data turns from static to flow,Data security scenarios have changed。Data security must not only protect the data entity,Based on data classification classification,Regulates the personal information processing and data protection of the full life cycle from training to the full life cycle from training to use。For example,Data collection phase through network crawler capture、Collect directly to the subject of personal information、Data transaction and other methods involve a large number of compliance risk points,Data pre -processing phase cleaning the collected data、Standardized、Steps of labeling and feature extraction may involve interpretation of infringement。

third,From a single subject to a diversified subject,Data security management and control situation Strict。Model development of generating artificial intelligence involves multiple steps,Including data collection、Data marking、Model training、Model training、Model optimization, etc.。In view of model development involving a large amount of data processing activities,Among them, each link is usually not executed by the same subject,Instead, the division of labor collaboration by different subjects in the industry、Completely completed。For example,Enterprises usually include the data marking bet365 live casino games tasks to the company or individual through their own platforms,Therefore, the difficulty of management increases,I have put forward greater challenges to fulfilling the obligation of compliance。Excessive model development links and data processing between multiple subjects cause data security risk liability,Tracking the source is even more difficult。

  Path selection of the classification of public data classification 

Public data classification and grading treatment is a complex system engineering,must be complete、Accurate、Comprehensive implementation of the Party Central Committee's decision -making deployment,Sworing the data is collecting、Pre -training、Results such as the safety risk faced during the whole life cycle of the output,Classification and classification as the key starting point,Construction of public data classification grading governance system in the era of large models。

First,Institutional data security and development。Data classification classification is the prerequisite for data security protection and data factor marketization in the era of large models,Only by doing a good job of classification can we adopt more refined measures for data security management。The national level needs to establish a data classification hierarchical protection system framework as soon as possible,Develop clear、Specific、Operating large model training data requirements list and negative list,Differentiated management measures for different types of and different risk levels。The scope of the legal application of data classification and classification in a timely manner,Can't bet365 Play online games be in a safe protection vision,It is necessary to emphasize data supervision and rules,At the same time, we must also emphasize data development and utilization。Accelerate the authorization of public data authorization,Standardize public data authorization、Processing、Business、Data activities such as security supervision,Promote the orderly compliance of public data resources to enter the first market。

2,Establish a multi -party linkage mechanism in management。Establishing government departments、Industry Organization、Different subjects participating in different subjects participating in the linkage mechanism,Time to feedback、Common governance of new risks facing the security of artificial intelligence data、New progress、New Challenge,Diversified co -governance of realizing the security and development of large model data。The government must play a leading role,Data operators participating in public data governance、Research Support Agency、Data trading agencies and other subjects implement classification and classification supervision,Implement network security、Data security、Personal information protection and other relevant requirements。Industry organizations can formulate data classification and grading standards in the industry in accordance with the law,Combined with the specific application scenarios of the industry、Data attributes and importance,Formulate and promote data security specifications and group standards。At Bet365 lotto review the same time,Encourage various management on the generated artificial intelligence industry chain、Development、Researchers discover data assets legally and compliantly,Open data resources、Participate in data transactions。

third,Technically innovative intelligent classification classification method。Construction of public training data resource platform,Focusing Finance、Medical、Traffic、Space and other construction public data areas,Provide safety、credible data cleaning、Processing environment,Create high -quality artificial intelligence training data sets and Chinese corpus data。Use natural language treatment、convolutional neural network and other technologies to intelligently identify public data,Quick grading of data catalog recognition model,In -depth recognition of the hierarchical model with example data recognition model,Dynamic scanning data assets,To achieve intelligent automated public data classification classification。According to the word division results、Word attributes and different data responsible subjects,Explore the use of multi -party security calculations、Blockchain and other new technologies to build a matching public dataset open sharing mechanism,Under the premise of ensuring data security and controllable data,Realize public data credible circulation。

  (Author's Jiangjiang Laboratory Smart Social Governance Research Center Senior Engineering Specialist; Zhijiang Lab Smart Social Governance Research Center Zheng Senior Engineer、Director of the Information Technology Department) 、

 

Editor in charge: Cui Bohan
QR code icons 2.jpg
Key recommendation
The latest article
Graphics
bet365 live casino games
Video

Friendship link:

Website filing number: Jinggong.com An Bei 11010502030146 Ministry of Industry and Information Technology:

All rights reserved by China Social Sciences Magazine shall not be reprinted and used without permission

General Editor Email: zzszbj@126.com This website contact information: 010-85886809 Address: Building 1, Building 1, No. 15, Guanghua Road, Chaoyang District, Beijing: 100026