Actively carry out a study of in -depth falsification of voice
April 06, 2021 09:30 Source: "China Social Sciences", April 6, 2021, Issue 2140, Issue 2140, 2021

Today's Times,Mobile、recording pen、Recording telephones and other equipment have become common tools in people's lives and work。Especially,With WeChat Voice、Mobile phone recording and other functions gradually become stronger,Recording materials have become more and more frequent as evidence in legal litigation cases。Recording materials are also officially listed as one of the legal evidence forms by my country's "Criminal Procedure Law" and "Civil Procedure Law",and play more and more important characters during the proceedings。So,Appraisal "Whether a recording is what someone said" (that is, the verdial identification) has become an important aspect of judicial voice appraisal and research。

In recent years,With the rapid development of science and technology,Artificial Intelligence Technology,The following referred to as AI) has become more and more applied to various fields,Including simulation robots、Automatic recognition (fingerprint、Face、voice, etc.) and intelligent medical care。Voice depth falsification refers to the use of artificial intelligence technology (such as machine learning algorithm、neural network, etc.),In other words, it is a kind of reapable (Reenactment)、Replace (replace)、Editting and Synthesis new technologies。The emergence of this technology means "sound no longer belongs to yourself",Anyone's voice can be forged and replaced。

  bet365 live casino games

Due to voice depth for falsification technology in medical rehabilitation (such as "reconstruction" sound of patients with voice loss)、Entertainment (such as funny videos) has huge development bet365 Play online games potential and application value,The development and promotion of this technology in the world invested a lot of energy,Related technologies are becoming more mature。

Face DeepFake,Voice depth falsification time is late,Mainly rise in 2019。voice depth forgery technology,In essence is a "text-voice" conversion system (Text-to-site system,The following referred to as "TTS system")。Early TTS system through voice synthesis technology,You can convert the entered text information into a corresponding voice signal。But,As the early TTS system synthesized voice signals pointed out by E.Helander and J.NURMINEN are natural degrees、The effects of understanding and continuity are not ideal,It is often said that it sounds like the sound of machine (Robotic Voice)。In recent years,With the continuous progress of voice synthesis technology,The quality of voice signal generated by the TTS system has greatly improved in these respects。

T. Chen Point out,Deep fake technology of voice,is the voice synthesis technology combined with high -quality TTS systems and sound convert。First,Computers through machine learning algorithms (such as Gaussi hybrid model (GMM)、convolutional neural networks, etc.) Special recognition of the speaker's voice sample,Create the corresponding TTS system。then,Enter through text、Voice conversion and other means to transform the text information obtained into voice signals (including real -time and delay conversion)。

Current,Deep fake voice is not only anthropomorphic、Reality and naturalness have greatly improved,and facing different languages ​​(such as Chinese、English、Vietnamese, etc.) Voice Deep Facing Software has been open to the public、and the use threshold and difficulty gradually decrease。

 Potential threats are worthy of attention

Initial,Procedures or Bet365 lotto review software with a voice depth of fake function is for medical treatment、Entertainment and other applications released。but,Do not rule out illegal behaviors that criminals use such products。For example,Publish fake news through falsification of people with social influence,or fake the sound of acquaintances to implement fraud、Get information on others。

There is no doubt,Voice depth falsification technology has huge development bet365 Play online games potential and application value。but,As the public uses voice depth for falsification software threshold and difficulty gradually decreases,Once there are criminals using these software for fraud、Drug transactions illegal activities,will trust our society、News authenticity、Monitoring and judicial evidence collection brings huge challenges。In this background,Make sure the authenticity and integrity of the recording materials,Guarantee judicial justice,Make sure news integrity,The urgent needs of becoming today's society。

 Learning technology is still in the exploration stage

Compared with the fake technology of deep falsification of the face,Deep fake voice currently gets less attention,Still in the exploration stage。Early researchers based on voice -based spectrum features (such as CQCC、mfccs),Utilize the Gaussian hybrid model、Deep Neural Network (DNN) and other machine learning algorithms to develop forgery of fake voice automatic identification system,But the correct rate of judging the other is not ideal (less than 70%)。Subsequent,M. Shan and T-J TSAI proposed a cross-venon method based on the Needleman-WUNSCH algorithm,Compare the difference between each frame after aligning two recording。Some researchers have tried to apply face recognition technology into automatic automatic bet365 best casino games identification technologies for pseudo -voice。t. Chen, etc. to learn from the falsification of the face of the face for pseudo -voice automatic identification system,The system uses a large amount of remaining string loss function (Large Margin Cosine Loss Function,LMCL) Maximize the differences between the original sound and the pseudo -voice,and minimize the internal changes of the two。Similar to this,b. Thai also borrowed the method of face recognition,Proposal to enter the voice features extracted through signal processing technology or convolutional neural networks into a long time memory model (Long Short-Term Memory),Then use the classification layer to determine whether the voice is forged。

Current,Learning research on deep forgive voice is extremely insufficient,and the effect is not ideal。Especially,In judicial appraisal practice,Generally based on the inspection of expert -based experts,A supplement based on a quantitative analysis of computer identification。But,Research on the in -depth falsification of voice is exactly the opposite,Mainly concentrated on the automatic identification of the computer。This lack will undoubtedly bring adverse effects on the study and practice of forgery of pseudo -voice,It needs enough attention to the future。

  Study on the problem of counterfeiting

In order to further improve the level of fake research on the in -depth falsification of voice,Gradually form a science、Accurate、Comprehensive forgery of pseudo -pseudo -process,Prepare for the potential threat of deep -falsification of voice。Research can be carried out in the following aspects。

First,Carrying out forgery research from the perspective of macro -language characteristics。Different sound spectrum characteristics from micro,Speech features (such as mantra、Research、Dialect bet365 Play online games Expressive、Pronunciation habits, etc.) From a macro perspective, the character's characteristics of the speaker at the level of consignment。E. SAPIR believes that its language acquisition with the speaker、Gender、Social background、Work and other factors have an inseparable relationship。Voice -based voice depth of voice depth falsification technology is difficult to achieve the simulation of the speaker's macro language characteristics,This provides sufficient possibilities for the test of the fake voice experts。Future research can analyze voice from the perspective of speech characteristics,Find an effective carrier and significant features that can reflect the characteristics of individual words of speakers。Related results have high reference value for authenticity of voice。

2,Explore the difference between the authenticity voice on the sound spectrum。Although the in -depth forgery voice is highly similar to the original sound characteristics,But the previous use of professional software can still find the subtle difference between the two。Technical experts from NIOS companies use Spectrum3D software to compare and analyze the sound spectrum characteristics of deep -falsified voice and original sound,It is found that although the two are very similar in hearing,But the acoustic spectrum distribution of the pseudo -voice is poor,and repeatedly appearing in the high -frequency area。The cause of this phenomenon may be that deep falsification software is to improve the similarity of the original sound,Dedicated from the voice of multiple channels。Subsequent,After the voice signal is increased,You can detect a weak background noise in the original sound,Faldo the voice no noise marks。This can be seen,Trivial voice distribution bet365 live casino games in the frequency domain、There are certain differences in the aspects of background noise。In future research and practice,You should make full use of professional analysis software,Different from the sound spectrum of authenticity,Summarize regular knowledge。

third,Try to improve algorithms and perspective,Further enhances the effect of automatic computer authentication。Current,Existing scholars such as M. Alzantot、b. Chettri, etc.,Improving machine learning algorithms (such as 2-D convolutional neural network),The accuracy rate of the authenticity of the voice is about 75%。other,There are also scholars such as T. Mittal, etc.,The idea of ​​identifying from the perspective of emotional recognition,Use Deep Learning Network to identify the authenticity of the face and voice in the video。They first judge the emotions expressed by the face and voice through perception experiments,Then extract and learn the characteristics of different emotions based on perception results。Final,The correct rate of identification of authenticity (84.4%or more) based on emotional characteristics as the basis for judging the basis。Visible,improvement of algorithm and perspective,It has a certain effect on improving the effect of computer test,It is worth developing in the future、Deeper research。

Deep fake technology of voice is a technological innovation in the era of artificial intelligence,Medical rehabilitation、Entertainment and other fields has important impact and significance。but,The emergence of this technology will undoubtedly bring certain safety hazards to the society。Facing possible challenges in the future,Judicial appraisal work must not only make full use of existing resources,Prepare similar problems in advance。At the same time,should also actively carry out related research,Accumulation of knowledge bet365 Play online games and experience of pseudo -pseudo -voicebook,and promote the development of the fake voice computer automatic fake system。

  (Author Unit: Sound and Electronic Data Appraisal Research Office of the Institute of Judicial Appraisal Science)

Editor in charge: Zhang Jing
QR code icons 2.jpg
Key recommendation
The latest article
Graphics
bet365 live casino games

Friendship link: Official website of the Chinese Academy of Social Sciences |

Website filing number: Jinggong.com Anxie 11010502030146 Ministry of Industry and Information Technology:

All rights reserved by China Social Sciences Magazine shall not be reprinted and used without permission

General Editor Email: zzszbj@126.com This website contact information: 010-85886809 Address: 11-12, Building 1, Building 1, No. 15, Guanghua Road, Chaoyang District, Beijing: 100026