Intelligent voice technology improves speaking efficiency
December 22, 2020 09:22 Source: "Chinese Social Sciences" December 22, 2020th, Issue 2074, Fang Qiang Li Aijun Wang Shijin

The study of language includes vocabulary、grammar、Oral learning and other aspects。From the perspective of communication,Oral learning may be the most critical。Traditional oral teaching requires teachers to pronounce standards,Being able to evaluate the pronunciation of learners in real time,Correct the error pronunciation of learners。But in the actual teaching process,Teacher less、Student Multi、Classroom time limited,It is difficult for teachers to provide one -to -one spoken guidance and feedback on students in class,To a certain extent, it affects the learning efficiency of some students and the enthusiasm of participating in classroom interaction。How to use various modern technical means,Effectively alleviate the pain points in speaking in spoken teaching,Make up for the shortcomings of classroom teaching,It is a thing that has practical value and social significance。With the development of statistical machine learning and deep neural network,voice technology in voice synthesis、voice recognition、Pronunciation reverse analysis and other key areas have great progress,Makes the pain points in speaking in speaking in speaking technology with voice technology becomes possible。

  bet365 Play online games

Voice synthesis technology is a technology that converts the input text into voice。The traditional voice synthesis system usually includes two modules of the front and back -end。The front -end module mainly analyzes the input text,Extract the linguistic information required for the back -end module。For bet365 Play online games the Chinese synthesis system,The front -end module generally contains text regularization、Words、Word prediction、Multi -Yinzi to eliminate dysfunction、Rhythm prediction and other sub -modules。The back -end module is based on the front -end analysis results,Givening sound waveforms through a certain way。

In practice,People evaluate voice quality through subjective or objective methods。The subjective method is to score the voice with the experimental test,For example, the average opinion score method (Mean Opinion Score,MOS)、Judging rhyme test (Diagnostic Rhyme Test,DRT)、Disted average opinion (Degradation Mean Opinion Score,DMOS)、Judgment satisfaction test method,DAM)。Objective method is to use algorithm to evaluate voice quality。We often see in the papers of voice synthesis to see the quality of the synthetic voice by calculating the synthetic voice to the real voice.。but,At present, we still cannot build an objective evaluation system that can fully imitate human sound quality perception process,can only make as correct evaluation as possible based on the information obtained,The objective evaluation model established is also far from the ability of human perceptions。In actual use,Usually combined with subjective evaluation and objective evaluation,Objective evaluation is commonly used for system design、Adjustment and live real -time monitoring phase,and subjective evaluation as the final test of actual effect。

Current,The mainstream voice synthesis system MOS score can reach 4.0 or more,Can synthesize sufficient standard and natural voice,For examples of pronunciation in spoken language learning,Effective allevia、Questions with Bet365 lotto review serious insufficient resources for teachers。

  Voice evaluation technology still has weak links

Generally speaking,Voice evaluation technology is a technology that automatically scores and detects and feeds feedback through the computer algorithm,is one of the most important technologies in the field of computer -aided language learning and test,Play an important role in language learning and speaking tests。The goal of voice evaluation technology is alternative experts and teachers,Real -time automatic evaluation and error detection of learners' pronunciation,Make up for artificial evaluation and strong subjectivity、Low -efficiency and other aspects of insufficient efficiency。

Voice evaluation includes two key technologies: reading aloud evaluation and oral expression evaluation。The former mainly includes reading question types such as words and phrases,Examination focuses on the pronunciation errors and pronunciation quality of learners; the latter mainly includes oral translation、Oral retelling、Look at the picture、Topic expressions and other questions,It mainly examines the logical thinking ability and language organization ability of learners。Reading Evaluation Technology Research Early,It has reached the level of maturity now。Oral expression evaluation technology is far more difficult than reading evaluation,Using audio evaluation learners' verbal expression ability is extremely difficult。

  Pronunciation reverse analysis technology still needs to be solved urgently

Pronunciation reverse analysis technology is a technology that infer the location and shape of the pronunciation organs from voice acoustic signals。It is combined with the pronunciation visualization technology to play the role of a teacher,Provide learners with real -time visual feedback and pronunciation guidance。The shape and bet365 live casino games position of the pronunciation organs from the voice signal is a very difficult job。Study based on the pronunciation model,Study based on actual pronunciation data,It is found that there is a pair of relationships between the acoustic signals of some voice and the position and form of the pronunciation organs,That is, different channel forms can generate voice signals with similar acoustic characteristics。

In recent years,With the development of deep neural network technology and synchronous voice -the advancement of pronunciation data collection means,The recursive neural network based on the two -way length and short -term memory unit is applied to the work of pronunciation reverse push,A better performance in the average sense (the average position error of the pronunciation organs is reduced from about 2 mm to about 0.5 mm)。but,Pronunciation reverse push technology should be applied to speaking learning and there are some problems to be solved urgently。First,For every specific pronunciation,Whether the shape and bet365 live casino games position of the pronunciation organs obtained by the existing pronunciation of the pronunciation technology can maintain the sound position characteristics of the pronunciation,It also needs further inspection。Next,Most of the current pronunciation reverse push is based on the pronunciation of a specific pronounced person,How to apply the pronunciation counter -pushing model based on a specific pronunciation person to non -special pronunciation person,Still a topic that needs to be explored in depth。

  Pronunciation visualization technology is still in the exploration stage

Pronunciation visualization technology can be understood as a combination of pronunciation reverse analysis、Oral evaluation results,The location and form of the pronunciation organ's Bet365 app download pronunciation organs、The air flow state during the pronunciation process、Scholars vibration state during the pronunciation process and other information presented in the form of video。Pronunciation visualization involves morphological modeling of pronunciation organs、Driver and display of the pronunciation organ、Analysis and display of air flow status、Analysis and display of the base frequency curve。

Driven and display of the modeling of pronunciation organs and the driver and display of pronunciation organs is mainly used to combine with the pronunciation reverse analysis model,Dynamic display of the pronunciation of the pronunciation of the organs when the pronunchers are pronounced,Provide learners with correct examples of pronunciation action and accurate pronunciation guidance。The work in this area has been deepened,Many research units at home and abroad have proposed the drive model of the pronunciation organs based on different data sets and the drive of the pronunciation organ movement。but,The existing pronunciation organs model and pronunciation motion model also need to be combined with the pronunciation air dynamic model,Further verify whether the visually visual pronunciation model can accurately realize the voice of different phonetic positions。

Analysis and display technology of airflow status is mainly used to visualize the state of the airflow during the pronunciation process,Help learners correctly master different pronunciation methods。but,There are few reports in this work。

Analysis and display technology of the base frequency curve is mainly used to analyze and feedback the rhythm characteristics of learners' spoken language,Help learners correctly grasp the tone,The work in this area is still in the infancy。

  The future application prospects are broad

Now,Intelligent voice technology applied in bet365 best casino games Chinese and English speaking teaching is becoming increasingly mature and stable,But currently only focuses on the positioning and detection of oral errors,The feedback provided by the form and content is compared with a single。Pronunciation of learners,Especially questions in terms of voice and tone,Lack of intuitive visual feedback to help it effectively correct the error,As a result, although the learner knows the existence of the problem,But pronunciation demonstration that can be imitated and referenced due to lack of,I don’t know where to start to overcome voice problems、Improve the accuracy of pronunciation。So,Explore and improve visual feedback is an important research direction of intelligent voice technology in the application of speaking teaching。

As the offline teaching mode turns to large -scale online teaching,People’s learning methods are changing hugely。Language learning combined with artificial intelligence technology,Open a door for online language teaching。The organic combination of intelligent voice technology and classroom teaching practice,Explore scientific and effective online teaching methods,It is also an urgent requirement for language teaching in the new era。

  (This article is the major project of the National Social Science Foundation "Cross -disciplinary Studies of the English learner's voice acquisition mechanism" (15zdb103) in the Chinese Dialect District (15ZDB103)

(Author: Institute of Language Research of the Chinese Academy of Social Sciences; HKUST Xunfei AI Research Institute)

Editor in charge: Zhang Jing
QR code icons 2.jpg
Key recommendation
The latest article
Graphics
bet365 live casino games

Friendship link: Official website of the Chinese Academy of Social Sciences |

Website filing number: Jinggong.com Anmi 11010502030146 Ministry of Industry and Information Technology: Beijing ICP No. 11013869

All rights reserved by China Social Sciences Magazine shall not be reprinted and used without permission

General Editor Email: zzszbj@126.com This website Contact information: 010-85886809 Address: 11-12, Building 1, Building 1, No. 15, Guanghua Road, Chaoyang District, Beijing: 100026