Uyghur Character Models with Shared Structure Information for Segmentation-free Recognition under Low Data Resource Conditions

Jiang Zhi-wei; Ding Xiao-qing; Peng Liang-rui; Liu Chang-song

doi:10.11999/JEIT150019

Volume 37 Issue 9

Sep. 2015

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2015 > 37(9): 2103-2109

Jiang Zhi-wei, Ding Xiao-qing, Peng Liang-rui, Liu Chang-song. Uyghur Character Models with Shared Structure Information for Segmentation-free Recognition under Low Data Resource Conditions[J]. Journal of Electronics & Information Technology, 2015, 37(9): 2103-2109. doi: 10.11999/JEIT150019

Citation:

Jiang Zhi-wei, Ding Xiao-qing, Peng Liang-rui, Liu Chang-song. Uyghur Character Models with Shared Structure Information for Segmentation-free Recognition under Low Data Resource Conditions[J]. Journal of Electronics & Information Technology, 2015, 37(9): 2103-2109. doi: 10.11999/JEIT150019

Citation:

Jiang Zhi-wei, Ding Xiao-qing, Peng Liang-rui, Liu Chang-song. Uyghur Character Models with Shared Structure Information for Segmentation-free Recognition under Low Data Resource Conditions[J]. Journal of Electronics & Information Technology, 2015, 37(9): 2103-2109. doi: 10.11999/JEIT150019

PDF( 897 KB)

Uyghur Character Models with Shared Structure Information for Segmentation-free Recognition under Low Data Resource Conditions

doi: 10.11999/JEIT150019 cstr: 32379.14.JEIT150019

Received Date: 2015-01-06
Rev Recd Date: 2016-03-25
Publish Date: 2015-09-19

Abstract

Abstract

Although segmentation-free Uyghur character document recognition can efficiently avoid character segmentation error, it does not work well on low-resource new-type samples. This paper suggests sharing stable character structure among different Uyghur fonts, and improves the efficiency of utilizing samples through Bootstrap. Experiments are made on new-type book samples, which contains only 1/5 training sample amount than the original. The average character recognition accuracy of the proposed method on test samples is 95.05%, and has 55.76%~63.84% recognition error rate relative decrease than the one of Maximum A Posteriori (MAP) method. Therefore, the proposed method can accomplish accurate Uyghur character model training under low data resource conditions.
- Character recognition,
- Hidden Markov Model (HMM),
- Statistical learning,
- Uyghur character

FullText(HTML)

References(20)

References

钱彦旻. 低数据资源条件下的语音识别技术新方法研究[D]. [博士论文], 清华大学, 2012: 67-85.

Qian Yan-min. Study on new speech recognition technology under low data resource conditions[D]. [Ph.D. dissertation], Tsinghua University, 2012: 67-85.

钱彦旻, 刘加. 低数据资源条件下基于优化的数据选择策略的无监督语音识别声学建模[J]. 清华大学学报(自然科学版), 2013, 53(7): 1001-1004.

Qian Yan-min and Liu Jia. Optimized data selection strategy based unsupervised acoustic modeling for low data resource speech recognition[J]. Journal of Tsinghua University (Science and Technology), 2013, 53(7): 1001-1004.

Gunter S and Bunke H. Optimizing the number of states, training iterations and Gaussians in an HMM-based handwritten word recognizer[C]. 7th International Conference on Document Analysis and Recognition (ICDAR), Edinburgh, Scotland, UK, 2003: 472-476.

Geiger J, Schenk J, Wallhoff F, et al.. Optimizing the number of states for HMM-based on-line handwritten whiteboard recognition[C]. 12th International Conference on Frontiers in Handwriting Recognition (ICFHR), Kolkata, India, 2010: 107-112.

Qing H, Chan C, and Chin-Hui L. Bayesian learning of the SCHMM parameters for speech recognition[C]. IEEE 19th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Adelaide, USA, 1994, I: 221-224.

Leggetter C J and Woodland P C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models[J]. Computer Speech Language, 1995, 9(2): 171-185.

刘杰. 序列模型中的迁移学习研究[D]. [博士论文], 南开大学计算机与控制工程学院, 2008: 66-89.

Liu Jie. Research on transfer learning on sequence model[D]. [Ph.D. dissertation], Nankai University, 2008: 66-89.

Ait-Mohand K, Paquet T, and Ragot N. Combining structure and parameter adaptation of HMMs for printed text recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(9): 1716-1732.

Ait-Mohand K, Paquet T, Ragot N, et al.. Structure adaptation of HMM applied to OCR[C]. 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 2010: 2877-2880.

Jiang Zhi-wei, Ding Xiao-qing, Peng Liang-rui, et al.. Analyzing the information entropy of states to optimize the number of states in an HMM-based off-line handwritten Arabic word recognizer[C]. 21st International Conference on Pattern Recognition, Tsukuba, Japan, 2012: 697-700.

王欢良, 韩纪庆, 郑铁然. 高斯混合分布之间K-L散度的近似计算[J]. 自动化学报, 2008, 34(5): 529-534.

Wang Huan-liang, Han Ji-qing, and Zheng Tie-ran. Approximation of Kullback-Leibler divergence between two Gaussian mixture distributions[J]. Acta Automatica Sinica, 2008, 34(5): 529-534.

Bicego M, Murino V, and Figueiredo M A T. A sequential pruning strategy for the selection of the number of states in hidden Markov models[J]. Pattern Recognition Letters, 2003, 24(9): 1395-1407.

Seymore K, McCallum A, and Rosenfeld R. Learning hidden Markov model structure for information extraction[C]. AAAI-99 Workshop on Machine Learning for Information Extraction, Orlando, USA, 1999: 37-42.

Jiang Zhi-wei, Ding Xiao-qing, Peng Liang-rui, et al.. Modified bootstrap approach with state number optimization for hidden Markov model estimation in small-size printed Arabic text-line recognition[C]. 10th International Conference on Machine Learning and Data Mining in Pattern Recognition, St. Petersburg, Russia, 2014: 437-441.

Young S, Evermann G, Gales M, et al.. The HTK Book (for HTK Version 3.4)[M]. Cambridge, UK, Cambridge University Engineering Department, 2009: 97-147.

Al-Hajj M R, Likforman-Sulem L, and Mokbel C. Combining slanted-frame classifiers for improved HMM-based Arabic handwriting recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(7): 1165-1177.

Relative Articles

Supplements(0)

Cited By

Proportional views