基于多信息融合的中文手写地址字符串切分与识别

付强; 丁晓青; 蒋焰

doi:10.3724/SP.J.1146.2007.00961

基于多信息融合的中文手写地址字符串切分与识别

doi: 10.3724/SP.J.1146.2007.00961 cstr: 32379.14.SP.J.1146.2007.00961

基金项目:

国家自然科学基金(60472002)和西门子公司合作项目(20030829- 24022SI202)资助课题

计量
- 文章访问数: 3303
- HTML全文浏览量: 148
- PDF下载量: 1324
- 被引次数: 0
出版历程
- 收稿日期: 2007-06-15
- 修回日期: 2007-10-16
- 刊出日期: 2008-12-19

Segmentation and Recognition Algorithm for Chinese Handwritten Address Character String

摘要

摘要: 该文提出了一种有效的中文手写地址字符串的切分与识别方法。首先，利用笔划提取与笔划合并将字符串图像进行过切分，得到字根图像序列；然后综合利用几何信息、识别信息和语义信息挑选最优的字根合并路径，得到最优的切分结果及对应的最优识别结果。其中，几何信息是根据当前字符串自身的特点统计得到，因此可适应不同书写风格的字符串。识别信息由单字分类器给出，包括10个候选识别结果及其相应的置信度；单字分类器采用MQDF分类器。语义信息用基于字的bi-gram模型进行描述，模型参数是从包含18万条地址数据的数据库中统计得到的。用3000个实际的手写地址样本做试验，单字识别正确率达到88.28%。
- 地址识别 /
- 字符串切分 /
- 手写字符串识别
Abstract: An effective segmentation and recognition method of Chinese handwritten address strings is proposed. Firstly, over-segmentation is applied to character string images by extracting stroke and merging them to obtain radical sequences. Next, the optimal segmentation and recognition result is achieved by synthesizing geometric analysis, isolated character classifier and semantic information together. The geometric information is estimated on current character string to adapt to various writing styles of character strings. The isolated character classifier adopts MQDF classifier with ten candidate results and their confidence. The semantic information is described by a character-based bi-gram model, parameters of which are estimated from a database containing 180,000 addresses items. The algorithm is tested on 3,000 actual handwritten address samples and the single-character recognition rate is 88.28%.
- A /
- d

HTML全文

参考文献(1)

[1] Chiang C C and Yu S S. An iterative character segmentationmethod for irregularly formatted Chinese documents.Proceedings of the Optical Character Recognition andDocument Analysis, Taiwan, 1996: 61-67. [2] Lu Y and Shridhar M. Character segmentation inhandwritten words-an overview[J].Pattern Recognition.1996,29(1):77-96 [3] Arica N and Yarman-Vural F T. An overview of characterrecognition focused on off-line handwriting[J].IEEE Trans. onSystems, Man, and Cybernetics-Part C: Applications andReviews.2001, 31(2):216-233 [4] Casey R G and Lecolinet E. A survey of methods andstrategies in character segmentation[J].IEEE Trans. onPattern Analysis and Machine Intelligence.1996, 18(7):690-706 [5] Liu C L, Koga M and Fujisawa H. Lexicon-drivensegmentation and recognition of handwritten characterstrings for Japanese address reading[J].IEEE Trans. on PatternAnalysis and Machine Intelligence.2002, 24(11):1425-1437 [6] Tseng L Y and Chuang C T. An efficient knowledge basedstroke extraction method for multi-font Chinese characters[J].Pattern Recognition.1992, 25(12):1445-1458 [7] Tseng L Y and Chen R C. Segmenting handwritten Chinesecharacters based on heuristic merging of stroke boundingboxes and dynamic programming[J].Pattern RecognitionLetters.1998, 19(10):963-973 [8] 王嵘, 丁晓青, 刘长松. 基于笔划合并的手写体信函地址汉字切分识别. 清华大学学报(自然科学版), 2004, 44(4): 498-502.Wang R, Ding X Q and Liu C S. Handwritten Chineseaddress segmentation and recognition based on mergingstrokes. J of Tsinghua Univ. (Sci Tech), 2004, 44(4):498-502. [9] Fu Q, Ding X Q, and Liu C S, et al.. A hiddern Markov modelbased segmentation and recognition algorithm for Chinesehandwritten address character strings. InternationalConference on Document Analysis and Recognition, Seoul,Korea, 2005: 590-594. [10] Duda R O.[J].Hart P E and Stork D G. Pattern Classification.Second Edition, New York, John Wiley Sons Inc.2000,:- [11] Kimura F, Takashina K, and Tsuruoka S, et al.. Modifiedquadratic discriminant functions and its application toChinese character recognition[J].IEEE Trans. on PatternAnalysis and Machine Intelligence.1987, 9(1):149-153

施引文献

资源附件(0)

访问统计