A Distributed KBB Index Tree Multi-Keyword Fuzzy Sorting Search Scheme
-
摘要: 随着数字化医疗的快速发展,电子健康记录(EHR)已成为提升医疗服务质量的重要手段,医疗机构想要获取患者的EHR用于医疗研究。然而,传统的电子健康记录系统面临信息孤岛、数据安全共享困难、隐私泄露等问题。该文设计了分布式的KBB索引树多关键词模糊排序搜索方案,旨在解决以上问题,并实现安全、高效、可追溯的电子健康记录共享。该方案使用K-means聚类技术对EHR分类,在IPFS上分类存储加密的EHR,实现密文的分布式存储。使用波特词干提取法把关键词转换为统一形式,基于聚类的EHR构造KBB索引树,达到模糊搜索并提高搜索效率。此外,可以通过智能合约实现对用户身份的验证。搜索节点在KBB树中排序搜索,星际文件系统(IPFS)返回前k个加密EHR给用户,用户在区块链上验证数据的完整性后解密。性能分析表明该方案的算法具有安全性,在已知密文模型中是语义安全的,同时具有良好的实现效率。Abstract:
Objective The rapid advancement of information technology has driven significant transformation in the medical domain. As a cornerstone of medical informatization, Electronic Health Records (EHRs) play a critical role in improving healthcare service efficiency and supporting medical research. Effective use of patient EHRs can enhance diagnostic and treatment processes and offer reliable data support for disease prevention and novel drug discovery. However, conventional EHR systems face several challenges, including data silos, secure sharing complexity, and heightened privacy risks. Data silos hinder the seamless exchange and integration of EHRs across institutions, limiting the potential for collaborative medical practices. A central challenge in secure data sharing lies in ensuring data interoperability without compromising patient privacy—a problem of urgent concern in modern healthcare systems. Moreover, privacy breaches not only jeopardize patient welfare but also damage the credibility and trustworthiness of healthcare institutions. To address these issues, this paper proposes a distributed multi-keyword fuzzy sorting search scheme based on a Keywords Balanced Binary (KBB) index tree. The scheme leverages the decentralized, tamper-proof, and traceable features of blockchain technology to create a secure and efficient framework for EHR sharing. By eliminating data silos, the blockchain facilitates cross-institutional interoperability, while smart contracts maintain strict privacy protections throughout the data sharing process. Methods The scheme first applies a K-means clustering algorithm to categorize EHRs, followed by hierarchical encrypted storage via the Interplanetary File System (IPFS). This dual-layer approach ensures distributed data management while protecting user privacy and enhancing storage robustness and fault tolerance. To improve search performance, the Porter stemming algorithm standardizes query keywords, reducing ambiguity from semantic variations and enabling consistent semantic matching. A KBB index tree is then constructed over the clustered EHRs to support fuzzy keyword matching and high retrieval precision. This structure incorporates adaptive sorting to reduce search latency. Access control is enforced using smart contracts, which implement fine-grained, role-based authentication to ensure that only authorized users can retrieve EHRs, thereby minimizing the risk of data leaks. During query processing, ranked searches are conducted within the KBB index tree. Once matching encrypted EHRs are located, data retrieval is performed via IPFS. Blockchain-based hashing ensures the integrity and immutability of the retrieved data, protecting it from tampering or corruption. Finally, users decrypt the data with private keys to access the original EHR content, completing the secure retrieval process. Results and Discussions Simulation experiments demonstrate that the proposed scheme offers superior implementation efficiency compared to related methods in terms of both time and storage overhead. The index tree construction time is shorter than that reported in[Ref. 7] and[Ref. 8], with an efficiency improvement of 80%–85% ( Fig. 3 ). Although the trap generation time is longer than in[Ref. 7] and[Ref. 15] (Fig. 4 ), this increase stems from the scheme’s support for multiuser search, whereas[Ref. 7] only supports single-user search. The combined search time for indexing and trap generation is lower than in[Ref. 7],[Ref. 8], and[Ref. 15], with search efficiency improved by 50.5% and 75.8%, respectively (Fig. 5 ). The ciphertext generation and decryption times are also markedly better than those in the comparative literature, with improvements of 33.3% and 60%, respectively (Fig. 6 ,Fig. 7 ). Furthermore, the storage costs for keys and ciphertext in this scheme are lower than those of the other methods (Fig. 8 ,Fig. 9 ).Conclusions The proposed scheme substantially improves multi-keyword search efficiency, enabling users to rapidly locate target information within large-scale EHR datasets while enhancing the accuracy of search results. The fuzzy search functionality allows users to retrieve data even when the exact form of the keywords is unknown, increasing the flexibility and applicability of the search process. The encryption algorithm has been rigorously tested in experimental analyses, demonstrating that it maintains semantic security under the known ciphertext model, thereby ensuring EHR confidentiality. In addition, the algorithm performs well in terms of implementation efficiency, meeting the practical needs of EHR applications in medical contexts while maintaining strong security guarantees. By combining high-precision multi-keyword indexing with robust privacy protection, this scheme offers a scalable and secure framework for EHR sharing that meets the interoperability and confidentiality requirements of modern healthcare systems. -
1 文档处理
Input: $ D = ({d_1},{d_2}, \cdots ,{d_n}) $ Output: $ B = ({b_1},{b_2}, \cdots ,{b_K}) $ 1: Randomly select $K$documents $ ({d_1},{d_2}, \cdots ,{d_K}) $ from the
document collection $ D $ as the initial clustering centers2: for $ i = 1,i \le K + 1 $ do for $j = K + 1,j \le n$ do 3: Calculate the Euclidean distance from document $ {d_i} $ to the
clustering center $ {d_j} $4:$ {\text{dist(}}{d_i},{d_j}{\text{)}} = \sqrt {\displaystyle\sum\nolimits_{i = 1}^n {{{({x_i} - {y_i})}^2}} } $ 5: Based on the $ {\text{sim}} $value, document $ {d_i} $ is grouped into clusters
with the least distance6: end for 7: Calculate the mean of all documents in the cluster to get the
new center of mass $ K' $8:if $ K = K' $then 9: Output the set of $K$ clustered documents
$ B = ({b_1},{b_2}, \cdots ,{b_K}) $10: else 11: Repeat steps 2-6 12:end 2 参数匹配
Input:Index parameter $ \tilde I $, Trapdoor parameter $ {\text{TK}} $, constant
$ {\omega _x} $Output:$ F' $ 1:function calculateIndexPandTrapdoorP $ ({I_x},{I_1},{I_2},{T_x},T') $ 2: numerator=$ e\left(\prod\nolimits_{x \in S} {I_x^{{\omega _x}},T'} \right) \cdot e({I_2},T') $ 3: denominator=$e({I_1},{T_x})$ 4: return $ {F}^{\prime }=\frac{e\left({\displaystyle \prod _{x\in S}{I}_{x}^{{\omega }_{x}},{T}^{\prime }}\right)\cdot e({I}_{2},{T}^{\prime })}{e({I}_{1},{T}_{x})} $ 5:end if 合约1 验证合约 1: contract VerifyContract { 2: uint256 public $F'$; 3: function receiveAlgorithmResult(uint256
resultFromAlgorithm1) public {$F'$= resultFromAlgorithm1;} 4: function verifyAccess ( ) public view returns (bool) { 5: if ($ F' = e{(g,g)^{\alpha rs{u_e}}} $) { 6: return true;} 7: else { 8: return false;} 合约2 搜索合约 1: contract SearchContract { 2: event AlgorithmExecuted(uint256 input, uint256 result); 3: function AlgorithmGDFS(${\tilde T_i},{\tilde Q_e},{Z_i},k$) internal pure { 4: return $ {\text{Top}} - {\text{kCIDList}} $; } 5: function executeAlgorithm(uint256 input) external returns
(uint256){6: uint256 result = AlgorithmGDFS(input); 7: emit AlgorithmExecuted(input, result); 8: return result;}} 表 1 功能比较
表 2 计算成本比较
方案 索引生成 陷门生成 搜索 加密 解密 文献[7] $ O({m^2}n) $ $ O({m^2}) $ $ O(\mu {m^2}\lg n) $ - - 文献[8] $ O(m{n^2}) $ $ O(n'{m^2}) $ $ O(\mu n'{m^2}\lg n) $ $ 4E + (n + 2){E_T} $ $ 4P + (n + 2){E_T} $ 文献[11] - - - $ 5E + (n + 1){E_T} $ $ 3P + nE + n{E_T} $ 文献[15] $ O(mn) $ $ O(m) $ $ O(\mu {m^2}\lg n) $ - - 本文方案 $ O(mn) $ $ O(n'{m^2}) $ $ O(\mu n'm\lg n) $ $ 2E + (n + 2){E_T} $ $ 2P + (n + 3){E_T} $ -
[1] SINGH N, KUMAR J, SINGH A K, et al. Privacy-preserving multi-keyword hybrid search over encrypted data in cloud[J]. Journal of Ambient Intelligence and Humanized Computing, 2024, 15(1): 261–274. doi: 10.1007/s12652-022-03889-8. [2] CAO Ning, WANG Cong, LI Ming, et al. Privacy-preserving multi-keyword ranked search over encrypted cloud data[J]. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(1): 222–233. doi: 10.1109/TPDS.2013.45. [3] LI Jinguo, WEN Mi, LU Kejie, et al. PIMRS: Achieving privacy and integrity-preserving multi-owner ranked-keyword search over encrypted cloud data[J]. Security and Communication Networks, 2016, 9(16): 3765–3776. doi: 10.1002/sec.1482. [4] WU Zigang, WANG Haijiang, WAN Jian, et al. An inner product predicate-based medical data-sharing and privacy protection system[J]. IEEE Access, 2024, 12: 68680–68696. doi: 10.1109/ACCESS.2024.3400611. [5] TONG Qiuyun, MIAO Yinbin, CHEN Lei, et al. VFIRM: Verifiable fine-grained encrypted image retrieval in multi-owner multi-user settings[J]. IEEE Transactions on Services Computing, 2022, 15(6): 3606–3619. doi: 10.1109/TSC.2021.3083512. [6] ZHONG Hong, LI Zhanfei, CUI Jie, et al. Efficient dynamic multi-keyword fuzzy search over encrypted cloud data[J]. Journal of Network and Computer Applications, 2020, 149: 102469. doi: 10.1016/j.jnca.2019.102469. [7] XIA Zhihua, WANG Xinhui, SUN Xingming, et al. A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(2): 340–352. doi: 10.1109/TPDS.2015.2401003. [8] GUO Jiaqi, TIAN Cong, LU Xu, et al. Multi-keyword ranked search with access control for multiple data owners in the cloud[J]. Journal of Information Security and Applications, 2024, 82: 103742. doi: 10.1016/j.jisa.2024.103742. [9] GAO Sheng, CHEN Yuqi, ZHU Jianming, et al. BPMS: Blockchain-based privacy-preserving multi-keyword search in multi-owner setting[J]. IEEE Transactions on Cloud Computing, 2023, 11(3): 2260–2272. doi: 10.1109/TCC.2022.3196712. [10] CHEN Kai, LIN Zhongrui, WAN Jian, et al. Multi-owner secure encrypted search using searching adversarial networks[C]. Proceedings of the 18th International Conference on Cryptology and Network Security, Fuzhou, China, 2019: 184–195. doi: 10.1007/978-3-030-31578-8_10. [11] MIAO Yinbin, LIU Ximeng, CHOO K K R, et al. Privacy-preserving attribute-based keyword search in shared multi-owner setting[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(3): 1080–1094. doi: 10.1109/TDSC.2019.2897675. [12] YIN Hui, QIN Zheng, ZHANG Jixin, et al. Secure conjunctive multi-keyword ranked search over encrypted cloud data for multiple data owners[J]. Future Generation Computer Systems, 2019, 100: 689–700. doi: 10.1016/j.future.2019.05.001. [13] CHEN Lanxiang, LEE W K, CHANG C C, et al. Blockchain based searchable encryption for electronic health record sharing[J]. Future Generation Computer Systems, 2019, 95: 420–429. doi: 10.1016/j.future.2019.01.018. [14] WITTEN I H, MOFFAT A, and BELL T C. Managing Gigabytes: Compressing and Indexing Documents and Images[M]. 2nd ed. San Francisco: Morgan Kaufmann, 1999: 576. [15] XU Dequan, PENG Changgen, WANG Weizheng, et al. Privacy-preserving dynamic multi-keyword ranked search scheme in multi-user settings[J]. IEEE Transactions on Consumer Electronics, 2023, 69(4): 890–901. doi: 10.1109/TCE.2023.3269045. [16] LI Yang and ZHANG Haiyu. Big data technology for teaching quality monitoring and improvement in higher education-joint K-means clustering algorithm and Apriori algorithm[J]. Systems and Soft Computing, 2024, 6: 200125. doi: 10.1016/j.sasc.2024.200125. [17] ZUBAIR M, IQBAL A, SHIL A, et al. An improved K-means clustering algorithm towards an efficient data-driven modeling[J]. Annals of Data Science, 2024, 11(5): 1525–1544. doi: 10.1007/s40745-022-00428-2. [18] MANASRAH A M, NASIR M A, and SALEM M. A privacy-preserving multi-keyword search approach in cloud computing[J]. Soft Computing, 2020, 24(8): 5609–5631. doi: 10.1007/s00500-019-04033-z. [19] ZOU Yipeng, PENG Tao, WANG Guojun, et al. Blockchain-assisted multi-keyword fuzzy search encryption for secure data sharing[J]. Journal of Systems Architecture, 2023, 144: 102984. doi: 10.1016/j.sysarc.2023.102984. [20] ABIDIN Z and JUNAIDI A. Text stemming and lemmatization of regional languages in Indonesia: A systematic literature review[J]. Journal of Information Systems Engineering and Business Intelligence, 2024, 10(2): 217–231. doi: 10.20473/jisebi.10.2.217-231. [21] ZHAO Fanyi, ZHANG Mingxuan, ZHOU Shiji, et al. Detection of network security traffic anomalies based on machine learning KNN method[J]. Journal of Artificial Intelligence General Science JAIGS, 2024, 1(1): 209–218. doi: 10.60087/jaigs.v1i1.213. [22] SAIF M B, MIGLIORINI S, and SPOTO F. A survey on data availability in layer 2 blockchain rollups: Open challenges and future improvements[J]. Future Internet, 2024, 16(9): 315. doi: 10.3390/fi16090315. [23] XIA Zhihua, WANG Xinhui, SUN Xingming, et al. A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(2): 340–352. doi: 10.1109/TPDS.2015.2401003. (查阅网上资料,本条文献与第7条文献重复,请确认). -