Compliance Analysis Method of Hive Data Operation Based on Subgraph Isomorphism
-
摘要: Hive现有的审计功能不能对数据操作目的进行合规判断。针对以上问题,该文提出一种基于子图同构的Hive数据操作合规分析方法。首先,提出基于图的Hive数据操作和合规规则的建模方法,形成数据溯源图和合规规则图;然后,将数据操作合规判断建模为溯源图和合规图的匹配问题,并提出基于子图同构的求解算法。最后,在数据治理平台Apache Atlas及Hive中进行了实验验证,实验结果表明,相比于基于集合、VF2以及Ullmann的合规验证,该文方法具有更高的合规验证效率。
-
关键词:
- Hive数据 /
- Apache Atlas /
- 合规分析 /
- 子图同构
Abstract: Hive's existing audit function can not make compliance judgment on the purpose of data operation. To solve the above problems, a Hive data operation compliance analysis method based on subgraph isomorphism is proposed. Firstly, the modeling method of Hive data operation and compliance rules based on graph is proposed to form data traceability graph and compliance rule graph; Then, the compliance judgment of data operation is modeled as the matching problem of traceability graph and compliance graph, and a solution algorithm based on subgraph isomorphism is proposed. Finally, the experimental verification is carried out in the data governance platforms Apache Atlas and Hive. The experimental results show that the proposed method has higher compliance verification efficiency than the collection based, VF2 and Ullmann compliance verification.-
Key words:
- Hive database /
- Apache Atlas /
- Compliance analysis /
- Subgraph isomorphism
-
表 1 合规要求及其功能说明
合规要求 功能 数据使用范围合规 使用的数据不超过声明的使用数据范围 处理方式合规 对数据的操作符合声明的数据操作 权限合规 使用数据的用户具有授权访问该数据的角色 目的合规 同时满足数据使用范围合规、处理方式合规、权限合规的情况,即目的合规 表 2 基于子图同构的合规验证算法
输入:${{G} }_{\text{R} },{{A} }_{1},{G}-{{A} }_{1}$ 输出:审计结果 参数说明:${ {p} }_{ {i} }\in { {V} }_{ { {A} }_{1} },{ {t} }_{ {i} }\in { {V} }_{ { {A} }_{1} },{ {D} }_{ { {U} }_{ {I} } }\subseteq { {V} }_{ { {A} }_{1} },{ {D} }_{ {j} }\subseteq { {V} }_{{R} },$ ${ {V} }_{ { {A} }_{1} }\subseteq \text{DU}\cup \text{OP},{ {V} }_{{R} }\subseteq { {D} }_{ { {U} }_{\text{r} } }\cup {\text{OP} }_{ {R} },$ $\text{PURPOSEMAP}\subseteq ({{A} }_{1},{\text{PP} }_{1})\cup ({{A} }_{2},{\text{PP} }_{2})$ $\cup ({ {A} }_{3},{\text{PP} }_{\text{3} })\cdots \cup \left({ {A} }_{{i} },{\text{PP} }_{ {I} }\right)$ (1) M(s)= $ \text{VF3} $(${ {G} }_{{R} },{ {A} }_{1}$) (2) if M(s) 包含${{A} }_{1}$中所有节点 then (3) 从 $ \mathrm{P}\mathrm{U}\mathrm{R}\mathrm{P}\mathrm{O}\mathrm{S}\mathrm{E}\mathrm{M}\mathrm{A}\mathrm{P} $获得关于${{A} }_{1}$操作的目的A (4) if OP $\text{∉}{{V} }_{\mathrm{G}-{{A} }_{1} }$ then (5) return ${ \text{purpose}}A$ (6) else (7) 基于OP $\in{ {V} }_{ {G}-{ {A} }_{1} }$ 获得 ${ {G} }_{ {{R} }_{1} }$ 和${{A} }_{2}$, ${{V} }_{ {{A} }_{2} }\subseteq {\mathrm{D}\mathrm{U} }_{2}\cup {\mathrm{O}\mathrm{P} }_{2}$,
${ {V} }_{ {{R} }_{1} }\subseteq { {D} }_{ { {U} }_{\mathrm{r}1} }\cup {\mathrm{O}\mathrm{P} }_{ {{R} }_{1} }$,(8) if ${{G} }_{ {{R} }_{1} }$=$\varnothing $ then (9) return ${\text{purpose}} A$ $ \mathrm{\Delta } $ 合规 (10) else (11) ${{M} }_{1}\left({S}\right)=\mathrm{V}\mathrm{F}3$(${{G} }_{ {{R} }_{1} },{{A} }_{2}$) (12) if ${{M} }_{1}\left({S}\right)$包含${{A} }_{2}$中的所有节点 then (13) 从$ \mathrm{P}\mathrm{U}\mathrm{M}\mathrm{A}\mathrm{P} $获得关于${{A} }_{2}$的purposeB (14) if ${\text{purpose}}B$ = ${\text{purpose}}A$ then (15) return ${\text{purpose}}A$ $ \mathrm{\Delta } $ 合规 (16) else (17) return “-2” $ \mathrm{\Delta }\mathrm{数}\mathrm{据}\mathrm{最}\mathrm{小}\mathrm{化}\mathrm{不}\mathrm{合}\mathrm{规} $ (18) else (19) if ${M}\left({S}\right)$ 包含${{A} }_{1}$中的一个节点 then (20) return “–3” $ \text{Δ} $操作和用户身份验证不合规 (21) else (22) return “–4” $ \text{Δ} $ 用户身份验证不满足 表 3 合规规则信息
编号 规则 (1) root1被授权在StatisticalAnalyse下只能对表Customer和
表Customer_address执行rank, avg, sum,count, substr(2) root2被授权在SensitiveAnalyse下只能对表Customer和
表Customer_demographics执行count, sum, max, min, avg -
[1] 中华人民共和国国家质量监督检验检疫总局, 中国国家标准化管理委员会. GB/T 35273-2017 信息安全技术 个人信息安全规范[S]. 北京: 中国标准出版社, 2018.General Administration of quality supervision, inspection and Quarantine of the people's Republic of China, National Standardization Administration of the people's Republic of China Management Committee. GB/T 35273-2017 information security technology personal information security Specification[S] Beijing: China Standards Press, 2018. [2] 眭震钧. 安全数据库审计子系统[D]. [硕士论文], 复旦大学, 2013.SUI Zhenjun.Safe Database Audit Subsystem[D]. [Master's Thesis], Fudan University, 2013. [3] SCHWAB P K, RÖCKL J, LANGOHR M S, et al. Performance evaluation of policy-based SQL query classification for data-privacy compliance[J]. Datenbank-Spektrum, 2021, 21(3): 191–201. doi: 10.1007/s13222-021-00385-9 [4] 刘驰, 胡柏青, 谢一, 等. 大数据治理与安全: 从理论到开源实践[M]. 北京: 机械工业出版社, 2017.LIU Chi, HU Baiqing, XIE Yi, et al. Big Data Governance and Security: from Theory to Implementation Jian [M]. Beijing: China Machine Press, 2017. [5] 郭洪宾. 动车组全生命周期数据集成平台安全防护技术的研究[D]. [硕士论文], 北京交通大学, 2017.GUO Hongbin. Research on safety protection technology of life cycle data integration platform for the EMU[D]. [Master's Thesis], Beijing Jiaotong University, 2017. [6] 兰文. 大数据背景下企业内部审计信息化研究——以康力电梯股份有限公司为例[D]. [硕士论文], 华东交通大学, 2020.LAN Wen. Research on Enterprise Internal Audit Informationization under the background of big data--taking canny elevator Co., Ltd as an Example[D]. [Master’s Thesis], East China Jiaotong University, 2020. [7] CHEN Haoyu, TU Shanshan, ZHAO Chunye, et al. Provenance cloud security auditing system based on log analysis[C]. 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China, 2016: 155–159. [8] ALDECO-PÉREZ R and MOREAU L. A provenance-based compliance framework[M]. BERRE A J, GÓMEZ-PÉREZ A, TUTSCHKU K, et al. Future Internet - FIS 2010. Berlin, Heidelberg: Springer, 2010. [9] ALDECO-PÉREZ R and MOREAU L. Provenance-based auditing of private data use[C]. Visions of Computer Science - BCS International Academic Conference 2008, London, UK, 2008. [10] BUTIN D, DEMIREL D, and BUCHMANN J. Formal policy-based provenance audit[C]. The 11th International Workshop on Security Information and Computer Security, Tokyo, Japan, 2016. [11] PASQUIER T, SINGH J, POWLES J, et al. Data provenance to audit compliance with pri v acy policy in the Internet of Things[J]. Personal and Ubiquitous Computing, 2018, 22(2): 333–344. doi: 10.1007/s00779-017-1067-4 [12] VADLAMUDI D and RAO K T. Provenance aware audit trail framework for SAAS providers efficiency assessment in cloud environment[J]. Journal of Critical Reviews, 2020, 7(12): 1191–1196. [13] ALI M. Provenance-based data traceability model and policy enforcement framework for cloud services[D]. [Ph. D. dissertation], University of Southampton, 2016. [14] LUO Chen, HE Fei, PENG Fei, et al. PSpec-SQL: Enabling fine-grained control for distributed data analytics[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(2): 810–824. doi: 10.1109/TDSC.2019.2914209 [15] 裴华, 刘炜, 唐冬林. 大数据下统一审计技术框架研究[J]. 通信技术, 2020, 53(9): 2252–2256. doi: 10.3969/j.issn.1002-0802.2020.09.027PEI Hua, LIU Wei,TANG Dong-lin. Unified auditing technical framework under big data[J]. Communications Technology, 2020, 53(9): 2252–2256. doi: 10.3969/j.issn.1002-0802.2020.09.027 [16] SPIVEY B and ECHEVERRIA J. Hadoop Security: Protecting Your Big Data Platform[M]. Sebastopol: O'Reilly Media, Inc. , 2015. [17] 杜娟, 苏秋月. 基于DAG的Hive数据溯源方法[J]. 信息技术与网络安全, 2020, 39(11): 31–37. doi: 10.19358/j.issn.2096-5133.2020.11.005DU Juan, SU Qiuyue. Hive data provenance Method Based on DAG[J]. Information Technology and Network Security, 2020, 39(11): 31–37. doi: 10.19358/j.issn.2096-5133.2020.11.005 [18] CARLETTI V, FOGGIA P, SAGGESE A, et al. Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 804–818. doi: 10.1109/TPAMI.2017.2696940 [19] CORDELLA L P, FOGGIA P, SANSONE C, et al. An improved algorithm for matching large graphs[C]. The 3rd IAPR-TC15 Workshop on Graphbased Representations in Pattern Recognition, 2001. [20] ULLMANN J R. An algorithm for subgraph isomorphism[J]. Journal of the ACM, 1976, 23(1): 31–42. doi: 10.1145/321921.321925