基于子图同构的Hive数据操作合规分析方法

陈丽; 陈兴蜀; 罗永刚; 杨露; 袁道华

doi:10.11999/JEIT211081

基于子图同构的Hive数据操作合规分析方法

doi: 10.11999/JEIT211081 cstr: 32379.14.JEIT211081

1.
四川大学软件学院成都 610065
2.
四川大学网络空间安全研究院成都 610065
3.
四川大学计算机学院成都 610065

基金项目: 国家自然科学基金(61802270)

详细信息

作者简介:
陈丽：女，硕士，研究方向为大数据安全

陈兴蜀：女，教授，博士生导师，研究方向为云计算与大数据安全、可信计算与信息保障

罗永刚：男，助理研究员，研究方向为大数据和网络安全

杨露：女，博士，研究方向为信息安全、云计算安全、大数据

袁道华：男，教授，研究方向为分布式处理与网络计算、数据库与信息系统

通讯作者:
罗永刚　iamlyg98@scu.edu.cn

中图分类号: TN915.08; TP311.13
计量
- 文章访问数: 687
- HTML全文浏览量: 540
- PDF下载量: 55
- 被引次数: 0
出版历程
- 收稿日期: 2021-10-08
- 修回日期: 2022-01-10
- 录用日期: 2022-03-01
- 网络出版日期: 2022-03-05
- 刊出日期: 2022-12-16

Compliance Analysis Method of Hive Data Operation Based on Subgraph Isomorphism

1.
College of Software, Sichuan University, Chengdu 610065, China
2.
Cyber Science Research Institute, Sichuan University, Chengdu 610065, China
3.
College of Computer, Sichuan University, Chengdu 610065, China

Funds: The National Natural Science Foundation of China (61802270)

摘要

摘要: Hive现有的审计功能不能对数据操作目的进行合规判断。针对以上问题，该文提出一种基于子图同构的Hive数据操作合规分析方法。首先，提出基于图的Hive数据操作和合规规则的建模方法，形成数据溯源图和合规规则图；然后，将数据操作合规判断建模为溯源图和合规图的匹配问题，并提出基于子图同构的求解算法。最后，在数据治理平台Apache Atlas及Hive中进行了实验验证，实验结果表明，相比于基于集合、VF2以及Ullmann的合规验证，该文方法具有更高的合规验证效率。
- Hive数据 /
- Apache Atlas /
- 合规分析 /
- 子图同构
Abstract: Hive's existing audit function can not make compliance judgment on the purpose of data operation. To solve the above problems, a Hive data operation compliance analysis method based on subgraph isomorphism is proposed. Firstly, the modeling method of Hive data operation and compliance rules based on graph is proposed to form data traceability graph and compliance rule graph; Then, the compliance judgment of data operation is modeled as the matching problem of traceability graph and compliance graph, and a solution algorithm based on subgraph isomorphism is proposed. Finally, the experimental verification is carried out in the data governance platforms Apache Atlas and Hive. The experimental results show that the proposed method has higher compliance verification efficiency than the collection based, VF2 and Ullmann compliance verification.
- Hive database /
- Apache Atlas /
- Compliance analysis /
- Subgraph isomorphism

HTML全文

图 1 Hive的合规验证模型

下载: 全尺寸图片幻灯片

图 2 合规规则图示例

下载: 全尺寸图片幻灯片

图 3 合规验证系统架构

下载: 全尺寸图片幻灯片

图 4 合规验证结果统计

下载: 全尺寸图片幻灯片

图 5 query35查询的合规结果

下载: 全尺寸图片幻灯片

图 6 query6查询的合规结果

下载: 全尺寸图片幻灯片

图 7 query94查询的合规结果

下载: 全尺寸图片幻灯片

图 8 query16查询的合规结果

下载: 全尺寸图片幻灯片

图 9 加入合规分析前后的溯源时间开销

下载: 全尺寸图片幻灯片

图 10 不同合规规则图大小对合规验证时间的影响

下载: 全尺寸图片幻灯片

图 11 不同数据溯源图大小对匹配时间的影响

下载: 全尺寸图片幻灯片

表 1 合规要求及其功能说明

合规要求	功能
数据使用范围合规	使用的数据不超过声明的使用数据范围
处理方式合规	对数据的操作符合声明的数据操作
权限合规	使用数据的用户具有授权访问该数据的角色
目的合规	同时满足数据使用范围合规、处理方式合规、权限合规的情况，即目的合规

下载: 导出CSV

表 2 基于子图同构的合规验证算法

输入：${{G} }_{\text{R} },{{A} }_{1},{G}-{{A} }_{1}$
输出：审计结果
参数说明：${ {p} }_{ {i} }\in { {V} }_{ { {A} }_{1} },{ {t} }_{ {i} }\in { {V} }_{ { {A} }_{1} },{ {D} }_{ { {U} }_{ {I} } }\subseteq { {V} }_{ { {A} }_{1} },{ {D} }_{ {j} }\subseteq { {V} }_{{R} },$
${ {V} }_{ { {A} }_{1} }\subseteq \text{DU}\cup \text{OP},{ {V} }_{{R} }\subseteq { {D} }_{ { {U} }_{\text{r} } }\cup {\text{OP} }_{ {R} },$
$\text{PURPOSEMAP}\subseteq ({{A} }_{1},{\text{PP} }_{1})\cup ({{A} }_{2},{\text{PP} }_{2})$
$\cup ({ {A} }_{3},{\text{PP} }_{\text{3} })\cdots \cup \left({ {A} }_{{i} },{\text{PP} }_{ {I} }\right)$
(1) M(s)= $ \text{VF3} $(${ {G} }_{{R} },{ {A} }_{1}$)
(2) if M(s) 包含${{A} }_{1}$中所有节点 then
(3) 　从 $ \mathrm{P}\mathrm{U}\mathrm{R}\mathrm{P}\mathrm{O}\mathrm{S}\mathrm{E}\mathrm{M}\mathrm{A}\mathrm{P} $获得关于${{A} }_{1}$操作的目的A
(4) if OP $\text{∉}{{V} }_{\mathrm{G}-{{A} }_{1} }$ then
(5) 　　　return ${ \text{purpose}}A$
(6) 　else
(7) 　基于OP $\in{ {V} }_{ {G}-{ {A} }_{1} }$ 获得 ${ {G} }_{ {{R} }_{1} }$ 和${{A} }_{2}$, ${{V} }_{ {{A} }_{2} }\subseteq {\mathrm{D}\mathrm{U} }_{2}\cup {\mathrm{O}\mathrm{P} }_{2}$, 　　　${ {V} }_{ {{R} }_{1} }\subseteq { {D} }_{ { {U} }_{\mathrm{r}1} }\cup {\mathrm{O}\mathrm{P} }_{ {{R} }_{1} }$,
(8) 　if ${{G} }_{ {{R} }_{1} }$=$\varnothing $ then
(9) 　　　return ${\text{purpose}} A$ $ \mathrm{\Delta } $ 合规
(10) 　　　else
(11) 　　　　${{M} }_{1}\left({S}\right)=\mathrm{V}\mathrm{F}3$(${{G} }_{ {{R} }_{1} },{{A} }_{2}$)
(12) 　　　if ${{M} }_{1}\left({S}\right)$包含${{A} }_{2}$中的所有节点 then
(13) 　　　　从$ \mathrm{P}\mathrm{U}\mathrm{M}\mathrm{A}\mathrm{P} $获得关于${{A} }_{2}$的purposeB
(14) 　　　　if ${\text{purpose}}B$ = ${\text{purpose}}A$ then
(15) 　　　　　return ${\text{purpose}}A$ $ \mathrm{\Delta } $ 合规
(16) 　　　　　else
(17) 　　　　return “-2” $ \mathrm{\Delta }\mathrm{数}\mathrm{据}\mathrm{最}\mathrm{小}\mathrm{化}\mathrm{不}\mathrm{合}\mathrm{规} $
(18) 　　　else
(19) 　　　　if ${M}\left({S}\right)$ 包含${{A} }_{1}$中的一个节点 then
(20) 　　　　　　return “–3” $ \text{Δ} $操作和用户身份验证不合规
(21) 　　　else
(22) 　　　　return “–4” $ \text{Δ} $ 用户身份验证不满足

下载: 导出CSV

表 3 合规规则信息

编号	规则
(1)	root1被授权在StatisticalAnalyse下只能对表Customer和表Customer_address执行rank, avg, sum,count, substr
(2)	root2被授权在SensitiveAnalyse下只能对表Customer和表Customer_demographics执行count, sum, max, min, avg

下载: 导出CSV

参考文献(20)

[1]	中华人民共和国国家质量监督检验检疫总局, 中国国家标准化管理委员会. GB/T 35273-2017 信息安全技术个人信息安全规范[S]. 北京: 中国标准出版社, 2018. General Administration of quality supervision, inspection and Quarantine of the people's Republic of China, National Standardization Administration of the people's Republic of China Management Committee. GB/T 35273-2017 information security technology personal information security Specification[S] Beijing: China Standards Press, 2018.
[2]	眭震钧. 安全数据库审计子系统[D]. [硕士论文], 复旦大学, 2013. SUI Zhenjun.Safe Database Audit Subsystem[D]. [Master's Thesis], Fudan University, 2013.
[3]	SCHWAB P K, RÖCKL J, LANGOHR M S, et al. Performance evaluation of policy-based SQL query classification for data-privacy compliance[J]. Datenbank-Spektrum, 2021, 21(3): 191–201. doi: 10.1007/s13222-021-00385-9
[4]	刘驰, 胡柏青, 谢一, 等. 大数据治理与安全: 从理论到开源实践[M]. 北京: 机械工业出版社, 2017. LIU Chi, HU Baiqing, XIE Yi, et al. Big Data Governance and Security: from Theory to Implementation Jian [M]. Beijing: China Machine Press, 2017.
[5]	郭洪宾. 动车组全生命周期数据集成平台安全防护技术的研究[D]. [硕士论文], 北京交通大学, 2017. GUO Hongbin. Research on safety protection technology of life cycle data integration platform for the EMU[D]. [Master's Thesis], Beijing Jiaotong University, 2017.
[6]	兰文. 大数据背景下企业内部审计信息化研究——以康力电梯股份有限公司为例[D]. [硕士论文], 华东交通大学, 2020. LAN Wen. Research on Enterprise Internal Audit Informationization under the background of big data--taking canny elevator Co., Ltd as an Example[D]. [Master’s Thesis], East China Jiaotong University, 2020.
[7]	CHEN Haoyu, TU Shanshan, ZHAO Chunye, et al. Provenance cloud security auditing system based on log analysis[C]. 2016 IEEE International Conference of Online Analysis and Computing Science (ICOACS), Chongqing, China, 2016: 155–159.
[8]	ALDECO-PÉREZ R and MOREAU L. A provenance-based compliance framework[M]. BERRE A J, GÓMEZ-PÉREZ A, TUTSCHKU K, et al. Future Internet - FIS 2010. Berlin, Heidelberg: Springer, 2010.
[9]	ALDECO-PÉREZ R and MOREAU L. Provenance-based auditing of private data use[C]. Visions of Computer Science - BCS International Academic Conference 2008, London, UK, 2008.
[10]	BUTIN D, DEMIREL D, and BUCHMANN J. Formal policy-based provenance audit[C]. The 11th International Workshop on Security Information and Computer Security, Tokyo, Japan, 2016.
[11]	PASQUIER T, SINGH J, POWLES J, et al. Data provenance to audit compliance with pri v acy policy in the Internet of Things[J]. Personal and Ubiquitous Computing, 2018, 22(2): 333–344. doi: 10.1007/s00779-017-1067-4
[12]	VADLAMUDI D and RAO K T. Provenance aware audit trail framework for SAAS providers efficiency assessment in cloud environment[J]. Journal of Critical Reviews, 2020, 7(12): 1191–1196.
[13]	ALI M. Provenance-based data traceability model and policy enforcement framework for cloud services[D]. [Ph. D. dissertation], University of Southampton, 2016.
[14]	LUO Chen, HE Fei, PENG Fei, et al. PSpec-SQL: Enabling fine-grained control for distributed data analytics[J]. IEEE Transactions on Dependable and Secure Computing, 2021, 18(2): 810–824. doi: 10.1109/TDSC.2019.2914209
[15]	裴华, 刘炜, 唐冬林. 大数据下统一审计技术框架研究[J]. 通信技术, 2020, 53(9): 2252–2256. doi: 10.3969/j.issn.1002-0802.2020.09.027 PEI Hua, LIU Wei,TANG Dong-lin. Unified auditing technical framework under big data[J]. Communications Technology, 2020, 53(9): 2252–2256. doi: 10.3969/j.issn.1002-0802.2020.09.027
[16]	SPIVEY B and ECHEVERRIA J. Hadoop Security: Protecting Your Big Data Platform[M]. Sebastopol: O'Reilly Media, Inc. , 2015.
[17]	杜娟, 苏秋月. 基于DAG的Hive数据溯源方法[J]. 信息技术与网络安全, 2020, 39(11): 31–37. doi: 10.19358/j.issn.2096-5133.2020.11.005 DU Juan, SU Qiuyue. Hive data provenance Method Based on DAG[J]. Information Technology and Network Security, 2020, 39(11): 31–37. doi: 10.19358/j.issn.2096-5133.2020.11.005
[18]	CARLETTI V, FOGGIA P, SAGGESE A, et al. Challenging the time complexity of exact subgraph isomorphism for huge and dense graphs with VF3[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 804–818. doi: 10.1109/TPAMI.2017.2696940
[19]	CORDELLA L P, FOGGIA P, SANSONE C, et al. An improved algorithm for matching large graphs[C]. The 3rd IAPR-TC15 Workshop on Graphbased Representations in Pattern Recognition, 2001.
[20]	ULLMANN J R. An algorithm for subgraph isomorphism[J]. Journal of the ACM, 1976, 23(1): 31–42. doi: 10.1145/321921.321925