摘要:
Recently, Oblivious Storage has been proposed to prevent privacy leakage from user access patterns, which obfuscates and makes it computationally indistinguishable from the random sequences by fake accesses and probabilistic encryption. The same data exhibits distinct ciphertexts. Thus, it seriously impedes cloud providers' efforts to improve storage utilization to remove user redundancy, which has been widely used in the existing cloud storage scenario. Inspired by the successful adoption of removing duplicate data in cloud storage, we attempt to integrate obliviousness, remove redundancy, and propose a practical oblivious storage, PEO-Store. Instead of fake accesses, introducing delegates breaks the mapping link between a valid access pattern and a specific client. The cloud interacts only with randomly authorized delegates. This design leverages non-interactive zero-knowledge-based redundancy detection, discrete logarithm problem-based key sharing, and secure time-based delivery proof. These components collectively protect access pattern privacy, accurately eliminate redundancy, and prove the data delivery among delegates and the cloud. Theoretical proof demonstrates that, in our design, the probability of identifying the valid access pattern with a specific client is negligible. Experimental results show that PEO-Store outperforms state-of-the-art methods, achieving an average throughput of up to 3 times faster and saving 74% of storage space.
摘要:
The concept of data locality is crucial for distributed systems (e.g., Spark and Hadoop) to process Big Data. Most of the existing research optimized the data locality from the aspect of task scheduling. However, as the execution container of Spark's tasks, the executor launched on different nodes can directly affect the data locality achieved by the tasks. This article tries to improve the data locality of tasks by executor allocation in Spark framework. First, because of different communication modes at stages, we separately model the communication cost of tasks for transferring input data to the executors. Then formalize an optimal executor allocation problem to minimize the total communication cost of transferring all input data. This problem is proven to be NP-hard. Finally, we present a greed dropping heuristic algorithm to provide solution to the executor allocation problem. Our proposals are implemented in Spark-3.4.0 and its performance is evaluated through representative micro-benchmarks (i.e., WordCount , Join , Sort ) and macro-benchmarks (i.e., PageRank and LDA ). Extensive experiments show that the proposed executor allocation strategy can decrease the network traffic and data access time by improving the data locality during the task scheduling. Its performance benefits are particularly significant for iterative applications.
摘要:
U-network is a comprehensive convolutional neural network that is widely utilized in medical image segmentation domain. However, it is not accurate enough in detail segmentation and resulting in unsatisfactory segmentation results. To solve this problem, this paper proposes an enhanced U-network that combines an improved Pyramid Pooling Module (PPM) and a modified Convolutional Block Attention Module (CBAM). Its whole network is U-Net architecture, where the PPM is improved by reducing the number of bin species and increasing the pooling connection multiples. It is used in the downsampling part of the network, which can extract input image features of various dimensions. And the CBAM is modified by using 1x1 convolutional layers instead of the original fully connected layers. It is used in the upsampling part of the network, which can combine convolution and attention mechanism. This pays attention to the image from two aspects of space and channel. Besides, the network is trained with novel RGB training to further improve the segmentation ability of the network. Experimental results show that our network outperforms traditional U-shaped segmentation networks by 30% to 40% in metrics Dice, IoU, MAE, and BFscore respectively. What's more, it is better than U-Net ++, U2-Net, ResU-Net, ResU-Net++, and UNeXt in terms of segmentation effect and training time.
摘要:
<jats:p>It is crucial to detect high-severity defects, such as memory leaks that can result in system crashes or severe resource depletion, in order to reduce software development costs and ensure software quality and reliability. The primary cause of high-severity defects is usually resource scheduling errors, and in the program source code, these defects have contextual features that require defect context to confirm their existence. In the context of utilizing machine learning methods for defect automatic confirmation, the single-feature label method cannot achieve high-precision defect confirmation results for high-severity defects. Therefore, a multi-feature fusion defect automatic confirmation method is proposed. The label generation method solves the dimensionality disaster problem caused by multi-feature fusion by fusing features with strong correlations, improving the classifier’s performance. This method extracts node features and basic path features from the program dependency graph and designs high-severity contextual defect confirmation labels combined with contextual features. Finally, an optimized Support Vector Machine is used to train the automatic detection model for high-severity defects. This study uses open-source programs to manually implant defects for high-severity defect confirmation verification. The experimental results show that compared with existing methods, this model significantly improves the efficiency of confirming high-severity defects.</jats:p>
摘要:
With the increasing number of big data applications, large amounts of valuable data are distributed in different organizations or regions. Federated Learning (FL) enables collaborative model training without sharing sensitive data and is widely used in AI medical diagnosis, economy, and autonomous driving scenarios. However, it still leaks the privacy from the gradient exchange in federated learning. What's worse, state-of-the-art work, such as Batchcrypt, still suffers from computational overhead due to a considerable amount of computation and communication costs caused by homomorphic encryption. Therefore, we propose a novel symmetric key-based homomorphic encryption scheme, Sym-Fed. To unleash the power of symmetric encryption in federated learning, we combine random masking with symmetric encryption and keep the homomorphic property during the gradient exchange in the federated learning process. Finally, the security analysis and experimental results on real workloads show that our design achieves performance improvement 6x to 668x and reduces the communication overhead 1.2x to 107x compared with the state-of-the-art work, BatchCrypt and FATE, without model accuracy degradation and security compromise.
通讯机构:
[Zhigang Xu; Shiguang Zhang; Wenlong Tian; Hongmu Han; Xinhua Dong] S;[Haitao Wang; Zhiqiang Zheng] N;Narcotics Control Bureau of Department of Public Security of Guangdong Province,Guangzhou 510050,null,China<&wdkj&>School of Computer Science and Technology,University of South China,Hengyang 421001,China<&wdkj&>School of Computer Science,Hubei University of Hubei University of Technology,Wuhan 430068,China
关键词:
Introduction;Materials and Methods;Results;Discussion;Conclusion;Abstract;Data Availability;Additional Points;Ethical Approval;Consent;Disclosure;Conflicts of Interests;Authors’ Contributions;Funding Statement;Acknowledgements;Acknowledgments;Supplementary Materials;Reference;Dataset Description;Dataset Files;Abstract;Introduction;Introduction and Materials;Introduction and Methods;Materials;Materials and Methods;Methods;Results;Discussion;Results and Discussion;Discussion and Conclusion;Results and Conclusion;Conclusion;Conclusions;Data Availability;Additional Points;Ethical Approval;Consent;Disclosure;Conflicts of Interest;Authors’ Contributions;Funding Statement;Acknowledgements;Supplementary Materials;References;Appendix;Abbreviations;Preliminaries;Introduction and Preliminaries;Notation;Proof of Theorem;Proofs;Analysis of Results;Examples;Numerical Example;Applications;Numerical Simulation;Model;Model Formulation;Systematic Palaeontology;Nomenclatural Acts;Taxonomic Implications;Experimental;Synthesis;Overview;Characterization;Background;Experimental;Theories;Calculations;Model Verification;Model Implementation;Geographic location;Study Area;Geological setting;Data Collection;Field Testing;Data and Sampling;Dataset;Literature Review;Related Works;Related Work;System Model;Methods and Data;Experimental Results;Results and Analysis;Evaluation;Implementation;Case Presentation;Case Report;Search Terms;Case Description;Case Series;Background;Limitations;Additional Points;Case;Case 1;Case 2 etc.;Concern Details;Retraction Details;Copyright;Related Articles
摘要:
In the Internet of Things (IoT), data sharing security is important to social security. It is a huge challenge to enable more accurate and secure access to data by authorized users. Blockchain access control schemes are mostly one-way access control, which cannot meet the need for ciphertext search, two-way confirmation of users and data, and secure data transmission. Thus, this paper proposes a blockchain-aided searchable encryption-based two-way attribute access control scheme (STW-ABE). The scheme combines ciphertext attribute access control, key attribute access control, and ciphertext search. In particular, two-way access control meets the requirement of mutual confirmation between users and data. The ciphertext search avoids information leakage during transmission, thus improving overall efficiency and security during data sharing. Moreover, user keys are generated by the coalition blockchain. Besides, the ciphertext search and pre-decryption are outsourced to cloud servers, reducing the computing pressure on users and adapting to the needs of lightweight users in the IoT. Security analysis proves that our scheme is secure under a chosen-plaintext attack and a chosen keyword attack. Simulations show that the cost of encryption and decryption, keyword token generation, and ciphertext search of our scheme are preferable.
作者机构:
[Ouyang, Chunping; Tian, Wenlong; Liu, Yongbin; Liu, Qifei; Li, Jing; Geng, Yuqing] Univ South China, Sch Comp Sci & Technol, Hengyang, Peoples R China.;[Tian, Wenlong] Nanyang Technol Univ, Sch Phys & Math Sci, Singapore, Singapore.;[Ouyang, Chunping; Tian, Wenlong; Liu, Yongbin] Hunan Prov Base Sci & Technol Innovat Cooperat, Hengyang, Peoples R China.;[Li, Ruixuan] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China.;[Xiao, Weijun] Virginia Commonwealth Univ, Elect & Comp Engn, Richmond, VA 23284 USA.
会议名称:
IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC) / 9th International Conference on Big Data Computing, Applications and Technologies (BDCAT)
会议时间:
DEC 06-09, 2022
会议地点:
Vancouver, WA
会议主办单位:
[Geng, Yuqing;Tian, Wenlong;Ouyang, Chunping;Liu, Yongbin;Liu, Qifei;Li, Jing] Univ South China, Sch Comp Sci & Technol, Hengyang, Peoples R China.^[Tian, Wenlong] Nanyang Technol Univ, Sch Phys & Math Sci, Singapore, Singapore.^[Tian, Wenlong;Ouyang, Chunping;Liu, Yongbin] Hunan Prov Base Sci & Technol Innovat Cooperat, Hengyang, Peoples R China.^[Li, Ruixuan] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China.^[Xiao, Weijun] Virginia Commonwealth Univ, Elect & Comp Engn, Richmond, VA 23284 USA.^[Xu, Zhiyong] Suffolk Univ, Math & Comp Sci Dept, Boston, MA 02114 USA.
关键词:
Cloud Storage;Resemblance Detection;Context-Aware;Deduplication Ratio Prediction
摘要:
With the prevalence of cloud storage, people prefer to outsource their data to the cloud for flexibility and reliability. Undoubtedly, there are lots of redundancy among these data. However, high-end storage with deduplication costs heavy computation and increases the data management complexity. Potential customers need the redundancy proportion information of their outsourced data to decide whether high-end storage with deduplication is worthwhile. Thus, many researchers have previously attempted to predict the redundant ratio. However, existing mechanisms ignore the redundancy proportion among similar chunks containing many duplicate data. Although resemblance detection, detecting the duplicate parts among similar data, has become a hot issue, it is hardly applied to the conventional deduplication ratio estimation because of unacceptable calculation cost. Therefore, we analyze the limitations and challenges of deduplication ratio prediction in prediction scope and response time and further propose a novel prediction scheme. By leveraging the context-aware resemblance detection, and confidence interval theory, our method can achieve faster estimation speed with higher accuracy in deduplication ratio compared with the state-of-the-art work. Finally, the results show that our method can efficiently and effectively estimate the proportion of duplicate chunks and redundant data among similar chunks by conducting experiments on real workloads.
期刊:
Proceedings of SPIE - The International Society for Optical Engineering,2022年12331 ISSN:0277-786X
通讯作者:
Li, Meng(mlemon@usc.edu.cn)
作者机构:
[He, Chen] School of Computer Science and Technology, University of South China, Hunan, Hengyang;421001, China;[Yang, Xiaohua; Li, Meng; Yan, Shiyu] CNNC Key Laboratory on High Trusted Computing, Computer School, University of South China, Hunan, Hengyang, China;[He, Chen] 421001, China
作者机构:
Engineering and Technology Research Center of Software Evaluation and Testing for Intellectual Equipment of Hunan Province, Hengyang of Hunan Prov., China;CNNC Key Laboratory on High Trusted Computing, Hengyang of Hunan Prov., China;School of Computer Science and Technology, University of South China, Hengyang, China;[Shengfu Fu; Xiaohua Yang; Meng Li; Fan Wang] Engineering and Technology Research Center of Software Evaluation and Testing for Intellectual Equipment of Hunan Province, Hengyang of Hunan Prov., China<&wdkj&>CNNC Key Laboratory on High Trusted Computing, Hengyang of Hunan Prov., China<&wdkj&>School of Computer Science and Technology, University of South China, Hengyang, China
会议名称:
2021 3rd International Academic Exchange Conference on Science and Technology Innovation (IAECST)
会议时间:
10 December 2021
会议地点:
Guangzhou, China
会议论文集名称:
2021 3rd International Academic Exchange Conference on Science and Technology Innovation (IAECST)
关键词:
Metamorphic testing;Metamorphic relation;Output pattern;Burnup calculation program
摘要:
It is challenging to verify the design and analysis software adequately for nuclear power areas because of the inflated cost, time-consuming benchmark problems development, and few verification examples. This situation is called the test oracle problem. Metamorphic testing cleverly combines the specific properties of physical equations and numerical algorithms with software verification. Without constructing numerical solutions or verification examples, it verifies the software by checking whether the input and output of multiple executions meet the metamorphic relations. Since metamorphic relation is the nature rules of physical models and numerical algorithms, it is not restricted to specific code implementation technology. Therefore, it is no doubt that metamorphic testing is a promising technique to alleviate the test oracle problem.Metamorphic relation is the key to metamorphic testing, which consists of input pattern and output pattern. The former can be obtained through manual analysis by industry experts, while the latter does not have a system identification method. An output pattern identification method based on the solution figure has been introduced here. Specifically, the input pattern is received by artificially analyzing the physical property of the program under test. Next, a group of test inputs has been generated according to such patterns. The calculation results are received after those inputs drive the program. Moreover, the function form of the output pattern is guided by the image of the results. Finally, metamorphic relation is constructed by input pattern and output pattern. For a demonstration of this method’s details, a nuclide burnup calculation code is employed here, namely NUIT. By investigating the resulting image of 3820 isotopes, their output pattern classification models have been established according to the burnup depth and nuclide density. It includes four categories and seventeen sub-categories. The verification of other burnup calculation codes can use these metamorphic relations directly. This technique can effectively avoid blindness and randomness in the metamorphic relation identification process. Besides, it can be used for verification in other nuclear professional fields, such as thermal-hydraulics and radiation protection. It would provide critical support for applying metamorphic testing.
摘要:
Spark Streaming is an extension of the core Spark engine that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. It treats stream as a series of deterministic batches and handles them as regular jobs. However, for a stream job responsible for a batch, data skew (i.e., the imbalance in the amount of data allocated to each reduce task), can degrade the job performance significantly because of load imbalance. In this paper, we propose an improved range partitioner (ImRP) to alleviate the reduce skew for stream jobs in Spark Streaming. Unlike previous work, ImRP does not require any pre-run sampling of input data and generates the data partition scheme based on the intermediate data distribution estimated by the previous batch processing, in which a prediction model EWMA (Exponentially Weighted Moving Average) is adopted. To lighten the data skew, ImRP presents a novel method of calculating the partition borders optimally, and a mechanism of splitting the border key clusters when the semantics of shuffle operators permit. Besides, ImRP considers the integrated partition size and heterogeneity of computing environments when balancing the load among reduce tasks appropriately. We implement ImRP in Spark-3.0 and evaluate its performance on four representative benchmarks: wordCount, sort, pageRank, and LDA. The results show that by mitigating the data skew, ImRP can decrease the execution time of stream jobs substantially compared with some other partition strategies, especially when the skew degree of input batch is serious.