oalib
Search Results: 1 - 10 of 100 matches for " "
All listed articles are free for downloading (OA Articles)
Page 1 /100
Display every page Item
TDPA: Trend Detection and Predictive Analytics  [PDF]
M. Sakthi ganesh,CH.Pradeep Reddy,,N.Manikandan,,DR.P.Venkata
International Journal on Computer Science and Engineering , 2011,
Abstract: Text mining is the process of exploratory text analysis either by automatic or semi-automatic means that helps finding previously unknown information. Text mining is a highly interdisciplinary research area, bringing together research insights from the fields of data mining, natural language processing, machine learning, and information retrieval. The amount of textual data available is too huge to be managed manually. An automatic system is needed to analyze and interpret the text. Some of the systems are semi automatic requiring user input to begin processing others are fully automatic producing output from the input corpus without guidance. The review literatures on trend detection indicates that much progress has been made toward automating the process of detecting emerging trends but there is room for improvement. In this work, we propose a Trend Detection and Predictive Analytics (TDPA) using Living Analytics to detect emerging trends from live data to cater the needs of various users irrespective of their domain. The system needs to serve asgeneral purpose software that will help the users to identify and visualize current happenings pertaining to any domain in an efficient and user friendly way. The paper also aims at forecasting the future of the trends obtained in helping the users to look forward and make quick decisions.
CryptGraph: Privacy Preserving Graph Analytics on Encrypted Graph  [PDF]
Pengtao Xie,Eric Xing
Computer Science , 2014,
Abstract: Many graph mining and analysis services have been deployed on the cloud, which can alleviate users from the burden of implementing and maintaining graph algorithms. However, putting graph analytics on the cloud can invade users' privacy. To solve this problem, we propose CryptGraph, which runs graph analytics on encrypted graph to preserve the privacy of both users' graph data and the analytic results. In CryptGraph, users encrypt their graphs before uploading them to the cloud. The cloud runs graph analysis on the encrypted graphs and obtains results which are also in encrypted form that the cloud cannot decipher. During the process of computing, the encrypted graphs are never decrypted on the cloud side. The encrypted results are sent back to users and users perform the decryption to obtain the plaintext results. In this process, users' graphs and the analytics results are both encrypted and the cloud knows neither of them. Thereby, users' privacy can be strongly protected. Meanwhile, with the help of homomorphic encryption, the results analyzed from the encrypted graphs are guaranteed to be correct. In this paper, we present how to encrypt a graph using homomorphic encryption and how to query the structure of an encrypted graph by computing polynomials. To solve the problem that certain operations are not executable on encrypted graphs, we propose hard computation outsourcing to seek help from users. Using two graph algorithms as examples, we show how to apply our methods to perform analytics on encrypted graphs. Experiments on two datasets demonstrate the correctness and feasibility of our methods.
Privacy by design in big data: An overview of privacy enhancing technologies in the era of big data analytics  [PDF]
Giuseppe D'Acquisto,Josep Domingo-Ferrer,Panayiotis Kikiras,Vicen? Torra,Yves-Alexandre de Montjoye,Athena Bourka
Computer Science , 2015, DOI: 10.2824/641480
Abstract: The extensive collection and processing of personal information in big data analytics has given rise to serious privacy concerns, related to wide scale electronic surveillance, profiling, and disclosure of private data. To reap the benefits of analytics without invading the individuals' private sphere, it is essential to draw the limits of big data processing and integrate data protection safeguards in the analytics value chain. ENISA, with the current report, supports this approach and the position that the challenges of technology (for big data) should be addressed by the opportunities of technology (for privacy). We first explain the need to shift from "big data versus privacy" to "big data with privacy". In this respect, the concept of privacy by design is key to identify the privacy requirements early in the big data analytics value chain and in subsequently implementing the necessary technical and organizational measures. After an analysis of the proposed privacy by design strategies in the different phases of the big data value chain, we review privacy enhancing technologies of special interest for the current and future big data landscape. In particular, we discuss anonymization, the "traditional" analytics technique, the emerging area of encrypted search and privacy preserving computations, granular access control mechanisms, policy enforcement and accountability, as well as data provenance issues. Moreover, new transparency and access tools in big data are explored, together with techniques for user empowerment and control. Achieving "big data with privacy" is no easy task and a lot of research and implementation is still needed. Yet, it remains a possible task, as long as all the involved stakeholders take the necessary steps to integrate privacy and data protection safeguards in the heart of big data, by design and by default.
CloudMine: Multi-Party Privacy-Preserving Data Analytics Service  [PDF]
Dinh Tien Tuan Anh,Quach Vinh Thanh,Anwitaman Datta
Computer Science , 2012,
Abstract: An increasing number of businesses are replacing their data storage and computation infrastructure with cloud services. Likewise, there is an increased emphasis on performing analytics based on multiple datasets obtained from different data sources. While ensuring security of data and computation outsourced to a third party cloud is in itself challenging, supporting analytics using data distributed across multiple, independent clouds is even further from trivial. In this paper we present CloudMine, a cloud-based service which allows multiple data owners to perform privacy-preserved computation over the joint data using their clouds as delegates. CloudMine protects data privacy with respect to semi-honest data owners and semi-honest clouds. It furthermore ensures the privacy of the computation outputs from the curious clouds. It allows data owners to reliably detect if their cloud delegates have been lazy when carrying out the delegated computation. CloudMine can run as a centralized service on a single cloud, or as a distributed service over multiple, independent clouds. CloudMine supports a set of basic computations that can be used to construct a variety of highly complex, distributed privacy-preserving data analytics. We demonstrate how a simple instance of CloudMine (secure sum service) is used to implement three classical data mining tasks (classification, association rule mining and clustering) in a cloud environment. We experiment with a prototype of the service, the results of which suggest its practicality for supporting privacy-preserving data analytics as a (multi) cloud-based service.
Visual and Predictive Analytics on Singapore News: Experiments on GDELT, Wikipedia, and ^STI  [PDF]
Clifton Phua,Yuzhang Feng,Junyao Ji,Timothy Soh
Computer Science , 2014,
Abstract: The open-source Global Database of Events, Language, and Tone (GDELT) is the most comprehensive and updated Big Data source of important terms extracted from international news articles . We focus only on GDELT's Singapore events to better understand the data quality of its news articles, accuracy of its term extraction, and potential for prediction. To test news completeness and validity, we visually compared GDELT (Singapore news articles' terms from 1979 to 2013) to Wikipedia's timeline of Singaporean history. To test term extraction accuracy, we visually compared GDELT (CAMEO codes and TABARI system of extraction from Singapore news articles' text from April to December 2013) to SAS Text Miner's term and topic extraction. To perform predictive analytics, we propose a novel feature engineering method to transform row-level GDELT from articles to a user-specified temporal resolution. For example, we apply a decision tree using daily counts of feature values from GDELT to predict Singapore stock market's Straits Times Index (^STI). Of practical interest from the above results is SAS Visual Analytics' ability to highlight the various impacts of June 2013 Southeast Asian haze and December 2013 Little India riot on Singapore. Although Singapore is unique as a sovereign city-state, a leading financial centre, has strong international influence, and consists of a highly multi-cultural population, the visual and predictive analytics reported here are highly applicable to another country's GDELT data.
Building and Measuring Privacy-Preserving Predictive Blacklists  [PDF]
Luca Melis,Apostolos Pyrgelis,Emiliano De Cristofaro
Computer Science , 2015,
Abstract: Collaborative approaches to network defense are being increasingly advocated, aiming to proactively predict and speed up detection of attacks. In particular, a lot of attention has recently been given to the problem of predictive blacklisting, i.e., forecasting attack sources based on Intrusion Detection Systems (IDS) alerts contributed by different organizations. While collaboration allows the discovery of groups of correlated attacks targeting similar victims, it also raises important privacy and security challenges, thus motivating privacy-preserving approaches to the problem. Although recent work provides encouraging results on the feasibility of collaborative predictive blacklisting via limited data sharing, a number of open problems remain unaddressed, which this paper sets to address. We introduce a privacy-friendly system for predictive blacklisting featuring a semi-trusted authority that clusters organizations based on the similarity of their logs, without access to these logs. Entities in the same cluster then securely share relevant logs with each other, and build predictive blacklists. We present an extensive set of measurements as we experiment with prior work as well as with four different clustering algorithms and three privacy-preserving sharing strategies, using several million alerts collected from DShield.org over several months as our training and ground-truth datasets. Our results show that collaborating with similarly attacked organizations always significantly improves the prediction and that privacy protection does not actually limit this improvement. Finally, we discuss how different clustering and log sharing methods yield different trade-offs between precision and recall.
ALOJA: A Framework for Benchmarking and Predictive Analytics in Big Data Deployments  [PDF]
Josep Ll. Berral,Nicolas Poggi,David Carrera,Aaron Call,Rob Reinauer,Daron Green
Computer Science , 2015, DOI: 10.1109/TETC.2015.2496504
Abstract: This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.
A Game-Theoretic Study on Non-Monetary Incentives in Data Analytics Projects with Privacy Implications  [PDF]
Michela Chessa,Jens Grossklags,Patrick Loiseau
Computer Science , 2015,
Abstract: The amount of personal information contributed by individuals to digital repositories such as social network sites has grown substantially. The existence of this data offers unprecedented opportunities for data analytics research in various domains of societal importance including medicine and public policy. The results of these analyses can be considered a public good which benefits data contributors as well as individuals who are not making their data available. At the same time, the release of personal information carries perceived and actual privacy risks to the contributors. Our research addresses this problem area. In our work, we study a game-theoretic model in which individuals take control over participation in data analytics projects in two ways: 1) individuals can contribute data at a self-chosen level of precision, and 2) individuals can decide whether they want to contribute at all (or not). From the analyst's perspective, we investigate to which degree the research analyst has flexibility to set requirements for data precision, so that individuals are still willing to contribute to the project, and the quality of the estimation improves. We study this tradeoff scenario for populations of homogeneous and heterogeneous individuals, and determine Nash equilibria that reflect the optimal level of participation and precision of contributions. We further prove that the analyst can substantially increase the accuracy of the analysis by imposing a lower bound on the precision of the data that users can reveal.
Managing large-scale scientific hypotheses as uncertain and probabilistic data with support for predictive analytics  [PDF]
Bernardo Gon?alves,Fabio Porto
Computer Science , 2014, DOI: 10.1109/MCSE.2015.102
Abstract: The sheer scale of high-resolution raw data generated by simulation has motivated non-conventional approaches for data exploration referred as `immersive' and `in situ' query processing of the raw simulation data. Another step towards supporting scientific progress is to enable data-driven hypothesis management and predictive analytics out of simulation results. We present a synthesis method and tool for encoding and managing competing hypotheses as uncertain data in a probabilistic database that can be conditioned in the presence of observations.
Toward Trusted Sharing of Network Packet Traces Using Anonymization: Single-Field Privacy/Analysis Tradeoffs  [PDF]
William Yurcik,Clay Woolam,Greg Hellings,Latifur Khan,Bhavani Thuraisingham
Computer Science , 2007,
Abstract: Network data needs to be shared for distributed security analysis. Anonymization of network data for sharing sets up a fundamental tradeoff between privacy protection versus security analysis capability. This privacy/analysis tradeoff has been acknowledged by many researchers but this is the first paper to provide empirical measurements to characterize the privacy/analysis tradeoff for an enterprise dataset. Specifically we perform anonymization options on single-fields within network packet traces and then make measurements using intrusion detection system alarms as a proxy for security analysis capability. Our results show: (1) two fields have a zero sum tradeoff (more privacy lessens security analysis and vice versa) and (2) eight fields have a more complex tradeoff (that is not zero sum) in which both privacy and analysis can both be simultaneously accomplished.
Page 1 /100
Display every page Item


Home
Copyright © 2008-2017 Open Access Library. All rights reserved.