[arXiv] BigDataFr recommends: Making problems tractable on big data via preprocessing with polylog-size output

BigDataFr recommends: Making problems tractable on big data via preprocessing with polylog-size output To provide a dichotomy between those queries that can be made feasible on big data after appropriate preprocessing and those for which preprocessing does not help, Fan et al. developed the ⊓-tractability theory. This theory provides a formal foundation for understanding the […]

[arXiv] BigDataFr recommends: Big Data Analytics-Enhanced Cloud Computing: Challenges, Architectural Elements, and Future Directions

BigDataFr recommends: Big Data Analytics-Enhanced Cloud Computing: Challenges, Architectural Elements, and Future Directions Excerpt The emergence of cloud computing has made dynamic provisioning of elastic capacity to applications on-demand. Cloud data centers contain thousands of physical servers hosting orders of magnitude more virtual machines that can be allocated on demand to users in a pay-as-you-go […]

[arXiv] BigDataFr recommends: An Extended classification and Comparison of NoSQL Big Data Models

BigDataFr recommends: An Extended classification and Comparison of NoSQL Big Data Models In last few years, the volume of the data has grown manyfold. The data storages have been inundated by various disparate potential data outlets, leading by social media such as Facebook, Twitter, etc. The existing data models are largely unable to illuminate the […]

[arXiv] BigDataFr recommends: Learning to Hash for Indexing Big Data – A Survey

BigDataFr recommends: Learning to Hash for Indexing Big Data – A Survey ‘The explosive growth in big data has attracted much attention in designing efficient indexing and search methods recently. In many critical applications such as large-scale search and pattern matching, finding the nearest neighbors to a query is a fundamental research problem. However, the […]

[arxiv] BIgDataFr recommends: Train faster, generalize better – Stability of stochastic gradient descent #datascientist

BigDataFr recommends: Train faster, generalize better – Stability of stochastic gradient descent ‘We show that any model trained by a stochastic gradient method with few iterations has vanishing generalization error. We prove this by showing the method is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex […]

[arXiv] BigDataFr recommends: Deep Broad Learning – Big Models for Big Data

BigDataFr recommends: Deep Broad Learning – Big Models for Big Data ‘Deep learning has demonstrated the power of detailed modeling of complex high-order (multivariate) interactions in data. For some learning tasks there is power in learning models that are not only Deep but also Broad. […] The most accurate models will integrate all that information. […]

[arXiv] BigDataFr recommends: A Big Data Analyzer for Large Trace Logs #machine learning

BigDataFr recommends: A Big Data Analyzer for Large Trace Logs ‘Current generation of Internet-based services are typically hosted on large data centers that take the form of warehouse-size structures housing tens of thousands of servers. Continued availability of a modern data center is the result of a complex orchestration among many internal and external actors […]

[arXiv] BigDataFr recommends: A Flexible Coordinate Descent Method for Big Data Applications #datascientist #machinelearning

BigDatafr recommends: A Flexible Coordinate Descent Method for Big Data Applications ‘In this paper we present a novel randomized block coordinate descent method for the minimization of a convex composite objective function. The method uses (approximate) partial second-order (curvature) information, so that the algorithm performance is more robust when applied to highly nonseparable or ill […]

[arXiv] BigDataFr recommends: Big Data Analytics in Bioinformatics – A Machine Learning Perspective #machine-learning

BigDataFr recommends: Big Data Analytics in Bioinformatics – A Machine Learning Perspective ‘Bioinformatics research is characterized by voluminous and incremental datasets and complex data analytics methods. The machine learning methods used in bioinformatics are iterative and parallel. These methods can be scaled to handle big data using the distributed and parallel computing technologies. Usually big […]

[arXiv] BigDataFr recommends: Predicting Regional Economic Indices using Big Data of Individual Bank Card Transactions #machine learning #datascientist

BigDataFr recommends: Predicting Regional Economic Indices using Big Data of Individual Bank Card Transactions ‘For centuries quality of life was a subject of studies across different disciplines. However, only with the emergence of a digital era, it became possible to investigate this topic on a larger scale. Over time it became clear that quality of […]