Small Summaries for Big Data
Graham Cormoden Ke Yi
The massive volume of data generated
in modern applications can overwhelm our ability to conveniently
transmit, store, and index it. For many scenarios, building a compact
summary of a dataset that is vastly smaller enables flexibility and
efficiency in a range of queries over the data, in exchange for some
approximation. This comprehensive introduction to data summarization,
aimed at practitioners and students, showcases the algorithms, their
behavior, and the mathematical underpinnings of their operation. The
coverage starts with simple sums and approximate counts, building to
more advanced probabilistic structures such as the Bloom Filter,
distinct value summaries, sketches, and quantile summaries. Summaries
are described for specific types of data, such as geometric data,
graphs, and vectors and matrices. The authors offer detailed
descriptions of and pseudocode for key algorithms that have been
incorporated in systems from companies such as Google, Apple, Microsoft,
Netflix and Twitter.
in modern applications can overwhelm our ability to conveniently
transmit, store, and index it. For many scenarios, building a compact
summary of a dataset that is vastly smaller enables flexibility and
efficiency in a range of queries over the data, in exchange for some
approximation. This comprehensive introduction to data summarization,
aimed at practitioners and students, showcases the algorithms, their
behavior, and the mathematical underpinnings of their operation. The
coverage starts with simple sums and approximate counts, building to
more advanced probabilistic structures such as the Bloom Filter,
distinct value summaries, sketches, and quantile summaries. Summaries
are described for specific types of data, such as geometric data,
graphs, and vectors and matrices. The authors offer detailed
descriptions of and pseudocode for key algorithms that have been
incorporated in systems from companies such as Google, Apple, Microsoft,
Netflix and Twitter.
年:
2020
出版社:
Cambridge University Press
言語:
english
ページ:
278
ISBN 10:
1108769934
ISBN 13:
9781108769938
ファイル:
PDF, 2.28 MB
IPFS:
,
english, 2020