Graph-Based Clustering and Data Visualization Algorithms [Vathy-Fogarassy & Abonyi 2013-06-05].pdf

(3907 KB) Pobierz

SPRINGER BRIEFS IN COMPUTER SCIENCE

Ágnes Vathy-Fogarassy

János Abonyi

Graph-Based

Clustering and

Data Visualization

Algorithms

SpringerBriefs in Computer Science

Series Editors

Stan Zdonik

Peng Ning

Shashi Shekhar

Jonathan Katz

Xindong Wu

Lakhmi C. Jain

David Padua

Xuemin Shen

Borko Furht

V. S. Subrahmanian

Martial Hebert

Katsushi Ikeuchi

Bruno Siciliano

For further volumes:

http://www.springer.com/series/10028

Ágnes Vathy-Fogarassy

János Abonyi

Graph-Based Clustering

and Data Visualization

Algorithms

123

Ágnes Vathy-Fogarassy

Computer Science and Systems Technology

University of Pannonia

Veszprém

Hungary

János Abonyi

Department of Process Engineering

University of Pannonia

Veszprém

Hungary

ISSN 2191-5768

ISBN 978-1-4471-5157-9

DOI 10.1007/978-1-4471-5158-6

ISSN 2191-5776 (electronic)

ISBN 978-1-4471-5158-6 (eBook)

Springer London Heidelberg New York Dordrecht

Library of Congress Control Number: 2013935484

János Abonyi 2013

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of

the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations,

recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or

information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar

methodology now known or hereafter developed. Exempted from this legal reservation are brief

excerpts in connection with reviews or scholarly analysis or material supplied speciﬁcally for the

purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the

work. Duplication of this publication or parts thereof is permitted only under the provisions of

the Copyright Law of the Publisher’s location, in its current version, and permission for use must

always be obtained from Springer. Permissions for use may be obtained through RightsLink at the

The use of general descriptive names, registered names, trademarks, service marks, etc. in this

publication does not imply, even in the absence of a speciﬁc statement, that such names are exempt

from the relevant protective laws and regulations and therefore free for general use.

While the advice and information in this book are believed to be true and accurate at the date of

publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for

any errors or omissions that may be made. The publisher makes no warranty, express or implied, with

respect to the material contained herein.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Preface

Clustering, as a special area of data mining, is one of the most commonly used

methods for discovering hidden structure of data. Clustering algorithms group a set

of objects in such a way that objects in the same cluster are more similar to each

other than to those in other clusters. Cluster analysis can be used to quantize data,

extract cluster prototypes for the compact representation of the data set, select

relevant features, segment data into homogeneous subsets, and to initialize

regression and classiﬁcation models.

Graph-based clustering algorithms are powerful in giving results close to the

human intuition [1]. The common characteristic of graph-based clustering methods

developed in recent years is that they build a graph on the set of data and then use

the constructed graph during the clustering process [2–9]. In graph-based clus-

tering methods objects are considered as vertices of a graph, while edges between

them are treated differently by the various approaches. In the simplest case, the

graph is a complete graph, where all vertices are connected to each other, and the

edges are labeled according to the degree of the similarity of the objects. Con-

sequently, in this case the graph is a weighted complete graph.

In case of large data sets the computation of the complete weighted graph

requires too much time and storage space. To reduce complexity many algorithms

work only with sparse matrices and do not utilize the complete graph. Sparse

similarity matrices contain information only about a small subset of the edges,

mostly those corresponding to higher similarity values. These sparse matrices

encode the most relevant similarity values and graphs based on these matrices

visualize these similarities in a graphical way.

Another way to reduce the time and space complexity is the application of a

vector quantization (VQ) method (e.g. k-means [10], neural gas (NG) [11], Self-

Organizing Map (SOM) [12]). The main goal of the VQ is to represent the entire

set of objects by a set of representatives (codebook vectors), whose cardinality is

much lower than the cardinality of the original data set. If a VQ method is used to

reduce the time and space complexity, and the clustering method is based on

graph-theory, vertices of the graph represent the codebook vectors and the edges

denote the connectivity between them.

Weights assigned to the edges express similarity of pairs of objects. In this book

Graph-Based Clustering and Data Visualization Algorithms [Vathy-Fogarassy & Abonyi 2013-06-05].pdf

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: