A Graph Distance Metric Based On The Maximal Common Subgraph
A Graph Distance Metric Based On The Maximal Common Subgraph
255259
Abstract
Error-tolerant graph matching is a powerful concept that has various applications in pattern recognition and machine
vision. In the present paper, a new distance measure on graphs is proposed. It is based on the maximal common subgraph of
two graphs. The new measure is superior to edit distance based measures in that no particular edit operations together with
their costs need to be defined. It is formally shown that the new distance measure is a metric. Potential algorithms for the
efficient computation of the new measure are discussed. q 1998 Elsevier Science B.V. All rights reserved.
Keywords: Error-tolerant graph matching; Distance measure; Maximal common subgraph; Graph edit distance; Metric
Speaking more generally, it is often desired that the n : E L E is a function assigning labels to the
distance measure d fulfills the properties of a metric: edges.
1. d A, B . s 0 m A s B,
2. d A, B . s d B, A., If V s
0 then G is called the empty graph.
3. d A, B . q d B,C . ( d A,C ..
Usually edit distance measures are metrics. Only Definition 2. Given a graph G s V, E, m , n ., a sub-
if the costs of the underlying edit operations satisfy graph of G is a graph S s VS , ES , mS , n S . such that
certain conditions, the properties listed above will VS 9 V,
hold. But these conditions are sometimes too restric- ES s E l VS = VS .,
tive, or incompatible with the considered problem mS and n S are the restrictions of m and n to VS
domain. and ES , respectively, i.e.,
In the present paper, we propose a new graph
m . if g VS ,
distance measure that is based on the maximal com-
mon subgraph of two graphs. The main contribution
mS . s undefined otherwise,
of the paper is the formal proof that the new distance n e. if e g ES ,
measure is a metric. An advantage of the new dis-
tance measure over graph edit distance is the fact
nS e . s undefined otherwise.
that it does not depend on edit costs. It is well known The notation S 9 G is used to indicate that S is a
that any edit distance measure critically depends on subgraph of G.
the costs of the underlying edit operations. But the
problem how these edit costs are obtained is still Definition 3. A bijective function f : V V X is a
unsolved. Using the new distance measure, this prob- graph isomorphism from a graph G s V, E, m , n . to
lem can be avoided. a graph GX s V X , EX , mX , n X . if
In the next section of this paper we will present m . s mX f .. for all g V,
basis definitions. The following section will first for any edge e s 1 , 2 . g E there exists an edge
define the maximal common subgraph based dis- eX s f 1 ., f 2 .. g EX such that n e . s n eX .,
tance measure. Then it will be shown that the mea- and for any eX s X1 ,X2 . g EX there exists an edge
sure is a metric. Concluding remarks will make up e s fy1 X1 ., fy1 X2 .. g E such that n eX . s
the final section, including a discussion of potential n e ..
algorithms for the computation of the new distance
measure. Definition 4. An injective function f : V V X is a
subgraph isomorphism from G to GX if there exists a
subgraph S 9 GX such that f is a graph isomorphism
2. Basic definitions from G to S.
The maximal common subgraph of two graphs G 1 Let m12 s <mcs G 1 ,G 2 .<, m 23 s <mcs G 2 ,G 3 .<, and
and G 2 will be denoted by mcs G1 ,G 2 .. Notice that m13 s <mcs G 1 ,G 3 .<. Then the following relation
mcs G 1 ,G 2 . is not necessarily unique for two given holds true:
graphs, G 1 and G 2 . The number of nodes of a graph
m12 q m 23 ( < G 2 < . 1.
G s V, E, m , n . is given by < V <. For the purpose of
notational convenience, we also denote the number Property 4 in Theorem 1 is equivalent to the follow-
of nodes of G by < G <. ing inequality:
m12 m 23
3. Graph distance measure 1y q1y
max < G 1 < , < G 2 < . max < G 2 < , < G 3 < .
m13
Definition 7. The distance of two non-empty graphs 01y . 2.
max < G 1 < , < G 3 < .
G 1 and G 2 is defined as
<mcs G 1 ,G 2 . < We will show that the left-hand side of this inequal-
d G 1 ,G 2 . s 1 y . ity is always greater than or equal to 1, which is
max < G 1 < , < G 2 < .
equivalent to
An example is shown in Fig. 1. Here we have
max < G 1 < , < G 2 < . max < G 2 < , < G 3 < .
< G 1 < s 5, < G 2 < s 4 and <mcs G 1 ,G 2 .< s 3. Hence,
d G 1 ,G 2 . s 0.4. 0 m12 max < G 2 < , < G 3 < . q m 23 max < G 1 < , < G 2 < . . 3.
Theorem 1. For any graphs G 1 , G 2 and G 3 , the We proceed by a simple case analysis.
following properties hold true: Case A.1: < G 1 < 0 < G 2 < 0 < G 3 <. Here Eq. 3. is
1. 0 ( d G1 ,G 2 . ( 1, equivalent to
2. d G 1 ,G 2 . s 0 m G1 and G 2 are isomorphic to
each other, < G 1 < P < G 2 < 0 m12 < G 2 < q m 23 < G 1 < . 4.
3. d G 1 ,G 2 . s d G 2 ,G 1 .,
4. d G 1 ,G 3 . ( d G 1 ,G 2 . q d G 2 ,G 3 .. From Eq. 1. we conclude that
Proof. Properties 13 follow directly from Defini- < G 1 < < G 2 < 0 m12 < G 1 < q m 23 < G 1 < 0 m12 < G 2 < q m 23 < G1 < .
tion 7. In the following proof of the triangle inequal-
ity we distinguish two cases: Case A.2: < G 1 < 0 < G 3 < 0 < G 2 <. Here Eq. 3. be-
Case A. The graphs mcs G 1 ,G 2 . and mcs G 2 ,G 3 . comes
are disjoint, or speaking more strictly, the maximal
common subgraph of mcs G 1 ,G 2 . and mcs G 2 ,G 3 . < G 1 < P < G 3 < 0 m12 P < G 3 < q m 23 P < G 1 < . 5.
is empty. For a Venn diagram illustration see Fig.
2a.. Using Eq. 1. again we conclude
0 m12 < G 1 < q m 23 < G 1 < 0 m12 < G 3 < q m 23 < G 1 < .
The remaining four cases < G 2 < 0 < G 1 < 0 < G 3 <, < G 2 <
0 < G 3 < 0 < G 1 <, < G 3 < 0 < G 1 < 0 < G 2 < and < G 3 < 0 < G 2 < 0
< G 1 < can be shown similarly.
Fig. 1. An example of Definition 7: a. a graph G1 ; b. a graph
Case B. Here we assume that the maximal com-
G 2 ; c. the maximal common subgraph, mcs G1 ,G 2 ., of G1 and mon subgraph of mcs G 1 ,G 2 . and mcs G 2 ,G 3 . is not
G 2 . Here we have d G1 ,G 2 . s 0.4. empty see Fig. 2b...
258 H. Bunke, K. Shearerr Pattern Recognition Letters 19 (1998) 255259
q m 23 max < G 1 < , < G 2 < . max < G 1 < , < G 3 < . 1
Strictly speaking, this statement is only true if isomorphic
graphs are regarded equal. But this assumption is certainly justi-
y mmax < G1 < , < G 2 < . max < G 2 < , < G 3 < . . 8. fied in most applications.
H. Bunke, K. Shearerr Pattern Recognition Letters 19 (1998) 255259 259
behaved to allow sensible navigation of the Cho, C.J., Kim, J.J., 1992. Recognizing 3-D objects by forward
database. The use of a metric, such as that proposed, checking constrained tree search. Pattern Recognition Lett. 13
8., 587597.
for the distance measure ensures that the behaviour Christmas, W.J., Kittler, J., Petrou, M., 1995. Structural matching
of the similarity retrieval will be consistent and in computer vision using probabilistic relaxation. IEEE Trans.
comprehensible, aiding the user in their search task. Pattern Anal. Machine Intell. 17 8., 749764.
Classical algorithms for computing the maximal Cordella, L., Foggia, P., Sansone, C., Vento, M., 1997. Subgraph
common subgraph of two graphs are based on maxi- transformations for the inexact matching of attributed rela-
tional graphs. In: Jolion, J.-M., Kropatsch, W. Eds.., Prepro-
mal clique detection Levi, 1972. or backtracking ceeding GbR97: IAPR Workshop on Graph Based Represen-
McGregor, 1982.. These algorithms are conceptu- tations, Lyon.
ally simple, but have a high computational complex- Horaud, R., Skordas, T., 1989. Stereo correspondence through
ity. For example, the worst case time complexity of feature grouping and maximal cliques. IEEE Trans. Pattern
the method described by Levi 1972. is O nm. n ., Anal. Machine Intell. 11 11., 11681180.
Lee, S., Hsu, F., 1992. Spatial reasoning and similarity retrieval of
where n and m denote the number of nodes of the images using 2D C-string knowledge representation. Pattern
two graphs under consideration. Recently, however, Recognition 25 3., 305318.
a new algorithm has been developed which uses Lee, S.W., Kim, J.H., Groen, F.C.A., 1990. Translation-, rotation-,
preprocessing of a database of model graphs to and scale invariant recognition of hand-drawn symbols in
detect the maximal common subgraph from an input schematic diagrams. Internat. J. Pattern Recognition Artif.
Intell. 4 1., 115.
graph to the models in the database with worst case Levi, G., 1972. A note on the derivation of maximal common
time complexity of O2 n . Shearer et al., 1997.. This subgraphs of two directed or undirected graphs. Calcols 9,
algorithm has demonstrated near real-time behaviour 341354.
in a video indexing application. Levinson, R., 1992. Pattern associativity and the retrieval of
In a recent paper, it has been shown that maximal semantic networks. Comput. Math. Appl. 23, 573600.
Lu, S.W., Ren, Y., Suen, C.Y., 1991. Hierarchical attributed graph
common subgraph computation can be regarded a representation and recognition of handwritten Chinese charac-
special case of graph edit distance computation un- ters. Pattern Recognition 24, 617632.
der a particular cost function Bunke, 1997.. An McGregor, J.J., 1982. Backtrack search algorithms and the maxi-
immediate consequence is that any algorithm for mal common subgraph problem. Software Practice and Experi-
graph edit distance computation can be used to com- ence 12, 2334.
Messmer, B., Bunke, H., 1996. Automatic learning and recogni-
pute the maximal common subgraph if it is run under tion of graphical symbols in engineering drawing. In: Kasturi,
the cost function given by Bunke 1997.. This opens R., Tombre, K. Eds.., Graphics Recognition, Lecture Notes in
up additional possibilities for the computation of the Computer Science, vol. 1072. Springer, Berlin, 1996, pp.
distance measure proposed in this paper, particularly 123134.
with respect to an efficient algorithm for graph edit Pearce, A., Caelli, T., Bischof, W.F., 1994. Rulegraphs for graph
matching in pattern recognition. Pattern Recognition 27 9.,
distance computation reported by Bunke and Mess- 12311246.
mer 1997.. Read, R.C., Corneil, D.G., 1977. The graph isomorphism disease.
J. Graph Theory 1, 339363.
Shapiro, L.G., Haralick, R.M., 1981. Structural descriptions and
References inexact matching. IEEE Trans. Pattern Anal. Machine Intell. 3,
504519.
Bunke, H., 1997. On a relation between graph edit distance and Shearer, K., Bunke, H., Venkatesh, S., Kieronska, D., 1997.
maximum common subgraph. Pattern Recognition Lett. 18 8., Efficient graph matching for video indexing. In: Jolion, J.-M.,
689694. Kropatsch, W. Eds.., Preproceeding GbR97: IAPR Work-
Bunke, H., Messmer, B., 1997. Recent advances in graph match- shop on Graph based Representations, Lyon.
ing. Internat. J. Pattern Recognition Artif. Intell. 11 1., Ullman, J.R., 1976. An algorithm for subgraph isomorphism. J.
169203. ACM 23 1., 3142.
Chang, S., Shi, Q., Yan, C., 1987. Iconic indexing by 2D strings. Wong, E.K., 1992. Model matching in robot vision by subgraph
IEEE Trans. Pattern Anal. Machine Intell. 9 3., 413428. isomorphism. Pattern Recognition 25 3., 287304.