Invited Talk by Carlo Zaniolo
Mining Databases and Data Streams with Query Languages and Rules
The vision of inductive databases introduced by [Imielinski and Mannila: CACM 1996] has inspired vibrant research endeavors aiming at integrating data mining technology into the very kernel of database management systems. We are now witnessing similar challenges and opportunities in data stream mining applications, where there is a clear need to integrate their enabling technology into data stream management systems (DSMS). Therefore, our discussion will begin by comparing the requirements and problems facing inductive data stream management systems (IDSMS) with those of inductive database management systems (IDBMS); then we will propose a unified approach to solve these problems.
The two main steps in our approach are (i) support for query-based data mining, and (ii) support for rule-based data mining. Query-based data mining can be viewed as the middle ground between the low-road approach explored in [Sarawagi,Thomas and Agrawal:SIGMOD 1998] and the high-road approaches proposed by (among others) [Meo, Psaila and Ceri: ICDE 1998]. The low-road approach attempts to implant efficient data mining algorithms into off-the-shelf commercial Object-Oriented DBMS through the efforts of skillful programmers---an Herculean task that proved successful only for specific algorithms.
The high-road approach takes a completely opposite view, since users are only expected to provide some high-level declarative mining templates (e.g., in the form of rules), which the system is then expected to turn into efficient algorithms, via suppositious techniques vaguely described as generalizations of the current optimization techniques of relational databases.
The query-based data mining approach instead assumes matching efforts from the system side and user side, whereby the query language of the system is extended with powerful constructs that makes it much easier for users to specify mining methods via the high level query language. The effectiveness of the query-based approach based on SQL extensions has been demonstrated for databases in [Wang, Zaniolo: SIAM 2003], and and for data streams in [Luo et al.: SIGMOD 2005]. We will show how a wide variety of mining algorithms, including classification, association, clustering, and sequence analysis, can be supported in this approach; also common DSMS primitives for windows and stream sampling dovetail with this approach.
Rule-based data mining represents the natural next step, inasmuch as it produces a formalism that unifies inductive and deductive reasoning and delivers the significant benefits described in [Giannnotti et al: AI*IA 1999]. In particular, rule-based IDBMS and IDSMS systems can support the integrated representation of database objects and KDD objects proposed in [Imielinski and Mannila: 1996], and the goal-directed rule discovery via meta-level logic programming described in [Shen et al: AAAI 1996]. We will discuss how the current technology for query-based mining can be extended to achieve rule-based mining, and explore longer-term challenges and opportunities.
Carlo Zaniolo received an E.E. Engineer degree at Padua University in 1968, and M.S. and Ph.D. degrees in Computer Science at UCLA in 1970 and 1976, respectively.
He is professor of Computer Science at the University of California, Los Angeles, where he occupies the N. E. Freedmann Chair in Knowledge Science. Before joining UCLA in 1991, Carlo Zaniolo acquired twenty years of industrial experience in various positions. He served as associate director in the Software Technology Program of MCC (Austin, Texas), and, before that, as MTS at AT&T Bell Labs (Murray Hill, NJ), at Sperry Research (Sudbury, Massachusetts) and Burroughs Corporation (Pasadena, California).
He served as the program chair or co-chair of several database conferences, including the Very Large Database Conference, 1980, 1994, the ACM SIGMOD International Conference on Management of Data, 1986 and the Conference on Extending Database Technology (EDBT) 2000.
He has published more than 130 papers in all major Databases and Data Mining conferences and journals. He has co-authored the book "Advanced Database Systems".
His research interests span several fields including database systems, knowledge base systems, non-monotonic reasoning, spatio/temporal reasoning, internet information systems, mining databases and data streams. He has worked at the integration of data mining and databases technologies, since the early days of data mining (first KDD workshop 1994).