Data Mining And Knowledge Discovery

How it is different from Data warehousing?

Origins of Data Mining

The goal of a data warehouse is to support decision making with Data mining can be use with the conjunction with data warehouse to help with certain types of decision. It can be applied to operational database with the individual transaction. To make it more efficient it should have an aggregated or summarized collection of data. Helps in extracting meaningful new patterns that cannot necessarily be found by merely querying or processing data or metadata in the data warehouse so the data mining application should be strongly considered.

Data Mining-Terminologies

  • Data Mining engine: It is very essential to the data mining systems that consist set of functional modules that perform some functions like: Characterization, Association and Correlation analysis, Classification, Prediction, Cluster, outline and Evolution analysis.
  • Knowledge Base: This determine the knowledge of the system to the specific problem domain and that can used to apply the knowledge to solve that problem of the Data mining System.
  • Knowledge Discovery: It is the broad process of finding the knowledge in the database an emphasizes the high-level application of particular data mining methods.
  • User interface: It is one of the module of data mining that build a communication between users and data mining system. It allow the functionalities like: Interact with the system by specifying a query task, Evaluate mined patterns, visualize those patterns in different forms.

Knowledge Discovery

  • Data Integration: It is a data processing technique that merges the data from multiple heterogeneous data sources into a coherent data storage.
  • Data Cleaning: This technique is applied to remove the noisy data and correct the wrong data It is a preprocessing step while preparing the data for a data warehouse.
  • Data Selection: In this process data relevant to analysis task are retrieved from the database sometimes consolidation are performed before the data selection process.
  • Clusters: Refer to a group that contain similar kind of objects.
  • Data Transformation: In this process data is transformed into may forms that appropriate for mining by performing aggregation operations.

Data Mining- Query Language

Data mining query language (DMQL) is actually based on structured query language (SQL) for DBMiner data mining system. It support interactive data mining also provide commands for specifying primitives. It can work with both databases and data warehouses. Syntax for task relevant data specification

{`
  use database database_name
  or 
  use data warehouse data_warehouse_name
  in relevance to att_or_dim_list
  from relation(s)/cube(s) [where condition]
  order by order_list
  group by grouping_list
  `}

Challenges in web Mining

Web give great challenges for resource and knowledge discovery based observations:

  • Web is huge: As we know web is too huge and increasing everyday this seems that web is too huge for data warehousing and data mining.
  • Complexity of web pages: The web page is very complex and do not have unifying structure and contain huge amount of libraries that are not in the sorted manner.
  • Web is dynamic information source: The information on the web is rapidly updated and the data like news, shopping, weather sports updated regularly.
  • Diversity of user Communities: User community rapidly expanding as user have different background , interests, and usage purposes.
  • Relevancy of Information: It is not necessary that every person should interested in every portion of the web it take interests by there choice only so rest of the portion contain the information which is not relevant for the user.