Data Mining in Bioinformatics

Data Mining in Bioinformatics


Data mining may be defined as — ‘the non-trivial retrieval of implicit, previously unknown and potentially useful information from various sources of data. Importantly, data mining exclusively complements and overwhelmingly expands bioinformatics, so that both the former and the latter are apparently distinct in nature, but there is every possibility, change, and ample scope that eventually they may ultimately merge together. Though at present they do have their own distinct identity, sooner or later eventually data mining and bioinformatics will turn out to be absolutely indistinguishable.

It may be observed explicitly that data mining is solely practiced in the field of ‘biotechnology’ intimately involving various branches of ‘life sciences’ viz., biology, microbiology, agriculture, and above all the health care system. Besides, data mining may also be extended legitimately and exploited in other areas not related to life sciences, such as banking, database providers, engineering, financial institutions, government agencies, manufacturing, marketing, telecommunications, travel industry, and service industries. In fact, the copious and massive pieces of information generated from the above sources could be utilized to their maximum extent by means of a good number of highly specialized software already in actual use across the globe.

The ever fast-developing ‘Biopharmaceutical Industry’ in the world is profusely using enormous quantum of databases that are virtually flooded with a plethora of vital information, retrieved from a variety of data mining methodologies, such as :

  1. Annotated databases of the disease profiles
  2. Molecular pathways involved in dreadful human diseases
  3. Quantitative structure-activity relationships (QSARs)
  4. Precise chemical structures of combinatorial libraries of compounds
  5. Results of mandatory ‘clinical trials’ of new molecules

(Data mining is employed to help the ‘pharmaceutical industry’ in general and ‘biopharmaceutical industry’ in particular to exploit and utilize this valuable information gainfully and fruitfully).

Applications of Data Mining

With the advent of the tremendous volume of highly informative, valuable, useful data generated and stored so efficiently there exists a ‘big challenge’ to the biopharmaceutical industry with respect to the critical and precise decision towards the development of absolutely viable ‘targets and lead compounds. Thus, data mining goes a long way to simplify and focus on these complex sets of data in an absolutely efficient and intuitive manner. In fact, there are quite an appreciable number of organizations that cater to data mining services for a variety of specialized applications. Importantly, there are six predominant and well-known approaches with regard to the ‘data-mining’ applications, namely :

(a) Influence-based mining. i.e., an intensive search for cause and effect relationships between data sets and pharmacogenomics,

(b) Affinity-based mining. i.e., data mining system distinctly identifies data points thereby making the approach more meaningful and useful, which is rather important to distinguish ‘accidental/incidental’ motifs vis-a-vis those of definite biological significance,

(c) Time-delay data mining. i.e., to identify patterns that are specifically combined or rejected as the data set gets voluminous in the future,

(d) Trends-based data mining. i.e., alterations are investigated minutely which essentially take place in specific data sets over a certain period (time); and examining the trends instituted,

(e) Comparative data mining. i.e., various data collected at different sites vis-a-vis different time periods are compared to detect and identify the extent of ensuing dissimilarities, and

(f) Predictive data mining. i.e., it largely complements and expands traditional bioinformatics.