Research World: Volume 4, Report R4.6 (2007)

HOME | CURRENT | ARCHIVES | FORUM

Research World, Volume 4, 2007
Online Version

Report R4.6

Statistical Learning Theory

Seminar Leader: Ramasubramanian Sundararajan, GE Global Research, John F. Welch Technology Centre, Bangalore
ramasubramanian.sundararajan[at]geind.ge.com

Algorithm is the foundation for any computer programme. With the increased usage of computers there has been a greater emphasis on developing various algorithms to mimic human decision-making processes. One of the fundamental tasks involved in decision making is the task of classifying some given set of data. Algorithms have been designed to perform this task of classification and learn from repeated application. Such learning may be supervised or unsupervised. In supervised learning, true classes of data are known whereas in unsupervised learning classes are to be inferred from the data.

The seminar focused on various issues involved in developing algorithms for supervised learning. The key questions were:

* What are the distinguishing features of a good classification algorithm that learns?
* What are the major challenges to a researcher involved in developing such algorithms?
* What are the approaches to developing such algorithms?
* What are the applications of learning algorithms?

One important feature that distinguishes a good learning algorithm is its performance on new data, i.e., its capacity for extracting generalisable "knowledge" from the given data. An algorithm developed on a larger sample of data is more likely to produce a more generalisable classification, thus leading to greater confidence in the results. Another important feature is accuracy. Increased levels of accuracy may result in greater complexity of the algorithm. Therefore, a researcher has to make a trade-off between accuracy and simplicity. The same concept is emphasised by a principle of systematic inquiry, Occam’s Razor, which states that a simpler solution that fits the data is preferable.

Three main approaches for developing learning algorithms are statistical, machine learning, and neural networks. Statistical approaches are generally characterised by having an "explicit underlying probability model, which provides a probability of being in each class rather than simply a classification" (Michie, Speigelhalter, & Taylor, 1994, p. 2). The machine learning approach emphasises more on simplicity and time efficiency of the algorithms. "Neural network approaches combine the complexity of some of the statistical techniques with the machine learning objective of imitating human intelligence" (ibid., p. 3).

Learning algorithms have been used in predicting possible bankruptcy of a firm. The learning algorithm learns the patterns from historical data, for example by comparing financial data pertaining to bankrupt and non-bankrupt firms. Given the data about a new firm, the algoriths should predict the probability of the firm going bankrupt. This helps in decision making, for example to estimate the risk of lending to that firm.

Learning algorithms based on neural network are also used in market segmentation. Such algorithms are used for clustering the customers based on certain criteria. This helps in planning product positioning, promotion, and such other decisions. Yet another application is in spam detection where learning algorithms automatically detect and filter messages based on certain words that are not likely to appear in legitimate messages.

References

Michie, D., Speigelhalter, D. J., & Taylor, C. C. (Eds). (1994). Machine learning, neural and statistical classification. Upper Saddle River, NJ: Ellis Horwood.

Sundararajan, R. (2006). Modelling learning from examples: An introduction. Unpublished manuscript.

Reported by Madhavi Latha Nandi, with inputs from D. P. Dash and Jacob D. Vakkayil (Nov 24, 2006).

Copyleft The article may be used freely, for a noncommercial purpose, as long as the original source is properly acknowledged.

Xavier Institute of Management, Xavier Square, Bhubaneswar 751013, India
Research World (ISSN 0974-2379) http://www1.ximb.ac.in/RW.nsf/pages/Home