VEPPAR

JA Newsflash

Newsflash 2
Yesterday all servers in the U.S. went out on strike in a bid to get more RAM and better CPUs. A spokes person said that the need for better RAM was due to some fool increasing the front-side bus speed. In future, busses will be told to slow down in residential motherboards.
You are here:Home arrow Basic principles arrow Parf
  • Decrease font size
  • Default font size
  • Increase font size
  • blue color
  • green color
  • default color
Parallel Random Forests – PARF PDF Print E-mail
Written by Viktor Bojović   
Thursday, 21 February 2008
Data mining commonly relies heavily on Pattern recognition, which aims to classify data based on either a priori knowledge or on statistical information extracted from the patterns. A range of classification algorithms has been devised.
The Random Forests algorithm is one of the best among the known classification algorithms, able to classify big quantities of data with great accuracy. For the Random Forests, in addition to a set of important statistical features, its loosely coupled structure allows the classifier training procedure to be readily parallelized. The RBI/CIC team reimplemented the original algorithm, and explored several parallelization strategies, ranging from SMP, through MPI, to Grid job schedulers. The creator of the algorithm, late Berkeley professor emeritus Leo Breiman, expressed a big interest in this idea in our correspondence. He has confirmed that no one was yet working on a parallel implementation of his algorithm, and promised his support and help. Leo Breiman is one
of the pioneers in the fields of machine learning and data mining, and a co-author of the first significant programs (CART – Classification and Regression Trees) in that field.
The present application is command line based, MPI-enabled, and if statically linked can be deployed to any system in the grid (or any system with MPICH support, if MPI execution is desired).

Last Updated ( Friday, 09 May 2008 )