Dr Dragoljub Pokrajac-Pokie
Assistant professor

 

 

 

20-310 Data Mining

Spring 2003 Course

syllabus (.doc)

MWF 2:00-2:50PM SC 302

IMPORTANT NOTES: 

    

bullet

Textbook: Margaret Dunham, Data Mining, Introductory and Advances Topics, Prentice Hall, 2003, ISBN 0-13-088892-3   

 
bullet

Book website: http://engr.smu.edu/~mhd/

 

          

 

bulletOffice hours:  

          M,W,F 11-12noon

bulletProject

          Data Mining projects

bulletAssignments

          Homework 2, due 2/26/2003 

                   pima-indians-diabetes.data

                   pima-indians-diabetes.names

          Homework 3, due 3/31/2003 

                  Download   Tooldiag software!!!

          Homework 4, due 4/7/2003 

          Homework 5, due 4/14/2003 

                       arff file for your homework

          Homework 6 due 4/29/2003 NEW!!!

          Homework 7, (MAKE-UP homework) due at the exam

bulletLecture Notes

            Tentative Class Outline  1/27/2003

            Chapter 1,2    1/27/2003

            Data Mining, Part II   2/7/2003

            Data Mining, Part IIa 2/10/2003

            Brief Course in Probability 2/15/2003

            Feature Selection      2/15/2003

           Feature Extraction  2/16/2003

          Measuring Classification Performance 2/16/2003

           Construction of ROC curves 3/7/2003

           Classification techniques, part I 3/19/2003

           Classification techniques, part II 3/20/2003

           Classification techniques, part III 3/21/2003

           Classification techniques, part IV 3/22/2003

           Classification techniques, part V 3/31/2003

          Clustering3/31/2003

          Incremental Clustering  4/9/2003

          Web mining  4/23/2003

 

bullet

 MATERIAL FROM Institute of Photogrammetry and Remote  Sensing, Finland (In English)

    Bayes Decision Theory  

    Nearest neighbor classifier  

    Linear classifiers and more  

   Feature selection algorithms  

   Feature selection: Probability distance and feature Extraction.ps  

   Clustering  

   Estimation of Classifier Oerformance  

WEB Mining Tutorial by Becher and Kohavi

   miningTutorialSlides SIAM KDD.pdf

bulletLinks

         WEKA

        http://www.cs.waikato.ac.nz/ml/weka/   Weka web-site

        weka-3-2-3jre.exe   Free Weka software installation (11,305,621 bytes, includes the Java Runtime Environment)

 

         DATA MINING SITES

        http://www.learningtheory.org/  Home page for Computation Learning Theory  (look for Freund and Schapire and their Boosting papers)

        http://www.kdnuggets.com/  Good Data Mining site, with job links and a lot of other stuff

     

     XELOPES GNU open source libraries

     XELOPES Version 1.1 for Java 

     XELOPES Version 1.1 Documentation        

     XELOPES Version 1.09 for C++ 

      

    DATASETS

       http://www.ics.uci.edu/~mlearn/MLRepository.html  UCI machine learning repository

       http://kdd.ics.uci.edu/ UCI KDD archive

      

    GHOSTSCRIPT AND GHOSTVIEW

   gs704w32.exe 

   gsv43w32.exe

 

    TOOLDIAG

   DOS version tooldiagDos.zip

  Linux/Unix version tooldiag-2.2.tar.gz, ToolUxIns.txt

 

    C4.5

   Unix version from Dr. Quinlan site http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz

   

   MATLAB

   Matlab Tutorial by D. Pokrajac

  

    Feature Selection paper

    Algorithm for feature selection- an evaluation by A. Jain and D. Zonker