Apriori Algorithm

Apriori Algorithm

March 1st, 2010  |  Published in Uncategorized

I’ve been meaning to get in to theĀ Data Mining SDK at code plex for a while as it has a couple of good items in it. The one item I was really interested in was the apriori algorithm.

Wikipedia describes Apriori:

In computer science and data mining, Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). Other algorithms are designed for finding association rules in data having no transactions (Winepi and Minepi), or having no timestamps (DNA sequencing).

The classic example is if you own a store and someone buys milk what is the probability that he will also buy bread and eggs or if voters in one state voted for one issue what is the chance he voted for something else. The applications for this approach are pretty much limitless.

The code in the SDK is pretty good with a couple of exceptions: there is little documentation and it only supports XML files and OleDb data connections. I have reworked it so it will also connect to a MSSQL database.

For this test application I created a simple C# Console Application and imported the “APriori” project in to the solution. In the APriori project you will to add these two bits of code to classes to the APriori project:

Add this method to DataAccessLayer.cs

	public Data GetTransactionsData(string rdbmsConnectionString, string dataSource)
        {
            myDatabase = new Data();
            string query = "SELECT * FROM " + dataSource;
            var myConn = new SqlConnection(rdbmsConnectionString);
            var myDBAdapter = new SqlDataAdapter(query, myConn);

            myConn.Open();
            try
            {
                myDBAdapter.Fill(myDatabase, "TransactionTable");
            }
            finally
            {
                myConn.Close();
            }
            return myDatabase;
        }

Add this method to DataMining.cs

public Data MarketBasedAnalysis(double supportCount, double minimumConfidence, string connectionString, string dataSource)
        {

            Database database = new Database();
            ItemsetCandidate Item = new ItemsetCandidate();

            this.AP = new APriori.Apriori();
            this.AP.ProgressMonitorEvent += new ProgressMonitorEventHandler(this.OnProgressMonitoringCompletedEvent);
            this.dataBase = database.GetTransactionsData(connectionString, dataSource);
            database.Transactions = this.dataBase;
            this.transactionsCount = this.dataBase.TransactionTable.Count;

            supportCount = ((supportCount / 100) * this.transactionsCount);

            minimumConfidence = (minimumConfidence / 100);

            string support = "SupportCount >= " + supportCount + " AND Level > 1";

            string sort = "SupportCount, Level";
            ItemsetCandidate uniqueItems = AP.CreateOneItemsets(database);
            AP.AprioriGenerator(uniqueItems, database, Convert.ToInt32(supportCount));
            ItemsetArrayList[] keys = database.GetItemset(support, sort);
            string msg = "Creating Frequent Subsets for Items";
            ProgressMonitorEventArgs e = new ProgressMonitorEventArgs(1, 100, 95, "DataMining.MarketBasedAnalysis(3)", msg);
            this.OnProgressMonitorEvent(e);

            for (int counter = 0; counter < keys.Length; counter++)
            {
                AP.CreateItemsetSubsets(0, keys[counter], null, database);
            }

            msg = "Completed C#.NET Data Mining Market Based Analysis";
            e = new ProgressMonitorEventArgs(1, 100, 100, "DataMining.MarketBasedAnalysis(3)", msg);
            this.OnProgressMonitorEvent(e);

            //Set the public properties of the class
            this.minimumSupportCount = supportCount;
            this.minimumConfidence = minimumConfidence;
            this.connectionString = connectionString;
            this.dataSource = dataSource;
            this.dataSourceCommand = dataSourceCommand;

            //return the database of transactions
            return this.dataBase;

        }

Here is my class in my console application

using System;
using System.Data;
using VISUAL_BASIC_DATA_MINING_NET;
using VISUAL_BASIC_DATA_MINING_NET.CustomEvents;

namespace APr2.classes
{
    internal class testrun
    {
        private Data _dataAnalysis;
        public event ProgressMonitorEventHandler ProgressMonitorEvent;

        /// <summary>
        /// Runs the Apriori.
        /// </summary>
        public void RunApriori()
        {
            // Create Data Mining Object
            var myDM = new DataMining();

            // Register Event
            myDM.ProgressMonitorEvent += OnProgressMonitorEvent;

            // Connect To Data Base & Process Items
            _dataAnalysis = myDM.MarketBasedAnalysis(2,             // Support Count
                                                     2,             // Minimum Confidence
                                                     @"Data Source=(local);Initial Catalog=Apriori;Integrated Security=True;", // Connection String
                                                     "Example");    // Table in db

            // Copy to Data View
            var dataView = new ViewData();
            _dataAnalysis.Tables.Add(dataView.CreateViewRulesTable(2, _dataAnalysis).Copy());
            _dataAnalysis.Tables.Add(dataView.CreateViewSubsetTable(_dataAnalysis).Copy());

            // Spacer Line
            Console.WriteLine();

            // Print Items
            foreach (DataRow row in dataView.ViewDataSet.Tables[1].Rows)
            {
                double per = Convert.ToDouble(row.ItemArray[2].ToString().Substring(0, (row.ItemArray[2].ToString().Length -1)));
                Console.WriteLine(row.ItemArray[0] + "\t" + row.ItemArray[1] + "\t" + String.Format("{0:###.##%}", (per/100)));
            }
        }

        /// <summary>
        /// Called when [progress monitor event].
        /// </summary>
        /// <param name="sender">The sender.</param>
        /// <param name="e">The <see cref="VISUAL_BASIC_DATA_MINING_NET.CustomEvents.ProgressMonitorEventArgs"/> instance containing the event data.</param>
        public void OnProgressMonitorEvent(object sender, ProgressMonitorEventArgs e)
        {
            // Prints Event Messages
            Console.Write("\r" + e.EventMessage);
        }
    }
}

Your MSSQL Code will be this

GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Example](
	[TransactionID] [int] IDENTITY(1,1) NOT NULL,
	[Transactions] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
 CONSTRAINT [PK_Example] PRIMARY KEY CLUSTERED
(
	[TransactionID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

And these records:

1	Books, CD, Video
2	CD, Games
3	CD, DVD
4	Books, CD, Games
5	Books, DVD
6	CD, DVD
7	Books, DVD
8	Books, CD, DVD, Video
9	Books, CD, DVD
10	Books, Games
11	Games, Lasers

Run the RunApriori() method in my class and it will yield you the correct results. Have fun.



Related Posts

Starting a Ph.D. in Computer Science
Monte Carlo Simulations in C#
Cryptanalysis Using n-Gram Probabilities
Benford’s Law and Trailing Digit Tests
K-Means Document Clustering

Archives