Approximate Parallel High Utility Itemset Mining

dc.contributor.advisorAn, Aijun
dc.creatorChen, Yan
dc.date.accessioned2016-09-20T16:36:12Z
dc.date.available2016-09-20T16:36:12Z
dc.date.copyright2015-10-30
dc.date.issued2016-09-20
dc.date.updated2016-09-20T16:36:12Z
dc.degree.disciplineComputer Science
dc.degree.levelMaster's
dc.degree.nameMSc - Master of Science
dc.description.abstractHigh utility itemset mining discovers itemsets whose utility is above a given threshold, where utilities measure the importance of itemsets. In high utility itemset mining, memory and time performance limitations cause scalability issues, when the dataset is very large. In this thesis, the problem is addressed by proposing a distributed parallel algorithm, PHUI-Miner, and a sampling strategy, which can be used either separately or simultaneously. PHUI-Miner parallelizes the state-of-the-art high utility itemset mining algorithm HUI-Miner. The sampling strategy investigates the required sample size of a dataset, in order to achieve a given accuracy. We also propose an approach combining sampling with PHUI-Miner, which provides better time performance. In our experiments, we show that PHUI-Miner has high performance and outperforms the state-of-the-art non-parallel algorithm. The sampling strategy achieves accuracies much higher than the guarantee. Extensive experiments are also conducted to compare the time performance of PHUI-Miner with and without sampling.
dc.identifier.urihttp://hdl.handle.net/10315/32162
dc.language.isoen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectComputer science
dc.subject.keywordsData mining
dc.subject.keywordsHigh utility item set mining
dc.subject.keywordsSampling
dc.subject.keywordsParallel
dc.subject.keywordsSpark
dc.subject.keywordsDistributed
dc.subject.keywordsApproximate
dc.titleApproximate Parallel High Utility Itemset Mining
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chen_Yan_2015_Masters.pdf
Size:
502.22 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.38 KB
Format:
Plain Text
Description: