Question? Leave a message!




Data Mining for Data Streams

Data Mining for Data Streams
Data Mining for Data Streams July 20, 2017 WWW.ThesisScientist.com 1 1Mining Data Streams  What is stream data Why Stream Data Systems  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis  Sketching 2 July 20, 2017 WWW.ThesisScientist.comCharacteristics of Data Streams  Data Streams Model:  Data enters at a high speed rate  The system cannot store the entire stream, but only a small fraction  How do you make critical calculations about the stream using a limited amount of memory  Characteristics  Huge volumes of continuous data, possibly infinite  Fast changing and requires fast, realtime response  Random access is expensive—single scan algorithms(can only have one look) 3 July 20, 2017 WWW.ThesisScientist.comArchitecture: Stream Query Processing SDMS (Stream Data User/Application Management System) Continuous Query Results Multiple streams Stream Query Processor Scratch Space (Main memory and/or Disk) 4 July 20, 2017 WWW.ThesisScientist.comStream Data Applications  Telecommunication calling records  Business: credit card transaction flows  Network monitoring and traffic engineering  Financial market: stock exchange  Engineering industrial processes: power supply manufacturing  Sensor, monitoring surveillance: video streams, RFIDs  Web logs and Web page click streams  Massive data sets (even saved but random access is too expensive) 5 July 20, 2017 WWW.ThesisScientist.comDBMS versus DSMS  Persistent relations Transient streams  Onetime queries Continuous queries  Random access Sequential access  ―Unbounded‖ disk store Bounded main memory  Only current state matters Historical data is important  No realtime services Realtime requirements  Relatively low update rate Possibly multiGB arrival rate  Data at any granularity Data at fine granularity  Assume precise data Data stale/imprecise  Access plan determined by  Unpredictable/variable data query processor, physical DB arrival and characteristics design Ack. From Motwani’s PODS tutorial slides 6 July 20, 2017 WWW.ThesisScientist.comMining Data Streams  What is stream data Why Stream Data Systems  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis 7 July 20, 2017 WWW.ThesisScientist.comProcessing Stream Queries  Query types  Onetime query vs. continuous query (being evaluated continuously as stream continues to arrive)  Predefined query vs. adhoc query (issued online)  Unbounded memory requirements  For realtime response, main memory algorithm should be used  Memory requirement is unbounded if one will join future tuples  Approximate query answering  With bounded memory, it is not always possible to produce exact answers  Highquality approximate answers are desired  Data reduction and synopsis construction methods  Sketches, random sampling, histograms, wavelets, etc. 8 July 20, 2017 WWW.ThesisScientist.comMethodologies for Stream Data Processing  Major challenges  Keep track of a large universe, e.g., pairs of IP address, not ages  Methodology  Synopses (tradeoff between accuracy and storage) k  Use synopsis data structure, much smaller (O(log N) space) than their base data set (O(N) space)  Compute an approximate answer within a small error range (factor ε of the actual answer)  Major methods  Random sampling  Histograms  Sliding windows  Multiresolution model  Sketches  Radomized algorithms 9 July 20, 2017 WWW.ThesisScientist.comStream Data Processing Methods (1)  Random sampling (but without knowing the total length in advance)  Reservoir sampling: maintain a set of s candidates in the reservoir, which form a true random sample of the element seen so far in the stream. As the data stream flow, every new element has a certain probability (s/N) of replacing an old element in the reservoir.  Sliding windows  Make decisions based only on recent data of sliding window size w  An element arriving at time t expires at time t + w  Histograms  Approximate the frequency distribution of element values in a stream  Partition data into a set of contiguous buckets  Equalwidth (equal value range for buckets) vs. Voptimal (minimizing frequency variance within each bucket)  Multiresolution models  Popular models: balanced binary trees, microclusters, and wavelets 10 July 20, 2017 WWW.ThesisScientist.comStream Data Mining vs. Stream Querying  Stream mining—A more challenging task in many cases  It shares most of the difficulties with stream querying  But often requires less ―precision‖, e.g., no join, grouping, sorting  Patterns are hidden and more general than querying  It may require exploratory analysis  Not necessarily continuous queries  Stream data mining tasks  Multidimensional online analysis of streams  Mining outliers and unusual patterns in stream data  Clustering data streams  Classification of stream data 11 July 20, 2017 WWW.ThesisScientist.comMining Data Streams  What is stream data Why Stream Data Systems  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis  Research issues 12 July 20, 2017 WWW.ThesisScientist.comChallenges for Mining Dynamics in Data Streams  Most stream data are at pretty lowlevel or multi dimensional in nature: needs ML/MD processing  Analysis requirements  Multidimensional trends and unusual patterns  Capturing important changes at multidimensions/levels  Fast, realtime detection and response  Comparing with data cube: Similarity and differences  Stream (data) cube or stream OLAP: Is this feasible  Can we implement it efficiently 13 July 20, 2017 WWW.ThesisScientist.comMultiDimensional Stream Analysis: Examples  Analysis of Web click streams  Raw data at low levels: seconds, web page addresses, user IP addresses, …  Analysts want: changes, trends, unusual patterns, at reasonable levels of details  E.g., Average clicking traffic in North America on sports in the last 15 minutes is 40 higher than that in the last 24 hours.‖  Analysis of power consumption streams  Raw data: power consumption flow for every household, every minute  Patterns one may find: average hourly power consumption surges up 30 for manufacturing companies in Chicago in the last 2 hours today than that of the same day a week ago 14 July 20, 2017 WWW.ThesisScientist.comA Stream Cube Architecture  A tilted time frame  Different time granularities  second, minute, quarter, hour, day, week, …  Critical layers  Minimum interest layer (mlayer)  Observation layer (olayer)  User: watches at olayer and occasionally needs to drilldown down to mlayer  Partial materialization of stream cubes  Full materialization: too space and time consuming  No materialization: slow response at query time  Partial materialization… 15 July 20, 2017 WWW.ThesisScientist.comA Titled Time Model  Natural tilted time frame:  Example: Minimal: quarter, then 4 quarters  1 hour, 24 hours  day, … 4 qtrs 24 hours 12 months 31 days time  Logarithmic tilted time frame:  Example: Minimal: 1 minute, then 1, 2, 4, 8, 16, 32, … 64t 32t 16t 8t 4t 2t t t Time 16 July 20, 2017 WWW.ThesisScientist.comTwo Critical Layers in the Stream Cube (, theme, quarter) olayer (observation) (usergroup, URLgroup, minute) mlayer (minimal interest) (individualuser, URL, second) (primitive) stream data layer 17 July 20, 2017 WWW.ThesisScientist.comOnLine Partial Materialization vs. OLAP Processing  Online materialization  Materialization takes precious space and time  Only incremental materialization (with tilted time frame)  Only materialize ―cuboids‖ of the critical layers  Online computation may take too much time  Preferred solution:  popularpath approach: Materializing those along the popular drilling paths  Htree structure: Such cuboids can be computed and stored efficiently using the Htree structure  Online aggregation vs. querybased computation  Online computing while streaming: aggregating stream cubes  Querybased computation: using computed cuboids 18 July 20, 2017 WWW.ThesisScientist.comMining Data Streams  What is stream data Why Stream Data Systems  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis 19 July 20, 2017 WWW.ThesisScientist.comMining Approximate Frequent Patterns  Mining precise freq. patterns in stream data: unrealistic  Even store them in a compressed form, such as FPtree  Approximate answers are often sufficient (e.g., trend/pattern analysis)  Example: a router is interested in all flows:  whose frequency is at least 1 (s) of the entire traffic stream seen so far  and feels that 1/10 of s (ε = 0.1) error is comfortable  How to mine frequent patterns with good approximation  Lossy Counting Algorithm (Manku Motwani, VLDB’02)  Based on Majority Voting… 20 July 20, 2017 WWW.ThesisScientist.comMajority  A sequence of N items.  You have constant memory.  In one pass, decide if some item is in majority (occurs N/2 times) 2 9 9 9 7 6 4 9 9 9 3 9 N = 12; item 9 is majority 21 July 20, 2017 WWW.ThesisScientist.comMisraGries Algorithm (‘82)  A counter and an ID.  If new item is same as stored ID, increment counter.  Otherwise, decrement the counter.  If counter 0, store new item with count = 1.  If counter 0, then its item is the only candidate for majority. 2 9 9 9 7 6 4 9 9 9 3 9 ID 2 2 9 9 9 9 4 4 9 9 9 9 count 1 0 1 2 1 0 1 0 1 2 1 2 22 July 20, 2017 WWW.ThesisScientist.comA generalization: Frequent Items Find k items, each occurring at least N/(k+1) times. ID ID ID . . . . ID 1 2 k count . .  Algorithm:  Maintain k items, and their counters.  If next item x is one of the k, increment its counter.  Else if a zero counter, put x there with count = 1  Else (all counters nonzero) decrement all k counters 23 July 20, 2017 WWW.ThesisScientist.comFrequent Elements: Analysis  A frequent item’s count is decremented if all counters are full: it erases k+1 items.  If x occurs N/(k+1) times, then it cannot be completely erased.  Similarly, x must get inserted at some point, because there are not enough items to keep it away. 24 July 20, 2017 WWW.ThesisScientist.comProblem of False Positives  False positives in MisraGries algorithm  It identifies all true heavy hitters, but not all reported items are necessarily heavy hitters.  How can we tell if the nonzero counters correspond to true heavy hitters or not  A second pass is needed to verify.  False positives are problematic if heavy hitters are used for billing or punishment.  What guarantees can we achieve in one pass 25 July 20, 2017 WWW.ThesisScientist.comApproximation Guarantees  Find heavy hitters with a guaranteed approximation error Demaine et al., MankuMotwani, EstanVarghese…  MankuMotwani (Lossy Counting)  Suppose you want heavy hitters items with freq N  An approximation parameter , where  . (E.g.,  = .01 and  = .0001;  = 1 and  = .01 )  Identify all items with frequency  N  No reported item has frequency ( )N  The algorithm uses O(1/ log (N)) memory G. Manku, R. Motwani. Approximate Frequency Counts over Data Streams, VLDB’02 26 July 20, 2017 WWW.ThesisScientist.comLossy Counting Step 1: Divide the stream into ‘windows’ Window 1 Window 2 Window 3 Is window size a function of support s Will fix later… 27 July 20, 2017 WWW.ThesisScientist.comLossy Counting in Action ... Frequency Counts + Empty First Window At window boundary, decrement all counters by 1 28 July 20, 2017 WWW.ThesisScientist.comLossy Counting continued ... Frequency Counts + Next Window At window boundary, decrement all counters by 1 29 July 20, 2017 WWW.ThesisScientist.comError Analysis How much do we undercount If current size of stream = N and windowsize = 1/ε frequency error then windows = εN Rule of thumb: Set ε = 10 of support s Example: Given support frequency s = 1, set error frequency ε = 0.1 30 July 20, 2017 WWW.ThesisScientist.comOutput: Elements with counter values exceeding sN – εN Approximation guarantees Frequencies underestimated by at most εN No false negatives False positives have true frequency at least sN – εN How many counters do we need Worst case: 1/ε log (ε N) counters See paper for proof 31 July 20, 2017 WWW.ThesisScientist.comEnhancements ... Frequency Errors For counter (X, c), true frequency in c, c+εN Trick: Remember windowid’s For counter (X, c, w), true frequency in c, c+w1 If (w = 1), no error Batch Processing Decrements after k windows 32 July 20, 2017 WWW.ThesisScientist.comAlgorithm 2: Sticky Sampling Stream 28 34 31 15 41 30  Create counters by sampling 23  Maintain exact counts thereafter 35 19 What rate should we sample 33 July 20, 2017 WWW.ThesisScientist.comSticky Sampling contd... For finite stream of length N Sampling rate = 2/Nε log 1/(s) = probability of failure Output: Elements with counter values exceeding sN – εN Approximation guarantees (probabilistic) Frequencies underestimated by at most εN No false negatives False positives have true frequency at least sN – εN Same Rule of thumb: Same error guarantees Set ε = 10 of support s as Lossy Counting Example: Given support threshold s = 1, but probabilistic set error threshold ε = 0.1 set failure probability  = 0.01 34 July 20, 2017 WWW.ThesisScientist.comSampling rate Finite stream of length N Sampling rate: 2/Nε log 1/(s) Infinite stream with unknown N Gradually adjust sampling rate (see paper for details) In either case, Expected number of counters = 2/ log 1/s Independent of N 35 July 20, 2017 WWW.ThesisScientist.comSticky Sampling Expected: 2/ log 1/s Lossy Counting Worst Case: 1/ log N Support s = 1 Error ε = 0.1 Log10 of N (stream length) N (stream length) 36 July 20, 2017 WWW.ThesisScientist.com No of counters No of countersFrom elements to sets of elements… 37 July 20, 2017 WWW.ThesisScientist.comFrequent Itemsets Problem ... Stream  Identify all subsets of items whose current frequency exceeds s = 0.1. Frequent Itemsets = Association Rules 38 July 20, 2017 WWW.ThesisScientist.comThree Modules TRIE SUBSETGEN BUFFER 39 July 20, 2017 WWW.ThesisScientist.comModule 1: TRIE Compact representation of frequent itemsets in lexicographic order. 45 50 40 31 29 32 42 30 50 40 30 31 29 45 32 42 Sets with frequency counts 40 July 20, 2017 WWW.ThesisScientist.comModule 2: BUFFER Window 1 Window 2 Window 3 Window 4 Window 5 Window 6 In Main Memory Compact representation as sequence of ints Transactions sorted by itemid Bitmap for transaction boundaries 41 July 20, 2017 WWW.ThesisScientist.comModule 3: SUBSETGEN 3 3 3 4 2 2 1 2 1 3 Frequency counts 1 of subsets 1 in lexicographic order BUFFER 42 July 20, 2017 WWW.ThesisScientist.comOverall Algorithm ... 3 3 3 4 2 2 1 2 1 3 1 1 SUBSETGEN BUFFER TRIE new TRIE Problem: Number of subsets is exponential 43 July 20, 2017 WWW.ThesisScientist.comSUBSETGEN Pruning Rules Apriori Pruning Rule If set S is infrequent, every superset of S is infrequent. Lossy Counting Pruning Rule At each ‘window boundary’ decrement TRIE counters by 1. Actually, ‘Batch Deletion’: At each ‘main memory buffer’ boundary, decrement all TRIE counters by b. See paper for details ... 44 July 20, 2017 WWW.ThesisScientist.comBottlenecks ... 3 3 3 4 2 2 1 2 1 3 1 1 SUBSETGEN BUFFER TRIE new TRIE Consumes main memory Consumes CPU time 45 July 20, 2017 WWW.ThesisScientist.comDesign Decisions for Performance TRIE Main memory bottleneck Compact linear array  (element, counter, level) in preorder traversal  No pointers Tries are on disk  All of main memory devoted to BUFFER Pair of tries  old and new (in chunks) mmap() and madvise() SUBSETGEN CPU bottleneck Very fast implementation  See paper for details 46 July 20, 2017 WWW.ThesisScientist.comMining Data Streams  What is stream data Why Stream Data Systems  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis 47 July 20, 2017 WWW.ThesisScientist.comClassification for Dynamic Data Streams  Decision tree induction for stream data classification  VFDT (Very Fast Decision Tree)/CVFDT (Domingos, Hulten, Spencer, KDD00/KDD01)  Is decisiontree good for modeling fast changing data, e.g., stock market analysis  Other stream classification methods  Instead of decisiontrees, consider other models  Naïve Bayesian  Ensemble (Wang, Fan, Yu, Han. KDD’03)  Knearest neighbors (Aggarwal, Han, Wang, Yu. KDD’04)  Tilted time framework, incremental updating, dynamic maintenance, and model construction  Comparing of models to find changes 48 July 20, 2017 WWW.ThesisScientist.comHoeffding Tree  With high probability, classifies tuples the same  Only uses small sample  Based on Hoeffding Bound principle  Hoeffding Bound (Additive Chernoff Bound) r: random variable R: range of r n: independent observations Mean of r is at least r – ε, with probability 1 – d avg 2 R ln(1/ )  2n 49 July 20, 2017 WWW.ThesisScientist.comHoeffding Tree Algorithm  Hoeffding Tree Input S: sequence of examples X: attributes G( ): evaluation function d: desired accuracy  Hoeffding Tree Algorithm for each example in S retrieve G(X ) and G(X ) //two highest G(X) a b i if ( G(X ) – G(X ) ε ) a b split on X a recurse to next node break 50 July 20, 2017 WWW.ThesisScientist.comDecisionTree Induction with Data Streams Packets 10 Data Stream yes no Protocol = http Packets 10 Data Stream yes no Bytes 60K Protocol = http yes Protocol = ftp Ack. From Gehrke’s SIGMOD tutorial slides 51 July 20, 2017 WWW.ThesisScientist.comHoeffding Tree: Strengths and Weaknesses  Strengths  Scales better than traditional methods  Sublinear with sampling  Very small memory utilization  Incremental  Make class predictions in parallel  New examples are added as they come  Weakness  Could spend a lot of time with ties  Memory used with tree expansion  Number of candidate attributes 52 July 20, 2017 WWW.ThesisScientist.comVFDT (Very Fast Decision Tree)  Modifications to Hoeffding Tree  Nearties broken more aggressively  G computed every n min  Deactivates certain leaves to save memory  Poor attributes dropped  Initialize with traditional learner (helps learning curve)  Compare to Hoeffding Tree: Better time and memory  Compare to traditional decision tree  Similar accuracy  Better runtime with 1.61 million examples  21 minutes for VFDT  24 hours for C4.5 53 July 20, 2017 WWW.ThesisScientist.comCVFDT (Conceptadapting VFDT)  Concept Drift  Timechanging data streams  Incorporate new and eliminate old  CVFDT  Increments count with new example  Decrement old example  Sliding window  Nodes assigned monotonically increasing IDs  Grows alternate subtrees  When alternate more accurate = replace old  O(w) better runtime than VFDTwindow 54 July 20, 2017 WWW.ThesisScientist.comMining Data Streams  What is stream data Why Stream Data Systems  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis  Research issues 55 July 20, 2017 WWW.ThesisScientist.comClustering Data Streams GMMO01  Base on the kmedian method  Data stream points from metric space  Find k clusters in the stream s.t. the sum of distances from data points to their closest center is minimized  Constant factor approximation algorithm  In small space, a simple two step algorithm: 1. For each set of M records, S , find O(k) centers in i S , …, S 1 l  Local clustering: Assign each point in S to its i closest center 2. Let S’ be centers for S , …, S with each center 1 l weighted by number of points assigned to it  Cluster S’ to find k centers 56 July 20, 2017 WWW.ThesisScientist.comHierarchical Clustering Tree level(i+1) medians leveli medians data points 57 July 20, 2017 WWW.ThesisScientist.comHierarchical Tree and Drawbacks  Method:  maintain at most m leveli medians  On seeing m of them, generate O(k) level(i+1) medians of weight equal to the sum of the weights of the intermediate medians assigned to them  Drawbacks:  Low quality for evolving data streams (register only k centers)  Limited functionality in discovering and exploring clusters over different portions of the stream over time 58 July 20, 2017 WWW.ThesisScientist.comClustering for Mining Stream Dynamics  Network intrusion detection: one example  Detect bursts of activities or abrupt changes in real time—by on line clustering  Another approach:  Tilted time frame work: o.w. dynamic changes cannot be found  Microclustering: better quality than kmeans/kmedian  incremental, online processing and maintenance  Two stages: microclustering and macroclustering  With limited ―overhead‖ to achieve high efficiency, scalability, quality of results and power of evolution/change detection 59 July 20, 2017 WWW.ThesisScientist.comCluStream: A Framework for Clustering Evolving Data Streams  Design goal  High quality for clustering evolving data streams with greater functionality  While keep the stream mining requirement in mind  Onepass over the original stream data  Limited space usage and high efficiency  CluStream: A framework for clustering evolving data streams  Divide the clustering process into online and offline components  Online component: periodically stores summary statistics about the stream data  Offline component: answers various user questions based on the stored summary statistics 60 July 20, 2017 WWW.ThesisScientist.comThe CluStream Framework  Microcluster  Statistical information about data locality  Temporal extension of the clusterfeature vector  Multidimensional points with time stamps T ...T ... X ...X ... 1 k 1 k 1 d  Each point contains d dimensions, i.e., Xx ...x i i i  A microcluster for n points is defined as a (2.d + 3) tuple x x t t CF2 ,CF1 ,CF2 ,CF1 ,n  Pyramidal time frame  Decide at what moments the snapshots of the statistical information are stored away on disk 61 July 20, 2017 WWW.ThesisScientist.comCluStream: Pyramidal Time Frame  Pyramidal time frame  Snapshots of a set of microclusters are stored following the pyramidal pattern  They are stored at differing levels of granularity depending on recency  Snapshots are classified into different orders varying from 1 to log(T) i  The ith order snapshots occur at intervals of α where α ≥ 1  Only the last (α + 1) snapshots are stored 62 July 20, 2017 WWW.ThesisScientist.comCluStream: Clustering Online Streams  Online microcluster maintenance  Initial creation of q microclusters  q is usually significantly larger than the number of natural clusters  Online incremental update of microclusters  If new point is within maxboundary, insert into the micro cluster  O.w., create a new cluster  May delete obsolete microcluster or merge two closest ones  Querybased macroclustering  Based on a userspecified timehorizon h and the number of macroclusters K, compute macroclusters using the kmeans algorithm 63 July 20, 2017 WWW.ThesisScientist.comReferences on Stream Data Mining (1)  C. Aggarwal, J. Han, J. Wang, P. S. Yu. A Framework for Clustering Data Streams, VLDB'03  C. C. Aggarwal, J. Han, J. Wang and P. S. Yu. OnDemand Classification of Evolving Data Streams, KDD'04  C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A Framework for Projected Clustering of High Dimensional Data Streams, VLDB'04  S. Babu and J. Widom. Continuous Queries over Data Streams. SIGMOD Record, Sept. 2001  B. Babcock, S. Babu, M. Datar, R. Motwani and J. Widom. Models and Issues in Data Stream Systems‖, PODS'02. (Conference tutorial)  Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. "MultiDimensional Regression Analysis of TimeSeries Data Streams, VLDB'02  P. Domingos and G. Hulten, ―Mining highspeed data streams‖, KDD'00  A. Dobra, M. N. Garofalakis, J. Gehrke, R. Rastogi. Processing Complex Aggregate Queries over Data Streams, SIGMOD’02  J. Gehrke, F. Korn, D. Srivastava. On computing correlated aggregates over continuous data streams. SIGMOD'01  C. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu. Mining frequent patterns in data streams at multiple time granularities, Kargupta, et al. (eds.), Next Generation Data Mining’04 64 July 20, 2017 WWW.ThesisScientist.comReferences on Stream Data Mining (2)  S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering Data Streams, FOCS'00  G. Hulten, L. Spencer and P. Domingos: Mining timechanging data streams. KDD 2001  S. Madden, M. Shah, J. Hellerstein, V. Raman, Continuously Adaptive Continuous Queries over Streams, SIGMOD02  G. Manku, R. Motwani. Approximate Frequency Counts over Data Streams, VLDB’02  A. Metwally, D. Agrawal, and A. El Abbadi. Efficient Computation of Frequent and Topk Elements in Data Streams. ICDT'05  S. Muthukrishnan, Data streams: algorithms and applications, Proceedings of the fourteenth annual ACMSIAM symposium on Discrete algorithms, 2003  R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge Univ. Press, 1995  S. Viglas and J. Naughton, RateBased Query Optimization for Streaming Information Sources, SIGMOD’02  Y. Zhu and D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, VLDB’02  H. Wang, W. Fan, P. S. Yu, and J. Han, Mining ConceptDrifting Data Streams using Ensemble Classifiers, KDD'03 65 July 20, 2017 WWW.ThesisScientist.com
Website URL
Comment