SlideShare une entreprise Scribd logo
1  sur  4
Télécharger pour lire hors ligne
R Reference Card for Data Mining                                               Performance Evaluation                                                               apclusterK() affinity propagation clustering to get K clusters (apcluster)
                                                                                     performance() provide various measures for evaluating performance of pre-            cclust() Convex Clustering, incl. k-means and two other clustering algo-
by Yanchang Zhao, yanchang@rdatamining.com, January 3, 2013                                diction and classification models (ROCR)                                             rithms (cclust)
The latest version is available at http://www.RDataMining.com. Click the link        roc() build a ROC curve (pROC)                                                       KMeansSparseCluster() sparse k-means clustering (sparcl)
also for document R and Data Mining: Examples and Case Studies.                      auc() compute the area under the ROC curve (pROC)                                    tclust(x,k,alpha,...) trimmed k-means with which a proportion
The package names are in parentheses.                                                ROC() draw a ROC curve (DiagnosisMed)                                                     alpha of observations may be trimmed (tclust)
                                                                                     PRcurve() precision-recall curves (DMwR)
 Association Rules & Frequent Itemsets                                               CRchart() cumulative recall charts (DMwR)                                            Hierarchical Clustering
                                                                                     Packages                                                                             a hierarchical decomposition of data in either bottom-up (agglomerative) or top-
APRIORI Algorithm                                                                    rpart recursive partitioning and regression trees                                    down (divisive) way
a level-wise, breadth-first algorithm which counts transactions to find frequent       party recursive partitioning                                                         hclust(d, method, ...) hierarchical cluster analysis on a set of dissim-
itemsets                                                                             randomForest classification and regression based on a forest of trees using ran-              ilarities d using the method for agglomeration
apriori() mine associations with APRIORI algorithm (arules)                                  dom inputs                                                                   birch() the BIRCH algorithm that clusters very large data with a CF-tree
                                                                                     rpartOrdinal ordinal classification trees, deriving a classification tree when the             (birch)
ECLAT Algorithm                                                                              response to be predicted is ordinal                                          pvclust() hierarchical clustering with p-values via multi-scale bootstrap re-
                                                                                     rpart.plot plots rpart models with an enhanced version of plot.rpart in the                  sampling (pvclust)
employs equivalence classes, depth-first search and set intersection instead of
                                                                                             rpart package                                                                agnes() agglomerative hierarchical clustering (cluster)
counting
                                                                                     ROCR visualize the performance of scoring classifiers                                 diana() divisive hierarchical clustering (cluster)
eclat() mine frequent itemsets with the Eclat algorithm (arules)
                                                                                     pROC display and analyze ROC curves                                                  mona() divisive hierarchical clustering of a dataset with binary variables only
                                                                                                                                                                                  (cluster)
Packages                                                                              Regression                                                                          rockCluster() cluster a data matrix using the Rock algorithm (cba)
arules mine frequent itemsets, maximal frequent itemsets, closed frequent item-                                                                                           proximus() cluster the rows of a logical matrix using the Proximus algorithm
        sets and association rules. It includes two algorithms, Apriori and Eclat.
                                                                                     Functions                                                                                    (cba)
arulesViz visualizing association rules                                              lm() linear regression                                                               isopam() Isopam clustering algorithm (isopam)
                                                                                     glm() generalized linear regression                                                  LLAhclust() hierarchical clustering based on likelihood linkage analysis
 Sequential Patterns                                                                 nls() non-linear regression                                                                  (LLAhclust)
                                                                                     predict() predict with models                                                        flashClust() optimal hierarchical clustering (flashClust)
Functions
                                                                                     residuals() residuals, the difference between observed values and fitted val-         fastcluster() fast hierarchical clustering (fastcluster)
cspade() mining frequent sequential patterns with the cSPADE algorithm                      ues
     (arulesSequences)                                                                                                                                                    cutreeDynamic(), cutreeHybrid() detection of clusters in hierarchi-
                                                                                     gls() fit a linear model using generalized least squares (nlme)                               cal clustering dendrograms (dynamicTreeCut)
seqefsub() searching for frequent subsequences (TraMineR)                            gnls() fit a nonlinear model using generalized least squares (nlme)                   HierarchicalSparseCluster() hierarchical sparse clustering (sparcl)
Packages                                                                             Packages
arulesSequences add-on for arules to handle and mine frequent sequences              nlme linear and nonlinear mixed effects models                                       Model based Clustering
TraMineR mining, describing and visualizing sequences of states or events
                                                                                      Clustering                                                                          Mclust() model-based clustering (mclust)
 Classification & Prediction                                                                                                                                               HDDC() a model-based method for high dimensional data clustering (HDclas-
                                                                                     Partitioning based Clustering                                                             sif )
Decision Trees                                                                       partition the data into k groups first and then try to improve the quality of clus-   fixmahal() Mahalanobis Fixed Point Clustering (fpc)
ctree() conditional inference trees, recursive partitioning for continuous, cen-     tering by moving objects from one group to another                                   fixreg() Regression Fixed Point Clustering (fpc)
      sored, ordered, nominal and multivariate response variables in a condi-        kmeans() perform k-means clustering on a data matrix                                 mergenormals() clustering by merging Gaussian mixture components (fpc)
      tional inference framework (party)                                             kmeansCBI() interface function for kmeans (fpc)
rpart() recursive partitioning and regression trees (rpart)
                                                                                                                                                                          Density based Clustering
                                                                                     kmeansruns() call kmeans for the k-means clustering method and includes              generate clusters by connecting dense regions
mob() model-based recursive partitioning, yielding a tree with fitted models                   estimation of the number of clusters and finding an optimal solution from
      associated with each terminal node (party)                                                                                                                          dbscan(data,eps,MinPts,...) generate a density based clustering of
                                                                                              several starting points (fpc)                                                       arbitrary shapes, with neighborhood radius set as eps and density thresh-
Random Forest                                                                        pam() the Partitioning Around Medoids (PAM) clustering method (cluster)                      old as MinPts (fpc)
cforest() random forest and bagging ensemble (party)                                 pamk() the Partitioning Around Medoids (PAM) clustering method with esti-            pdfCluster() clustering via kernel density estimation (pdfCluster)
randomForest() random forest (randomForest)                                                   mation of number of clusters (fpc)
varimp() variable importance (party)                                                 cluster.optimal() search for the optimal k-clustering of the dataset
                                                                                                                                                                          Other Clustering Techniques
importance() variable importance (randomForest)                                               (bayesclust)
                                                                                     clara() Clustering Large Applications (cluster)                                      mixer() random graph clustering (mixer)
Neural Networks                                                                                                                                                           nncluster() fast clustering with restarted minimum spanning tree (nnclust)
                                                                                     fanny(x,k,...) compute a fuzzy clustering of the data into k clusters (clus-
nnet() fit single-hidden-layer neural network (nnet)                                                                                                                       orclus() ORCLUS subspace clustering (orclus)
                                                                                              ter)
Support Vector Machine (SVM)                                                         kcca() k-centroids clustering (flexclust)                                             Plotting Clustering Solutions
svm() train a support vector machine for regression, classification or density-       ccfkms() clustering with Conjugate Convex Functions (cba)                            plotcluster() visualisation of a clustering or grouping in data (fpc)
      estimation (e1071)                                                             apcluster() affinity propagation clustering for a given similarity matrix (ap-        bannerplot() a horizontal barplot visualizing a hierarchical clustering (clus-
ksvm() support vector machines (kernlab)                                                      cluster)                                                                         ter)
Cluster Validation                                                                       Packages                                                                     SnowballStemmer() Snowball word stemmers (Snowball)
silhouette() compute or extract silhouette information (cluster)                         extremevalues detect extreme values in one-dimensional data                  LDA() fit a LDA (latent Dirichlet allocation) model (topicmodels)
cluster.stats() compute several cluster validity statistics from a cluster-              mvoutlier multivariate outlier detection based on robust methods             CTM() fit a CTM (correlated topics model) model (topicmodels)
     ing and a dissimilarity matrix (fpc)                                                outliers some tests commonly used for identifying outliers                   terms() extract the most likely terms for each topic (topicmodels)
clValid() calculate validation measures for a given set of clustering algo-              Rlof a parallel implementation of the LOF algorithm                          topics() extract the most likely topics for each document (topicmodels)
     rithms and number of clusters (clValid)
                                                                                          Time Series Analysis                                                        Packages
clustIndex() calculate the values of several clustering indexes, which can                                                                                            tm a framework for text mining applications
     be independently used to determine the number of clusters existing in a             Construction & Plot                                                          lda fit topic models with LDA
     data set (cclust)                                                                   ts() create time-series objects (stats)                                      topicmodels fit topic models with LDA and CTM
NbClust() provide 30 indices for cluster validation and determining the num-             plot.ts() plot time-series objects (stats)                                   RTextTools automatic text classification via supervised learning
     ber of clusters (NbClust)                                                           smoothts() time series smoothing (ast)                                       tm.plugin.dc a plug-in for package tm to support distributed text mining
Packages                                                                                 sfilter() remove seasonal fluctuation using moving average (ast)              tm.plugin.mail a plug-in for package tm to handle mail
cluster cluster analysis                                                                 Decomposition                                                                RcmdrPlugin.TextMining GUI for demonstration of text mining concepts and
fpc various methods for clustering and cluster validation                                                                                                                       tm package
                                                                                         decomp() time series decomposition by square-root filter (timsac)
mclust model-based clustering and normal mixture modeling                                                                                                             textir a suite of tools for inference about text documents and associated sentiment
                                                                                         decompose() classical seasonal decomposition by moving averages (stats)
birch clustering very large datasets using the BIRCH algorithm                                                                                                        tau utilities for text analysis
                                                                                         stl() seasonal decomposition of time series by loess (stats)
pvclust hierarchical clustering with p-values                                                                                                                         textcat n-gram based text categorization
                                                                                         tsr() time series decomposition (ast)
apcluster Affinity Propagation Clustering                                                                                                                              YjdnJlp Japanese text analysis by Yahoo! Japan Developer Network
                                                                                         ardec() time series autoregressive decomposition (ArDec)
cclust Convex Clustering methods, including k-means algorithm, On-line Up-                                                                                             Social Network Analysis and Graph Mining
         date algorithm and Neural Gas algorithm and calculation of indexes for          Forecasting
         finding the number of clusters in a data set                                     arima() fit an ARIMA model to a univariate time series (stats)                Functions
cba Clustering for Business Analytics, including clustering techniques such as           predict.Arima() forecast from models fitted by arima (stats)                  graph(), graph.edgelist(), graph.adjacency(),
         Proximus and Rock                                                               auto.arima() fit best ARIMA model to univariate time series (forecast)             graph.incidence() create graph objects respectively from edges,
bclust Bayesian clustering using spike-and-slab hierarchical model, suitable for         forecast.stl(), forecast.ets(), forecast.Arima()                                  an edge list, an adjacency matrix and an incidence matrix (igraph)
         clustering high-dimensional data                                                     forecast time series using stl, ets and arima models (forecast)         plot(), tkplot() static and interactive plotting of graphs (igraph)
biclust algorithms to find bi-clusters in two-dimensional data                            Packages                                                                     gplot(), gplot3d() plot graphs (sna)
clue cluster ensembles                                                                   forecast displaying and analysing univariate time series forecasts           V(), E() vertex/edge sequence of igraph (igraph)
clues clustering method based on local shrinking                                         timsac time series analysis and control program                              are.connected() check whether two nodes are connected (igraph)
clValid validation of clustering results                                                 ast time series analysis                                                     degree(), betweenness(), closeness() various centrality scores
clv cluster validation techniques, contains popular internal and external cluster        ArDec time series autoregressive-based decomposition                              (igraph, sna)
         validation methods for outputs produced by package cluster                      ares a toolbox for time series analyses using generalized additive models    add.edges(), add.vertices(), delete.edges(),
bayesclust tests/searches for significant clusters in genetic data                        dse tools for multivariate, linear, time-invariant, time series models            delete.vertices() add and delete edges and vertices (igraph)
clustvarsel variable selection for model-based clustering                                                                                                             neighborhood() neighborhood of graph vertices (igraph, sna)
clustsig significant cluster analysis, tests to see which (if any) clusters are statis-    Text Mining                                                                 get.adjlist() adjacency lists for edges or vertices (igraph)
         tically different                                                               Functions                                                                    nei(), adj(), from(), to() vertex/edge sequence indexing (igraph)
clusterfly explore clustering interactively                                                                                                                            cliques() find cliques, ie. complete subgraphs (igraph)
                                                                                         Corpus() build a corpus, which is a collection of text documents (tm)
clusterSim search for optimal clustering procedure for a data set                                                                                                     clusters() maximal connected components of a graph (igraph)
                                                                                         tm map() transform text documents, e.g., stemming, stopword removal (tm)
clusterGeneration random cluster generation                                                                                                                           %->%, %<-%, %--% edge sequence indexing (igraph)
                                                                                         tm filter() filtering out documents (tm)
clusterCons calculate the consensus clustering result from re-sampled clustering                                                                                      get.edgelist() return an edge list in a two-column matrix (igraph)
                                                                                         TermDocumentMatrix(), DocumentTermMatrix() construct a
         experiments with the option of using multiple algorithms and parameter                                                                                       read.graph(), write.graph() read and writ graphs from and to files
                                                                                               term-document matrix or a document-term matrix (tm)
gcExplorer graphical cluster explorer                                                                                                                                      (igraph)
                                                                                         Dictionary() construct a dictionary from a character vector or a term-
hybridHclust hybrid hierarchical clustering via mutual clusters                                                                                                       Packages
                                                                                               document matrix (tm)
Modalclust hierarchical modal Clustering
                                                                                         findAssocs() find associations in a term-document matrix (tm)                 sna social network analysis
iCluster integrative clustering of multiple genomic data types
                                                                                         findFreqTerms() find frequent terms in a term-document matrix (tm)            igraph network analysis and visualization
EMCC evolutionary Monte Carlo (EMC) methods for clustering
                                                                                         stemDocument() stem words in a text document (tm)                            statnet a set of tools for the representation, visualization, analysis and simulation
rEMM extensible Markov Model (EMM) for data stream clustering
                                                                                         stemCompletion() complete stemmed words (tm)                                          of network data
 Outlier Detection                                                                       termFreq() generate a term frequency vector from a text document (tm)        egonet ego-centric measures in social network analysis
                                                                                         stopwords(language) return stopwords in different languages (tm)             snort social network-analysis on relational tables
Functions
                                                                                         removeNumbers(), removePunctuation(), removeWords() re-                      network tools to create and modify network objects
boxplot.stats()$out list data points lying beyond the extremes of the                          move numbers, punctuation marks, or a set of words from a text docu-   bipartite visualising bipartite networks and calculating some (ecological) indices
      whiskers                                                                                 ment (tm)                                                              blockmodelinggeneralized and classical blockmodeling of valued networks
lofactor() calculate local outlier factors using the LOF algorithm (DMwR                 removeSparseTerms() remove sparse terms from a term-document matrix          diagram visualising simple graphs (networks), plotting flow diagrams
      or dprep)                                                                                (tm)                                                                   NetCluster clustering for networks
lof() a parallel implementation of the LOF algorithm (Rlof )                             textcat() n-gram based text categorization (textcat)                         NetData network data for McFarland’s SNA R labs
NetIndices estimating network indices, including trophic structure of foodwebs   Packages                                                                     googleVis an interface between R and the Google Visualisation API to create
         in R                                                                    nlme linear and nonlinear mixed effects models                                        interactive charts
NetworkAnalysis statistical inference on populations of weighted or unweighted                                                                                lattice a powerful high-level data visualization system, with an emphasis on mul-
         networks
                                                                                  Graphics                                                                             tivariate data
tnet analysis of weighted, two-mode, and longitudinal networks                   Functions                                                                    vcd visualizing categorical data
triads triad census for networks                                                 plot() generic function for plotting (graphics)                              denpro visualization of multivariate, functions, sets, and data
 Spatial Data Analysis                                                           barplot(), pie(), hist() bar chart, pie chart and histogram (graph-          iplots interactive graphics

Functions
                                                                                      ics)                                                                     Data Manipulation
                                                                                 boxplot() box-and-whisker plot (graphics)
geocode() geocodes a location using Google Maps (ggmap)                          stripchart() one dimensional scatter plot (graphics)                         Functions
qmap() quick map plot (ggmap)                                                    dotchart() Cleveland dot plot (graphics)                                     transform() transform a data frame
get map() queries the Google Maps, OpenStreetMap, or Stamen Maps server          qqnorm(), qqplot(), qqline() QQ (quantile-quantile) plot (stats)             scale() scaling and centering of matrix-like objects
      for a map at a certain location (ggmap)                                    coplot() conditioning plot (graphics)                                        t() matrix transpose
gvisGeoChart(), gvisGeoMap(), gvisIntensityMap(),                                splom() conditional scatter plot matrices (lattice)                          aperm() array transpose
      gvisMap() Google geo charts and maps (googleVis)                           pairs() a matrix of scatterplots (graphics)                                  sample() sampling
GetMap() download a static map from the Google server (RgoogleMaps)              cpairs() enhanced scatterplot matrix (gclus)                                 table(), tabulate(), xtabs() cross tabulation (stats)
ColorMap() plot levels of a variable in a colour-coded map (RgoogleMaps)         parcoord() parallel coordinate plot (MASS)                                   stack(), unstack() stacking vectors
PlotOnStaticMap() overlay plot on background image of map tile                   cparcoord() enhanced parallel coordinate plot (gclus)                        split(), unsplit() divide data into groups and reassemble
      (RgoogleMaps)                                                              paracoor() parallel coordinates plot (denpro)                                reshape() reshape a data frame between “wide” and “long” format (stats)
TextOnStaticMap() plot text on map (RgoogleMaps)                                 parallelplot() parallel coordinates plot (lattice)                           merge() merge two data frames; similar to database join operations
Packages                                                                         densityplot() kernel density plot (lattice)                                  aggregate() compute summary statistics of data subsets (stats)
plotGoogleMaps plot spatial data as HTML map mushup over Google Maps             contour(), filled.contour() contour plot (graphics)                          by() apply a function to a data frame split by factors
RgoogleMaps overlay on Google map tiles in R                                     levelplot(), contourplot() level plots and contour plots (lattice)           melt(), cast() melt and then cast data into the reshaped or aggregated
plotKML visualization of spatial and spatio-temporal objects in Google Earth     smoothScatter() scatterplots with smoothed densities color representation;         form you want (reshape)
ggmap Spatial visualization with Google Maps and OpenStreetMap                        capable of visualizing large datasets (graphics)                        complete.cases() find complete cases, i.e., cases without missing values
clustTool GUI for clustering data with spatial information                       sunflowerplot() a sunflower scatter plot (graphics)                           na.fail, na.omit, na.exclude, na.pass handle missing values
SGCS Spatial Graph based Clustering Summaries for spatial point patterns         assocplot() association plot (graphics)                                      Packages
spdep spatial dependence: weighting schemes, statistics and models               mosaicplot() mosaic plot (graphics)
                                                                                                                                                              reshape flexibly restructure and aggregate data
                                                                                 matplot() plot the columns of one matrix against the columns of another
 Statistics                                                                                                                                                   data.table extension of data.frame for fast indexing, ordered joins, assignment,
                                                                                      (graphics)
                                                                                                                                                                      and grouping and list columns
Summarization                                                                    fourfoldplot() a fourfold display of a 2 × 2 × k contingency table (graph-
                                                                                                                                                              gdata various tools for data manipulation
                                                                                      ics)
summary() summarize data
describe() concise statistical description of data (Hmisc)
                                                                                 persp() perspective plots of surfaces over the x?y plane (graphics)           Data Access
                                                                                 cloud(), wireframe() 3d scatter plots and surfaces (lattice)                 Functions
boxplot.stats() box plot statistics
                                                                                 interaction.plot() two-way interaction plot (stats)
Analysis of Variance                                                             iplot(), ihist(), ibar(), ipcp() interactive scatter plot, his-              save(), load() save and load R data objects
aov() fit an analysis of variance model (stats)                                        togram, bar plot, and parallel coordinates plot (iplots)                read.csv(), write.csv() import from and export to .CSV files
anova() compute analysis of variance (or deviance) tables for one or more        pdf(), postscript(), win.metafile(), jpeg(), bmp(),                          read.table(), write.table(), scan(), write() read and
      fitted model objects (stats)                                                     png(), tiff() save graphs into files of various formats                       write data
                                                                                                                                                              write.matrix() write a matrix or data frame (MASS)
Statistical Test                                                                 gvisAnnotatedTimeLine(), gvisAreaChart(),
                                                                                                                                                              readLines(), writeLines() read/write text lines from/to a connection,
t.test() student’s t-test (stats)                                                     gvisBarChart(), gvisBubbleChart(),
                                                                                      gvisCandlestickChart(), gvisColumnChart(),                                   such as a text file
prop.test() test of equal or given proportions (stats)                                                                                                        sqlQuery() submit an SQL query to an ODBC database (RODBC)
binom.test() exact binomial test (stats)                                              gvisComboChart(), gvisGauge(), gvisGeoChart(),
                                                                                      gvisGeoMap(), gvisIntensityMap(),                                       sqlFetch() read a table from an ODBC database (RODBC)
Mixed Effects Models                                                                  gvisLineChart(), gvisMap(), gvisMerge(),                                sqlSave(), sqlUpdate() write or update a table in an ODBC database
lme() fit a linear mixed-effects model (nlme)                                          gvisMotionChart(), gvisOrgChart(),                                           (RODBC)
nlme() fit a nonlinear mixed-effects model (nlme)                                      gvisPieChart(), gvisScatterChart(),                                     sqlColumns() enquire about the column structure of tables (RODBC)
                                                                                                                                                              sqlTables() list tables on an ODBC connection (RODBC)
Principal Components and Factor Analysis                                              gvisSteppedAreaChart(), gvisTable(),
                                                                                      gvisTreeMap() various interactive charts produced with the Google       odbcConnect(), odbcClose(), odbcCloseAll() open/close con-
princomp() principal components analysis (stats)                                                                                                                   nections to ODBC databases (RODBC)
prcomp() principal components analysis (stats)                                        Visualisation API (googleVis)
                                                                                 gvisMerge() merge two googleVis charts into one (googleVis)                  dbSendQuery execute an SQL statement on a given database connection
Other Functions                                                                                                                                                    (DBI)
var(), cov(), cor() variance, covariance, and correlation (stats)
                                                                                 Packages                                                                     dbConnect(), dbDisconnect() create/close a connection to a DBMS
density() compute kernel density estimates (stats)                               ggplot2 an implementation of the Grammar of Graphics                              (DBI)
Packages                                                                     snowfall usability wrapper around snow for easier development of parallel R      gWidgets a toolkit-independent API for building interactive GUIs
RODBC ODBC database access                                                           programs                                                                 Red-R An open source visual programming GUI interface for R
DBI a database interface (DBI) between R and relational DBMS                 snowFT extension of snow supporting fault tolerant and reproducible applica-     R AnalyticFlow a software which enables data analysis by drawing analysis
RMySQL interface to the MySQL database                                               tions, and easy-to-use parallel programming                                        flowcharts
RJDBC access to databases through the JDBC interface                         Rmpi interface (Wrapper) to MPI (Message-Passing Interface)                      latticist a graphical user interface for exploratory visualisation
RSQLite SQLite interface for R                                               rpvm R interface to PVM (Parallel Virtual Machine)
                                                                             nws provide coordination and parallel execution facilities
                                                                                                                                                              Other R Reference Cards
ROracle Oracle database interface (DBI) driver
                                                                             foreach foreach looping construct for R                                          R Reference Card, by Tom Short
RpgSQL DBI/RJDBC interface to PostgreSQL database
                                                                             doMC foreach parallel adaptor for the multicore package                           http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/
RODM interface to Oracle Data Mining
                                                                             doSNOW foreach parallel adaptor for the snow package                              R-refcard.pdf or
xlsReadWrite read and write Excel files
                                                                             doMPI foreach parallel adaptor for the Rmpi package                               http://cran.r-project.org/doc/contrib/Short-refcard.pdf
WriteXLS create Excel 2003 (XLS) files from data frames
                                                                             doParallel foreach parallel adaptor for the multicore package                    R Reference Card, by Jonathan Baron
Big Data                                                                     doRNG generic reproducible parallel backend for foreach Loops                     http://cran.r-project.org/doc/contrib/refcard.pdf
                                                                             GridR execute functions on remote hosts, clusters or grids                       R Functions for Regression Analysis, by Vito Ricci
Functions                                                                                                                                                      http://cran.r-project.org/doc/contrib/Ricci-refcard-regression.
as.ffdf() coerce a dataframe to an ffdf (ff )                                fork R functions for handling multiple processes
                                                                                                                                                               pdf
read.table.ffdf(), read.csv.ffdf() read data from a flat file to                Generating Reports                                                              R Functions for Time Series Analysis, by Vito Ricci
     an ffdf object (ff )                                                                                                                                      http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf
write.table.ffdf(), write.csv.ffdf() write an ffdf object to a               Sweave() mixing text and R/S code for automatic report generation (utils)
     flat file (ff )                                                           knitr a general-purpose package for dynamic report generation in R               RDataMining Website, Package, Twitter & Groups
ffdfappend() append a dataframe or an ffdf to an existing ffdf (ffdf )       R2HTML making HTML reports
                                                                             R2PPT generating Microsoft PowerPoint presentations                               RDataMining Website:     http://www.rdatamining.com
big.matrix() create a standard big.matrix, which is constrained to avail-                                                                                      Group on LinkedIn:       http://group.rdatamining.com
     able RAM (bigmemory)                                                     Interface to Weka                                                                Group on Google:         http://group2.rdatamining.com
read.big.matrix() create a big.matrix by reading from an ASCII file           Package RWeka is an R interface to Weka, and enables to use the following Weka    Twitter:                 http://twitter.com/rdatamining
     (bigmemory)                                                                      functions in R.                                                          RDataMining Package:     http://www.rdatamining.com/package
write.big.matrix() write a big.matrix to a file (bigmemory)                   Association rules:                                                                                         http://package.rdatamining.com
filebacked.big.matrix() create a file-backed big.matrix, which may                     Apriori(), Tertius()
     exceed available RAM by using hard drive space (bigmemory)              Regression and classification:
mwhich() expanded “which”-like functionality (bigmemory)                              LinearRegression(), Logistic(), SMO()
Packages                                                                     Lazy classifiers:
ff memory-efficient storage of large data on disk and fast access functions            IBk(), LBR()
ffbase basic statistical functions for package ff                            Meta classifiers:
filehash a simple key-value database for handling large data                           AdaBoostM1(), Bagging(), LogitBoost(),
g.data create and maintain delayed-data packages                                      MultiBoostAB(), Stacking(),
BufferedMatrix a matrix data storage object held in temporary files                    CostSensitiveClassifier()
biglm regression for data too large to fit in memory                          Rule classifiers:
bigmemory manage massive matrices with shared memory and memory-mapped                JRip(), M5Rules(), OneR(), PART()
        files                                                                 Regression and classification trees:
biganalytics extend the bigmemory package with various analytics                      J48(), LMT(), M5P(), DecisionStump()
bigtabulate table-, tapply-, and split-like functionality for matrix and     Clustering:
        big.matrix objects                                                            Cobweb(), FarthestFirst(), SimpleKMeans(),
                                                                                      XMeans(), DBScan()
Parallel Computing                                                           Filters:
Functions                                                                             Normalize(), Discretize()
foreach(...)        %dopar% looping in parallel (foreach)                    Word stemmers:
registerDoSEQ(), registerDoSNOW(), registerDoMC() regis-                              IteratedLovinsStemmer(), LovinsStemmer()
     ter respectively the sequential, SNOW and multicore parallel backend    Tokenizers:
     with the foreach package (foreach, doSNOW, doMC)                                 AlphabeticTokenizer(), NGramTokenizer(),
sfInit(), sfStop() initialize and stop the cluster (snowfall)                         WordTokenizer()
sfLapply(), sfSapply(), sfApply()                parallel  versions    of     Editors/GUIs
     lapply(), sapply(), apply() (snowfall)                                  Tinn-R a free GUI for R language and environment
Packages                                                                     RStudio a free integrated development environment (IDE) for R
multicore parallel processing of R code on machines with multiple cores or   rattle graphical user interface for data mining in R
       CPUs                                                                  Rpad workbook-style, web-based interface to R
snow simple parallel computing in R                                          RPMG graphical user interface (GUI) for interactive R analysis sessions

Contenu connexe

Tendances

Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data StructureSakthi Dasans
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examplesDennis
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in REshwar Sai
 
IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Laura Hughes
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
 

Tendances (17)

Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
3 R Tutorial Data Structure
3 R Tutorial Data Structure3 R Tutorial Data Structure
3 R Tutorial Data Structure
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in R
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
R교육1
R교육1R교육1
R교육1
 
R gráfico
R gráficoR gráfico
R gráfico
 
R language introduction
R language introductionR language introduction
R language introduction
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Sql
SqlSql
Sql
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 

En vedette

Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Cheat sheets for data scientists
Cheat sheets for data scientistsCheat sheets for data scientists
Cheat sheets for data scientistsAjay Ohri
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with RYanchang Zhao
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with RYanchang Zhao
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with RYanchang Zhao
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with RYanchang Zhao
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataYanchang Zhao
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheetGil Cohen
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat SheetGlowTouch
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with RYanchang Zhao
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slidesYanchang Zhao
 
Python at yhat (august 2013)
Python at yhat (august 2013)Python at yhat (august 2013)
Python at yhat (august 2013)Austin Ogilvie
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.Dr. Volkan OBAN
 
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)Austin Ogilvie
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 
Ggplot in python
Ggplot in pythonGgplot in python
Ggplot in pythonAjay Ohri
 

En vedette (20)

Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Cheat sheets for data scientists
Cheat sheets for data scientistsCheat sheets for data scientists
Cheat sheets for data scientists
 
Data Exploration and Visualization with R
Data Exploration and Visualization with RData Exploration and Visualization with R
Data Exploration and Visualization with R
 
Advanced R cheat sheet
Advanced R cheat sheetAdvanced R cheat sheet
Advanced R cheat sheet
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Data Clustering with R
Data Clustering with RData Clustering with R
Data Clustering with R
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
Association Rule Mining with R
Association Rule Mining with RAssociation Rule Mining with R
Association Rule Mining with R
 
Time Series Analysis and Mining with R
Time Series Analysis and Mining with RTime Series Analysis and Mining with R
Time Series Analysis and Mining with R
 
Text Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter DataText Mining with R -- an Analysis of Twitter Data
Text Mining with R -- an Analysis of Twitter Data
 
Python
PythonPython
Python
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheet
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
 
An Introduction to Data Mining with R
An Introduction to Data Mining with RAn Introduction to Data Mining with R
An Introduction to Data Mining with R
 
Time series-mining-slides
Time series-mining-slidesTime series-mining-slides
Time series-mining-slides
 
Python at yhat (august 2013)
Python at yhat (august 2013)Python at yhat (august 2013)
Python at yhat (august 2013)
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.
 
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
 
Analyze this
Analyze thisAnalyze this
Analyze this
 
Ggplot in python
Ggplot in pythonGgplot in python
Ggplot in python
 

Similaire à R Reference Card for Data Mining

RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-cardYanchang Zhao
 
Scala parallel-collections
Scala parallel-collectionsScala parallel-collections
Scala parallel-collectionsKnoldus Inc.
 
Scala parallel-collections
Scala parallel-collectionsScala parallel-collections
Scala parallel-collectionsKnoldus Inc.
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindingsDmitriy Lyubimov
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programmingNixon Mendez
 
Duchowski Scanpath Comparison Revisited
Duchowski Scanpath Comparison RevisitedDuchowski Scanpath Comparison Revisited
Duchowski Scanpath Comparison RevisitedKalle
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_publicLong Nguyen
 
BIM_2010_20_Bioinformatics_Project
BIM_2010_20_Bioinformatics_ProjectBIM_2010_20_Bioinformatics_Project
BIM_2010_20_Bioinformatics_ProjectSagar Nikam
 
Tables - TMS320C30/C40 DSP Manual
Tables - TMS320C30/C40 DSP ManualTables - TMS320C30/C40 DSP Manual
Tables - TMS320C30/C40 DSP ManualSethCopeland
 
R Machine Learning packages( generally used)
R Machine Learning packages( generally used)R Machine Learning packages( generally used)
R Machine Learning packages( generally used)Dr. Volkan OBAN
 
Recursion schemes in Scala
Recursion schemes in ScalaRecursion schemes in Scala
Recursion schemes in ScalaArthur Kushka
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu
 

Similaire à R Reference Card for Data Mining (20)

RDataMining-reference-card
RDataMining-reference-cardRDataMining-reference-card
RDataMining-reference-card
 
R refcard-data-mining
R refcard-data-miningR refcard-data-mining
R refcard-data-mining
 
Introduction
IntroductionIntroduction
Introduction
 
Scala parallel-collections
Scala parallel-collectionsScala parallel-collections
Scala parallel-collections
 
Scala parallel-collections
Scala parallel-collectionsScala parallel-collections
Scala parallel-collections
 
Mahout scala and spark bindings
Mahout scala and spark bindingsMahout scala and spark bindings
Mahout scala and spark bindings
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
Duchowski Scanpath Comparison Revisited
Duchowski Scanpath Comparison RevisitedDuchowski Scanpath Comparison Revisited
Duchowski Scanpath Comparison Revisited
 
EROSについて
EROSについてEROSについて
EROSについて
 
ML-CheatSheet (1).pdf
ML-CheatSheet (1).pdfML-CheatSheet (1).pdf
ML-CheatSheet (1).pdf
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 
BIM_2010_20_Bioinformatics_Project
BIM_2010_20_Bioinformatics_ProjectBIM_2010_20_Bioinformatics_Project
BIM_2010_20_Bioinformatics_Project
 
Tables - TMS320C30/C40 DSP Manual
Tables - TMS320C30/C40 DSP ManualTables - TMS320C30/C40 DSP Manual
Tables - TMS320C30/C40 DSP Manual
 
R Machine Learning packages( generally used)
R Machine Learning packages( generally used)R Machine Learning packages( generally used)
R Machine Learning packages( generally used)
 
Recursion schemes in Scala
Recursion schemes in ScalaRecursion schemes in Scala
Recursion schemes in Scala
 
Collections forceawakens
Collections forceawakensCollections forceawakens
Collections forceawakens
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFP
 
2.3 implicits
2.3 implicits2.3 implicits
2.3 implicits
 

Plus de Yanchang Zhao

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisYanchang Zhao
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationYanchang Zhao
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rYanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 

Plus de Yanchang Zhao (8)

RDataMining slides-time-series-analysis
RDataMining slides-time-series-analysisRDataMining slides-time-series-analysis
RDataMining slides-time-series-analysis
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
RDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisationRDataMining slides-data-exploration-visualisation
RDataMining slides-data-exploration-visualisation
 
RDataMining slides-clustering-with-r
RDataMining slides-clustering-with-rRDataMining slides-clustering-with-r
RDataMining slides-clustering-with-r
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 

Dernier

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 

Dernier (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 

R Reference Card for Data Mining

  • 1. R Reference Card for Data Mining Performance Evaluation apclusterK() affinity propagation clustering to get K clusters (apcluster) performance() provide various measures for evaluating performance of pre- cclust() Convex Clustering, incl. k-means and two other clustering algo- by Yanchang Zhao, yanchang@rdatamining.com, January 3, 2013 diction and classification models (ROCR) rithms (cclust) The latest version is available at http://www.RDataMining.com. Click the link roc() build a ROC curve (pROC) KMeansSparseCluster() sparse k-means clustering (sparcl) also for document R and Data Mining: Examples and Case Studies. auc() compute the area under the ROC curve (pROC) tclust(x,k,alpha,...) trimmed k-means with which a proportion The package names are in parentheses. ROC() draw a ROC curve (DiagnosisMed) alpha of observations may be trimmed (tclust) PRcurve() precision-recall curves (DMwR) Association Rules & Frequent Itemsets CRchart() cumulative recall charts (DMwR) Hierarchical Clustering Packages a hierarchical decomposition of data in either bottom-up (agglomerative) or top- APRIORI Algorithm rpart recursive partitioning and regression trees down (divisive) way a level-wise, breadth-first algorithm which counts transactions to find frequent party recursive partitioning hclust(d, method, ...) hierarchical cluster analysis on a set of dissim- itemsets randomForest classification and regression based on a forest of trees using ran- ilarities d using the method for agglomeration apriori() mine associations with APRIORI algorithm (arules) dom inputs birch() the BIRCH algorithm that clusters very large data with a CF-tree rpartOrdinal ordinal classification trees, deriving a classification tree when the (birch) ECLAT Algorithm response to be predicted is ordinal pvclust() hierarchical clustering with p-values via multi-scale bootstrap re- rpart.plot plots rpart models with an enhanced version of plot.rpart in the sampling (pvclust) employs equivalence classes, depth-first search and set intersection instead of rpart package agnes() agglomerative hierarchical clustering (cluster) counting ROCR visualize the performance of scoring classifiers diana() divisive hierarchical clustering (cluster) eclat() mine frequent itemsets with the Eclat algorithm (arules) pROC display and analyze ROC curves mona() divisive hierarchical clustering of a dataset with binary variables only (cluster) Packages Regression rockCluster() cluster a data matrix using the Rock algorithm (cba) arules mine frequent itemsets, maximal frequent itemsets, closed frequent item- proximus() cluster the rows of a logical matrix using the Proximus algorithm sets and association rules. It includes two algorithms, Apriori and Eclat. Functions (cba) arulesViz visualizing association rules lm() linear regression isopam() Isopam clustering algorithm (isopam) glm() generalized linear regression LLAhclust() hierarchical clustering based on likelihood linkage analysis Sequential Patterns nls() non-linear regression (LLAhclust) predict() predict with models flashClust() optimal hierarchical clustering (flashClust) Functions residuals() residuals, the difference between observed values and fitted val- fastcluster() fast hierarchical clustering (fastcluster) cspade() mining frequent sequential patterns with the cSPADE algorithm ues (arulesSequences) cutreeDynamic(), cutreeHybrid() detection of clusters in hierarchi- gls() fit a linear model using generalized least squares (nlme) cal clustering dendrograms (dynamicTreeCut) seqefsub() searching for frequent subsequences (TraMineR) gnls() fit a nonlinear model using generalized least squares (nlme) HierarchicalSparseCluster() hierarchical sparse clustering (sparcl) Packages Packages arulesSequences add-on for arules to handle and mine frequent sequences nlme linear and nonlinear mixed effects models Model based Clustering TraMineR mining, describing and visualizing sequences of states or events Clustering Mclust() model-based clustering (mclust) Classification & Prediction HDDC() a model-based method for high dimensional data clustering (HDclas- Partitioning based Clustering sif ) Decision Trees partition the data into k groups first and then try to improve the quality of clus- fixmahal() Mahalanobis Fixed Point Clustering (fpc) ctree() conditional inference trees, recursive partitioning for continuous, cen- tering by moving objects from one group to another fixreg() Regression Fixed Point Clustering (fpc) sored, ordered, nominal and multivariate response variables in a condi- kmeans() perform k-means clustering on a data matrix mergenormals() clustering by merging Gaussian mixture components (fpc) tional inference framework (party) kmeansCBI() interface function for kmeans (fpc) rpart() recursive partitioning and regression trees (rpart) Density based Clustering kmeansruns() call kmeans for the k-means clustering method and includes generate clusters by connecting dense regions mob() model-based recursive partitioning, yielding a tree with fitted models estimation of the number of clusters and finding an optimal solution from associated with each terminal node (party) dbscan(data,eps,MinPts,...) generate a density based clustering of several starting points (fpc) arbitrary shapes, with neighborhood radius set as eps and density thresh- Random Forest pam() the Partitioning Around Medoids (PAM) clustering method (cluster) old as MinPts (fpc) cforest() random forest and bagging ensemble (party) pamk() the Partitioning Around Medoids (PAM) clustering method with esti- pdfCluster() clustering via kernel density estimation (pdfCluster) randomForest() random forest (randomForest) mation of number of clusters (fpc) varimp() variable importance (party) cluster.optimal() search for the optimal k-clustering of the dataset Other Clustering Techniques importance() variable importance (randomForest) (bayesclust) clara() Clustering Large Applications (cluster) mixer() random graph clustering (mixer) Neural Networks nncluster() fast clustering with restarted minimum spanning tree (nnclust) fanny(x,k,...) compute a fuzzy clustering of the data into k clusters (clus- nnet() fit single-hidden-layer neural network (nnet) orclus() ORCLUS subspace clustering (orclus) ter) Support Vector Machine (SVM) kcca() k-centroids clustering (flexclust) Plotting Clustering Solutions svm() train a support vector machine for regression, classification or density- ccfkms() clustering with Conjugate Convex Functions (cba) plotcluster() visualisation of a clustering or grouping in data (fpc) estimation (e1071) apcluster() affinity propagation clustering for a given similarity matrix (ap- bannerplot() a horizontal barplot visualizing a hierarchical clustering (clus- ksvm() support vector machines (kernlab) cluster) ter)
  • 2. Cluster Validation Packages SnowballStemmer() Snowball word stemmers (Snowball) silhouette() compute or extract silhouette information (cluster) extremevalues detect extreme values in one-dimensional data LDA() fit a LDA (latent Dirichlet allocation) model (topicmodels) cluster.stats() compute several cluster validity statistics from a cluster- mvoutlier multivariate outlier detection based on robust methods CTM() fit a CTM (correlated topics model) model (topicmodels) ing and a dissimilarity matrix (fpc) outliers some tests commonly used for identifying outliers terms() extract the most likely terms for each topic (topicmodels) clValid() calculate validation measures for a given set of clustering algo- Rlof a parallel implementation of the LOF algorithm topics() extract the most likely topics for each document (topicmodels) rithms and number of clusters (clValid) Time Series Analysis Packages clustIndex() calculate the values of several clustering indexes, which can tm a framework for text mining applications be independently used to determine the number of clusters existing in a Construction & Plot lda fit topic models with LDA data set (cclust) ts() create time-series objects (stats) topicmodels fit topic models with LDA and CTM NbClust() provide 30 indices for cluster validation and determining the num- plot.ts() plot time-series objects (stats) RTextTools automatic text classification via supervised learning ber of clusters (NbClust) smoothts() time series smoothing (ast) tm.plugin.dc a plug-in for package tm to support distributed text mining Packages sfilter() remove seasonal fluctuation using moving average (ast) tm.plugin.mail a plug-in for package tm to handle mail cluster cluster analysis Decomposition RcmdrPlugin.TextMining GUI for demonstration of text mining concepts and fpc various methods for clustering and cluster validation tm package decomp() time series decomposition by square-root filter (timsac) mclust model-based clustering and normal mixture modeling textir a suite of tools for inference about text documents and associated sentiment decompose() classical seasonal decomposition by moving averages (stats) birch clustering very large datasets using the BIRCH algorithm tau utilities for text analysis stl() seasonal decomposition of time series by loess (stats) pvclust hierarchical clustering with p-values textcat n-gram based text categorization tsr() time series decomposition (ast) apcluster Affinity Propagation Clustering YjdnJlp Japanese text analysis by Yahoo! Japan Developer Network ardec() time series autoregressive decomposition (ArDec) cclust Convex Clustering methods, including k-means algorithm, On-line Up- Social Network Analysis and Graph Mining date algorithm and Neural Gas algorithm and calculation of indexes for Forecasting finding the number of clusters in a data set arima() fit an ARIMA model to a univariate time series (stats) Functions cba Clustering for Business Analytics, including clustering techniques such as predict.Arima() forecast from models fitted by arima (stats) graph(), graph.edgelist(), graph.adjacency(), Proximus and Rock auto.arima() fit best ARIMA model to univariate time series (forecast) graph.incidence() create graph objects respectively from edges, bclust Bayesian clustering using spike-and-slab hierarchical model, suitable for forecast.stl(), forecast.ets(), forecast.Arima() an edge list, an adjacency matrix and an incidence matrix (igraph) clustering high-dimensional data forecast time series using stl, ets and arima models (forecast) plot(), tkplot() static and interactive plotting of graphs (igraph) biclust algorithms to find bi-clusters in two-dimensional data Packages gplot(), gplot3d() plot graphs (sna) clue cluster ensembles forecast displaying and analysing univariate time series forecasts V(), E() vertex/edge sequence of igraph (igraph) clues clustering method based on local shrinking timsac time series analysis and control program are.connected() check whether two nodes are connected (igraph) clValid validation of clustering results ast time series analysis degree(), betweenness(), closeness() various centrality scores clv cluster validation techniques, contains popular internal and external cluster ArDec time series autoregressive-based decomposition (igraph, sna) validation methods for outputs produced by package cluster ares a toolbox for time series analyses using generalized additive models add.edges(), add.vertices(), delete.edges(), bayesclust tests/searches for significant clusters in genetic data dse tools for multivariate, linear, time-invariant, time series models delete.vertices() add and delete edges and vertices (igraph) clustvarsel variable selection for model-based clustering neighborhood() neighborhood of graph vertices (igraph, sna) clustsig significant cluster analysis, tests to see which (if any) clusters are statis- Text Mining get.adjlist() adjacency lists for edges or vertices (igraph) tically different Functions nei(), adj(), from(), to() vertex/edge sequence indexing (igraph) clusterfly explore clustering interactively cliques() find cliques, ie. complete subgraphs (igraph) Corpus() build a corpus, which is a collection of text documents (tm) clusterSim search for optimal clustering procedure for a data set clusters() maximal connected components of a graph (igraph) tm map() transform text documents, e.g., stemming, stopword removal (tm) clusterGeneration random cluster generation %->%, %<-%, %--% edge sequence indexing (igraph) tm filter() filtering out documents (tm) clusterCons calculate the consensus clustering result from re-sampled clustering get.edgelist() return an edge list in a two-column matrix (igraph) TermDocumentMatrix(), DocumentTermMatrix() construct a experiments with the option of using multiple algorithms and parameter read.graph(), write.graph() read and writ graphs from and to files term-document matrix or a document-term matrix (tm) gcExplorer graphical cluster explorer (igraph) Dictionary() construct a dictionary from a character vector or a term- hybridHclust hybrid hierarchical clustering via mutual clusters Packages document matrix (tm) Modalclust hierarchical modal Clustering findAssocs() find associations in a term-document matrix (tm) sna social network analysis iCluster integrative clustering of multiple genomic data types findFreqTerms() find frequent terms in a term-document matrix (tm) igraph network analysis and visualization EMCC evolutionary Monte Carlo (EMC) methods for clustering stemDocument() stem words in a text document (tm) statnet a set of tools for the representation, visualization, analysis and simulation rEMM extensible Markov Model (EMM) for data stream clustering stemCompletion() complete stemmed words (tm) of network data Outlier Detection termFreq() generate a term frequency vector from a text document (tm) egonet ego-centric measures in social network analysis stopwords(language) return stopwords in different languages (tm) snort social network-analysis on relational tables Functions removeNumbers(), removePunctuation(), removeWords() re- network tools to create and modify network objects boxplot.stats()$out list data points lying beyond the extremes of the move numbers, punctuation marks, or a set of words from a text docu- bipartite visualising bipartite networks and calculating some (ecological) indices whiskers ment (tm) blockmodelinggeneralized and classical blockmodeling of valued networks lofactor() calculate local outlier factors using the LOF algorithm (DMwR removeSparseTerms() remove sparse terms from a term-document matrix diagram visualising simple graphs (networks), plotting flow diagrams or dprep) (tm) NetCluster clustering for networks lof() a parallel implementation of the LOF algorithm (Rlof ) textcat() n-gram based text categorization (textcat) NetData network data for McFarland’s SNA R labs
  • 3. NetIndices estimating network indices, including trophic structure of foodwebs Packages googleVis an interface between R and the Google Visualisation API to create in R nlme linear and nonlinear mixed effects models interactive charts NetworkAnalysis statistical inference on populations of weighted or unweighted lattice a powerful high-level data visualization system, with an emphasis on mul- networks Graphics tivariate data tnet analysis of weighted, two-mode, and longitudinal networks Functions vcd visualizing categorical data triads triad census for networks plot() generic function for plotting (graphics) denpro visualization of multivariate, functions, sets, and data Spatial Data Analysis barplot(), pie(), hist() bar chart, pie chart and histogram (graph- iplots interactive graphics Functions ics) Data Manipulation boxplot() box-and-whisker plot (graphics) geocode() geocodes a location using Google Maps (ggmap) stripchart() one dimensional scatter plot (graphics) Functions qmap() quick map plot (ggmap) dotchart() Cleveland dot plot (graphics) transform() transform a data frame get map() queries the Google Maps, OpenStreetMap, or Stamen Maps server qqnorm(), qqplot(), qqline() QQ (quantile-quantile) plot (stats) scale() scaling and centering of matrix-like objects for a map at a certain location (ggmap) coplot() conditioning plot (graphics) t() matrix transpose gvisGeoChart(), gvisGeoMap(), gvisIntensityMap(), splom() conditional scatter plot matrices (lattice) aperm() array transpose gvisMap() Google geo charts and maps (googleVis) pairs() a matrix of scatterplots (graphics) sample() sampling GetMap() download a static map from the Google server (RgoogleMaps) cpairs() enhanced scatterplot matrix (gclus) table(), tabulate(), xtabs() cross tabulation (stats) ColorMap() plot levels of a variable in a colour-coded map (RgoogleMaps) parcoord() parallel coordinate plot (MASS) stack(), unstack() stacking vectors PlotOnStaticMap() overlay plot on background image of map tile cparcoord() enhanced parallel coordinate plot (gclus) split(), unsplit() divide data into groups and reassemble (RgoogleMaps) paracoor() parallel coordinates plot (denpro) reshape() reshape a data frame between “wide” and “long” format (stats) TextOnStaticMap() plot text on map (RgoogleMaps) parallelplot() parallel coordinates plot (lattice) merge() merge two data frames; similar to database join operations Packages densityplot() kernel density plot (lattice) aggregate() compute summary statistics of data subsets (stats) plotGoogleMaps plot spatial data as HTML map mushup over Google Maps contour(), filled.contour() contour plot (graphics) by() apply a function to a data frame split by factors RgoogleMaps overlay on Google map tiles in R levelplot(), contourplot() level plots and contour plots (lattice) melt(), cast() melt and then cast data into the reshaped or aggregated plotKML visualization of spatial and spatio-temporal objects in Google Earth smoothScatter() scatterplots with smoothed densities color representation; form you want (reshape) ggmap Spatial visualization with Google Maps and OpenStreetMap capable of visualizing large datasets (graphics) complete.cases() find complete cases, i.e., cases without missing values clustTool GUI for clustering data with spatial information sunflowerplot() a sunflower scatter plot (graphics) na.fail, na.omit, na.exclude, na.pass handle missing values SGCS Spatial Graph based Clustering Summaries for spatial point patterns assocplot() association plot (graphics) Packages spdep spatial dependence: weighting schemes, statistics and models mosaicplot() mosaic plot (graphics) reshape flexibly restructure and aggregate data matplot() plot the columns of one matrix against the columns of another Statistics data.table extension of data.frame for fast indexing, ordered joins, assignment, (graphics) and grouping and list columns Summarization fourfoldplot() a fourfold display of a 2 × 2 × k contingency table (graph- gdata various tools for data manipulation ics) summary() summarize data describe() concise statistical description of data (Hmisc) persp() perspective plots of surfaces over the x?y plane (graphics) Data Access cloud(), wireframe() 3d scatter plots and surfaces (lattice) Functions boxplot.stats() box plot statistics interaction.plot() two-way interaction plot (stats) Analysis of Variance iplot(), ihist(), ibar(), ipcp() interactive scatter plot, his- save(), load() save and load R data objects aov() fit an analysis of variance model (stats) togram, bar plot, and parallel coordinates plot (iplots) read.csv(), write.csv() import from and export to .CSV files anova() compute analysis of variance (or deviance) tables for one or more pdf(), postscript(), win.metafile(), jpeg(), bmp(), read.table(), write.table(), scan(), write() read and fitted model objects (stats) png(), tiff() save graphs into files of various formats write data write.matrix() write a matrix or data frame (MASS) Statistical Test gvisAnnotatedTimeLine(), gvisAreaChart(), readLines(), writeLines() read/write text lines from/to a connection, t.test() student’s t-test (stats) gvisBarChart(), gvisBubbleChart(), gvisCandlestickChart(), gvisColumnChart(), such as a text file prop.test() test of equal or given proportions (stats) sqlQuery() submit an SQL query to an ODBC database (RODBC) binom.test() exact binomial test (stats) gvisComboChart(), gvisGauge(), gvisGeoChart(), gvisGeoMap(), gvisIntensityMap(), sqlFetch() read a table from an ODBC database (RODBC) Mixed Effects Models gvisLineChart(), gvisMap(), gvisMerge(), sqlSave(), sqlUpdate() write or update a table in an ODBC database lme() fit a linear mixed-effects model (nlme) gvisMotionChart(), gvisOrgChart(), (RODBC) nlme() fit a nonlinear mixed-effects model (nlme) gvisPieChart(), gvisScatterChart(), sqlColumns() enquire about the column structure of tables (RODBC) sqlTables() list tables on an ODBC connection (RODBC) Principal Components and Factor Analysis gvisSteppedAreaChart(), gvisTable(), gvisTreeMap() various interactive charts produced with the Google odbcConnect(), odbcClose(), odbcCloseAll() open/close con- princomp() principal components analysis (stats) nections to ODBC databases (RODBC) prcomp() principal components analysis (stats) Visualisation API (googleVis) gvisMerge() merge two googleVis charts into one (googleVis) dbSendQuery execute an SQL statement on a given database connection Other Functions (DBI) var(), cov(), cor() variance, covariance, and correlation (stats) Packages dbConnect(), dbDisconnect() create/close a connection to a DBMS density() compute kernel density estimates (stats) ggplot2 an implementation of the Grammar of Graphics (DBI)
  • 4. Packages snowfall usability wrapper around snow for easier development of parallel R gWidgets a toolkit-independent API for building interactive GUIs RODBC ODBC database access programs Red-R An open source visual programming GUI interface for R DBI a database interface (DBI) between R and relational DBMS snowFT extension of snow supporting fault tolerant and reproducible applica- R AnalyticFlow a software which enables data analysis by drawing analysis RMySQL interface to the MySQL database tions, and easy-to-use parallel programming flowcharts RJDBC access to databases through the JDBC interface Rmpi interface (Wrapper) to MPI (Message-Passing Interface) latticist a graphical user interface for exploratory visualisation RSQLite SQLite interface for R rpvm R interface to PVM (Parallel Virtual Machine) nws provide coordination and parallel execution facilities Other R Reference Cards ROracle Oracle database interface (DBI) driver foreach foreach looping construct for R R Reference Card, by Tom Short RpgSQL DBI/RJDBC interface to PostgreSQL database doMC foreach parallel adaptor for the multicore package http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/ RODM interface to Oracle Data Mining doSNOW foreach parallel adaptor for the snow package R-refcard.pdf or xlsReadWrite read and write Excel files doMPI foreach parallel adaptor for the Rmpi package http://cran.r-project.org/doc/contrib/Short-refcard.pdf WriteXLS create Excel 2003 (XLS) files from data frames doParallel foreach parallel adaptor for the multicore package R Reference Card, by Jonathan Baron Big Data doRNG generic reproducible parallel backend for foreach Loops http://cran.r-project.org/doc/contrib/refcard.pdf GridR execute functions on remote hosts, clusters or grids R Functions for Regression Analysis, by Vito Ricci Functions http://cran.r-project.org/doc/contrib/Ricci-refcard-regression. as.ffdf() coerce a dataframe to an ffdf (ff ) fork R functions for handling multiple processes pdf read.table.ffdf(), read.csv.ffdf() read data from a flat file to Generating Reports R Functions for Time Series Analysis, by Vito Ricci an ffdf object (ff ) http://cran.r-project.org/doc/contrib/Ricci-refcard-ts.pdf write.table.ffdf(), write.csv.ffdf() write an ffdf object to a Sweave() mixing text and R/S code for automatic report generation (utils) flat file (ff ) knitr a general-purpose package for dynamic report generation in R RDataMining Website, Package, Twitter & Groups ffdfappend() append a dataframe or an ffdf to an existing ffdf (ffdf ) R2HTML making HTML reports R2PPT generating Microsoft PowerPoint presentations RDataMining Website: http://www.rdatamining.com big.matrix() create a standard big.matrix, which is constrained to avail- Group on LinkedIn: http://group.rdatamining.com able RAM (bigmemory) Interface to Weka Group on Google: http://group2.rdatamining.com read.big.matrix() create a big.matrix by reading from an ASCII file Package RWeka is an R interface to Weka, and enables to use the following Weka Twitter: http://twitter.com/rdatamining (bigmemory) functions in R. RDataMining Package: http://www.rdatamining.com/package write.big.matrix() write a big.matrix to a file (bigmemory) Association rules: http://package.rdatamining.com filebacked.big.matrix() create a file-backed big.matrix, which may Apriori(), Tertius() exceed available RAM by using hard drive space (bigmemory) Regression and classification: mwhich() expanded “which”-like functionality (bigmemory) LinearRegression(), Logistic(), SMO() Packages Lazy classifiers: ff memory-efficient storage of large data on disk and fast access functions IBk(), LBR() ffbase basic statistical functions for package ff Meta classifiers: filehash a simple key-value database for handling large data AdaBoostM1(), Bagging(), LogitBoost(), g.data create and maintain delayed-data packages MultiBoostAB(), Stacking(), BufferedMatrix a matrix data storage object held in temporary files CostSensitiveClassifier() biglm regression for data too large to fit in memory Rule classifiers: bigmemory manage massive matrices with shared memory and memory-mapped JRip(), M5Rules(), OneR(), PART() files Regression and classification trees: biganalytics extend the bigmemory package with various analytics J48(), LMT(), M5P(), DecisionStump() bigtabulate table-, tapply-, and split-like functionality for matrix and Clustering: big.matrix objects Cobweb(), FarthestFirst(), SimpleKMeans(), XMeans(), DBScan() Parallel Computing Filters: Functions Normalize(), Discretize() foreach(...) %dopar% looping in parallel (foreach) Word stemmers: registerDoSEQ(), registerDoSNOW(), registerDoMC() regis- IteratedLovinsStemmer(), LovinsStemmer() ter respectively the sequential, SNOW and multicore parallel backend Tokenizers: with the foreach package (foreach, doSNOW, doMC) AlphabeticTokenizer(), NGramTokenizer(), sfInit(), sfStop() initialize and stop the cluster (snowfall) WordTokenizer() sfLapply(), sfSapply(), sfApply() parallel versions of Editors/GUIs lapply(), sapply(), apply() (snowfall) Tinn-R a free GUI for R language and environment Packages RStudio a free integrated development environment (IDE) for R multicore parallel processing of R code on machines with multiple cores or rattle graphical user interface for data mining in R CPUs Rpad workbook-style, web-based interface to R snow simple parallel computing in R RPMG graphical user interface (GUI) for interactive R analysis sessions