Functional Programming (Best Tutorial 2019)

Functional Programming

 Functional Programming Tutorial 2019

Functional Programming has its roots in the lambda calculus, originally developed in the 1930s, to explore computability. Many Functional Programming languages can thus be considered as elaborations on this lambda calculus.

 

This tutorial explains all Functional Programming languages like Lisp, Clojure, and Scala with best examples. And also explains the Advantages and disadvantages of functional programming.

 

Programming paradigms can be of two fundamental types, namely, imperative programming and functional programming. Imperative Programming is what is currently perceived as traditional programming; it is the style of programming used in languages such as C, C++, Java, and C# soon.

 

In these languages a programmer tells the computer what to do, for example, x = y + z so on. It is thus oriented around control statements, looping constructs and assignments. In contrast, functional programming aims to describe the solution.

 

Functional Programming is defined as a programming paradigm including a style of building the structure and elements of computer programs that treat computation as the evaluation of mathematical functions and avoids state and mutable data.

 

The functions generate results based on input data and computations to generate a new output; they do not rely on any side effects and also do not depend on the current state of the program.

 

Characteristics of functional programming:

Characteristics of functional programming

1. Functional Programming aims to avoid side effects: functions disallow any hidden side effects; only observable output allowed is the return value; only output dependency allowed are the arguments that are fully determined before any output is generated.

 

Lack of hidden side effects evidently makes it easier to understand what the program is doing and also make comprehension, development, and maintenance easier.

 

2. Functional Programming avoids concepts such as the state. If some operation is dependent upon the state of the program or some element of a program, then its behavior may differ depending upon that state; this may make it harder to comprehend, implement, test and debug.

 

As all of these impacts on the stability and probably reliability of a system, state-based operations may result in less reliable software being developed. As functions do not rely on any given state but only upon the data they are given, it results are easier to understand, implement, test and debug.

 

3. Functional Programming promotes declarative programming (and is, in fact, a subtype of declarative programming), which means that programming is oriented around expressions that describe the solution rather than focus on the imperative approach of most procedural programming languages.

These languages emphasize aspects of how the solution is derived.

 

For example, an imperative approach to looping through some container and printing out each result, in turn, would look like this:

int sizeOfCarton = carton.length for (int I = 1 to sizeOfcarton) do element = carton.get(i) print(element)

end

Whereas a functional programming approach would look like carton.foreach(print)

 

4. Functional Programming promotes immutable data. Immutability indicates that once created, data cannot be changed. In Scala Strings are immutable.

 

Once you create a new string you cannot modify it. Any functions that apply to a string that would conceptually alter the contents of the string, result in a new String being generated.

 

Scala takes this further by having a presumption of Immutability that means that by default all data holding types are immutable. This ensures that functions cannot have hidden side effects and thus simplifies programming in general.

 

5. Functional Programming promotes recursion as a natural control structure. Functional languages tend to emphasis recursion as a way of processing structures that would use some form of looping constructs in an imperative language.

 

While recursion is very expressive and is a great way for a programmer to write a solution to a problem, it is not as efficient at runtime as looping.

 

However, any expression that can be written as a recursive routine can also be written using looping constructs.

 

Functional programming languages often incorporate tail end recursive optimizations to convert recursive routines into iterative ones at runtime, that is, if the last thing a routine does before it returns is to call another routine.

 

Rather than actually invoking the routine and having to set up the context for that routine, it should be possible to reuse the current context and to treat it in an iterative manner as a loop around that routine.

 

This means that both the programmer benefits of an expressive recursive construct and the runtime benefits of an iterative solution, can be achieved using the same source code. This option is typically not available in imperative languages.

 

Advantages of functional programming are as follows:

a. Good for prototyping solutions: Solutions can be created very quickly for algorithmic or behavior problems in a functional language. Thus, allowing ideas and concepts to be explored in a rapid application development style.

 

b. Modular Functionality: Functional Programming is modular in terms of functionality (where Object Oriented languages are modular in the dimension of components). They are thus well suited to situations where it is natural to want to reuse or componentize the behavior of a system.

 

c. The avoidance of state-based behavior: As functions only rely on their inputs and outputs (and avoid accessing any other stored state) they exhibit a cleaner and simpler style of programming.

 

This avoidance of state-based behavior makes many difficult or challenging areas of programming simpler (such as those used in concurrency applications).

 

d.Additional control structures: A strong emphasis on additional control structures such as pattern matching, managing variable scope, and tail recursion optimizations.

 

e. Concurrency and immutable data: As functional programming systems advocate immutable data structures it is simpler to construct concurrent systems. This is because the data being exchanged and accessed is immutable. Therefore multiple executing thread or processes cannot affect each other adversely.

 

f. Partial Evaluation: Since functions do not have side effects, it also becomes practical to bind one or more parameters to a function at compile time and to reuse these functions with bound values as new functions that take fewer parameters.

 

Disadvantages of functional programming are as follows:

 

a. Input-output is harder in a purely functional language. Input-output flows naturally align with stream style processing, which does not neatly fit into the “data in, results out” nature of functional systems.

 

b. Interactive applications are harder to develop. The interactive application is constructed via “request-response” cycles initiated by a user action. Again these do not naturally sit within the purely functional paradigm.

 

c. Not data oriented. A pure functional Language does not really align with the needs of the primary data-oriented nature of many of today’s systems.

 

Many commercial systems are oriented around the need to retrieve data from a database, manipulate it in some way and store that data back into a database: such data can be naturally (and better) represented via objects in an Object Oriented language.

 

d. Continuously running programs such as services or controllers may be more difficult to develop, as they are naturally based upon the idea of a continuous loop that does not naturally sit within the purely functional paradigm.

 

e. Functional programming languages have tended to be less efficient on current hardware platforms.

 

This is partly because current hardware platforms are not designed with functional programming in mind and also because many of the systems previously available were focused on the academic community for whom performance was not the primary focus per se.

 

However, this has changed to a large extent with Scala and the functional language Haskell.

 

Clojure

Clojure

Clojure was forged out of a unique blend of the best features of a number of different programming languages—including various Lisp implementations, Ruby, Python, Java, Haskell, and others.

 

Clojure provides a set of capabilities suited to address many of the most frustrating problems programmers struggle with today and those we can see barreling toward us over the horizon.

 

And, far from requiring a sea-change to a new or unfamiliar architecture and runtime (typical of many otherwise promising languages over the years), Clojure is hosted on the Java Virtual Machine, a fact that puts to bed many of the most pressing pragmatic and legacy concerns raised when a new language is considered.

 

Characteristics of Clojure are as follows:

 

1. Clojure is hosted on the JVM: Clojure code can use any Java library, Clojure libraries can, in turn, be used from Java, and Clojure applications can be packaged just like any other Java application and deployed anywhere that other Java applications can be deployed: to web application servers; to desktops with Swing, SWT, or command-line interfaces; and so on.

 

This also means that Clojure’s runtime is Java’s runtime, one of the most efficient and operationally reliable in the world.

 

2. Clojure is a Lisp: Unlike Java, Python, Ruby, C++, and other members of the Algol family of programming languages, Clojure is part of the Lisp family.

 

However, forget everything you know about Lisps: Clojure retains the best of Lisp heritage, but is unburdened by the shortcomings and sometimes anachronistic aspects of many other Lisp implementations.

 

Also, being a Lisp, Clojure has macros, an approach to metaprogramming and syntactic extension that has been the benchmark against which other such systems have been measured for decades.

 

3. Clojure is a functional programming language: Clojure encourages the use of first-class and higher-order functions with values and comes with its own set of efficient immutable data structures.

 

The focus on a strong flavor of functional programming encourages the elimination of common bugs and faults due to the use of unconstrained mutable state and enables Clojure’s solutions for concurrency and parallelization.

 

4. Clojure offers innovative solutions to the challenges inherent in concurrency and parallelization: The realities of multicore, multi-CPU, and distributed computing demand that we use languages and libraries that have been designed with these contexts in mind.

 

Clojure’s reference types enforce a clean separation of state and identity, providing defined concurrency semantics that are to manual locking and threading strategies what garbage collection is to manual memory management.

 

5. Clojure is a dynamic programming language: Clojure is dynamically and strongly typed (and therefore similar to Python and Ruby), yet function calls are compiled down to (fast!) Java method invocations.

 

Clojure is also dynamic in the sense that it deeply supports updating and loading new code at runtime, either locally or remotely. This is particularly useful for enabling interactive development and debugging or even instrumenting and patching remote applications without downtime.

 

Python

Python is a versatile programming language that has been widely adopted across the data science sector over the last decade. Although popular programming languages like Java and C++ are better for developing standalone desktop applications, Python is terrific for processing, analyzing, and visualizing data.

 

We mentioned that the two most relevant Python characteristics are its ability to integrate with other languages and it's mature package system that is well embodied by PyPI (the Python Package Index; https://pypi.python.org/pypi), a common repository for a majority of Python packages.

 

The packages are strongly analytical and will offer a complete Data Science Toolbox made up of highly optimized functions for working, optimal memory configuration, ready to achieve scripting operations with optimal performance.

 

NumPy

NumPy, which is Travis Oliphant’s creation, is the true analytical workhorse of the Python language.

 

It provides the user with multidimensional arrays, along with a large set of functions to operate a multiplicity of mathematical operations on these arrays. Arrays are blocks of data arranged along multiple dimensions, which implement mathematical vectors and matrices.

 

Arrays are useful not just for storing data, but also for fast matrix operations (vectorization), which are indispensable when you wish to solve ad hoc data science problems. As a convention, when importing NumPy, it is aliased as np: import num py as np

 

SciPy

SciPy completes NumPy’s functionalities, offering a larger variety of scientific algorithms for linear algebra, sparse matrices, signal and image processing, optimization, fast Fourier transformation, and much more.

 

Pandas

The panda's package deals with everything that NumPy and SciPy cannot do. Thanks to its specific object data structures, Data Frames and Series, pandas allows you to handle complex tables of data of different types (which is something that NumPy’s arrays cannot do) and time series.

 

Pandas enable you to easily and smoothly load data from a variety of sources. You can then slice, dice, handle missing elements, add, rename, aggregate, reshape, and finally visualize this data at your will. 

pandas are imported as pd: import pandas as pd

 

Scikit-Learn

Scikit-learn is the core of data science operations on Python. It offers all that you may need in terms of data preprocessing, supervised and unsupervised learning, model selection, validation, and error metrics.

 

IPython

A scientific approach requires the fast experimentation of different hypotheses in are producible fashion.

 

IPython was created by Fernando Perez in order to address the need for an interactive Python command shell (which is based on a shell, web browser, and the application interface), with graphical integration, customizable commands, rich history (in the JSON format), and computational parallelism for enhanced performance.

 

Matplotlib

Matplotlib is the library that contains all the building blocks that are required to create quality plots from arrays and to visualize them interactively.

import matplotlib.pyplot as plt

 

Stats Models

Previously part of SciKits, stats models were thought to be a complement to SciPy statistical functions. 

 

It features generalized linear models, discrete choice models, time-series analysis, and a series of descriptive statistics as well as parametric and nonparametric tests.

 

Beautiful Soup

Beautiful Soup, a creation of Leonard Richardson, is a great tool to scrap out data from HTML and XML files retrieved from the Internet. It works incredibly well, even in the case of tag soups (hence the name), which are collections of malformed, contradictory, and incorrect tags.

 

NetworkX

NetworkX is a package specialized in the creation, manipulation, analysis, and graphical representation of real-life network data (it can easily operate with graphs made up of a million nodes and edges).

 

Besides specialized data structures for graphs and fine visualization methods (2D and 3D), it provides the user with many standard graph measures and algorithms, such as the shortest path, centrality, components, communities, clustering, and PageRank.

 

NLTK

The Natural Language Toolkit (NLTK) provides access to corpora and lexical resources and to a complete suite of functions for statistical natural language processing (NLP), ranging from tokenizers to part-of-speech taggers and from tree models to named-entity recognition.

 

Initially, the package was created by StevenBird and Edward Loper as an NLP teaching infrastructure for CIS-530 at the University of Pennsylvania. It is a fantastic tool that you can use to prototype and build NLP systems.

 

Gensim

Gensim, programmed by Radim, is an open source package that is suitable for the analysis of large textual collections with the help of parallel distributable online algorithms.

 

Among advanced functionalities, it implements Latent Semantic Analysis (LSA), topic modeling by Latent Dirichlet Allocation (LDA), and Google’s word2vec, a powerful algorithm that transforms text into vector features that can be used in supervised and unsupervised machine learning.

 

PyPy

PyPy

PyPy is not a package; it is an alternative implementation of Python 2.7.8 that supports most of the commonly used Python standard packages (unfortunately, NumPy is currently not fully supported).

 

As an advantage, it offers enhanced speed and memory handling. Thus, it is very useful for heavy-duty operations on large chunks of data and it should be part of your big data handling strategies.

 

Scala

Scala is a new programming language developed by Martin Odersky and his team at the EPFL (EcolePolytheniqueFererale de Lausanne, Lausanne, Switzerland) and now supported by Typesafe.

 

The name Scala is derived from Sca(lable) La(nguage) and is a multiparadigm language, incorporating Object Oriented approaches with Functional Programming.

Scala

Like any other object-oriented language (such as Java, C# or C++) Scala can exploit inheritance, polymorphism and abstraction and encapsulation techniques.

 

However, you can also develop solutions using purely functional programming principles in a similar manner to languages such as Haskell or Clojure; in such an approach program are written purely in terms of functions that take inputs and generate outputs without any side effects.

 

Thus, it is possible to combine the best of both worlds when creating a software system: you can exploit object-oriented principles to structure your solution but integrate functional aspects when appropriate.

 

One of the design goals of the Scala development team was to create a scalable language suitable for the construction of component-based software within highly concurrent environments.

 

This means that has several features integrated into it that support large software developments.

 

For example, the Actor model of concurrency greatly simplifies the development of concurrent applications. In addition, the syntax reduces the amount of code that must be written by a developer (at least compared with Java).

 

Scala can be compiled to Java Byte Codes. This means that a Scala system can run on any environment that supports the Java Virtual Machine (JVM). There are already several languages that compile to Java Bytecodes including Ada, JavaScript, Python, Ruby, Tcl, and Prolog.

 

However, this has the additional advantage that Scala can also be integrated with any existing Java code base that a project may have. It also allows Scala to exploit the huge library of Java projects available both for free and for commercial use.

 

Characteristics of Scala are as follows:

Characteristics of Scala

Provides Object Oriented concepts including classes, objects, inheritance, and abstraction.

 

Extends these (at least with reference to Java) to include Traits that represent data and behavior that can be mixed into classes and objects. 

 

Includes functional concepts, such as functions as first-class entities in the language, as well as concepts such as partially applied functions and currying that allow new functions to be constructed from existing functions.

 

Has interoperability (mostly) with Java.

Uses statically typed variables and constants with type inference used whenever possible to avoid unnecessary repetition.

 

Scala Advantages

Scala’s strong type system, preference for immutability, functional capabilities, and parallelism abstractions make it easy to write reliable programs and minimize the risk of unexpected behavior.

 

Interoperability with Java

Scala runs on the Java virtual machine; the Scala compiler compiles programs to Java bytecode.

 

Thus, Scala developers have access to Java libraries natively. Given the phenomenal number of applications written in Java, both open source and as part of the legacy code in organizations, the interoperability of Scala and Java helps explain the rapid popularity of Scala.

 

[Note: You can free download the complete Office 365 and Office 2019 com setup Guide.]

 

Parallelism

Parallel programming is difficult because we, as programmers, tend to think sequentially. Reasoning about the order in which different events can happen in a concurrent program is very challenging.

 

Scala provides several abstractions that greatly facilitate the writing of parallel code. These abstractions work by imposing constraints on the way parallelism is achieved.

 

For instance, parallel collections force the user to phrase the computation as a sequence of operations (such as map, reduce, and filter) on collections.

 

Actor systems require the developer to think in terms of actors that encapsulate the application state and communicate by passing messages.

 

Static Typing and Type Inference

Scala’s static typing system is very versatile. A lot of information as to the program’s behavior can be encoded in types, allowing the compiler to guarantee a certain level of correctness. This is particularly useful for code paths that are rarely used.

 

A dynamic language cannot catch errors until a particular branch of execution runs, so a bug can persist for a long time until the program runs into it. In a statically typed language; any bug that can be caught by the compiler will be caught at compile time before the program has even started running.

 

Immutability

Having immutable objects removes a common source of bugs. Knowing that some objects cannot be changed once instantiated reduces the number of places bugs can creep in. Instead of considering the lifetime of the object, we can narrow in on the constructor.

 

Scala encourages the use of immutable objects. In Scala, it is very easy to define an attribute as immutable:

val amountExpnd = 200

The default collections are immutable:

valrolIds = List(“123”, “456”) // List is immutable rollIds(1) = “589” // Flag Compile-time error

 

Scala and Functional Programs

Scala encourages functional code. A lot of Scala code consists of using higher-order functions to transform collections. The developer does not have to deal with the details of iterating over the collection.

 

Consider the problem of locating in a list the position of occurrence of an identified element. In Scala, we first declare a new list, collection.rollWithIndex, whose elements are pairs of the collection’s elements and their indexes, that is, (collection(0), 0) and (collection(1),

 

1). We then tell Scala that we want to iterate over this collection, binding the currentElem variable to the current element and index to the index. We apply a filter on the iteration, selecting only those elements for which currentElem==elem. We then tell Scala to just return the index variable.

A sample occurrencesOf function would be

def occurrencesOf[A](elem:A, collection:List[A]):List[Int] = { for {

(currentElem, index) <- collection.rollWithIndex if (currentElem == elem)

} yield index

}

 

We did not need to deal with the details of the iteration process in Scala. The syntax is very declarative: we tell the compiler that we want the index of every element equal to an element in the collection and let the compiler worry about how to iterate over the collection.

 

Null Pointer Uncertainty

Scala, like other functional languages, introduces the Option[T] type to represent an attribute that might be absent. We might then write the following:

class User {

...

val email:Option[Email]

...

}

Thus, Scala goes further in achieving a higher degree of provable correctness. Never using null, we know that we will never run into null pointer exceptions.

 

Achieving the same level of correctness in languages without Option[T] requires writing unit tests on the client code to verify that it behaves correctly when the e-mail attribute is null.

 

Scala Benefits

Increased Productivity

Having a compiler that performs a lot of type checking and works as a personal assistant is, in our opinion, a significant advantage over languages that check types dynamically at runtime, and the fact that Java is a statically typed language is probably one of the main reasons that made it so popular in the first place.

 

The Scala compiler belongs to this category as well and goes even further by finding out many of the types automatically, often relieving the programmer from specifying these types explicitly in the code.

 

Moreover, the compiler in your Integrated Development Environment (IDE) gives instant feedback, and therefore, increases your productivity.

 

Natural Evolution from Java

Scala integrates seamlessly with Java, which is a very attractive feature, to avoid reinventing the wheel. You can start running Scala today in a production environment.

 

Large corporations such as Twitter, LinkedIn, or Foursquare have done that on large-scale deployments for many years now, followed recently by other big players such as Intel and Amazon.

 

Scala compiles to Java bytecode, which means that performance will be comparable. Most of the code that you are running while executing the Scala program is probably Java code, the major difference being what programmers see and the advanced type checking while compiling the code.

 

Better Fit for Asynchronous and Concurrent Code

To achieve better performance and handle more load, modern Java frameworks and libraries for web development are now tackling difficult problems that are tied to multicore architectures and the integration with unpredictable external systems.

 

Scala’s incentive to use immutable data structures and functional programming constructs, as well as its support for parallel collections, has a better chance to succeed in writing concurrent code that will behave correctly.

 

Moreover, Scala’s superior type system and macro support enable DSLs for trivially safe asynchronous constructs—for example, composable futures and asynchronous language extensions.

 

R

R is also known as GNU S, as it is basically an open source derivative and descendant of the S language. In various forms and avatars, R has been around for almost two decades now, with an ever-growing library of specialized data visualization, data analysis, and data manipulation packages.

 

With around two million users, R has one of the largest libraries of statistical algorithms and packages. While R was initially a statistical computing language, by now it has evolved into a complete analytical environment.

 

Analytical Features of R

Analytical Features

1. The latest and broadest range of statistical algorithms are available in R. This is due to R’s package structure in which it is rather easier for developers to create new packages than in any other comparable analytics platform.

 

2. It is easy to migrate from other analytical platforms to the R platform. It is relatively easy for a non-R platform user to migrate to the R platform, and there is no danger of vendor lock-in due to the GPL nature of the source code and the open community, the GPL can be seen at http://www.gnu.org/copyleft/gpl.html.

 

3. R offers flexible programming for your data environment. This includes packages that ensure compatibility with Java, Python, and CCC.

 

4.R offers the best data visualization tools in analytical software (apart from Tableau Software’s latest version). The extensive data visualization available in R comprises a wide variety of customizable graphics as well as animation.

 

The principal reason why third-party software initially started creating interfaces to R is that the graphical library of packages in R was more advanced and was acquiring more features by the day.

 

5. A wide range of training material in the form of books is available for the R analytical platform.

 

6. R’s source code is designed to ensure complete custom solutions and embedding for a particular application. Open source code has the advantage of being extensively peer-reviewed in journals and the scientific literature. This means bugs will found, information about them shared, and solutions delivered transparently.

 

Business Dashboard and Reporting

Business Dashboard

1. R offers data visualization through ggplot, and GUIs such as Deducer, Grapher, and Red-R can help even business analysts who know none or very little of the R language in creating a metrics dashboard.

 

2. For online dashboards, R has packages like RWeb, RServe, and R Apache that, in combination with data visualization packages, offer powerful dashboard capabilities. Well-known examples of these will be shown later.

 

3. R can also be combined with Microsoft Excel using the R Excel package to enable R capabilities for importing within Excel. Thus an Excel user with no knowledge of R can use the GUI within the R Excel plug-in to take advantage of the powerful graphical and statistical capabilities.

 

Data Mining

Data Mining

1.R has a vast array of packages covering standard regression, decision trees, association rules, cluster analysis, machine learning, neural networks, and exotic specialized algorithms like those based on chaos models.

 

2. R provides flexibility in tweaking a standard algorithm by allowing one to see the source code.

 

3. The Rattle GUI remains the standard GUI for data miners using R. This GUI offers easy access to a wide variety of data mining techniques. It was created and developed in Australia by Prof. Graham Williams. Rattle offers a very powerful and convenient free and open source alternative to data mining software.

 

Business Analytics

Business Analytics

1. It has the open source code for customization as per GPL and adequate intellectual protection for developers wanting to create commercial packages.

 

2. It also has a flexible option for enterprise users from commercial vendors like Revolution Analytics (who support 64-bit Windows and now Linux) as well as big data processing through its RevoScaleR package.

 

3. It has interfaces from almost all other analytical software including SAS, SPSS, JMP, Oracle Data Mining, and RapidMiner. Exist huge library of packages is available for regression, time series, finance, and modeling.

 

4. High-quality data visualization packages are available for use with R.

 

5.R is one of the few analytical platforms that work on Mac OS. Additional analytical features of R:

 

1.A wide range of solutions from the R package library for statistical, analytical, data mining, dashboard, data visualization, and online applications make it the broadest analytical platform in the field.

 

2. Largest and fastest growing open source statistical library: The current number of statistical packages and the rate of growth at which new packages continue to be upgraded ensure the continuity of R as a long-term solution to analytical problems.

 

3.Extensive data visualization capabilities: These include much better animation and graphing than other software.

 

4.Interoperability of data: Data from various file formats as well as various databases can be used directly in R, connected via a package, or reduced to an intermediate format for importing into R.

 

5. Software compatibility: Official commercial interfaces to R have been developed by numerous commercial vendors including software makers who had previously thought of R as a challenger in the analytical space.

 

Oracle, ODBC, Microsoft Excel, PostgreSQL, MySQL, SPSS, Oracle Data Miner, SAS/IML, JMP, Pentaho Kettle, and Jaspersoft BI are just a few examples of commercial software that are compatible with R usage.

 

In terms of the basic SAS language, a WPS software reseller offers a separate add-on called the Bridge to R.

 

6. Multiple platforms and interfaces to input command: R has multiple interfaces ranging from the command line to numerous specialized graphical user interfaces (GUIs) for working on desktops. For clusters, cloud computing, and remote server environments, R now has extensive packages including SNOW, RApache, RMpi, R Web, and Reserve.

 

SAS

SAS

SAS was originally the acronym for Statistical Analysis System, which is an integrated software system that utilizes fourth-generation programming language to perform tasks like data management, report writing, statistical analysis, data warehousing, and application development.

 

The core component of the SAS system is Base SAS software, which consists of different modules such as DATA steps, SAS Base procedures, SAS macro facility, and Output Delivery System (ODS).

 

The SAS system is divided into two main areas: procedures to perform an analysis and the fourth-generation language called DATA Step that allows users to manipulate data.

 

SAS consists of the following:

1.A data handling language (DATA step)

2.A library of prewritten procedures (PROC step)

 

SAS/STAT software, a component of the SAS System, provides comprehensive statistical tools for a wide range of statistical analyses, including analysis of variance, regression, categorical data analysis, multivariate analysis, survival analysis, psychometric analysis, cluster analysis, and nonparametric analysis.

 

The features provided by SAS/STAT software are in addition to the features provided by Base SAS software; many data management and reporting capabilities required commonly are already part of the Base SAS software.

 

SAS DATA Step

SAS DATA Step

The DATA step is your primary tool for reading and processing data in the SAS System. The DATA step provides a powerful general purpose programming language that enables you to perform all kinds of data processing tasks.

 

Base SAS Procedures

Base SAS software includes many useful SAS procedures. Base SAS procedures are documented in the SAS Procedures Guide.

 

The following is a list of base SAS procedures:

  • CORR compute correlations
  • RANK compute rankings or order statistics
  • STANDARD standardize variables to a fixed mean and variance
  • MEANS compute descriptive statistics and summarizing or collapsing data
  • Over cross sections
  • TABULATE print descriptive statistics in tabular format
  • UNIVARIATE compute descriptive statistics Other SAS software products are as follows:

 

1. SAS/ETS Software: SAS/ETS software provides SAS procedures for econometrics and time-series analysis.

 

It includes capabilities for forecasting, systems modeling and simulation, seasonal adjustment, and financial analysis and reporting. In addition, SAS/ETS software includes an interactive time-series forecasting system.

 

2.SAS/GRAPH Software: SAS/GRAPH software includes procedures that create two- and three-dimensional high-resolution color graphics plots and charts. You can generate output that graphs the relationship of data values to one another, enhance existing graphs, or simply create graphics output that is not tied to data.

 

3. SAS/IML Software: SAS/IML software gives you access to a powerful and flexible programming language (Interactive Matrix Language) in a dynamic, interactive environment.

 

The fundamental object of the language is a data matrix. You can use SAS/IML software interactively (at the statement level) to see results immediately, or you can store statements in a module and execute them later.

 

The programming is dynamic because necessary activities such as memory allocation and dimensioning of matrices are done automatically. SAS/IML software is of interest to users of SAS/ STAT software because it enables you to program your methods in the SAS System.

 

4. SAS/INSIGHT Software: SAS/INSIGHT software is a highly interactive tool for data analysis. You can explore data through a variety of interactive graphs including bar charts, scatter plots, box plots, and three-dimensional rotating plots.

 

You can examine distributions and perform parametric and nonparametric regression, analyze general linear models and generalized linear models, examine correlation matrixes, and perform principal component analyses.

 

Any changes you make to your data show immediately in all graphs and analyses. You can also configure SAS/INSIGHT software to produce graphs and analyses tailored to the way you work.

 

SAS/INSIGHT software may be of interest to users of SAS/STAT software for interactive graphical viewing of data, editing data, exploratory data analysis, and checking distributional assumptions.

 

5. SAS/OR Software: AS/OR software provides SAS procedures for operations research and project planning and includes a point-and-click interface to project management. Its capabilities include the following:

 

  • a. Solving transportation problems
  • b. Linear, integer, and mixed-integer programming
  • c.Nonlinear programming
  • d.Scheduling projects
  • e.Plotting Gantt charts
  • f. Drawing network diagrams
  • g. Solving optimal assignment problems
  • h. Network flow programming

 

SAS/OR software may be of interest to users of SAS/STAT software for its mathematical programming features. In particular, the NLP procedure in SAS/OR software solves nonlinear programming problems, and it can be used for constrained and unconstrained maximization of user-defined likelihood functions.

 

6. SAS/QC Software: SAS/QC software provides a variety of procedures for statistical quality control and quality improvement. SAS/QC software includes procedures for Shewhart control charts:

  • a.Cumulative sum control charts
  • b.Moving average control charts
  • c.Process capability analysis
  • d.Ishikawa diagrams
  • e.Pareto charts
  • f.Experimental design

 

SAS/QC software also includes the ADX interface for experimental design.

 

Summary

This blog introduced Spark analytics tool that is suitable for iterative and interactive analytics. 

After an introduction to functional programming and its benefits, the blog describes important functional languages and tool environments including Clojure, Python, Scala and R. The last part of the blog describes the SAS analytics solution.

 

There are three takeaways from this blog. First, Spark is expected to eclipse MapReduce for most functionality. Spark is already the best choice for machine-learning applications because of its ability to perform iterative operations on data cached in memory. Spark additionally offers SQL, graph processing, and streaming frameworks.

 

Second, Scala is a scalable language suitable for the construction of component-based software within highly concurrent environments.

 

Scala is the only language that is statically typed, runs on the JVM and is totally Java compatible, is both object-oriented and functional, and is not verbose, thereby leading to better productivity and, therefore, less maintenance.

 

Third, Python is a versatile programming language that has been widely adopted across the data science sector over the last decade. Python is also terrific for processing, analyzing, and visualizing data.

Recommend