Author: admin
-
Installation
Spark is Hadoop’s sub-project. Therefore, it is better to install Spark into a Linux based system. The following steps show how to install Apache Spark. Step 1: Verifying Java Installation Java installation is one of the mandatory things in installing Spark. Try the following command to verify the JAVA version.$java -version If Java is already,…
-
RDD
Resilient Distributed Datasets Resilient Distributed Datasets (RDD) is a fundamental data structure of Spark. It is an immutable distributed collection of objects. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes.…
-
Introduction
Industries are using Hadoop extensively to analyze their data sets. The reason is that Hadoop framework is based on a simple programming model (MapReduce) and it enables a computing solution that is scalable, flexible, fault-tolerant and cost effective. Here, the main concern is to maintain speed in processing large datasets in terms of waiting time…
-
Deployment
To switch over from a development environment to a full-fledged production environment, application needs to be deployed on a real web server. Depending upon what you have, there are different options available to deploy a TurboGears web application. Apache with mod_wsgi The mod_wsgi is an Apache module developed by Graham Dumpleton. It allows WSGI programs…
-
Pluggable Applications
If your extension needs to expose models and controllers, you probably want to have a look at the Pluggable Applications, which are meant to create reusable Turbogears applications that can be plugged inside other applications to extend their features. Use the following gearbox command to create a pluggable application −gearbox quickstart-pluggable plugtest These pluggable applications can…
-
Writing Extensions
TurboGears extensions are identified by tgext.* package. A Gearbox toolkit provides tgext command to create a sample extension. For example − Other optional parameters for this command are − This will create a tgext.myextension directory, which has a simple sample extension inside. Run the setup.py inside the directory −Python setup.py install The _init_.py file inside tgext/myextension folder contains − Brief version…
-
Hooks
There are three ways in TurboGears to plug behaviors inside the existing applications. Here in this chapter, we will discuss how to use hooks inside an existing application. Hooks Hooks are events registered in the application’s configuration file app_cfg.py. Any controller is then hooked to these events by event decorators. The following hooks are defined in…
-
Scaffolding
Gearbox toolkit contains scaffold command, which is very useful to quickly create new components of TurboGears application. An application generated by quickstart command of gearbox has a skeleton template in the model folder (model.py.template), a templates folder (template.html.template) and a controllers folder (controller.py.template). These ‘.template’ files are used as basis for creating new scaffolds for…
-
Pagination
TurboGears provides a convenient decorator called paginate() to divide output in the pages. This decorator is combined with the expose() decorator. The @Paginate() decorator takes the dictionary object of query result as argument. In addition, the number of records per page are decided by value of items_per_page attribute. Ensure that you import paginate function from…
-
CRUD Operations
The following session methods perform CRUD Operations − You can apply filter to the retrieved record set by using a filter attribute. For instance, in order to retrieve records with city = ’Hyderabad’ in students table, use the following statement − We shall now see how to interact with the models through controller URLs. First…