education plus: Soft Skills

Showing posts with label Soft Skills. Show all posts

Monday, February 1, 2016

Logical operator

There are three logical operators in C language, &&, ||, !

Operator	Name	Description	Example
&&	'Logical AND' Operator	'AND' operator returns true if both the operands it operates on evaluates to true (non-zero), else return false	`a && b` returns true if both a and b are non-zero
\|\|	'Logical OR' Operator	'OR' operator returns true if any of the operands it operates on evaluates to true (non-zero), else return false	`a \|\| b` returns true if either a or b are non-zero
!	'Logical NOT' Operator	'NOT' operator is a unary operator (it operates only on one operand). It returns true if the operand is false and returns false if the operand is true	`!a` returns true if a is false. Else returns false

Examples of Logical Operators


#include <stdio.h>
 
int main(void) {
    int a = 5, b = 6;
    printf("(a == 5) && (b == 7) : %d\n", (a == 5) && (b == 7));
    printf("(a == 5) || (b == 7) : %d\n", (a == 5) || (b == 7));
    printf("!(a == 5) : %d", !(a == 5));
    return 0;
}
 
Output:
(a == 5) && (b == 7) : 0
(a == 5) || (b == 7) : 1
!(a == 5) : 0

Monday, January 18, 2016

Types of Type Casting in C: Upcasting and Downcasting

There are two type of casting available in C language, known as upcasting and downcasting. Upcast means converting lower (like int) to higher (float, long int) data type. The reverse is called downcast.

Upcast results in no information loss. But downcast results in information loss as lower data type have lesser bits and can hold the lesser amount of information/data. One data type considers higherif maximum value allowed to store in it is greater than the other data type. For example, float is lower compared to double because double can store more precisions

Second program in this chapter is example of upcasting(int value is converted to float) and the third program is example of down casting(float is converted to int)

C Type Casting with examples for Float to Int/Int to Char

Tuesday, January 12, 2016

C Qualifiers - Constant and Volatile type Qualifier

C has some basic data types like 'int', 'float', 'char' etc. Each of which can hold a data of a specific type and has a set of specific properties defined for them. But what if you need to modify the properties of them? For example, if you want to make an 'int' variable constant, that means it's value shouldn't be changed from any where else in the program except the variable definition part. C type qualifiers can be used for such purposes.

'Const' Qualifier in C

Two type qualifiers available in C are 'const' and 'volatile'.

'Const' qualifier will impose a restriction on the variable, such a way that its value can't be changed or modified. The main use of 'const' is to define constants in your program; so that it's values can't be changed. Please take a look at the example given below to understand the use case of 'const' qualifier


#include <stdio.h>
// PI is defined as 'const' so that you may not change it's value accidentally.
const float PI = 3.14;
 
double find_area(float radius) {
 return PI * radius * radius;
}
 
int main() {
 printf("Area: %f", find_area(5.5));
 return 0;
}
 
Output:
Area: 94.985001

'Volatile' Type Qualifier in C

If a variable is declared as 'volatile', its value can be changed from outside the program. Declaring a variable with 'volatile' is actually a hint to the compiler to not perform any optimizations on the access restrictions of the variable. Let's take a look at it with the help of an example using 'volatile' keyword.

Variables

Void Type

Void type is defined using keyword 'void'. It's used to represent absence of data. For example, if a functions is declared with 'void' return type, it means that function doesn't return any values. But you cannot define a variable with 'void' type, program will not compile. You can declare void pointers though.

Variables

In C programs, sometimes you may need to store some values so that you can later do some calculation or operation on the stored values. To hold these values we need variables. The variable can be considered as a type of identifiers that are used to represent some type of value or information in a designated portion of a program. A variable can be assigned different values in the course of its life span, i.e. the value stored in a variable may change. However, the type of data associated with a variable can never be changed.

Variable declaration

Declaration means associating a variable with some data type. Declaring variable will enforce that the variable can only represent values of already specified data types. A declaration has a specific format. Let’s consider following declaration as an example.

int number;

By this declaration, we are specifying that the number is a variable of data type integer. Until now, the variable number does not contain the value we want. The values can be assigned to the variable as follows

Enumeration Types

Enumeration types are used to hold a value that belongs to a particular set. Keyword 'enum' is used to define enumeration types. Each element in the enumeration type is assigned with a consecutive integer constant starting with 0 by default.

Syntax for declaring enum is given below

enum type_name{value_1, value_2, value_3,...}

Each of the values in the value set (ie; value_1, value_2 etc) are integer constants. By default, they start with 0 and increments by 1. You can override this values your self. Let's take a look at it with the help of an example.


#include <stdio.h>
enum day_of_week{ sunday, monday, tuesday, wednesday, thursday, friday, saturday};
int main(){
    enum day_of_week today;
 //Assigns enum value sunday to the variable today.
    today = sunday;
    printf("%d", today);
    return 0;
}
 
Output:
0

In the above program enum type "day_of_week" consists of 7 values; Sunday to Saturday. Each of them is internally assigned with an integer constant starting with 0 by default. 'Sunday' gets 0, 'Monday' gets 1 and so on. Here the 'printf' statement will output the value 0, because we've assigned enum value 'Sunday' to the variable 'today' of type 'day_of_week'.

Other Basic Type Specifiers

Signed : 'signed' specifier helps us to declare signed integer or character type variable, which can hold positive as well as negative values. By default integer and character type variables are signed.
Unsigned : On the other hand, 'unsigned' qualifier helps us to declare variables which can hold only positive values starting from 0. If you are working with values that can’t be less than 0 like the number of cars or visitor count, you should use 'unsigned' as it gives one extra benefit. Declaring an int as unsigned can it able to hold up to double the maximum value supported by int.
Long and Short :'Long' and 'short' specifier enables us to declare integers and double (only long specifier supported with double, short is not supported) with different lengths. Short integers are 16 bit long and have -32768 to 32767 range. Long integers have greater range -2147483647 to +2147483647.

Wednesday, January 6, 2016

Data Types in C Language

In a C program, data types are used to specify the type of data held by a variable or type of data returned/accepted by a function. If you remember the HelloWord Program we wrote earlier, you might have noticed that the main function returns an 'int' type response, which represents numbers.

Datatypes in C can be categorised into following sections

Basic Data Types	These are the basic built-in data types of C Programming Language. They include 'int', 'char', 'float' and 'double'
Enumerated Types	These are the types which can hold only a specific set of values for the variables defined using them. Name of the datatype is 'enum'
Void Type	Type 'void' indicates lack of data.
User Defined	This includes the types that can be created by the programmer using the basic types. This includes 'struct', 'union', arrays and pointers

Basic data types

Basic data types are the built-in data types supported by C language. You can find the explanation and usage of them below

Char : This can be used to store a single character. It can hold only one byte of data.
```
char var = 'a';
```
In the above example, variable 'var' holds a single character 'a'.
Int: This data type can be used to hold integer type of data. It may use 2/4/8 bytes depending on the architecture of the platform you're running the program. If the platform is 16 bit, int data type will use 2 bytes. If platform is of 32 bit, int data type will use 4 bytes. If platform is of 64 bit, int will use 8 bytes. 'int' variable cannot hold a floating point number.

Character Sets Supported by C Programs

C uses alphabets (upper case 'A' to 'Z' or lower case 'a' to 'z'), digits ('0' to '9') and some certain special characters shown in the figure below to construct basic elements of C program (constants, variables, expressions, operators etc.)

C Character Set

Identifiers and Keywords

Identifiers are the names by which we can distinguish different program elements (such as variables, constants, functions). The identifiers consist of letters (uppercase or lowercase) and digits in any possible combination as long as it follows one rule of identifiers.

The rule states that every identifier must have a letter as its first character; no digits are allowed in the first character. So, while “first” is a valid identifier, “1st” is not. Also, no identifier must contain any special character except “_” (underscore). Given below, a list of examples of valid and invalid identifiers with a reason why they are invalid.

1'st	Invalid (First character must be letter )
first	Valid
total_area	Valid
phone-no	Invalid ( ‘-‘ not allowed )
national flag	Invalid ( blank space not allowed )
f1car	Valid

There is another restriction for identifiers. Some reserved words can't be used as identifiers. These reserved words are predefined by C language and have predefined meanings and can be used for their intended purposes. They are called keywords. The list of keywords is given below

C Programming Tutorial

To learn any new programming language, for example C, C++, Java, etc., most commonly used approach is to write a basic HelloWorld program first and then learn the concepts from that. Let's follow the same approach here and first write a C program and learn about basic syntax of C programming.

HelloWorld in C:


#include<stdio.h>
int main()
{
    printf("Hello world!");
    return 0;
}

How to Run the HelloWorld C Program:

To compile and run the above C program,

Or you can make use of online services like IdeOne. For learning purposes, this is the better option.

Pig Use Case in Telecom Industry

Telecom Industry generates huge amount of data (Call details). To process this data, we use pig to De-identify the user call data information.

First step is to store the data into HDFS, applying pig scripts on the loaded data and refining user call data and fetching important call information like Time rate, repetition rate, and some important log info. Once the de-identified Information comes out, the result will get stored into HDFS.

Like this huge amount of data comes into the system servers and it will be stored in HDFS and processed using scripts. During this process it will filter the data, iterates the data and produces results.

IT companies which use Pig to process their data are Yahoo, Twitter, LinkedIn and eBay. They use Pig to run most of their MR jobs. The pig is mainly used for web log processing, typical data mining situations and for image processing.

Pig Architecture

Pig Architecture consists of Pig Latin Interpreter and it will be executed on client Machine. It uses Pig Latin scripts and it converts the script into a series of MR jobs. Then It will execute MR jobs and saves the output result into HDFS. In between, it performs different operations such as Parse, Compile, optimize and plan the Execution on data that comes into the system.

Job Execution Flow

When a pig programmer develops scripts, they are stored in the local file system in the form of user defined functions. When we submit Pig Script, it comes in contact with pig Latin Compiler which splits the task and run a series of MR jobs, meanwhile Pig Compiler fetches data from HDFS (i.e. input file present). After running MR jobs, the output file is stored in HDFS.

Tuesday, December 29, 2015

Hadoop Pig

Pig is developed on top of Hadoop. It provides the data flowing environment for processing large sets of data. The pig provides a high-level language. It is an alternative abstraction on top of Map Reduce (MR). Pig program supports parallelization mechanism. For scripting of the Pig, it provides Pig Latin language.

Pig takes Pig Latin scripts and turns into a series of MR jobs. Pig scripting having its own advantages of running the applications on Hadoop from the client side. The nature of programming is easy, compared to low level languages such as Java. It provides simple parallel execution, the user can write and use its own custom defined functions to perform unique processing.

Pig Latin provides several operators like LOAD, FILTER, SORT, JOIN, GROUP, FOREACH and STORE for performing relational data operations. The operators implement data transformation tasks with simple lines of codes. Compared to MR code, the Pig Latin codes are very less in lines and gives better flexibility in some IT industrial use cases.

Monday, December 28, 2015

Hive Data Modeling

In Hive data modeling - Tables, Partitions and Buckets come in to picture.

Coming to Tables, it’s just like the way that we create a table in Traditional relational databases. The functionalities such as filtering, joins can be performed on the tables. Hive deals with two types of table structures - Internal and External, depends on the design of schema and how the data is getting loaded in to Hive.

Internal Table is tightly coupled with nature. At first, we have to create tables and load the data. We can call this one as data on the schema. By dropping this table, both data and schema will be removed. The stored location of this table will be at /user/hive/warehouse.

External Table is loosely coupled with nature. Data will be available in HDFS; the table is going to get created with HDFS data. We can say that its creating schema of data. At the time of dropping the table, it dropped only schema, data will be available in HDFS as before. External tables provide an option to create multiple schemas for the data stored in HDFS instead of deleting the data every time whenever schema updates.

Partitions

Partitions come into place when table is having one or more Partition keys which is the basis for determining how the data is stored. For Example: - “Client has Some E–commerce data which belong to India operations in which each state (29 states) operations mentioned in as a whole. If we take the state as partition key and perform partitions on that India data as a whole, we will be able to get a Number of partitions (29 partitions) which is equal to the number of states (29) present in India. Each state data can be viewed separately in the partition tables.”

Hive Vs Relational Databases

By using Hive, we can perform some peculiar functionalities that can't be achieved by Relational Databases. For huge amounts of data that is in beta bytes, querying it and getting results in seconds is important. In this scenario, the hive will achieve fast querying and produce results in a second time.

Some key differences between hive and relational databases are the following

Relational databases are of “Schema on READ and Schema on Write”. First creating a table and then inserting the data into the particular table. Insertions, Updates, Modifications can be performed on this relational database table.
Hive is “Schema on READ only”. Update, modifications won't work on this because the hive query in typical cluster is set to run on multiple Data Nodes. So it is not possible to update and modify data across multiple nodes. Hive provides READ Many WRITE Once.

Job Execution Inside Hive

HIVESERVER is an API that allows the clients (JDBC) to execute the queries on hive data warehouse and get the desired results. Under hive services driver, compiler and execution engine interact with each other and process the query.

The client submits the query via a GUI. The driver receives the queries in the first instance from GUI and it will define session handlers which will fetch required APIs that is designed with different interfaces like JDBC or ODBC. The compiler creates the plan for the job to be executed. Compiler in turn is in contact with matter and its gets metadata from Meta Store.

Hive Architecture

There are 3 major components in Hive as shown in the architecture diagram. They are hive clients, hive services and Meta Store. Under hive client, we can have different ways to connect to HIVE SERVER in hive services.

These are Thrift client, ODBC driver and JDBC driver. Coming to thrift client, it provides an easy environment to execute the hive commands from a vast range of programming languages. Thrift client bindings for Hive are available for C++, Java, PHP scripts, python scripts and Ruby. Similarly, JDBC and ODBC drivers can be used for communication between hive client and hive servers for compatible options.

Tuesday, December 22, 2015

Hadoop Hive

Hive is developed on top of Hadoop as its data warehouse framework for querying and analysis of data that is stored in HDFS. Hive is an open source-software that lets programmers analyze large data sets on Hadoop. Hive makes the job easy for performing operations like data encapsulation, ad-hoc queries, & analysis of huge datasets.

The hive’s design reflects its targeted use as a system for managing and querying structured data. While coming to structured data in general, Map Reduce doesn’t have optimization and usability features, but Hive framework provides those features. Hive’s SQL-inspired language separates the user from the complexity of Map Reduce programming. It reuses familiar concepts from the relational database world, such as tables, rows, columns and schema, to ease learning.

Hadoop programming works on flat files. The hive can use directory structures to “partition“ data to improve performance on certain queries. To support these enhanced features, a new and important component of Hive i.e. metastore is used for storing schema information. This Metastore typically resides in a relational database.

We can interact with hive using several methods; those are Web GUI and Java Database Connectivity (JDBC) interface. Most interactions tend to take place over a command line interface (CLI). Hive provides a CLI to write hive queries using Hive Query Language(HQL). Generally HQL syntax is similar to the SQL syntax that most data analysts are familiar with.

Big Data Analytics

Big Data Analytics (BDA) comes into the picture when we are dealing with the enormous amount of data that is being generated from the past 10 years with the advancement of the science and technology in different fields. To process this large amount of data and getting valuable meaning from it in a short span of time is a really challenging Task. Especially when four V’s that comes into the picture, when we discussing about BDA i.e. Volume, Velocity, Variety and Veracity of data.

Why and When to go for Big Data Analytics

Big data is a Revolutionary term that describes the Very large amount (Volume) of unstructured (text, images, videos), structured (tabular data) and semi-structured (json, xml) data that have the potential to be mined for information.

Volume (data at scale)

Volume is about large amount of data that is being generated daily from different type of sources, i.e. Namely, we can say like Social media data (Facebook, Twitter, Google), Satellite images, mining and sensor data, Different Type of Network logs generated from servers.

Integrating and processing these huge volumes of data, stored across a scalable and distributed environment poses a business huge challenge to analysts. Big IT Giants like Yahoo, Google generates Peta Bytes of data in less span of time. IT industry, the increase in data volume is in exponential terms compared to the past.

Pages

Monday, February 1, 2016

Examples of Logical Operators

Monday, January 18, 2016

Tuesday, January 12, 2016

'Const' Qualifier in C

'Volatile' Type Qualifier in C

Monday, January 11, 2016

Void Type

Sunday, January 10, 2016

Thursday, January 7, 2016

Wednesday, January 6, 2016

Basic data types

Tuesday, January 5, 2016

Identifiers and Keywords

Monday, January 4, 2016

HelloWorld in C:

How to Run the HelloWorld C Program:

Sunday, January 3, 2016

Telecom Industry generates huge amount of data (Call details). To process this data, we use pig to De-identify the user call data information.

Wednesday, December 30, 2015

Job Execution Flow

Tuesday, December 29, 2015

Monday, December 28, 2015

Sunday, December 27, 2015

Thursday, December 24, 2015

Wednesday, December 23, 2015

Tuesday, December 22, 2015

Tuesday, December 8, 2015

Why and When to go for Big Data Analytics

Volume (data at scale)