Friday, December 25, 2015

Spark Scala : Setting up environment and Project with Scala-IDE

I have been exploring various aspects of connecting to standalone Spark from scala and setting up the project  and found it difficult being a .NET developer as we don't have enough knowledge on working with Java environment

Pre-Requisites:

I expect you have installed Spark,and hadoop ecosystem in a standalone cluster and trying to connect from your machine .

We can code in Java ,Python and scala , scala is preferrable as spark core engine is built on scala

We can build the code using sbt(simple build tool), here I'm going with Maven (even I don't know what it is :-p .... ) I have tried with SBT but unable to fix few issues which were unanswered
 sbt unresolved exception.

Installations Required:

Scala Language
Scala IDE (How ever JDK is required )

1)Opening up the IDE we got when downloaded Scala IDE .



2)Creating a new project:

Create New Maven Project with steps below
Right Click in Package Explorer Page --> New --> Project-->Type in textbox maven or just expand Maven Folder below-->Maven Project  (And as usual next next :-p, any way don't forget to give your artifactId and Group ID ).

To add scala files or classes or object you have o inject scala nature to it , by right clicking on project
Configur--> Add Scala Nature.

You can see pom.xml file where your entire configuration for project will present and can mention the dependencies of the project (which will download )




You will end up with below issues  after build (you can see building work space at bottom right)

This errors can be resolved by :
Download Maven and add it to environment variable
Right Click on the Project in Eclipse -> Maven -> Update Project On this screen select the check box Force Update for Snapshots/Releases

Why to Update : Sometimes while downloading dependencies proxies may be denying and incomplete , as you can see a red cross mark on pom.xml and when you excatly see the error which says : multiple annotations found at this line pom.xml

This will clear the error and now when you build will end up with below error...
You can set this scal version incompatibility  by right clicking on project --> Properties-->Scala Compiler and check use project settings


Now go ahead and write your "Hello World Program"
In your project -->right click on folder src/main/java-->New-->Package-->Give Name  and it will create a object and write program as below and run




There you go...with your 1st program in scala...!


P.S: In the next post will see how to connect to hadoop standalone cluster and write your word count program in scala.

If you face any issues or feel anywhere content placed is wrong , feel free to contact me @ sudhir9p@gmail.com







Saturday, November 14, 2015

C# : Part2 [Basics]


As In previous article we have seen few of the pre-defined keywords intricacies like access modifiers, class definitions , parameters, mutable and immutable objects , today let us see other topics like delegates,events and flow control statements

Delegate: is a type that references methods
eg: public delegate void DisplayMessage(string message);

Now we have to declare a variable of this type and make this variable to point to a method which can be static or instance method
-->The method we are going to point should have return type void (i.e same as a delegate) and accepts input parameter  as string

Lets say we have class Display

class Display
 {
    public void showMyMessage(string message)
     {
       Console.WriteLine(message);
     }
 }

Display objDisplay1 = new Display();
Now we can invoke 'showMyMessage''by creating a variable of declared delegate and making it to point to this method as below:

DisplayMessage objDisplayMessage = new DisplayMessage(objDisplay1.showMyMessage);

objDisplayMessage("Hi sudhirr");

Why Delegate : Abstraction , as the end user do not know what is happening and when we expose through services, we can add the invocation list i.e a delegate can invoke any number of methods 
eg: objDisplayMessage +=objDisplay1.showMyMessage;
      objDisplayMessage +=objDisplay2.showMyMessage;

Sometimes we may end up with issues when things like objDisplayMessage =null ; happens any user may send malicious content that can destroy the invocation list , here comes events in to picture where we can attach events (same like delegate but no initialization i.e no chances of making null  )

Events :
Events are variables of type delegate with event keyword.
We can attach events by += and detach events by -=

using event:
btnGo_Click+= buttonclick;
void buttonclick(object sender, RoutedEventArgs e)
{
    objPerson.Name ="Teja";
}
using delegate:

btnGo_Click+=delegate(object sender, RoutedEventArgs e)
{
     objPerson.Name ="Teja";

}

i.e giving inline , creating an anonymous method.

Constructors :
We have static constructors and  Instance constructors.
Static constructors: will have static keyword, there can be only one static const(i.e no overloading ) and it cannot have access modifier on it (For any given type it will be executed only once i.e during run time, even though any numberof times that type  is used or initialized).
-->Static const executes before the instance const.

Just to say: 
public Person()
            : this("Test")
  {

  }
This constructor jus passes the control to the constructor which accepts a string param 

Variable Initializers: 
Generally we do declare instance of a class and assign values to its properties like...
Person objPerson = new Person();
objPerson.Name = "Mahesh";

objPerson.Id = 1;

where variable Initialization is writing code in below way:
Person objPerson = new Person
{
  Name = "Mahesh",
  Id = 1

};

ForEach:
Syntax:  
int[] numbers = { 16, 53, 2, 31, 51, 56, 22, 28, 13, 42 };
foreach (int i in numbers)
{
  Console.WriteLine("Number is " + i);

}

What happens internally :
int[] numbers = { 16, 53, 2, 31, 51, 56, 22, 28, 13, 42 };
IEnumerator ObjEnumerator = numbers.GetEnumerator();
while (ObjEnumerator.MoveNext())
  {
  Console.WriteLine("Number is " + (int)ObjEnumerator.Current);
  }

So to use foreach on objects it should be inherited from IEnumerable, internally to iterate through any object C# calls above code.
If we try to iterate i.e apply foreach on objects that does not inherit from IEnumerable then we get errors like the object does not implement GetEnumerator.

Jumping Statements:
break: same as in switch we use break after a statement in case to come out of switch we use similarly for all other looping statements to come out of that loop(only that loop) and continue exec below code.
continue : Same as break which does not allow to exec code below it and continues with next iteration (whereas break stops any more iterations and comes out of loop )

goto : can skip the lines below it  and go to the set of lines which are mentioned with a label 


foreach (int i in numbers)
 {
   if (i == 53)
   goto DisplayMe;
 }
  DisplayMe:

      Console.WriteLine("My Name");
This diplayMe label may be inside foreach or inside that method, if out side forloop then iteration stops and execution flow continues to next line.

return : Will force to come out of the method, we can use this in void method too with just return;
we can use yield return to return to build a collection i,e IEnumerable Object , i.e build up collection in a lazy manner 

foreach (int n in GetEnumerable())
 {
   Console.WriteLine(n);

 }

public static IEnumerable GetEnumerable()
{
  yield return 10;
  yield return 20;
  yield return 30;
}

we get output 10 20 30  i.e using 'yield return'  will return a collection i.e Ienumerable (Only on which foreach can operate ) but here lazy load implies it will not completely execute GetEnumerable and return values 10,20,30 ata time as collection after returning 10 it will execute what is inside foreach loop and then comes back to where it stopped and continues 

throw : to raise an exception 


Sunday, October 25, 2015

C# : Part1 [Basics]

In this article , today we gonna look at some of the pre-defined keywords used in C#, access specifiers, value types & reference types and other basic concepts

Intro:
--> C# has been registered in   international recognized standards organisation where we can see specs

-->The C# code can be written in notepad and compiled using inbuilt compiler through command line, we can see C Sharp Compiler(csc.exe) in below path 
     C:\Windows\Microsoft.NET\Framework\v4.0.30319
We can compile our code using this compiler in following way
          C:\Windows\Microsoft.NET\Framework\v4.0.30319\csc.exe yourfile.cs
This produces executable file of your code , you can see the yourfile.exe by giving dir command and now you can exec that file by giving yourfile.exe 

We do have String[] args param in Main methods , we can pass parameters to it from command line  after your exe file

-->When Projects are added to Solution explorer each proj will be converted to an assembly i.e either as a dll or an executable file. 

Class:
They  define types which is understandable by assembly, which contains
 -->Properties,Fields,..i.e data the state an object can hold
 -->Behovior : Methods 
We can define accessibility of members in class

Objects are Instances of a type: 
We can create multiple instances, each instance holds different state, each instance has same behavior

Classes create reference types, Objects are stored on heap. Variables reference the object instance.
Default classes inherit  System.Object class

Employee e1 = new Employee();

e1 is referencing Employee object on heap but its not forever , we can again have i.e another employee object

e1 = new Employee();
e1.Name = "Sudhir";

Imp note here is we can have multiple variables referencing same object
eg:
Employee e2; //we are not initializing to new Employee object
e2=e1; // e2 is referencing same object that is created and assigned to  e1, we can check this by object.ReferenceEquals(e1, e2); which will return true as both reference same memory

e2.Name = "Paturu";

When we print e1.Name we still get "Paturu" because e1 and e2 both are referencing same object on heap

Mutable vs Immutable:

AS said most reference type objects are Mutable i.e can be changed at any point of time it will not create another object or memory on heap but where as  string is immutable even though it is reference type

string s1 = "Hi";

string s2 = "Hello";

s2=s1;   // Same as above class example when we check object.ReferenceEquals it will return true
but when s1="Sudhir"; and check now Object.ReferenceEquals of s1 & s2 it will return false because string is Immutable(its State cannot be changed) it will now reference other memory location in heap , so reference will be changed hence now s1 will reference one memory and s2 refer other 

Inheritance: Ability to define a class that inherits state and properties and behaviour from other calsses.
    This is one approach to reuse code (But not to reuse code we use inheritance )
Encapsulation : Ability to hide details inner workings of a class

Polymorphism: Plays role with inheritance to reuse code with extensibility mechanism where by customizing class that is inheriting behaviour for other class

Access Modifiers: 

Public : Can apply to class or member of a class and it creates open access , classes declared public can be  accessed from other projects implied they should be added reference

Protected : Can be applied to Members  of a class , access is limited to the class that declaring the protected member and any derived class(inherited class). Only when inherited (Can be accessed when inherited both in current assembly and outside also) but where as when we create instance of that class we cannot access it.

Internal : Can be applied to a class or Member of a class , where access is limited to current assembly .
This is default accessibility of a class in C# . i.e only internal classes can be referred inside the application or project even though this is referred in another project we cannot access such classes because it is out if assembly 

Protected Internal : Applied to members of class where there access varies in current assembly and when referred in other projects
1.When in current assembly it acts as internal i.e when we create instance of class we can access it.
2.Outside the assembly it can be accessed but only when from derived class i.e when it is inherited.i.e acts as protected outside the assembly.

private : Can be applied to members Default accessibility of a member, access limited to the class

Abstract:
This  can be applied class  and can also applied to methods, properties all members
 Abstract class cannot be instantiated , i.e we cannot use new operator to create object i.e it is designed to use it as base class , only can be inherited i.e only for reuse 
The members in base class does not have implementation jus declaration 

If abstract classes have abstract methods then such methods need to be implemented in derived class  , we can implement using a override keyword

Virtual : This  keyword creates  virtual members, where we can change the behaviour of member if required in derived class using Override keyword , while overriding we may sometimes need to implement that original definition with some extra features then we can use base.thatMethod(); inside override before your custom implementation

Static : Are members of the type, Cannot invoke the member through object instance , Cannot instantiate a static class eg: Math.Pi, Colors.Red etc  . Only one copy is maintained through out once declared.


Sealed : Sealed classes cannot be inherited (To prevent extensibility or misuse, for security purposes)
eg:  System.String is sealed class we cannot inherit it. (Improves performance)

In the same way for members: When a method is declared as virtual we can override with sealed keyword so that when this child class is inherited by some other class then we cannot override that method.

Partial :  This keyword is used to split class definition across multiple files. We can also have  partial methods just like classes , partial methods cannot have implementation jus like abstract methods. we can define them in other file of this  partial class.

Reference Types:
Multiple variables can point to same object , single variable can point to multiple objects over its lifetime i.e (Mutable ..its sate can be changed ).
--> Objects are allocated on the heap by new operator  (Value type variables will be stored on stack.)


Value Types:
Variables hold value, No pointers or references, Immutable . No objects are allocated on heap - lightweight .
Many built in primitives are value type only allowed to store less data not more than 16 bytes .. When we copy a variable x of  Int32 to variable of y Int32 then entire whole 32bits are copied which causes performance issues

Struct : These are also Value type , Like a class we add properties,fields and methods but we cannot inherits these structs and should be less than 16 bytes and ,must be primitive data types

Parameters:  In C# all parameters always pass by value (default), where as reference types pass a copy of the reference i.e changes are propagated to caller.

   eg:
class Dog
 {
    public string name { get; set; }

 }
 static void Main(string[] args)
 {
   Dog obj = new Dog();
   obj.name = "Main Obj";
   Console.WriteLine("Original :" + obj.name);
   TestingParams(obj);
   Console.WriteLine("After Calling Method : " + obj.name);
   Console.Read();
 }

public static void TestingParams(Dog objDog)
  {
    objDog.test = "Modified";

  }

Here, second time we will get o/p as Modified  even though base object isn't modified in main method.It implies for reference types params changes effect caller object as they pass reference not actual value.

Note: This is not applicable for string because its immutable as defined above.

Parameter Keywords :  For those types where we cannot change values of caller method i.e Value types we can use keywords 'ref' and 'out'

Both are used for same purpose , but major difference is  when you declare with 'ref' in the child method it will check whether that is initialized with some value in parent method because it will be used in chid method where as when declared as 'Out' no need to initialize in parent method.

eg: 
public static void Method1()
 {
   string Name;
   int Id;
   Dog obj = new Dog();
   obj.Name = "alpha";
   ChildMethod(ref Name, out Id, obj)  // Here we get obj.Name as Beta
   childMethod2(obj); // Here Obj.Name will not be Test Dog . If we like to change                      the base object then we need to pass its reference like ref obj

        }

public void childMethod(ref string Name, out int id, Dog objDog)
  {
     id = 0;    // we have to define here otherwise we get error as  out param must                     be declared before control leaves current  method   
     objDOg.Name = "Beta";

   }


public void childMethod2(Dog objDog)
 {
   objDog = new Dog(); // A new instance is declared no more points to same                     reference pointed by object in main  method
   objDog.Name = "TestDog";

 }


params:  when you like to send a set of values of same type then we declare an array and send it , where as params keyword provides flexibility to pass by values (any number dynamically)

eg : 
public void Testing(int[] values)
 {

 }

To send integers you want to play we pass integer array or something like  Testing(params int[] values);
where as when we declare method with params keyword as below we can have flexibility of passing values directly like Testing(2,3); or Testing(4,5,6,7); or Testing(9);
public void Testing(params int[] values)
 {


 }


enum:  Creates a value type set of named constants (Why : To Improve Readability)
--> Underlying datatype is  int by default  
By default value in an enumeration start with zero , however we can explicitly define the behaviour 

Arrays: Sample data structure for managing  a collection of variables , everything inside  will have same datatype , indexed from Zero . The index of item which is not in array will be '-1'.
Array inherits from many interfaces like Iclonable, IList etc so in the params instead a array variable we can also use IList or any interface (It will take any element that implement IList interface)