Sunday, October 2, 2016

LINQ - Method Syntax

It is a technology that give you the ability to query data from C# i.e  data imply object that are in memory on heap , objects from database or xml or any data-source.

Provides almost 50+ query operators which include Sorting, Filtering , joining , grouping, partitioning, projecting , aggregating etc ..

So on what types of data we can apply querying ? 

Lowest level data type that was chosen is IEnumerable<T>

But we don't have  interesting methods to query data we only have methods to loop through ,so to define new methods to IEnumerable<T>, Microsoft took extension methods i.e by extending it.. extension method is a method that looks like a member of a given type but in reality its a STATIC method on a different type

It may Seem Why Extension methods ... when we have inheritance for example we have a class called DateTime which cannot be inherited because of its sealed property so what to do if I have to have additional functionality like some calculation or custom logic through out the code ....

What LINQ can do?

We can easily get the things like 
what is the average invoice  amount customer,list of all customers with greater than 35 or starting with A etc , all these SQL type of querying can be done on data sources.

On What DataSources we can apply this is which have LINQ Providers,
DataSouces which have Linq providers are 

SQL DataSources,Entities(EF), Objects(In Memory Data), XML

We have 2 diff types of LINQ syntax  Query Syntax and Method Syntax



This Code looks like query iterates over the list of customers and finds all the customers that matches id and uses first method to return first element of sequence. BUT it is not Happening



Linq uses something called deferred execution  i.e code defining the linq statement is doing just that i.e defining statement only ...the Linq query itself is not executed until the result is required.

Calling any operator or method on query that must iterate over the result will cause query to execute.. here above when written .First() method the result is executed which is performance level issue.

The above is linq query syntax i,e built in to language i.e C# or VB and not Dot NEt CLR , therefore the query syntax should be translated in method calls for CLR when code is compiled. Instead of using linq query syntax we can call those methods directly using Linq Method syntax , since those methods are implemented as extension methods 

Extension Methods : An extension method is a method on class that extends its functionality without modifying or recompiling th original class .
An extension method is a static method on a static class , the first parameter should be the type that should be extended and also should be prefixed with this keyword.


















Usage:













Extension methods have a small down arrow indication.

Sorting:

Ordering Operators:
 OrderBy
 OrderByDescending,
 ThenBy,
 ThenByDescending,
 Reverse.

the execution of query expression using these operators is deferred until the code requests an item from resulting sequence  

return customerList.OrderBy(c => c.LastName)
  .ThenBy(c=> c.FirstName);

If you want to order by two columns then we have to use thenby i.e if you use orderby two times then it gets sort by last orderby column only.. Such Cases we have to use ThenBy

Same if you want descending then  .. we have to use OrderByDescending 

We can handle nulls by making the datatype nullable and we can make where these null values can appear i.e either starting or at the ending i.e 


return customerList.OrderByDescending(c => c.CustomerTypeId.HasValue)   //This will make nulls to come at end
  .ThenBy(c=>c.CustomerTypeId); 

  Creating:

We may get a scenario to create enumerable sequence, creating seq of vlaues, Linq Generation operators are 
Range, -- Creates sequence of integral numbers within a specified range 
Repeat, -- Generates a sequence of one repeated value 
These Operators are static methods in Enumerable Class but not Extension Methods and Not Deferred i.e Sequence is immediately built. Using these we can create Sequential series or random series 

var seq1 = Enumerable.Range(0, 10);  // We get an integer array of size 10 with values from 0 to 9
  var seq2 = Enumerable.Range(0, 10)
  .Select(i => i * i); // We can Have some custom logic implemented inside 

We can write multiple statements inside a Lambda Expression as below
Enumerable.Range(0, 10)
  .Select(i =>
  {
  var k = 0;
  return i * i + k;
  });

var integers = Enumerable.Repeat(-1, 10);  //  Creates integer array of value '-1 ' ten times.

Random rand = new Random();
rand.Next(0,26)  creates a new Random number ranging between 0 and 25 
var strings = Enumerable.Range(0, 10)
  .Select(i => ((char)('A' + rand.Next(0, 26))).ToString());

Comparing:

comparing sequence of objects, like comparing processed list to original list  determining the difference of records or performing some data synchronization  (All are extension methods)

Intresect,  --This operator produces set intersection of two sequences ; basically defining elements seq1.Intersect(seq2)
 Except, seq1.Except(seq2)
 Concat,-- Merging  , seq1.Concat(seq2);
 Distinct , seq1.Concat(seq2).Distinct();
 Union .- By defining he unique item for merging sequences   seq1.Union(seq2);

Projection : Refers to operation of Transforming an object in to new form , i.e using SELECT or SELCTMany operators 
If we donot provide projection operators then the type returned from query will be sequence of items of type defined in original sequence 

If we provide Projection operators , the result will be a sequence that is shaped differently from sequence we have passed 

Select
SlectMany ,  projects multiple sequences based on transform function and flatens in to one sequence


Examples of Select:

public IEnumerable <string > GetNames(List<Customer> customerList)
        {
            var query = customerList.Select(c => c.LastName + ", " + c.FirstName);
            return query;
        }

Defining anonymous types in Linq using new keyword.

public dynamic GetNamesAndEmail(List<Customer> customerList)
        {
            var query = customerList.Select(c => new
                            {
                                Name = c.LastName + ", " + c.FirstName,
                                c.EmailAddress
                            });
           foreach (var item in query)
            {
                Console.WriteLine(item.Name + ":" + item.EmailAddress);
            }
            return query;
        }

Using JOINS

var query = customerList.Join(customerTypeList,
                                c => c.CustomerTypeId,
                                ct => ct.CustomerTypeId,
                                (c, ct) => new
                                {
                                    Name = c.LastName + ", " + c.FirstName,
                                    CustomerTypeName = ct.TypeName
                                });

types When a collection have fields in which one of the child is again a collection and you want to filter the Parent / Master collection using child (i.e inner filed values) we can use SelectMany

Parent Object has as collection of related or child objects

eg: A customer(Parent object )as set of invoices(which is also a collection ) or shoppping cart as items purchased
WE MAY NEED TO FIND INFORMATION ON THE PARENT OBJECT USING CHILD DATA PROPERTIES 

eg: Total price of items in the cart , customers based on invoice info(i.e not paid etc)

We can also do with select ..

For Example we have Customers and we have to find out customers with OverDue Invoice i.e not paid then we have to check the child object condition..
doing so will return an enumerable of enumerable of invoices and not customers as follows

public IEnumerable <IEnumerable <Invoice >> GetOverdueCustomers1(List<Customer> customerList)
        {
            var query = customerList
                        .Select(c => c.InvoiceList
                                    .Where(i => (i.IsPaid ?? false) == false));
            return query;
        }

but we need customers and not invoices and that too only IEnumerable , we can write above as follows

public IEnumerable <Invoice > GetOverdueCustomers1(List<Customer> customerList)
        {
            var query = customerList
                        .SelectMany(c => c.InvoiceList
                                    .Where(i => (i.IsPaid ?? false) == false));
            return query;
        }

But to get List of customers we have to use SELECT MANY so that result selector is type of customer

public IEnumerable <Customer > GetOverdueCustomers(List<Customer> customerList)
        {
            var query = customerList
                        .SelectMany(c => c.InvoiceList
                                    .Where(i => (i.IsPaid ?? false) == false),
                                    (c,i)=> c).Distinct();
            return query;
        }

Imp: As Linq is Deferred execution i.e not gets executed until we use i.e IEnumerable is Lazy Loading when we directly assign to a datasource we will not get any data , so We have to use .ToList()

CustomerGridView.DataSource = customerRepository.SortByName(customerList).ToList();

Imp: When we have anonymous type then after retrieving when we assign directly to a datasource it throws error as cannot convert object to IEnumerable so we need to convert the query itself to .ToList() before passing.

.ToList() must be used only at the time of assigning because that will cause the query to execute.


GroupBy :

public dynamic GetInvoiceTotalByIsPaid(List<Invoice> invoiceList)
        {
            var query = invoiceList.GroupBy(inv =>inv.IsPaid ?? false,
                                            inv => inv.TotalAmount,
                                            (groupKey, invTotal) => new
                                            {
                                                Key= groupKey,
                                                InvoicedAmount = invTotal.Sum()
                                            });
            foreach ( var item in query)
            {
                Console.WriteLine(item.Key + ": " + item.InvoicedAmount);
            }

            return query;
        }
Here we want to get total amount of invoices based on the invoices that got paid and not paid... i.e group invoices by IsPaid property ... so 
1st param is on which prop you would like to group by and then what result  you would like to see i.e what column aggregation you want and then the result set ...

NOT MUCH IN RESULT SET generally for join like statement you will take 1st datasource and 2nd datasource here the parameters are GROUP BY  KEY AND AGGREGATE COLUMN WE mentioned ..

If we want to group by 2 columns then the 1st parameter is the object (using new keyword)

 var query = invoiceList.GroupBy(inv => new
                            {
                                IsPaid = inv.IsPaid ?? false,
                                InvoiceMonth = inv.InvoiceDate.ToString( "MMMM")
                            },
                                            inv => inv.TotalAmount,
                                            (groupKey, invTotal) => new
                                            {
                                                Key = groupKey,
                                                InvoicedAmount = invTotal.Sum()
                                            });
            foreach ( var item in query)
            {
                Console.WriteLine(item.Key.IsPaid + "/" +
                                    item.Key.InvoiceMonth + ": " + item.InvoicedAmount);
            }


            return query;

Friday, December 25, 2015

Spark Scala : Setting up environment and Project with Scala-IDE

I have been exploring various aspects of connecting to standalone Spark from scala and setting up the project  and found it difficult being a .NET developer as we don't have enough knowledge on working with Java environment

Pre-Requisites:

I expect you have installed Spark,and hadoop ecosystem in a standalone cluster and trying to connect from your machine .

We can code in Java ,Python and scala , scala is preferrable as spark core engine is built on scala

We can build the code using sbt(simple build tool), here I'm going with Maven (even I don't know what it is :-p .... ) I have tried with SBT but unable to fix few issues which were unanswered
 sbt unresolved exception.

Installations Required:

Scala Language
Scala IDE (How ever JDK is required )

1)Opening up the IDE we got when downloaded Scala IDE .



2)Creating a new project:

Create New Maven Project with steps below
Right Click in Package Explorer Page --> New --> Project-->Type in textbox maven or just expand Maven Folder below-->Maven Project  (And as usual next next :-p, any way don't forget to give your artifactId and Group ID ).

To add scala files or classes or object you have o inject scala nature to it , by right clicking on project
Configur--> Add Scala Nature.

You can see pom.xml file where your entire configuration for project will present and can mention the dependencies of the project (which will download )




You will end up with below issues  after build (you can see building work space at bottom right)

This errors can be resolved by :
Download Maven and add it to environment variable
Right Click on the Project in Eclipse -> Maven -> Update Project On this screen select the check box Force Update for Snapshots/Releases

Why to Update : Sometimes while downloading dependencies proxies may be denying and incomplete , as you can see a red cross mark on pom.xml and when you excatly see the error which says : multiple annotations found at this line pom.xml

This will clear the error and now when you build will end up with below error...
You can set this scal version incompatibility  by right clicking on project --> Properties-->Scala Compiler and check use project settings


Now go ahead and write your "Hello World Program"
In your project -->right click on folder src/main/java-->New-->Package-->Give Name  and it will create a object and write program as below and run




There you go...with your 1st program in scala...!


P.S: In the next post will see how to connect to hadoop standalone cluster and write your word count program in scala.

If you face any issues or feel anywhere content placed is wrong , feel free to contact me @ sudhir9p@gmail.com







Saturday, November 14, 2015

C# : Part2 [Basics]


As In previous article we have seen few of the pre-defined keywords intricacies like access modifiers, class definitions , parameters, mutable and immutable objects , today let us see other topics like delegates,events and flow control statements

Delegate: is a type that references methods
eg: public delegate void DisplayMessage(string message);

Now we have to declare a variable of this type and make this variable to point to a method which can be static or instance method
-->The method we are going to point should have return type void (i.e same as a delegate) and accepts input parameter  as string

Lets say we have class Display

class Display
 {
    public void showMyMessage(string message)
     {
       Console.WriteLine(message);
     }
 }

Display objDisplay1 = new Display();
Now we can invoke 'showMyMessage''by creating a variable of declared delegate and making it to point to this method as below:

DisplayMessage objDisplayMessage = new DisplayMessage(objDisplay1.showMyMessage);

objDisplayMessage("Hi sudhirr");

Why Delegate : Abstraction , as the end user do not know what is happening and when we expose through services, we can add the invocation list i.e a delegate can invoke any number of methods 
eg: objDisplayMessage +=objDisplay1.showMyMessage;
      objDisplayMessage +=objDisplay2.showMyMessage;

Sometimes we may end up with issues when things like objDisplayMessage =null ; happens any user may send malicious content that can destroy the invocation list , here comes events in to picture where we can attach events (same like delegate but no initialization i.e no chances of making null  )

Events :
Events are variables of type delegate with event keyword.
We can attach events by += and detach events by -=

using event:
btnGo_Click+= buttonclick;
void buttonclick(object sender, RoutedEventArgs e)
{
    objPerson.Name ="Teja";
}
using delegate:

btnGo_Click+=delegate(object sender, RoutedEventArgs e)
{
     objPerson.Name ="Teja";

}

i.e giving inline , creating an anonymous method.

Constructors :
We have static constructors and  Instance constructors.
Static constructors: will have static keyword, there can be only one static const(i.e no overloading ) and it cannot have access modifier on it (For any given type it will be executed only once i.e during run time, even though any numberof times that type  is used or initialized).
-->Static const executes before the instance const.

Just to say: 
public Person()
            : this("Test")
  {

  }
This constructor jus passes the control to the constructor which accepts a string param 

Variable Initializers: 
Generally we do declare instance of a class and assign values to its properties like...
Person objPerson = new Person();
objPerson.Name = "Mahesh";

objPerson.Id = 1;

where variable Initialization is writing code in below way:
Person objPerson = new Person
{
  Name = "Mahesh",
  Id = 1

};

ForEach:
Syntax:  
int[] numbers = { 16, 53, 2, 31, 51, 56, 22, 28, 13, 42 };
foreach (int i in numbers)
{
  Console.WriteLine("Number is " + i);

}

What happens internally :
int[] numbers = { 16, 53, 2, 31, 51, 56, 22, 28, 13, 42 };
IEnumerator ObjEnumerator = numbers.GetEnumerator();
while (ObjEnumerator.MoveNext())
  {
  Console.WriteLine("Number is " + (int)ObjEnumerator.Current);
  }

So to use foreach on objects it should be inherited from IEnumerable, internally to iterate through any object C# calls above code.
If we try to iterate i.e apply foreach on objects that does not inherit from IEnumerable then we get errors like the object does not implement GetEnumerator.

Jumping Statements:
break: same as in switch we use break after a statement in case to come out of switch we use similarly for all other looping statements to come out of that loop(only that loop) and continue exec below code.
continue : Same as break which does not allow to exec code below it and continues with next iteration (whereas break stops any more iterations and comes out of loop )

goto : can skip the lines below it  and go to the set of lines which are mentioned with a label 


foreach (int i in numbers)
 {
   if (i == 53)
   goto DisplayMe;
 }
  DisplayMe:

      Console.WriteLine("My Name");
This diplayMe label may be inside foreach or inside that method, if out side forloop then iteration stops and execution flow continues to next line.

return : Will force to come out of the method, we can use this in void method too with just return;
we can use yield return to return to build a collection i,e IEnumerable Object , i.e build up collection in a lazy manner 

foreach (int n in GetEnumerable())
 {
   Console.WriteLine(n);

 }

public static IEnumerable GetEnumerable()
{
  yield return 10;
  yield return 20;
  yield return 30;
}

we get output 10 20 30  i.e using 'yield return'  will return a collection i.e Ienumerable (Only on which foreach can operate ) but here lazy load implies it will not completely execute GetEnumerable and return values 10,20,30 ata time as collection after returning 10 it will execute what is inside foreach loop and then comes back to where it stopped and continues 

throw : to raise an exception