Friday, December 25, 2015

Spark Scala : Setting up environment and Project with Scala-IDE

I have been exploring various aspects of connecting to standalone Spark from scala and setting up the project  and found it difficult being a .NET developer as we don't have enough knowledge on working with Java environment

Pre-Requisites:

I expect you have installed Spark,and hadoop ecosystem in a standalone cluster and trying to connect from your machine .

We can code in Java ,Python and scala , scala is preferrable as spark core engine is built on scala

We can build the code using sbt(simple build tool), here I'm going with Maven (even I don't know what it is :-p .... ) I have tried with SBT but unable to fix few issues which were unanswered
 sbt unresolved exception.

Installations Required:

Scala Language
Scala IDE (How ever JDK is required )

1)Opening up the IDE we got when downloaded Scala IDE .



2)Creating a new project:

Create New Maven Project with steps below
Right Click in Package Explorer Page --> New --> Project-->Type in textbox maven or just expand Maven Folder below-->Maven Project  (And as usual next next :-p, any way don't forget to give your artifactId and Group ID ).

To add scala files or classes or object you have o inject scala nature to it , by right clicking on project
Configur--> Add Scala Nature.

You can see pom.xml file where your entire configuration for project will present and can mention the dependencies of the project (which will download )




You will end up with below issues  after build (you can see building work space at bottom right)

This errors can be resolved by :
Download Maven and add it to environment variable
Right Click on the Project in Eclipse -> Maven -> Update Project On this screen select the check box Force Update for Snapshots/Releases

Why to Update : Sometimes while downloading dependencies proxies may be denying and incomplete , as you can see a red cross mark on pom.xml and when you excatly see the error which says : multiple annotations found at this line pom.xml

This will clear the error and now when you build will end up with below error...
You can set this scal version incompatibility  by right clicking on project --> Properties-->Scala Compiler and check use project settings


Now go ahead and write your "Hello World Program"
In your project -->right click on folder src/main/java-->New-->Package-->Give Name  and it will create a object and write program as below and run




There you go...with your 1st program in scala...!


P.S: In the next post will see how to connect to hadoop standalone cluster and write your word count program in scala.

If you face any issues or feel anywhere content placed is wrong , feel free to contact me @ sudhir9p@gmail.com