Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Cloudera CCA175 - CCA Spark and Hadoop Developer Exam

Page: 3 / 3
Total 96 questions

Problem Scenario 83 : In Continuation of previous question, please accomplish following activities.

1. Select all the records with quantity >= 5000 and name starts with 'Pen'

2. Select all the records with quantity >= 5000, price is less than 1.24 and name starts with 'Pen'

3. Select all the records witch does not have quantity >= 5000 and name does not starts with 'Pen'

4. Select all the products which name is 'Pen Red', 'Pen Black'

5. Select all the products which has price BETWEEN 1.0 AND 2.0 AND quantity BETWEEN 1000 AND 2000.

Problem Scenario 25 : You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)

sex,name,city

1,alok,mumbai

1,jatin,chennai

1,yogesh,kolkata

2,ragini,delhi

2,jyotsana,pune

1,valmiki,banglore

Create a flume conf file using fastest non-durable channel, which write data in hive warehouse directory, in two separate tables called flumemaleemployee1 and flumefemaleemployee1

(Create hive table as well for given data}. Please use tail source with /home/cloudera/flumetest/in.txt file.

Flumemaleemployee1 : will contain only male employees data flumefemaleemployee1 : Will contain only woman employees data

Problem Scenario 27 : You need to implement near real time solutions for collecting information when submitted in file with below information.

Data

echo "IBM,100,20160104" >> /tmp/spooldir/bb/.bb.txt

echo "IBM,103,20160105" >> /tmp/spooldir/bb/.bb.txt

mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt

After few mins

echo "IBM,100.2,20160104" >> /tmp/spooldir/dr/.dr.txt

echo "IBM,103.1,20160105" >> /tmp/spooldir/dr/.dr.txt

mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt

Requirements:

You have been given below directory location (if not available than create it) /tmp/spooldir . You have a finacial subscription for getting stock prices from BloomBerg as well as

Reuters and using ftp you download every hour new files from their respective ftp site in directories /tmp/spooldir/bb and /tmp/spooldir/dr respectively.

As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume/finance location in a single directory.

Write a flume configuration file named flume7.conf and use it to load data in hdfs with following additional properties .

1. Spool /tmp/spooldir/bb and /tmp/spooldir/dr

2. File prefix in hdfs sholuld be events

3. File suffix should be .log

4. If file is not commited and in use than it should have _ as prefix.

5. Data should be written as text to hdfs

Problem Scenario 10 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following.

1. Create a database named hadoopexam and then create a table named departments in it, with following fields. department_id int,

department_name string

e.g. location should be hdfs://quickstart.cloudera:8020/user/hive/warehouse/hadoopexam.db/departments

2. Please import data in existing table created above from retaidb.departments into hive table hadoopexam.departments.

3. Please import data in a non-existing table, means while importing create hive table named hadoopexam.departments_new

Problem Scenario 44 : You have been given 4 files , with the content as given below:

spark11/file1.txt

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework

spark11/file2.txt

The core of Apache Hadoop consists of a storage part known as Hadoop Distributed File System (HDFS) and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.

spark11/file3.txt

his approach takes advantage of data locality nodes manipulating the data they have access to to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking

spark11/file4.txt

Apache Storm is focused on stream processing or what some call complex event processing. Storm implements a fault tolerant method for performing a computation or pipelining multiple computations on an event as it flows into a system. One might use Storm to transform unstructured data as it flows into a system into a desired format

(spark11Afile1.txt)

(spark11/file2.txt)

(spark11/file3.txt)

(sparkl 1/file4.txt)

Write a Spark program, which will give you the highest occurring words in each file. With their file name and highest occurring words.

Problem Scenario 84 : In Continuation of previous question, please accomplish following activities.

1. Select all the products which has product code as null

2. Select all the products, whose name starts with Pen and results should be order by Price descending order.

3. Select all the products, whose name starts with Pen and results should be order by Price descending order and quantity ascending order.

4. Select top 2 products by price

Problem Scenario 70 : Write down a Spark Application using Python, In which it read a file "Content.txt" (On hdfs) with following content. Do the word count and save the results in a directory called "problem85" (On hdfs)

Content.txt

Hello this is ABCTECH.com

This is XYZTECH.com

Apache Spark Training

This is Spark Learning Session

Spark is faster than MapReduce

Problem Scenario 53 : You have been given below code snippet.

val a = sc.parallelize(1 to 10, 3)

operation1

b.collect

Output 1

Array[lnt] = Array(2, 4, 6, 8,10)

operation2

Output 2

Array[lnt] = Array(1,2, 3)

Write a correct code snippet for operation1 and operation2 which will produce desired output, shown above.