Cloudera CCA175 - CCA Spark and Hadoop Developer Exam

Cloudera CCA175 Premium Access Download Demo

Page: 3 / 3
Total 96 questions

Question # 21

Problem Scenario 83 : In Continuation of previous question, please accomplish following activities.

1. Select all the records with quantity >= 5000 and name starts with 'Pen'

2. Select all the records with quantity >= 5000, price is less than 1.24 and name starts with 'Pen'

3. Select all the records witch does not have quantity >= 5000 and name does not starts with 'Pen'

4. Select all the products which name is 'Pen Red', 'Pen Black'

5. Select all the products which has price BETWEEN 1.0 AND 2.0 AND quantity BETWEEN 1000 AND 2000.

Question # 22

Problem Scenario 25 : You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)

sex,name,city

1,alok,mumbai

1,jatin,chennai

1,yogesh,kolkata

2,ragini,delhi

2,jyotsana,pune

1,valmiki,banglore

Create a flume conf file using fastest non-durable channel, which write data in hive warehouse directory, in two separate tables called flumemaleemployee1 and flumefemaleemployee1

(Create hive table as well for given data}. Please use tail source with /home/cloudera/flumetest/in.txt file.

Flumemaleemployee1 : will contain only male employees data flumefemaleemployee1 : Will contain only woman employees data

Explanation:

Solution :

Step 1 : Create hive table for flumemaleemployeel and .'

CREATE TABLE flumemaleemployeel

(

sex_type int, name string, city string )

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

CREATE TABLE flumefemaleemployeel

(

sex_type int, name string, city string

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

Step 2 : Create below directory and file mkdir /home/cloudera/flumetest/ cd /home/cloudera/flumetest/

Step 3 : Create flume configuration file, with below configuration for source, sink and channel and save it in flume5.conf.

agent.sources = tailsrc

agent.channels = mem1 mem2

agent.sinks = stdl std2

agent.sources.tailsrc.type = exec

agent.sources.tailsrc.command = tail -F /home/cloudera/flumetest/in.txt

agent.sources.tailsrc.batchSize = 1

agent.sources.tailsrc.interceptors = i1 agent.sources.tailsrc.interceptors.i1.type = regex_extractor agent.sources.tailsrc.interceptors.il.regex = A(\\d} agent.sources.tailsrc. interceptors. M.serializers = t1 agent.sources.tailsrc. interceptors, i1.serializers.t1. name = type

agent.sources.tailsrc.selector.type = multiplexing agent.sources.tailsrc.selector.header = type agent.sources.tailsrc.selector.mapping.1 = memi agent.sources.tailsrc.selector.mapping.2 = mem2

agent.sinks.std1.type = hdfs

agent.sinks.stdl.channel = mem1

agent.sinks.stdl.batchSize = 1

agent.sinks.std1.hdfs.path = /user/hive/warehouse/flumemaleemployeei

agent.sinks.stdl.rolllnterval = 0

agent.sinks.stdl.hdfs.tileType = Data Stream

agent.sinks.std2.type = hdfs

agent.sinks.std2.channel = mem2

agent.sinks.std2.batchSize = 1

agent.sinks.std2.hdfs.path = /user/hi ve/warehouse/fIumefemaleemployee1

agent.sinks.std2.rolllnterval = 0

agent.sinks.std2.hdfs.tileType = Data Stream

agent.channels.mem1.type = memory agent.channels.meml.capacity = 100

agent.channels.mem2.type = memory agent.channels.mem2.capacity = 100

agent.sources.tailsrc.channels = mem1 mem2

Step 4 : Run below command which will use this configuration file and append data in hdfs.

Start flume service:

flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/flume5.conf --name agent

Step 5 : Open another terminal create a file at /home/cloudera/flumetest/in.txt.

Step 6 : Enter below data in file and save it.

l.alok.mumbai

1 jatin.chennai

1,yogesh,kolkata

2,ragini,delhi

2,jyotsana,pune

1,valmiki,banglore

Step 7 : Open hue and check the data is available in hive table or not.

Step 8 : Stop flume service by pressing ctrl+c

Question # 23

Problem Scenario 27 : You need to implement near real time solutions for collecting information when submitted in file with below information.

Data

echo "IBM,100,20160104" >> /tmp/spooldir/bb/.bb.txt

echo "IBM,103,20160105" >> /tmp/spooldir/bb/.bb.txt

mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt

After few mins

echo "IBM,100.2,20160104" >> /tmp/spooldir/dr/.dr.txt

echo "IBM,103.1,20160105" >> /tmp/spooldir/dr/.dr.txt

mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt

Requirements:

You have been given below directory location (if not available than create it) /tmp/spooldir . You have a finacial subscription for getting stock prices from BloomBerg as well as

Reuters and using ftp you download every hour new files from their respective ftp site in directories /tmp/spooldir/bb and /tmp/spooldir/dr respectively.

As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume/finance location in a single directory.

Write a flume configuration file named flume7.conf and use it to load data in hdfs with following additional properties .

1. Spool /tmp/spooldir/bb and /tmp/spooldir/dr

2. File prefix in hdfs sholuld be events

3. File suffix should be .log

4. If file is not commited and in use than it should have _ as prefix.

5. Data should be written as text to hdfs

Question # 24

Problem Scenario 10 : You have been given following mysql database details as well as other info.

user=retail_dba

password=cloudera

database=retail_db

jdbc URL = jdbc:mysql://quickstart:3306/retail_db

Please accomplish following.

1. Create a database named hadoopexam and then create a table named departments in it, with following fields. department_id int,

department_name string

e.g. location should be hdfs://quickstart.cloudera:8020/user/hive/warehouse/hadoopexam.db/departments

2. Please import data in existing table created above from retaidb.departments into hive table hadoopexam.departments.

3. Please import data in a non-existing table, means while importing create hive table named hadoopexam.departments_new

Question # 25

Problem Scenario 44 : You have been given 4 files , with the content as given below:

spark11/file1.txt

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework

spark11/file2.txt

The core of Apache Hadoop consists of a storage part known as Hadoop Distributed File System (HDFS) and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.

spark11/file3.txt

his approach takes advantage of data locality nodes manipulating the data they have access to to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking

spark11/file4.txt

Apache Storm is focused on stream processing or what some call complex event processing. Storm implements a fault tolerant method for performing a computation or pipelining multiple computations on an event as it flows into a system. One might use Storm to transform unstructured data as it flows into a system into a desired format

(spark11Afile1.txt)

(spark11/file2.txt)

(spark11/file3.txt)

(sparkl 1/file4.txt)

Write a Spark program, which will give you the highest occurring words in each file. With their file name and highest occurring words.

Question # 26

Problem Scenario 84 : In Continuation of previous question, please accomplish following activities.

1. Select all the products which has product code as null

2. Select all the products, whose name starts with Pen and results should be order by Price descending order.

3. Select all the products, whose name starts with Pen and results should be order by Price descending order and quantity ascending order.

4. Select top 2 products by price

Question # 27

Problem Scenario 70 : Write down a Spark Application using Python, In which it read a file "Content.txt" (On hdfs) with following content. Do the word count and save the results in a directory called "problem85" (On hdfs)

Content.txt

Hello this is ABCTECH.com

This is XYZTECH.com

Apache Spark Training

This is Spark Learning Session

Spark is faster than MapReduce

Question # 28

Problem Scenario 53 : You have been given below code snippet.

val a = sc.parallelize(1 to 10, 3)

operation1

b.collect

Output 1

Array[lnt] = Array(2, 4, 6, 8,10)

operation2

Output 2

Array[lnt] = Array(1,2, 3)

Write a correct code snippet for operation1 and operation2 which will produce desired output, shown above.

Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: xmas50

Cloudera CCA175 - CCA Spark and Hadoop Developer Exam

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation:

The Answer Is:

Explanation: