Questions tagged [hadoop]

Hadoop provides High Availability services by distributing processing of large data sets across clusters of machines.

Hadoop from Apache, includes the modules:

Hadoop Common: support utilities
Hadoop Distributed File System (HDFS)
Hadoop YARN: job scheduling and resource management
Hadoop MapReduce: parallel processing of large data sets

It is used by by such heavyweights as Facebook and Twitter.

Apache projects based on Hadoop should probably use their own tag, namely Ambari, Avro, Cassandra, Chukwa, HBase, Hive, Mahout, Pig, Spark, Tez, and ZooKeeper.

69 questions

votes

3 answers

bind failure, address in use: Unable to use a TCP port for both source and destination?

I'm debugging Hadoop DataNodes that won't start. We are using saltstack and also elasticsearch on the machines. The Hadoop DataNode error is pretty clear: java.net.BindException: Problem binding to [0.0.0.0:50020] java.net.BindException:…

hadoop

asked Dec 14 '15 at 13:00

kei1aeh5quahQu4U

vote

0 answers

jps process issue about installing Hadoop

I'm trying to install hadoop with fully distributed mode in Centos6.4 (I use 4 Virtual boxes). server1 NameNode server2 SecondaryNameNode, Datanode server3 datanode server4 datanode Maybe I guess I'm almost done with it but.. in server1, I typed…

hadoop

asked Feb 20 '18 at 14:14

Seung

vote

2 answers

Delete files 10 days older from hdfs

I am writing a ksh script to clean up hdfs directories and files at least 10 days old. I am testing the deletion command in a terminal, but it kept saying it is wrong: $ hdfs dfs -find "/file/path/file" -depth -type d -mtime +10 -exec rm -rf {}…

hadoop

asked Nov 25 '16 at 15:09

Misha

votes

1 answer

bash: pig: command not found

I am trying to find out what version of pig I am using. I thought I already installed it # yum install hadoop\* mahout\* oozie\* hbase\* hive\* hue\* pig\* zookeeper\* When I try to enter a pig script, the terminal returns the following. # pig…

hadoop

asked Apr 09 '19 at 23:13

ubliat

votes

1 answer

Ambari and Spark cant start from CLI

From the Ambari GUI we can not start the Spark service. So we want to start it by command line as the following: [spark@mas01 spark2]$ ./sbin/start-thriftserver.sh --master yarn-client --executor-memory 512m --hiveconf…

hadoop

asked Aug 02 '17 at 18:57

yael

13,106