Questions tagged [hadoop]

Hadoop provides High Availability services by distributing processing of large data sets across clusters of machines.

Hadoop from Apache, includes the modules:

  • Hadoop Common: support utilities
  • Hadoop Distributed File System (HDFS)
  • Hadoop YARN: job scheduling and resource management
  • Hadoop MapReduce: parallel processing of large data sets

It is used by by such heavyweights as Facebook and Twitter.

Apache projects based on Hadoop should probably use their own tag, namely Ambari, Avro, Cassandra, Chukwa, HBase, Hive, Mahout, Pig, Spark, Tez, and ZooKeeper.

69 questions
4
votes
3 answers

bind failure, address in use: Unable to use a TCP port for both source and destination?

I'm debugging Hadoop DataNodes that won't start. We are using saltstack and also elasticsearch on the machines. The Hadoop DataNode error is pretty clear: java.net.BindException: Problem binding to [0.0.0.0:50020] java.net.BindException:…
1
vote
0 answers

jps process issue about installing Hadoop

I'm trying to install hadoop with fully distributed mode in Centos6.4 (I use 4 Virtual boxes). server1 NameNode server2 SecondaryNameNode, Datanode server3 datanode server4 datanode Maybe I guess I'm almost done with it but.. in server1, I typed…
Seung
  • 11
  • 1
1
vote
2 answers

Delete files 10 days older from hdfs

I am writing a ksh script to clean up hdfs directories and files at least 10 days old. I am testing the deletion command in a terminal, but it kept saying it is wrong: $ hdfs dfs -find "/file/path/file" -depth -type d -mtime +10 -exec rm -rf {}…
Misha
  • 13
0
votes
1 answer

bash: pig: command not found

I am trying to find out what version of pig I am using. I thought I already installed it # yum install hadoop\* mahout\* oozie\* hbase\* hive\* hue\* pig\* zookeeper\* When I try to enter a pig script, the terminal returns the following. # pig…
ubliat
  • 1
0
votes
1 answer

Ambari and Spark cant start from CLI

From the Ambari GUI we can not start the Spark service. So we want to start it by command line as the following: [spark@mas01 spark2]$ ./sbin/start-thriftserver.sh --master yarn-client --executor-memory 512m --hiveconf…
yael
  • 13,106