Hadoop: Difference between revisions
Jump to navigation
Jump to search
Line 18: | Line 18: | ||
<name>hadoop.tmp.dir</name> | <name>hadoop.tmp.dir</name> | ||
<value>/home/hadoop/tmp</value> | <value>/home/hadoop/tmp</value> | ||
</property> | </property> | ||
<property> | <property> | ||
Line 52: | Line 51: | ||
<configuration> | <configuration> | ||
<property> | <property> | ||
<name>mapreduce.framework.name</name> | <name>mapreduce.framework.name</name> | ||
<value>yarn</value> | <value>yarn</value> | ||
</property> | |||
</configuration> | |||
EOF | |||
</source> | |||
---- | |||
<source lang="bash"> | |||
sudo tee -a $HADOOP_HOME/etc/hadoop/yarn-site.xml >/dev/null <<EOF | |||
<configuration> | |||
<property> | |||
<name>yarn.nodemanager.aux-services</name> | |||
<value>mapreduce_shuffle</value> | |||
</property> | </property> | ||
</configuration> | </configuration> |
Revision as of 16:15, 19 October 2022
Hadoop is a Java-based programming framework that supports the processing and storage of extremely large datasets on a cluster of inexpensive machines. It was the first major open source project in the big data playing field and is sponsored by the Apache Software Foundation. Hadoop is comprised of four main layers:
- Hadoop Common is the collection of utilities and libraries that support other Hadoop modules.
- HDFS, which stands for Hadoop Distributed File System, is responsible for persisting data to disk.
- YARN, short for Yet Another Resource Negotiator, is the "operating system" for HDFS.
- MapReduce is the original processing model for Hadoop clusters. It distributes work within the cluster or map, then organizes and reduces the results from the nodes into a response to a query. Many other processing models are available for the 3.x version of Hadoop
Configuration
sudo tee -a $HADOOP_HOME/etc/hadoop/core-site.xml >/dev/null <<EOF
<property>
<name>hadoop.tmp.dir</name>
<value>${HADOOP_HOME}/tmp</value>
<description>Parent directory for other temporary directories.</description>
</property>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://0.0.0.0:9000</value>
<description>The default file system URI</description>
</property>
</configuration>
EOF
sudo tee -a $HADOOP_HOME/etc/hadoop/hdfs-site.xml >/dev/null <<EOF
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hdfs/datanode</value>
</property>
</configuration>
EOF
sudo tee -a $HADOOP_HOME/etc/hadoop/mapred-site.xml >/dev/null <<EOF
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
EOF
sudo tee -a $HADOOP_HOME/etc/hadoop/yarn-site.xml >/dev/null <<EOF
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
EOF
Knowledge
ssh-keygen -b 4096 -t rsa -f ~/.ssh/id_rsa -q -N "[email protected]" readlink -f /usr/bin/java | sed "s:bin/java::" sudo apt-get install pdsh sudo apt-get install ssh su -h hadoop | |
| |
sudo apt dist-upgrade sudo do-release-upgrade sudo apt --fix-broken install sudo apt install ubuntu-desktop |
[Service]
User=hadoop
Group=hadoop
Type=forking
SuccessExitStatus=143
|