October 2012 – CHEN Jian's Java Blog

Hadoop: 为什么报“Retrying connect to server: localhost/127.0.0.1:8020”

Leave a Comment / Architecture / October 31, 2012 October 31, 2012

我明明把端口配成了8021，为什么客户端还会连8020? 很有可能是你的NameNode并没有起起来，可以jps看一下有没有这样一行：引用 4245 NameNode 如果没有，去$HADOOP_INSTALL/logs里看一下相关的NameNode日志。按经验，如果你用的pseudo模式并且hadoop.tmp.dir没有显示设置，那很有可能是因为你的hdfs环境已经被破坏，因为hdfs默认把文件放/tmp目录下，/tmp很不可靠。这种情况下，你应该重新格式化一下hdfs文件系统

HDFS NameNode的备份

Leave a Comment / Architecture / October 30, 2012 October 30, 2012

据象书说，有两种模式： 1. 直接设置，将NameNode中的每一个改变都传达到其他存储系统中。这个可以保证强一致性。 2. 使用Secondary NameNode，定期复制数据。由于是“定期”，所以在当机时一定会丢失数据。更多细节待以后补充。

HDFS结点之间的交互图

Leave a Comment / Architecture / October 29, 2012 October 29, 2012

基本抄自象书，我只是加了几条线读注： NameNode只提供元数据，数据交换在客户端和DataNode之间直接发生，以免NameNode成为瓶颈写注：默认情况下，写入的数据会有三份复本，分布在两个机架上(鸡蛋不放在同一个篮子里)

HDFS in MapReduce

Leave a Comment / Architecture / October 29, 2012 October 29, 2012

1. Map的输入数据一般放在HDFS中 2. Map的输出数据放在本地硬盘上，因为它们只是中间结果，不需要冗余，所以不需要用HDFS 3. Reduce的输出数据放在HDFS中，以实现冗余

搭建Hadoop的Pseudo-Distributed Mode环境

Leave a Comment / Architecture / October 29, 2012 October 29, 2012

仅供复制修改配置文件 <!–修改conf/core-site.xml–> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost/</value> <!–默认的文件系统是本机hdfs系统–> </property> </configuration> <!–修改conf/hdfs-site.xml–> <configuration> <property> <name>dfs.replication</name> <value>1</value> <!–pseudo-distributed模式下没法做replication–> </property> </configuration> <!–修改conf/mapred-site.xml–> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:8021</value> </property> </configuration> 使本机可以免密码登录本机 $ssh-add $ssh localhost #测试一下要不要输入密码格式化HDFS文件系统 $hadoop namenode -format #经测试，文件系统创建在/tmp/hadoop-kent/dfs/name中启动Hadoop后台服务 $start-dfs.sh $start-mapred.sh 通过浏览器察看状态 http://localhost:50070/ http://localhost:50030/ 操纵一下hdfs中的文件 $hadoop fs -copyFromLocal 1k.log hdfs://localhost/firsttry/1k.log $hadoop fs -ls / #列出hdfs的根目录停止hadoop服务 $stop-dfs.sh $stop-mapred.sh

HDFS API示例代码

Leave a Comment / Architecture / October 29, 2012 October 29, 2012

无干货，仅供复制 public class HdfsExample { public static void main(String[] args) throws IOException { String dir = "/home/kent"; String fileUrl = "hdfs://localhost" + dir + "/" + System.currentTimeMillis() + "hdfsExample.txt"; FileSystem fs = FileSystem.get(URI.create(fileUrl), new Configuration()); // create a file System.out.println("Creating hdfs file : " + fileUrl); Path path = new Path(fileUrl); FSDataOutputStream out = fs.create(path); …

HDFS API示例代码 Read More »

Hadoop map-reduce的基本过程

Leave a Comment / Architecture / October 29, 2012 October 29, 2012

摘自Hadoop-The Definitive Guide, 稍微添加了一两条线只有一个Reduce Task 有多个Reduce Task 注意：All values for a key are sent to the same reducer

hadoop map-reduce 入门示例代码

Leave a Comment / Architecture / October 26, 2012 October 26, 2012

无任何干货，仅供复制程序说明： 1. 分析一个应该的访问日志文件，找出每个用户ID的访问次数。日志格式基本上是："2012-10-26 14:41:30,748 userNameId-777 from IP-10.232.25.144 invoked URL-http://xxx/hello.jsonp" 2. Standalone模式，但直接用maven项目所依赖的hadoop库，你不必再另装hadoop <!– pom.xml –> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.0.4</version> </dependency> //Mapper public class Coupon11LogMapper extends Mapper<LongWritable, Text, Text, LongWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws java.io.IOException, InterruptedException { String line = value.toString(); String accessRegex = ".*userNameId\\-(\\d+).*"; Pattern pattern …

hadoop map-reduce 入门示例代码 Read More »

java正则”组”例子

Leave a Comment / Java / October 26, 2012 October 26, 2012

String accessRegex = ".*userNameId\\-(\\d+).*"; String text = "2012-10-26 14:41:30,748 userNameId-777 from IP-10.232.25.144 invoked URL-http://xxx/hello.jsonp"; Pattern pattern = Pattern.compile(accessRegex); Matcher matcher = pattern.matcher(text); if (matcher.find()) { System.out.println(matcher.group(1)); }

查看应用服务器的连接数

Leave a Comment / Linux/Unix/Windows / October 24, 2012 October 24, 2012

这样或许可以： netstat -an|grep " 本机IP:80\b"|wc -l

Month: October 2012