Monthly Archives: September 2011

五种线程状态的精确定义

摘自《深入理解Java虚拟机》周志明著

1. New: 创建后尚未启动

2. Runnable: Running(正在执行) 或
Ready(正在等待分配时间片)

3. Waiting: 不会分配CPU时间,直到被唤醒进入Runnable

4. Timed Waiting: 不会分配CPU时间片,直到被唤醒,或者Time Up

5. Blocked: 在等待一个排它锁,等拿到锁了就进入Runnable

学习JVM原理-19.Java内存模型与并发

摘自《深入理解Java虚拟机》周志明著

JMM(Java Memory Model)是一种统一的内存模型,它屏蔽了各种硬件和操作系统的内存访问差异

它规定:

  1. 所有变量都存储在Main Memory中

  2. 每条线程有自己的Working Memory,里面存有Main Memory变量的副本; 线程只能操作这些副本,不能直接操纵Main Memory里的变量

JMM下的并发过程中有3个问题: Atomicity, Visibility, Ordering

  1. Atomicity: 基本数据类型的读写基本上是原子的;另外可以通过synchornized加锁保证原子性

  2. Visibility: 指变量被一个线程修改后能立即被另一个线程看到。volatile可以实现visibility, 因为用volatile的变量被线程修改后会立即同步到Main Memory中; synchronized关键一般也可以,因为它规定“对一个变量Unlock之前,必须先把变量同步回Main Memory”中

  3. Ordering: JVM会对指令进行重排,后面的代码可能会在前面的代码之前执行。Ordering的意思是,不管指令重排成什么样,最终的结果不受影响。 volatile可以实现ordering,因为它禁止了指令重排; synchronized关键字也可以,因为它通过锁机制保证了线程对同一个资源进行访问的串行性

收藏一本书:《深入理解计算机系统》

原名叫: Computer Systems, a programmer’s perspective

应该在译名前加一个 “从程序员的角度”

http://book.douban.com/subject/1230413/

前言节选 3

preface 7

about the authors 21

1 a tour of computer systems 35

1.1 information is bits + context 37

1.2 programs are translated by other programs into different forms 38

1.3 it pays to understand how compilation systems work 40

1.4 processors read and interpret instructions stored in memory 41

1.4.1 hardware organization of a system 41

1.4.2 running the hello program 44

1.5 caches matter 46

1.6 storage devices form a hierarchy 47

1.7 the operating system manages the hardware 48

1.7.1 processes 50

1.7.2 threads 51

1.7.3 virtual memory 51

1.7.4 files 53

1.8 systems communicate with other systems using networks 54

1.9 important themes 55

1.9.1 concurrency and parallelism 55

.1.9.2 the importance of abstractions in computer systems 58

1.10 summary 59

bibliographic notes 60

part i program structure and execution

2 representing and manipulating information 63

2.1 information storage 67

2.1.1 hexadecimal notation 68

2.1.2 words 72

2.1.3 data sizes 72

computer systems contents

a programmer’s perspective, 2e

24 contents

2.1.4 addressing and byte ordering 73

2.1.5 representing strings 80

2.1.6 representing code 81

2.1.7 introduction to boolean algebra 82

2.1.8 bit-level operations in c 85

2.1.9 logical operations in c 88

2.1.10 shift operations in c 88

2.2 integer representations 90

2.2.1 integral data types 91

2.2.2 unsigned encodings 92

2.2.3 two’s-complement encodings 94

2.2.4 conversions between signed and unsigned 99

2.2.5 signed vs. unsigned in c 103

2.2.6 expanding the bit representation of a number 105

2.2.7 truncating numbers 109

2.2.8 advice on signed vs. unsigned 110

2.3 integer arithmetic 113

2.3.1 unsigned addition 113

2.3.2 two’s-complement addition 117

2.3.3 two’s-complement negation 121

2.3.4 unsigned multiplication 122

2.3.5 two’s-complement multiplication 123

2.3.6 multiplying by constants 126

2.3.7 dividing by powers of two 129

2.3.8 final thoughts on integer arithmetic 132

2.4 floating point 133

2.4.1 fractional binary numbers 134

2.4.2 ieee floating-point representation 137

2.4.3 example numbers 139

2.4.4 rounding 144

2.4.5 floating-point operations 147

2.4.6 floating point in c 148

2.5 summary 152

bibliographic notes 153

homework problems 153

solutions to practice problems 168

3 machine-level representation of programs 187

3.1 a historical perspective 190

3.2 program encodings 193

contents 25

3.2.1 machine-level code 194

3.2.2 code examples 196

3.2.3 notes on formatting 199

3.3 data formats 201

3.4 accessing information 202

3.4.1 operand specifiers 203

3.4.2 data movement instructions 205

3.4.3 data movement example 208

3.5 arithmetic and logical operations 211

3.5.1 load effective address 211

3.5.2 unary and binary operations 212

3.5.3 shift operations 213

3.5.4 discussion 214

3.5.5 special arithmetic operations 216

3.6 control 219

3.6.1 condition codes 219

3.6.2 accessing the condition codes 221

3.6.3 jump instructions and their encodings 223

3.6.4 translating conditional branches 227

3.6.5 loops 231

3.6.6 conditional move instructions 240

3.6.7 switch statements 247

3.7 procedures 253

3.7.1 stack frame structure 253

3.7.2 transferring control 255

3.7.3 register usage conventions 257

3.7.4 procedure example 258

3.7.5 recursive procedures 263

3.8 array allocation and access 266

3.8.1 basic principles 266

3.8.2 pointer arithmetic 267

3.8.3 nested arrays 269

3.8.4 fixed-size arrays 271

3.8.5 variable-size arrays 272

3.9 heterogeneous data structures 275

3.9.1 structures 275

3.9.2 unions 278

3.9.3 data alignment 282

3.10 putting it together: understanding pointers 286

3.11 life in the realworld: using the gdb debugger 288

3.12 out-of-bounds memory references and buffer overflow 290

3.12.1 thwarting buffer overflow attacks 295

26 contents

3.13 x86-64: extending ia32 to 64 bits 301

3.13.1 history and motivation for x86-64 302

3.13.2 an overview of x86-64 304

3.13.3 accessing information 307

3.13.4 control 313

3.13.5 data structures 324

3.13.6 concluding observations about x86-64 325

3.14 machine-level representations of floating-point programs 326

3.15 summary 327

bibliographic notes 328

homework problems 328

solutions to practice problems 342

4 processor architecture 367

4.1 the y86 instruction set architecture 370

4.1.1 programmer-visible state 370

4.1.2 y86 instructions 371

4.1.3 instruction encoding 373

4.1.4 y86 exceptions 378

4.1.5 y86 programs 379

4.1.6 some y86 instruction details 384

4.2 logic design and the hardware control language hcl 386

4.2.1 logic gates 387

4.2.2 combinational circuits and hcl boolean expressions 388

4.2.3 word-level combinational circuits and hcl integer

expressions 389

4.2.4 set membership 394

4.2.5 memory and clocking 395

4.3 sequential y86 implementations 398

4.3.1 organizing processing into stages 398

4.3.2 seq hardware structure 409

4.3.3 seq timing 413

4.3.4 seq stage implementations 417

4.4 general principles of pipelining 425

4.4.1 computational pipelines 426

4.4.2 a detailed look at pipeline operation 427

4.4.3 limitations of pipelining 428

4.4.4 pipelining a system with feedback 432

4.5 pipelined y86 implementations 434

4.5.1 seq+: rearranging the computation stages 434

contents 27

4.5.2 inserting pipeline registers 435

4.5.3 rearranging and relabeling signals 439

4.5.4 next pc prediction 440

4.5.5 pipeline hazards 442

4.5.6 avoiding data hazards by stalling 447

4.5.7 avoiding data hazards by forwarding 449

4.5.8 load/use data hazards 452

4.5.9 exception handling 454

4.5.10 pipe stage implementations 457

4.5.11 pipeline control logic 465

4.5.12 performance analysis 478

4.5.13 unfinished business 480

4.6 summary 483

4.6.1 y86 simulators 484

bibliographic notes 485

homework problems 485

solutions to practice problems 491

5 optimizing program performance 507

5.1 capabilities and limitations of optimizing compilers 510

5.2 expressing program performance 514

5.3 program example 516

5.4 eliminating loop inefficiencies 520

5.5 reducing procedure calls 524

5.6 eliminating unneeded memory references 525

5.7 understanding modern processors 530

5.7.1 overall operation 531

5.7.2 functional unit performance 534

5.7.3 an abstract model of processor operation 536

5.8 loop unrolling 543

5.9 enhancing parallelism 547

5.9.1 multiple accumulators 548

5.9.2 reassociation transformation 552

5.10 summary of results for optimizing combining code 558

5.11 some limiting factors 559

5.11.1 register spilling 559

5.11.2 branch prediction and misprediction penalties 560

5.12 understanding memory performance 565

5.12.1 load performance 565

5.12.2 store performance 566

28 contents

5.13 life in the real world: performance improvement techniques 573

5.14 identifying and eliminating performance bottlenecks 574

5.14.1 program profiling 574

5.14.2 using a profiler to guide optimization 576

5.14.3 amdahl’s law 579

5.15 summary 581

bibliographic notes 582

homework problems 583

solutions to practice problems 586

6 the memory hierarchy 593

6.1 storage technologies 595

6.1.1 random-access memory 595

6.1.2 disk storage 604

6.1.3 solid state disks 615

6.1.4 storage technology trends 617

6.2 locality 620

6.2.1 locality of references to program data 621

6.2.2 locality of instruction fetches 622

6.2.3 summary of locality 623

6.3 the memory hierarchy 625

6.3.1 caching in the memory hierarchy 626

6.3.2 summary of memory hierarchy concepts 629

6.4 cache memories 630

6.4.1 generic cache memory organization 631

6.4.2 direct-mapped caches 633

6.4.3 set associative caches 640

6.4.4 fully associative caches 642

6.4.5 issues with writes 645

6.4.6 anatomy of a real cache hierarchy 646

6.4.7 performance impact of cache parameters 648

6.5 writing cache-friendly code 649

6.6 putting it together: the impact of caches on program performance 654

6.6.1 the memory mountain 655

6.6.2 rearranging loops to increase spatial locality 659

6.6.3 exploiting locality in your programs 663

6.7 summary 663

bibliographic notes 664

homework problems 665

solutions to practice problems 676

contents 29

part ii running programs on a system

7 linking 687

7.1 compiler drivers 689

7.2 static linking 691

7.3 object files 691

7.4 relocatable object files 692

7.5 symbols and symbol tables 694

7.6 symbol resolution 697

7.6.1 how linkers resolve multiply defined global symbols 698

7.6.2 linking with static libraries 701

7.6.3 how linkers use static libraries to resolve references 704

7.7 relocation 706

7.7.1 relocation entries 706

7.7.2 relocating symbol references 707

7.8 executable object files 712

7.9 loading executable object files 713

7.10 dynamic linking with shared libraries 715

7.11 loading and linking shared libraries from applications 717

7.12 position-independent code (pic) 721

7.13 tools for manipulating object files 724

7.14 summary 725

bibliographic notes 725

homework problems 726

solutions to practice problems 732

8 exceptional control flow 735

8.1 exceptions 737

8.1.1 exception handling 738

8.1.2 classes of exceptions 740

8.1.3 exceptions in linux/ia32 systems 742

8.2 processes 746

8.2.1 logical control flow 746

8.2.2 concurrent flows 747

8.2.3 private address space 748

8.2.4 user and kernel modes 748

8.2.5 context switches 750

30 contents

8.3 system call error handling 751

8.4 process control 752

8.4.1 obtaining process ids 753

8.4.2 creating and terminating processes 753

8.4.3 reaping child processes 757

8.4.4 putting processes to sleep 763

8.4.5 loading and running programs 764

8.4.6 using fork and execve to run programs 767

8.5 signals 770

8.5.1 signal terminology 772

8.5.2 sending signals 773

8.5.3 receiving signals 776

8.5.4 signal handling issues 779

8.5.5 portable signal handling 786

8.5.6 explicitly blocking and unblocking signals 787

8.5.7 synchronizing flows to avoid nasty concurrency bugs 789

8.6 nonlocal jumps 793

8.7 tools for manipulating processes 796

8.8 summary 797

bibliographic notes 797

homework problems 798

solutions to practice problems 805

9 virtual memory 809

9.1 physical and virtual addressing 811

9.2 address spaces 812

9.3 vm as a tool for caching 813

9.3.1 dram cache organization 814

9.3.2 page tables 814

9.3.3 page hits 816

9.3.4 page faults 816

9.3.5 allocating pages 817

9.3.6 locality to the rescue again 818

9.4 vm as a tool for memory management 819

9.5 vm as a tool for memory protection 820

9.6 address translation 821

9.6.1 integrating caches and vm 825

9.6.2 speeding up address translation with a tlb 825

9.6.3 multi-level page tables 826

9.6.4 putting it together: end-to-end address translation 828

9.7 case study: the intel core i7/linux memory system 833

contents 31

9.7.1 core i7 address translation 834

9.7.2 linux virtual memory system 837

9.8 memory mapping 841

9.8.1 shared objects revisited 841

9.8.2 the fork function revisited 843

9.8.3 the execve function revisited 844

9.8.4 user-level memory mapping with the mmap function 844

9.9 dynamic memory allocation 846

9.9.1 the malloc and free functions 848

9.9.2 why dynamic memory allocation? 850

9.9.3 allocator requirements and goals 851

9.9.4 fragmentation 853

9.9.5 implementation issues 854

9.9.6 implicit free lists 854

9.9.7 placing allocated blocks 856

9.9.8 splitting free blocks 857

9.9.9 getting additional heap memory 857

9.9.10 coalescing free blocks 858

9.9.11 coalescing with boundary tags 858

9.9.12 putting it together: implementing a simple allocator 861

9.9.13 explicit free lists 869

9.9.14 segregated free lists 870

9.10 garbage collection 872

9.10.1 garbage collector basics 873

9.10.2 mark&sweep garbage collectors 874

9.10.3 conservative mark&sweep for c programs 876

9.11 common memory-related bugs in c programs 877

9.11.1 dereferencing bad pointers 877

9.11.2 reading uninitialized memory 877

9.11.3 allowing stack buffer overflows 878

9.11.4 assuming that pointers and the objects they point to are the same size 878

9.11.5 making off-by-one errors 879

9.11.6 referencing a pointer instead of the object it points to 879

9.11.7 misunderstanding pointer arithmetic 880

9.11.8 referencing nonexistent variables 880

9.11.9 referencing data in free heap blocks 881

9.11.10 introducing memory leaks 881

9.12 summary 882

bibliographic notes 882

homework problems 883

solutions to practice problems 887

32 contents

part iii interaction and communication between

programs

10 system-level i/o 895

10.1 unix i/o 896

10.2 opening and closing files 897

10.3 reading and writing files 899

10.4 robust reading and writing with the rio package 901

10.4.1 rio unbuffered input and output functions 901

10.4.2 rio buffered input functions 902

10.5 reading file metadata 907

10.6 sharing files 909

10.7 i/o redirection 911

10.8 standard i/o 913

10.9 putting it together: which i/o functions should i use? 914

10.10 summary 915

bibliographic notes 916

homework problems 916

solutions to practice problems 917

11 network programming 919

11.1 the client-server programming model 920

11.2 networks 921

11.3 the global ip internet 925

11.3.1 ip addresses 927

11.3.2 internet domain names 929

11.3.3 internet connections 933

11.4 the sockets interface 934

11.4.1 socket address structures 935

11.4.2 the socket function 936

11.4.3 the connect function 937

11.4.4 the open_clientfd function 937

11.4.5 the bind function 938

11.4.6 the listen function 939

11.4.7 the open_listenfd function 939

11.4.8 the accept function 941

11.4.9 example echo client and server 942

contents 33

11.5 web servers 945

11.5.1 web basics 945

11.5.2 web content 946

11.5.3 http transactions 948

11.5.4 serving dynamic content 950

11.6 putting it together: the tiny web server 953

11.7 summary 961

bibliographic notes 962

homework problems 962

solutions to practice problems 963

12 concurrent programming 967

12.1 concurrent programming with processes 969

12.1.1 a concurrent server based on processes 970

12.1.2 pros and cons of processes 971

12.2 concurrent programming with i/o multiplexing 973

12.2.1 a concurrent event-driven server based on i/omultiplexing 976

12.2.2 pros and cons of i/o multiplexing 980

12.3 concurrent programming with threads 981

12.3.1 thread execution model 982

12.3.2 posix threads 982

12.3.3 creating threads 984

12.3.4 terminating threads 984

12.3.5 reaping terminated threads 985

12.3.6 detaching threads 985

12.3.7 initializing threads 986

12.3.8 a concurrent server based on threads 986

12.4 shared variables in threaded programs 988

12.4.1 threads memory model 989

12.4.2 mapping variables to memory 990

12.4.3 shared variables 990

12.5 synchronizing threads with semaphores 991

12.5.1 progress graphs 994

12.5.2 semaphores 997

12.5.3 using semaphores for mutual exclusion 998

12.5.4 using semaphores to schedule shared resources 1000

12.5.5 putting it together: a concurrent server based on prethreading 1004

12.6 using threads for parallelism 1008

34 contents

12.7 other concurrency issues 1013

12.7.1 thread safety 1013

12.7.2 reentrancy 1014

12.7.3 using existing library functions in threaded programs 1016

12.7.4 races 1017

12.7.5 deadlocks 1019

12.8 summary 1022

bibliographic notes 1023

homework problems 1023

solutions to practice problems 1028

a error handling 1033

a.1 error handling in unix systems 1034

a.2 error-handling wrappers 1035

references 1039

index 1045

学习JVM原理-18.Java比C/C++的性能倒底差在哪里?

摘自《深入理解Java虚拟机》周志明著

都说因为解释执行的缘故,JAVA比C/C++差; 现在主流JVM都带JIT编译器了,JAVA还比C/C++差吗? 

答案:是。缺点如下:

  1.JIT是在运行时编译,占用系统开销,挤占了程序本身的资源

    2.JVM需要检查空指针、指针越界等问题,耗时

    3.JAVA里多态用的更多,很多virtual方法,编译时不好做内联优化

    4.JAVA可以动态加载类,改变程序的继承关系,这使得很多全局优化都不好做

    5.JAVA在栈上还是堆上分配内存这个机制是固定的,C/C++就灵活多了,可以按需分配,实现更好的性能

    6.JAVA要GC,GC耗资源

学习JVM原理-16 VM Stack里的数据结构

摘自《深入理解Java虚拟机》周志明著

VM Stack是一个栈,栈里面的一个元素称作一个Stack Frame,它是用于支持虚拟机进行方法调用和方法执行的数据结构。

一个方法的执行过程,实际就是一个Frame入栈和出栈的过程

Stack Frame的组成:

  1. Local Variable Table: 存放方法的局部变量

  2. Operand Stacks: 还记得微机原理里面用栈实现加减法的例子么? 这个栈就是起这个作用的

  3. Dynamic Linking

  4. Return Address

学习JVM原理-17.早期编译与运行时JIT编译

摘自《深入理解Java虚拟机》周志明著

虽有那种直接把*.java变成本地执行代码的编译器,但java的编译方式主要有两种:

  1. 前期编译:即SUN的javac, 把java编译成字节码; 运行时由解释器解释执行

  2. 运行时编译:JIT(Just in Time)编译器,运行时把Hot Spot(经常访问的代码)编译成本地代码执行(注意,JIT并不是JVM规范,有的JVM实现里没有JIT)

Java程序运行时就是 解释器 + JIT编译器 一起工作(你可以用-Xint禁止编译器)。解释器发现HotSpot后,如果已有对应的本地代码,就会让JIT执行这段代码;否则,会一边让JIT在后台编译,一边自己解释执行。有时JIT如果发现自己不能够执行本地代码,也会把任务交给解释器

另外,两种编译器都会做些优化动作:

  1.javac: 一个例子就是解除泛型这个语法糖,因为字节码是不认泛型的,List<String>在字节码里的类型就是List

  2.JIT: 比如内联

    

学习JVM原理-如果constant literal变了,用到它的其它类都要重新编译

假定Foo里定义了一个constant literal(literal 指String, int之类的字面量),而Bar里又用到了这个常量

一旦Foo里的常量值变了, 而Bar没有重新编译,则Bar里的值仍是Foo变之前的那个值

为什么会这样?  因为编译Bar时会把Foo里的常量值直接写到Bar.class这个字节码文件里; 运行Bar的代码时,JVM并不会动态地去Foo里取这个值,而是直接从Bar.class字节码文件中把值找出来,所以Foo的改动不会影响到Bar的运行

所以,如果Foo里的常量值变了,则Bar应该重新编译,把Foo中的新值更新到字节码文件中

学习JVM原理-15.类initialization与继承关系

摘自《深入理解Java虚拟机》周志明著

class Bean {
	static {
		System.out.println("Bean inited");
	}

}

class SubBean extends Bean{
	static {
		System.out.println("SubBean inited");
	}
}


main(){
    int  i = SubBean.subBeanValue; 
/*会先打印"Bean inited",再打印"SubBean inited" */
/*JVM会保证在子类的<clinit>方法执行之前,父类的<clint>()方法已经执行完毕*/
}