收藏一本书:《深入理解计算机系统》

原名叫: Computer Systems, a programmer’s perspective

应该在译名前加一个 “从程序员的角度”

http://book.douban.com/subject/1230413/

前言节选 3

preface 7

about the authors 21

1 a tour of computer systems 35

1.1 information is bits + context 37

1.2 programs are translated by other programs into different forms 38

1.3 it pays to understand how compilation systems work 40

1.4 processors read and interpret instructions stored in memory 41

1.4.1 hardware organization of a system 41

1.4.2 running the hello program 44

1.5 caches matter 46

1.6 storage devices form a hierarchy 47

1.7 the operating system manages the hardware 48

1.7.1 processes 50

1.7.2 threads 51

1.7.3 virtual memory 51

1.7.4 files 53

1.8 systems communicate with other systems using networks 54

1.9 important themes 55

1.9.1 concurrency and parallelism 55

.1.9.2 the importance of abstractions in computer systems 58

1.10 summary 59

bibliographic notes 60

part i program structure and execution

2 representing and manipulating information 63

2.1 information storage 67

2.1.1 hexadecimal notation 68

2.1.2 words 72

2.1.3 data sizes 72

computer systems contents

a programmer’s perspective, 2e

24 contents

2.1.4 addressing and byte ordering 73

2.1.5 representing strings 80

2.1.6 representing code 81

2.1.7 introduction to boolean algebra 82

2.1.8 bit-level operations in c 85

2.1.9 logical operations in c 88

2.1.10 shift operations in c 88

2.2 integer representations 90

2.2.1 integral data types 91

2.2.2 unsigned encodings 92

2.2.3 two’s-complement encodings 94

2.2.4 conversions between signed and unsigned 99

2.2.5 signed vs. unsigned in c 103

2.2.6 expanding the bit representation of a number 105

2.2.7 truncating numbers 109

2.2.8 advice on signed vs. unsigned 110

2.3 integer arithmetic 113

2.3.1 unsigned addition 113

2.3.2 two’s-complement addition 117

2.3.3 two’s-complement negation 121

2.3.4 unsigned multiplication 122

2.3.5 two’s-complement multiplication 123

2.3.6 multiplying by constants 126

2.3.7 dividing by powers of two 129

2.3.8 final thoughts on integer arithmetic 132

2.4 floating point 133

2.4.1 fractional binary numbers 134

2.4.2 ieee floating-point representation 137

2.4.3 example numbers 139

2.4.4 rounding 144

2.4.5 floating-point operations 147

2.4.6 floating point in c 148

2.5 summary 152

bibliographic notes 153

homework problems 153

solutions to practice problems 168

3 machine-level representation of programs 187

3.1 a historical perspective 190

3.2 program encodings 193

contents 25

3.2.1 machine-level code 194

3.2.2 code examples 196

3.2.3 notes on formatting 199

3.3 data formats 201

3.4 accessing information 202

3.4.1 operand specifiers 203

3.4.2 data movement instructions 205

3.4.3 data movement example 208

3.5 arithmetic and logical operations 211

3.5.1 load effective address 211

3.5.2 unary and binary operations 212

3.5.3 shift operations 213

3.5.4 discussion 214

3.5.5 special arithmetic operations 216

3.6 control 219

3.6.1 condition codes 219

3.6.2 accessing the condition codes 221

3.6.3 jump instructions and their encodings 223

3.6.4 translating conditional branches 227

3.6.5 loops 231

3.6.6 conditional move instructions 240

3.6.7 switch statements 247

3.7 procedures 253

3.7.1 stack frame structure 253

3.7.2 transferring control 255

3.7.3 register usage conventions 257

3.7.4 procedure example 258

3.7.5 recursive procedures 263

3.8 array allocation and access 266

3.8.1 basic principles 266

3.8.2 pointer arithmetic 267

3.8.3 nested arrays 269

3.8.4 fixed-size arrays 271

3.8.5 variable-size arrays 272

3.9 heterogeneous data structures 275

3.9.1 structures 275

3.9.2 unions 278

3.9.3 data alignment 282

3.10 putting it together: understanding pointers 286

3.11 life in the realworld: using the gdb debugger 288

3.12 out-of-bounds memory references and buffer overflow 290

3.12.1 thwarting buffer overflow attacks 295

26 contents

3.13 x86-64: extending ia32 to 64 bits 301

3.13.1 history and motivation for x86-64 302

3.13.2 an overview of x86-64 304

3.13.3 accessing information 307

3.13.4 control 313

3.13.5 data structures 324

3.13.6 concluding observations about x86-64 325

3.14 machine-level representations of floating-point programs 326

3.15 summary 327

bibliographic notes 328

homework problems 328

solutions to practice problems 342

4 processor architecture 367

4.1 the y86 instruction set architecture 370

4.1.1 programmer-visible state 370

4.1.2 y86 instructions 371

4.1.3 instruction encoding 373

4.1.4 y86 exceptions 378

4.1.5 y86 programs 379

4.1.6 some y86 instruction details 384

4.2 logic design and the hardware control language hcl 386

4.2.1 logic gates 387

4.2.2 combinational circuits and hcl boolean expressions 388

4.2.3 word-level combinational circuits and hcl integer

expressions 389

4.2.4 set membership 394

4.2.5 memory and clocking 395

4.3 sequential y86 implementations 398

4.3.1 organizing processing into stages 398

4.3.2 seq hardware structure 409

4.3.3 seq timing 413

4.3.4 seq stage implementations 417

4.4 general principles of pipelining 425

4.4.1 computational pipelines 426

4.4.2 a detailed look at pipeline operation 427

4.4.3 limitations of pipelining 428

4.4.4 pipelining a system with feedback 432

4.5 pipelined y86 implementations 434

4.5.1 seq+: rearranging the computation stages 434

contents 27

4.5.2 inserting pipeline registers 435

4.5.3 rearranging and relabeling signals 439

4.5.4 next pc prediction 440

4.5.5 pipeline hazards 442

4.5.6 avoiding data hazards by stalling 447

4.5.7 avoiding data hazards by forwarding 449

4.5.8 load/use data hazards 452

4.5.9 exception handling 454

4.5.10 pipe stage implementations 457

4.5.11 pipeline control logic 465

4.5.12 performance analysis 478

4.5.13 unfinished business 480

4.6 summary 483

4.6.1 y86 simulators 484

bibliographic notes 485

homework problems 485

solutions to practice problems 491

5 optimizing program performance 507

5.1 capabilities and limitations of optimizing compilers 510

5.2 expressing program performance 514

5.3 program example 516

5.4 eliminating loop inefficiencies 520

5.5 reducing procedure calls 524

5.6 eliminating unneeded memory references 525

5.7 understanding modern processors 530

5.7.1 overall operation 531

5.7.2 functional unit performance 534

5.7.3 an abstract model of processor operation 536

5.8 loop unrolling 543

5.9 enhancing parallelism 547

5.9.1 multiple accumulators 548

5.9.2 reassociation transformation 552

5.10 summary of results for optimizing combining code 558

5.11 some limiting factors 559

5.11.1 register spilling 559

5.11.2 branch prediction and misprediction penalties 560

5.12 understanding memory performance 565

5.12.1 load performance 565

5.12.2 store performance 566

28 contents

5.13 life in the real world: performance improvement techniques 573

5.14 identifying and eliminating performance bottlenecks 574

5.14.1 program profiling 574

5.14.2 using a profiler to guide optimization 576

5.14.3 amdahl’s law 579

5.15 summary 581

bibliographic notes 582

homework problems 583

solutions to practice problems 586

6 the memory hierarchy 593

6.1 storage technologies 595

6.1.1 random-access memory 595

6.1.2 disk storage 604

6.1.3 solid state disks 615

6.1.4 storage technology trends 617

6.2 locality 620

6.2.1 locality of references to program data 621

6.2.2 locality of instruction fetches 622

6.2.3 summary of locality 623

6.3 the memory hierarchy 625

6.3.1 caching in the memory hierarchy 626

6.3.2 summary of memory hierarchy concepts 629

6.4 cache memories 630

6.4.1 generic cache memory organization 631

6.4.2 direct-mapped caches 633

6.4.3 set associative caches 640

6.4.4 fully associative caches 642

6.4.5 issues with writes 645

6.4.6 anatomy of a real cache hierarchy 646

6.4.7 performance impact of cache parameters 648

6.5 writing cache-friendly code 649

6.6 putting it together: the impact of caches on program performance 654

6.6.1 the memory mountain 655

6.6.2 rearranging loops to increase spatial locality 659

6.6.3 exploiting locality in your programs 663

6.7 summary 663

bibliographic notes 664

homework problems 665

solutions to practice problems 676

contents 29

part ii running programs on a system

7 linking 687

7.1 compiler drivers 689

7.2 static linking 691

7.3 object files 691

7.4 relocatable object files 692

7.5 symbols and symbol tables 694

7.6 symbol resolution 697

7.6.1 how linkers resolve multiply defined global symbols 698

7.6.2 linking with static libraries 701

7.6.3 how linkers use static libraries to resolve references 704

7.7 relocation 706

7.7.1 relocation entries 706

7.7.2 relocating symbol references 707

7.8 executable object files 712

7.9 loading executable object files 713

7.10 dynamic linking with shared libraries 715

7.11 loading and linking shared libraries from applications 717

7.12 position-independent code (pic) 721

7.13 tools for manipulating object files 724

7.14 summary 725

bibliographic notes 725

homework problems 726

solutions to practice problems 732

8 exceptional control flow 735

8.1 exceptions 737

8.1.1 exception handling 738

8.1.2 classes of exceptions 740

8.1.3 exceptions in linux/ia32 systems 742

8.2 processes 746

8.2.1 logical control flow 746

8.2.2 concurrent flows 747

8.2.3 private address space 748

8.2.4 user and kernel modes 748

8.2.5 context switches 750

30 contents

8.3 system call error handling 751

8.4 process control 752

8.4.1 obtaining process ids 753

8.4.2 creating and terminating processes 753

8.4.3 reaping child processes 757

8.4.4 putting processes to sleep 763

8.4.5 loading and running programs 764

8.4.6 using fork and execve to run programs 767

8.5 signals 770

8.5.1 signal terminology 772

8.5.2 sending signals 773

8.5.3 receiving signals 776

8.5.4 signal handling issues 779

8.5.5 portable signal handling 786

8.5.6 explicitly blocking and unblocking signals 787

8.5.7 synchronizing flows to avoid nasty concurrency bugs 789

8.6 nonlocal jumps 793

8.7 tools for manipulating processes 796

8.8 summary 797

bibliographic notes 797

homework problems 798

solutions to practice problems 805

9 virtual memory 809

9.1 physical and virtual addressing 811

9.2 address spaces 812

9.3 vm as a tool for caching 813

9.3.1 dram cache organization 814

9.3.2 page tables 814

9.3.3 page hits 816

9.3.4 page faults 816

9.3.5 allocating pages 817

9.3.6 locality to the rescue again 818

9.4 vm as a tool for memory management 819

9.5 vm as a tool for memory protection 820

9.6 address translation 821

9.6.1 integrating caches and vm 825

9.6.2 speeding up address translation with a tlb 825

9.6.3 multi-level page tables 826

9.6.4 putting it together: end-to-end address translation 828

9.7 case study: the intel core i7/linux memory system 833

contents 31

9.7.1 core i7 address translation 834

9.7.2 linux virtual memory system 837

9.8 memory mapping 841

9.8.1 shared objects revisited 841

9.8.2 the fork function revisited 843

9.8.3 the execve function revisited 844

9.8.4 user-level memory mapping with the mmap function 844

9.9 dynamic memory allocation 846

9.9.1 the malloc and free functions 848

9.9.2 why dynamic memory allocation? 850

9.9.3 allocator requirements and goals 851

9.9.4 fragmentation 853

9.9.5 implementation issues 854

9.9.6 implicit free lists 854

9.9.7 placing allocated blocks 856

9.9.8 splitting free blocks 857

9.9.9 getting additional heap memory 857

9.9.10 coalescing free blocks 858

9.9.11 coalescing with boundary tags 858

9.9.12 putting it together: implementing a simple allocator 861

9.9.13 explicit free lists 869

9.9.14 segregated free lists 870

9.10 garbage collection 872

9.10.1 garbage collector basics 873

9.10.2 mark&sweep garbage collectors 874

9.10.3 conservative mark&sweep for c programs 876

9.11 common memory-related bugs in c programs 877

9.11.1 dereferencing bad pointers 877

9.11.2 reading uninitialized memory 877

9.11.3 allowing stack buffer overflows 878

9.11.4 assuming that pointers and the objects they point to are the same size 878

9.11.5 making off-by-one errors 879

9.11.6 referencing a pointer instead of the object it points to 879

9.11.7 misunderstanding pointer arithmetic 880

9.11.8 referencing nonexistent variables 880

9.11.9 referencing data in free heap blocks 881

9.11.10 introducing memory leaks 881

9.12 summary 882

bibliographic notes 882

homework problems 883

solutions to practice problems 887

32 contents

part iii interaction and communication between

programs

10 system-level i/o 895

10.1 unix i/o 896

10.2 opening and closing files 897

10.3 reading and writing files 899

10.4 robust reading and writing with the rio package 901

10.4.1 rio unbuffered input and output functions 901

10.4.2 rio buffered input functions 902

10.5 reading file metadata 907

10.6 sharing files 909

10.7 i/o redirection 911

10.8 standard i/o 913

10.9 putting it together: which i/o functions should i use? 914

10.10 summary 915

bibliographic notes 916

homework problems 916

solutions to practice problems 917

11 network programming 919

11.1 the client-server programming model 920

11.2 networks 921

11.3 the global ip internet 925

11.3.1 ip addresses 927

11.3.2 internet domain names 929

11.3.3 internet connections 933

11.4 the sockets interface 934

11.4.1 socket address structures 935

11.4.2 the socket function 936

11.4.3 the connect function 937

11.4.4 the open_clientfd function 937

11.4.5 the bind function 938

11.4.6 the listen function 939

11.4.7 the open_listenfd function 939

11.4.8 the accept function 941

11.4.9 example echo client and server 942

contents 33

11.5 web servers 945

11.5.1 web basics 945

11.5.2 web content 946

11.5.3 http transactions 948

11.5.4 serving dynamic content 950

11.6 putting it together: the tiny web server 953

11.7 summary 961

bibliographic notes 962

homework problems 962

solutions to practice problems 963

12 concurrent programming 967

12.1 concurrent programming with processes 969

12.1.1 a concurrent server based on processes 970

12.1.2 pros and cons of processes 971

12.2 concurrent programming with i/o multiplexing 973

12.2.1 a concurrent event-driven server based on i/omultiplexing 976

12.2.2 pros and cons of i/o multiplexing 980

12.3 concurrent programming with threads 981

12.3.1 thread execution model 982

12.3.2 posix threads 982

12.3.3 creating threads 984

12.3.4 terminating threads 984

12.3.5 reaping terminated threads 985

12.3.6 detaching threads 985

12.3.7 initializing threads 986

12.3.8 a concurrent server based on threads 986

12.4 shared variables in threaded programs 988

12.4.1 threads memory model 989

12.4.2 mapping variables to memory 990

12.4.3 shared variables 990

12.5 synchronizing threads with semaphores 991

12.5.1 progress graphs 994

12.5.2 semaphores 997

12.5.3 using semaphores for mutual exclusion 998

12.5.4 using semaphores to schedule shared resources 1000

12.5.5 putting it together: a concurrent server based on prethreading 1004

12.6 using threads for parallelism 1008

34 contents

12.7 other concurrency issues 1013

12.7.1 thread safety 1013

12.7.2 reentrancy 1014

12.7.3 using existing library functions in threaded programs 1016

12.7.4 races 1017

12.7.5 deadlocks 1019

12.8 summary 1022

bibliographic notes 1023

homework problems 1023

solutions to practice problems 1028

a error handling 1033

a.1 error handling in unix systems 1034

a.2 error-handling wrappers 1035

references 1039

index 1045

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.