Home

UNIVERSITI TEKNOLOGI MALAYSIA

image

Contents

1. Functions Outputs on LCD displays in HEX cos 0x3F5DB600 sin 0xBF000200 cosh 0x3F909F00 sinh 0x3F05 A800 exp 0x3FD37300 Table 6 6 Results collected from LCD display outputs Based on the results on Table 6 6 since the values that displayed on the LCD are totally the same with the simulation values it means that the interface circuit is working and the results were verified CHAPTER 7 CONCLUSION AND FUTURE WORKS In this chapter the conclusion had been carried out to conclude the all the results of floating point math module Besides that the future work of this project also been stated for the further improvement of this project 7 1 Conclusion As concluded from the simulation results the design for floating point adder subtractor multiplier and divider are working and their precision of the results are up to 4 to 6 decimal places This achieved by using IEEE 754 single precision floating point Meanwhile for CORDIC module it combines the trigonometric CORDIC hyperbolic CORDIC module and binary to IEEE 754 converter The computation speed is increased by using the fixed point format but the precision of the results are eventually become low Therefore it shows that there is a trade off among IEEE 754 format and fixed point format where the IEEE 754 format can give the higher precision result but it needs more time to process meanwhile the fixed point format can shorten the time to process but it re
2. 5 0 0312601785 1 32 6 0 0156262718 1 64 7 0 0078126580 1 128 8 0 0039062690 1 256 9 0 0019531270 1 512 10 0 0009765620 1 1024 11 0 0004882810 1 2048 12 0 0002441400 1 4096 13 0 0001220700 1 8192 14 0 0000610350 1 16384 15 0 0000305170 1 32768 Table 4 7 Look up Table for Rotational Angles from 1 to 15 iterations CORDIC_Hyperbolic Thus the name of this design is CORDIC_Hyperbolic and the block diagram is shown in Figure 4 6 i 32 angle cosh_eff CORDIC_ Hyperbolic sinh_eff clk Figure 4 7 Block diagram of CORDIC_Hyperbolic Then Table 4 8 describes all the inputs and outputs for this block and brief description of their functions Signal Name Width Type Description clk 1 Input System Clock hyper_in 32 Input Input angle in Q format Q2 30 Notes Conversion equation Desired Hyperbolic Angle 930 48 cosh_eff 17 Output Output value for cosine in Q format Q2 15 f 15 Notes Conversion eguation Value 2 sinh eff 17 Output Output value for sine in Q format Q2 15 i 15 Notes Conversion eguation Value 2 Table 4 8 VO Interface description for CORDIC Hyperbolic Since the data is in Q format all the data inside this design as well as the data of look up table of rotational angles need to be converted to this format After that the algorithm to implement this module is shown in the following 1
3. A 5 Trigonometric CORDIC CORDIC_Circular module cordic_Circular input clk input 31 0 angle output 16 0 cos_eff sin_eff wire signed 15 0 Xin 16 b0100110110111010 wire signed 15 0 Yin 16 D0000000000000000 arctan table wire signed 31 0 atan_table 0 30 assign atan_table 00 assign atan_table 01 assign atan_table 02 assign atan_table 03 assign atan_table 04 assign atan_table 05 assign atan_table 06 assign atan_table 07 assign atan_table 08 assign atan_table 09 assign atan_table 10 assign atan_table 11 assign atan_table 12 assign atan_table 13 assign atan_table 14 assign atan_table 15 assign atan_table 16 assign atan_table 17 assign atan_table 18 assign atan_table 19 assign atan_table 20 assign atan_table 21 assign atan_table 22 assign atan_table 23 assign atan_table 24 assign atan_table 25 assign atan_table 26 assign atan_table 27 assign atan_table 28 assign atan_table 29 assign atan_table 30 32 b00100000000000000000000000000000 32 b00010010111001000000010100011101 32 b00001001111110110011100001011011 32 D00000101000100010001000111010100 32 b00000010100010110000110101000011 32 b00000001010001011101011111100001 32 b00000000101000101111011000011110 32 b00000000010100010111110001010101 32 b00000000001010001011111001010011 32 b00000000000101000101111100101110
4. 0 03126018 assign atan_table 05 32 b0000000100000000000001010101001 1 0 01562627 assign atan_table 06 32 b00000000 10000000000000001010101 1 0 0078 1266 assign atan_table 07 32 b00000000010000000000000000010101 0 00390627 assign atan_table 08 32 b0000000000 1000000000000000000101 0 00195313 assign atan_table 09 32 b00000000000011111111111111111101 0 00097656 assign atan_table 10 32 b00000000000001111111111111111110 0 00048828 assign atan_table 11 32 b00000000000000111111111111111111 7 0 00024414 assign atan_table 12 32 b00000000000000011111111111111111 0 00012207 assign atan_table 13 32 b00000000000000010000000000000101 0 00006104 assign atan_table 14 32 b0000000000000000 1000000000000010 0 00003052 assign atan_table 15 32 b00000000000000000100000000000001 0 00001526 assign atan_table 16 32 b0000000000000000001000001 1010111 0 00000783 assign atan_table 17 32 b000000000000000000001 11111111010 0 00000381 stage outputs reg 16 0 X 0 15 reg 16 0 Y 0 15 reg signed 31 0 Z 0 15 always posedge clk begin if data 31 begin X 0 lt Xin Y 0 lt Yin Z 0 lt data end else begin X 0 lt Xin Y 0 lt Yin Z 0 lt data end end genvar i generate for G 0 i lt 15 i 1 1 begin XYZ wire Z_sign wire 16 0 X_shr Y_shr assign X_shr X i gt gt gt i 1 signed shift right assign Y_shr
5. 1 b0 m_norm frac6 remainder6 26 rem Isb assign sign op1 31 op2 31 lto give the desired output always posedge clk begin if rst exp_out lt 0 else exp out lt opl_zero 12 b0 expf_temp4 end counters always posedge clk begin if rst count_out lt 0 else if en_reg count_out lt preset else if count_nonzero count_out lt count_out 1 end to output the desired quotient and remainder always posedge clk begin if rst begin quotient_out lt 0 remainder_out lt 0 end else begin quotient_out lt quotient remainder_out lt remainder end end lto calculate the quotient always posedge clk begin if rst quotient lt 0 else if count_nonzero_reg quotient count_index lt divisor_reg gt dividend_reg end to calculate the remainder always posedge clk begin if rst begin remainder lt 0 remainder_msb lt 0 end else if count_nonzero_reg amp count nonzero reg2 begin remainder lt dividend reg remainder_msb lt lt divisor reg gt dividend reg 0 1 end end to calculate dividend and divisor always posedge clk begin if rst begin dividend reg lt 0 divisor reg lt 0 end elseif en reg e begin dividend reg lt dividend temp divisor reg lt divisor temp end elseif count nonzero reg dividend reg lt divisor reg gt dividend reg dividend reg lt lt 1 dividend re
6. 19 21 26 27 32 34 35 37 39 41 43 45 47 50 52 53 55 56 5 2 6 1 6 2 6 3 6 4 6 5 6 6 Gantt Chart of FY P2 Simulation result of fpu_add Simulation result of fpu_sub Simulation result of fpu_mul Simulation result of fpu_div Simulation result of CORDIC module VO interface circuit on donut board with working LCD display xiii 57 60 61 62 63 64 66 FPGA FPU CORDIC ROM LCD LUT VO HDL ASIC SoC HPS SDRAM PLL GPIO FSM xiv LIST OF ABBREVIATIONS Field Programmable Gate Array Floating Point Unit Coordinate Rotational Digital Computer Random Access Memory Liquid Crystal Display Look up Table Input Output Hardware Description Language Application specified Integrated Circuit System on chip Hard Processor System Synchronous Dynamic Random Access Memory Phase Locked Loop General Purposes Input Output Finite State Machine APPENDIX A LIST OF APPENDICES TITTLE FLOATING POINT MATH MODULE VERILOG CODE LISTS A 1 Floating Point Adder fpu_add A 2 Floating Point Subtractor fpu_sub A 3 Floating Point Multiplier fpu_mul A 4 Floating Point Divider fpu_div A 5 Trigonometric CORDIC CORDIC Circular A 6 Hyperbolic CORDIC CORDIC hyperbolic A7 O format to IEEE 754 format converter A 8 CORDIC Top Module INTERFACE CIRCUIT VERILOG CODE LISTS B 1 De bouncer B 2 Keypad scanner keypad encoder B 3 LCD Top Module XV PAGE 71 71 73 76 78 85 88 90 91 92 92
7. Y i gt gt gt il the sign of the current rotation angle assign Z sign Z i 31 Z_sign 1 if Zi lt O always posedge clk begin add subtract shifted data X i 1 lt Z_sign X i Y_shr X i Y_shr Yfi 1 lt Z_sign Y i X_shr Y i X_shr Zli 1 lt Z sign Z i atan_table i ZG atan_table i end end endgenerate output assign cosh_eff X 15 assign sinh_eff Y 15 endmodule 89 A 7 Q format to IEEE 754 format converter module binary_to_ieee input clk input 16 0 data output 31 0 ieee_data reg 4 0 leadl wire sign data 16 wire 7 0 exponent 8 d127 leadl reg 23 0 mantissa reg 23 0 frac reg reg 4 0 count 0 reg done 0 always posedge clk begin frac_reg lt mantissa lt lt lead 1 end always begin if sign mantissa data 15 0 8 1 b0O else mantissa data 15 0 1 8 1 b0O end always mantissa begin if mantissa 23 leadl lt 0 else if mantissa 22 lead lt 1 else if mantissa 21 lead1 lt 2 else if mantissa 20 lead1 lt 3 else if mantissa 19 lead1 lt 4 else if mantissa 18 lead1 lt 5 else if mantissa 17 lead1 lt 6 else if mantissa 16 lead1 lt 7 else if mantissa 15 lead1 lt 8 else if mantissa 14 lead1 lt 9 else if mantissa 13 lead lt 10 else if mantissa 12 lead lt 11 else if mantissa 11 lead lt 12
8. 1 or 0 E Biased exponent 0 to 255 Bias 127 However there are five distinct numerical ranges that the single precision floating point numbers are unable to represent 2 as shown in the following table Specific name for the invalid Range of corresponding value range 1 Negative overflow lt 2 2 x 27 2 Negative underflow sa P 3 Zero 0 4 Positive underflow Ea 5 Positive overflow gt 2 27 xo Table 2 1 List of invalid range for IEEE 754 single precision format Thus overflow means that the value is too large that cannot be represented correctly Meanwhile underflow means the value is too small which become inexact Therefore these conditions are the exceptions that need to be handled as discussed in the next subsection 13 2 3 2 IEEE 754 Rounding Modes Sometimes rounding is necessary since the result precision is not infinite Furthermore rounding can also be used to handle the exception for underflow condition where the number is rounded toward zero Thus the standard specifies five rounding modes 1 2 4 as shown in the followings a Round to the nearest ties to even default which rounds to the nearest value with an even or zero least significant bit if the number falls midway b Round to the nearest ties away from zero which rounds to the nearest value above for positive numbers or below for negative numbers c Round toward zero which roun
9. 232 e Decimal value x 360 330 65 hyper in in Q2 30 unsigned binary 00 100000000000000000000000000000 536870912 e Decimal value a 0 5 Output operands cos in IEEE 754 binary 0 01111110 10111011011011000000000 e Decimal value 1 x 201267127 x 1 7321167 0 86605835 e Actual value by scientific calculator 0 8660254038 sin in IEEE 754 binary 1 01111110 00000000000001000000000 e Decimal value 1 x 201267127 x 1 0000610 0 5000305 e Actual value by scientific calculator 0 5 cosh in IEEE 754 binary 0 01111111 00100001001 111100000000 e Decimal value 1 x 2127 127 x 1 1298523 1 1298523 e Actual value by scientific calculator 1 127625965 sinh in IEEE 754 binary 0 01111110 00001011010100000000000 e Decimal value 1 x 20267127 x 1 0441895 0 5220947 e Actual value by scientific calculator 0 5210953055 exp in IEEE 754 binary 0 01111111 10100110111001100000000 e Decimal value 1 x 2027 127 x 1 6519470 1 651947 e Actual value by scientific calculator 1 648721271 Table 6 5 The detailed description of input and output operands from the output waveform of CORDIC module Based on the results in Table 6 5 the output results for cos and sin were closely the same with the result calculated from scientific calculator The precision up to 4 decimal places was achieved if compare these two sets of result
10. 7 0 exps expb reg 7 0 temp_exp reg 22 0 frac_opl frac_op2 reg 22 0 fracs fracb reg 26 0 fracb_n fracs_n allign_fracs_n final fracs_n reg 26 0 temp_sum wire allign_fracs_n_nonzero lallign_fracs_n 26 0 wire fracs_n_nonzero exps gt 0 I Ifracs 22 0 wire small_frac_en fracs n nonzero amp lallign fracs_n nonzero wire 26 0 special_fracs_n 26 b0 1 bl wire overflow temp_sum 26 wire lead1 temp_sum 25 wire opl_lt_op2 exp_opl gt exp_op2 wire s_denorm exps gt 0 wire b_denorm expb gt 0 wire b_norm_s_denorm s_denorm amp amp b_denorm wire denorm_to_norm leadl amp b_denorm 12 always posedge clk begin if rst begin exp_opl lt 0 exp_op2 lt 0 frac_opl lt 0 frac op2 lt 0 exps lt 0 expb lt 0 fracs lt 0 fracb lt 0 exp_diff lt 0 fracb_n lt 0 fracs_n lt 0 allign_fracs_n lt 0 final_fracs_n lt 0 temp_exp lt 0 temp_sum lt 0 end else if en begin exp_opl lt op1 30 23 exp_op2 lt 0p2 30 23 frac_opl lt op1 22 0 frac_op2 lt 0p2 22 0 if opl_lt_op2 begin exps lt exp_op2 expb lt exp_opl fracs lt frac_op2 fracb lt frac_opl end else if op1_It_op2 begin exps lt exp_opl expb lt exp_op2 fracs lt frac_opl fracb lt frac_op2 end 73 exp diff lt expb exps b_norm_s_denorm fracb_n
11. 94 97 CHAPTER 1 INTRODUCTION In this chapter the introduction about this project is made It starts with the project overview and follows by the motivations problem statements and objective After that the scope of work is identified from several aspects Lastly the organization of the report is briefly discussed 1 1 Project Overview Basically this project focuses on designing and implementing FPGA based floating point math hardware modules based on the conventional architecture of FPU and CORDIC algorithm to solve some typical operations as well as transcendental functions such as addition subtraction multiplication division exponential trigonometry and hyperbolic Normally the floating point number is represented by IEEE 754 standard technical standard with single precision 32 bits Meanwhile the fixed point format or Q format can also be used as the alternative to represent the floating point number which has higher speed but with lower precision Therefore we can make use of the speed advantages of fixed point format to deal with any low precision calculations and then convert its output to the IEEE 754 format so that the output data is complied with this standard Apart from that an efficient hardware algorithm namely COrdinate Rotational DIgital Computer CORDIC was developed in the design to realize the solution for some transcendental functions such as exponential trigonometry and hyperbolic Theoretic
12. Thus set X to value of X dY set Y to value of Y dX and set Z to value of Z dZ in order to update the values for X Y and Z e If Z lt O rotate the angle in clockwise direction for the next iteration Thus set X to value of X dY set Y to value of Y dX set Z to value of Z dZ in order to update the values for X Y and Z Thus the algorithm to perform linear and hyperbolic is similar to the algorithm for trigonometry but only with some modifications on LUT data and iteration equations by referring to Table 2 2 Meanwhile the value for exponent can be determined once the values for sinh and cosh are known since the addition for the values of sinh x and cosh x results in exponent of x 25 2 6 Related Works There are several works being done previously that relate to my projects Therefore there are some of the previous works were highlighted in this project for improvement In a thesis entitled An Efficient IEEE 754 Compliant Floating Point Unit using Verilog done by Lipsa Sahu and Ruby Dev 2012 1 the FPUs were implemented according to the IEEE 754 standard They built the FPU by using possible efficient algorithms with several modifications 1 Therefore from this works they design the FPUs with some most essential functions such as addition subtraction multiplication division shifting square root and trigonometry In this works the trigonometry function is computed using the CORDIC algorit
13. begin count lt 0 if cordic mode 2 b01 state lt 11 else if cordic mode 2 b10 state lt 12 end else count lt count 1 end end display the selection of trigonometry function 11 begin LCD DATA lt trigo_msg Isend msg to LCD LCD RS lt 1 bl set to data mode if count lt setup_delay LCD EN lt 1 b1 enable LCD else LCD EN lt 1 b0 104 if count big_delay begin count lt 0 addr3 lt addr3 1 state lt 11 if addr3 7 h38 begin state lt 13 db_en lt 1 end end else count lt count 1 end display the selection of hyperbolic function 12 begin LCD DATA hyper msg _ send msg to LCD LCD RS lt 1 bl set to data mode if count lt setup_delay LCD EN lt 1 b1 enable LCD else LCD EN lt 1 b0 if count big_delay begin count lt 0 addr4 lt addr4 1 state lt 12 if addr4 7 h38 begin state lt 13 db_en lt 1 end end else count lt count 1 end wait user to choose the output type to display 13 begin if db_level begin displayNo lt 2 b00 state lt 13 if data_in 8 h31 begin displayNo lt 2 b01 state lt 14 end else if data_in 8 h32 begin displayNo lt 2 b10 state lt 14 end 105 else if data_in 8 h33 if cordic_mode 2 b10 begin displayNo lt 2 b11 state lt 14 end end else state lt 13 end clear the screen after the selec
14. else if mantissa 10 lead lt 13 else if mantissa 9 lead1 lt 14 else if mantissa 8 lead1 lt 15 else if mantissa 7 lead1 lt 16 else if mantissa 6 lead1 lt 17 else if mantissa 5 leadl lt 18 90 91 else if mantissa 4 leadl lt 19 else if mantissa 3 leadl lt 20 else if mantissa 2 lead1 lt 21 else if mantissa 1 lead1 lt 22 else leadl lt 23 end always posedge clk begin if count 5 d18 done lt 1 else count lt count 1 end assign ieee_data done sign exponent frac_reg 22 0 32 hzzzzzzzz endmodule A 8 CORDIC Top Module module CORDIC input clk rst_n input 31 0 angle data output 31 0 cos_ieee sin_ieee cosh_ieee sinh_ieee exponent_ieee output ready_cos ready_sin ready_cosh ready_sinh ready_exp wire 16 0 cos_eff sin_eff cosh_eff sinh eff cordic_Circular u0 clk angle cos_eff sin_eff binary_to_ieee ul clk rst_n cos_eff cos_ieee ready_cos binary_to_ieee u2 clk rst_n sin_eff sin_ieee ready_sin cordic_hyperbolic u3 clk data cosh_eff sinh_eff binary_to_ieee u4 clk rst_n cosh_eff cosh_ieee ready_cosh binary_to_ieee u5 clk rst_n sinh_eff sinh_ieee ready_sinh fpu addsub u6 clk rst_n cosh ieee sinh_ieee exponent ieee ready exp endmodule APPENDIX B INTERFACE CIRCUIT VERILOG CODE LISTS B 1 De bouncer 92 module debounce input clk rst_n
15. frac2 guotient_out 23 1 wire 22 0 frac3 guotient_out 22 0 wire guotient_msb guotient_out 24 wire 22 0 frac4 guotient_msb frac2 frac3 wire expf_temp3_et0 expf_temp3 0 wire 22 0 frac5 expf_temp3 1 frac2 frac4 wire 22 0 frac6 expf_temp3_et0 fracl frac5 wire 23 0 dividend_denorm fdivided opl sh 1 b0 wire opl_norm lexp_opl wire op2_norm lexp_op2 wire 24 0 dividend temp opl_norm 2 b01 divided _opl 1 b0 dividend_denorm wire 23 0 divisor_denorm divisor_op2_sh 1 b0 wire 24 0 divisor temp op2_norm 2 b01 divisor_op2 1 b0 divisor_denorm wire 26 0 remainderl remainder_op2 49 23 wire 26 0 remainder2 guotient_out 0 remainder_msb remainder_out 23 0 1 b0 wire 26 0 remainder3 remainder_msb remainder_out 23 0 2 b0 wire 26 0 remainder4 quotient_msb remainder2 remainder3 wire 26 0 remainder5 expf_temp3 1 remainder2 remainder4 wire 26 0 remainder6 expf_temp3_et0 remainderl remainder5 wire 49 0 remainder_op1 quotient_out 24 0 remainder_msb remainder_out 23 0 wire exp_ufl exp_op2 gt exp_term wire exp uf2 expsh_opl gt expf_templ wire exp_uf_gt_maxshift exp_uf_term3 gt 22 wire count_nonzero count_index 0 80 wire op1 zero lop1 30 0 wire m_norm lexpf_temp4 wire rem Isb Iremainder6 25 0 assign frac out
16. lt 1 b0 b_denorm fracb 2 b0 fracs_n lt 1 b0 s_denorm fracs 2 b0 allign_fracs_n lt fracs_n gt gt exp_diff final_fracs_n lt small_frac_en special_fracs_n allign_fracs_n temp_sum lt fracb_n final_fracs_n temp_exp lt overflow expb 1 expb end end assign sign op1 31 assign final_sum overflow temp_sum gt gt 1 temp_sum assign final_exp denorm_to_norm temp_exp 1 temp_exp endmodule A 2 Floating Point Subtractor fpu_sub module fpu_sub input clk rst en input 31 0 opl op2 input 2 0 fpu_mode output sign output 7 0 final_exp output 25 0 final_diff reg 4 0 lead0 reg 7 0 exp_opl exp_op2 exps expb exp_diff exp reg 22 0 frac opl frac_op2 fracs fracb reg 25 0 minuend subtrahend allign subtra final subtra diff temp_diff wire expl_lt_exp2 exp_opl gt exp_op2 wire expl_et_exp2 exp_opl exp_op2 wire fracl_ltet_frac2 frac_opl gt frac_op2 wire opl_ltet_op2 expl_lt_exp2 expl_et_exp2 amp fracl_ltet_frac2 wire s_denorm exps gt 0 wire b_denorm Nexpb gt 0 wire b_norm_s_denorm s_denorm amp amp b_denorm wire fracs nonzero exps gt 0 I lfracs 22 0 wire allign subtra nonzero lallign_subtra 25 0 wire subtra_frac_en fracs_nonzero amp allign_subtra_nonzero wire 25 0 special_subtra 25 b0 1 bl 1 wire lead0_lt_exp lead0 gt exp
17. to interface the DEI board with external peripherals such as character LCD and keypad the 40 pins expansion headers can be used by proper pin assignment according to the datasheet Basically the DEI board provides two 40 pins expansion headers Each header connect to 36 pins on the Cyclone II FPGA and remaining 4 pins are used to provide DC 5V DC 3 3V and two GND pins 18 Thus for protection purposes each pin on the expansion headers is connected to a resistor Thus the schematic diagram of the expansion headers is shown in Figure 2 3 10 GPIO 0 GPIO 1 2 2 4 4 6 6 8 8 0 0 2 2 4 4 6 6 8 VCC33 O Figure 2 3 The schematic diagram for expansion headers 2 2 Floating Point Units FPUs Floating point units FPUs colloquially a math or numeric coprocessor which are specially designed to perform the floating point operations 1 The terms coprocessor is referred to a special set of circuits in a microprocessor chip that is designed to speed up the manipulation process of the numbers Meanwhile a floating point number is basically a binary number that includes the radix point and being stored into three parts which are the sign either plus or minus the mantissa sequence of meaningful digits and the exponent power or order of magnitude according to the IEEE 754 standard 1 There have several functions of the FPUs Typically FPUs are used to perform addition subtraction multiplication and di
18. 32 b000000000000 10100010111110011000 32 b00000000000001010001011111001100 32 b00000000000000101000101111100110 32 D00000000000000010100010111110011 32 D00000000000000001010001011111001 32 b00000000000000000101000101111101 32 b00000000000000000010100010111110 32 b00000000000000000001010001011111 32 b00000000000000000000101000101111 32 b00000000000000000000010100011000 32 b00000000000000000000001010001 100 32 b000000000000000000000001010001 10 32 b0000000000000000000000001010001 1 32 b00000000000000000000000001010001 32 b00000000000000000000000000101000 32 b00000000000000000000000000010100 32 b00000000000000000000000000001010 32 b00000000000000000000000000000101 32 b00000000000000000000000000000010 32 b00000000000000000000000000000001 atan 2 29 32 b00000000000000000000000000000000 85 Istage outputs reg signed 16 0 X 0 15 reg signed 16 0 Y 0 15 reg signed 31 0 Z 0 15 wire 1 0 quadrant assign quadrant angle 31 30 always posedge clk begin make sure the rotation angle is in the pi 2 to pi 2 range If not then pre rotate case quadrant 2 b00 2 bll no pre rotation needed for these guadrants begin X n Y n is 1 bit larger than Xin Yin but Verilog handles the assignments properly X 0 lt Xin Y 0 lt Yin Z 0 lt angle end 2 b01 begin X 0 lt Yin Y 0 lt Xin Z O lt 2 b00 angle 29 0 subtract pi 2 from angle for this
19. Clock rst 1 Input Reset values for initializing en 1 Input Enable signal opl 32 Input Operand 1 in IEEE 754 format op2 32 Input Operand 2 in IEEE 754 format sign 1 Output Sign bit for output in IEEE 754 format exp_out 9 Output Exponent for output in IEEE 754 format frac_out 27 Output Mantissa for output in IEEE 754 format with extra 4 bits for specific purposes Table 4 4 T O interface description for fpu_div Basically the division is performed by several shifting and subtracting operations It is similar with the hand calculation method for division For IEEE 754 42 format it has 24 bits mantissa if include the hidden bit Therefore the shift and subtract operation need to be performed with 24 iterations to compute the value of result bit by bit Thus the algorithm for my design is as shown in the following 1 Determine the number of leading zeroes for both operands e Count the number of leading zeros for the mantissa of both operands and store them into registers 2 Shifting left e Shift left the mantissas for both operands by corresponding number of leading zeroes 3 Division e Initialize and start the counter for iteration Create a counter that count from 24 and decrement until O to indicate the start and end of the operation e Determine the value of result bit by bit It can be done by shift and subtract when the counter is valid e The sign of the result is determined by exclusive O
20. Fixed Point Representation amp Fractional Math Oberstar Consulting online Available http www superkits net whitepapers Fixed 20Point 20Representation 20 amp 20Fractional 20Math pdf 14 Pong P 2008 FPGA Prototyping by Verilog Examples New Jersey A John Wiley amp Sons Inc Publication 15 Wayne W 2004 FPGA Based System Design New Jersey Prentice Hall 16 ALTERA 2007 Cyclone II Architecture Altera Corporation retrieved from official website www altera com 17 ALTERA 2012 Altera s User Customizable ARM Based SoC FPGAs Altera Corporation retrieved from official website www altera com 18 ALTERA and Terasic DEI Development and Education Board User Manual Retrieved from Terasic Official Website www terasic com 19 Cytron Technologies 4x4 Keypad User s Manual Retrieved from Cytron product page http www cytron com my viewProduct php pcode SW KEYPAD 4X4 amp name Keypad 204x4 20 Julyan I 1997 How to use Intelligent L C D s Part One Wimborne Pulishing Ltd publishers of Everyday Practical Electronics Magazine online Available http Www wizard org auction_support lcd1 pdf AA APPENDIX A FLOATING POINT MATH MODULE VERILOG CODE LISTS Floating Point Adder fpu_add 71 module fpu_add input clk rst en input 31 0 opl op2 output sign output 7 0 final_exp output 26 0 final_sum reg 7 0 exp_opl exp_op2 exp_diff reg
21. HD44780 controller chip Thus this module has a fairly basic interface for several platforms such as microprocessor microcontroller and even the FPGA Although it is not quite as advanced as the latest generation it still extensively used in commercial and industrial equipment Thus there have 14 pins for standard interface as shown in Table 2 3 Pin Number Name Function 1 Vss Ground 2 Vdd Positive supply 3 Vee Contrast 4 RS Register Select 28 5 R W Read Write 6 E Enable 7 DO Data bit 0 8 D1 Data bit 1 9 D2 Data bit 2 10 D3 Data bit 3 11 D4 Data bit 4 12 D5 Data bit 5 13 D6 Data bit 6 14 D7 Data bit 7 Table 2 3 Pin Layout functions for all character LCD 18 Thus to interface character LCD module with DE1 board the LCD pins are connected to GPIO pins in the DE1 board and then make proper pin assignment Then the specific command data in 1 byte is sent to the LCD to perform certain operations in command mode RS 0 such as clear display set entry mode set display address and so forth as shown in Table 2 4 Clear Display Character Entry Mode Display Cursor Shift xi Do Set CGRAM Address Set Display Address 1 Increment 0 Decrement R L 1 Right shift O Left shift 1 Display shift on O Off 8 4 1 8 bit interface 0 4 bit interface 1 Display on 0 Off 2 1 122 line mode 0 1 line mod
22. However if E2 gt El the mantissas of these two operands were swap Then set larger exponent as tentative exponent of result 2 Stage 2 Pre alignment 16 e Pre align mantissa by shifting the smaller mantissa to the right by d bits 3 Stage 3 Addition or subtraction e Perform addition or subtraction between M1 and M2 to get the tentative for mantissa 4 Stage 4 Rounding e Round the mantissa of the result by following the rounding mode If the result become overflows due to rounding shift right and increment exponent back by 1 bit 5 Stage 5 Normalization e Check the number of leading zeros in the tentative result and then shift the result to left and decrement exponent by the number of leading zeros However if the tentative result overflows shift right and increment exponent back by 1 bit Thus the pre alignment and normalization stages reguire large shifter registers For pre alignment stage it needs a right shift register that is twice the number of mantissa bits because the shifted out bits have to be maintained to generate the guard round and sticky bits which is reguired for rounding operation Meanwhile for the normalization stage it needs a left shift register that egual to the number of mantissa bits plus 1 to shift in the guard bit Therefore the flowchart for floating point addition or subtraction algorithms is shown in Figure 2 5 17 ai Operand sign amp e amp frac N C Operand signg a
23. Stage 2 Shifting left e Shift left MI and M2 by the corresponding number of leading zeroes 3 Stage 3 Division e Divide the MI with M2 Then the sign of the result is determined by exclusive OR the S1 and S2 Meanwhile the exponent of the result is calculated based on the following equation Resulted E El E2 127 Z1 Z2 4 Stage 4 Rounding e Round the mantissa of the result by following the rounding mode If the result become overflows due to rounding shift right and increment exponent back by 1 bit 5 Stage 5 Normalization e Check the number of leading zeros in the tentative result and then shift the result to left and decrement exponent by the number of leading zeros However if the tentative result overflows shift right and increment exponent back by 1 bit Thus the flowchart for floating point division is shown in Figure 2 7 21 Operand signa amp ea amp fraca N Operandg signg amp eg amp fracg SS Count leading zeroes in both fraction Z amp Ze Shift left frac Z bits Shift left frac Zgbits ep x g bias 127 z zu frac frac fracg sign sign xor signp Round frac yes i Y T Signal exception y_ Normalize T Output sign amp e amp frac Figure 2 7 The flowchart for the conventional floating point division 4 2 5 4 Transcendental Functions Basically a transcendental function is a function that
24. are act as the input of the controller Then the LCD_RW LCD_BLON LCD DATA LCD_EN LCD_RS are the output of the controller which connected to the LCD for display the messages Apart from that the controller also sent the address to the ROM to access its memory Besides that an enable signal for de bouncer is also controlled by the controller Meanwhile for the input the system is actually retrieving the input from user based on the keypad button that has been pressed and then scan it to generate an appropriate signal for the controller to process CHAPTER 5 PROJECT MANAGEMENT A project represents a collection of tasks aimed toward a single set of objectives culminating in a definable end point and having a finite life span and budget Normally a project is a one of a kind activity which aimed to produce some product or outcome that has never existed before Therefore there are two essential considerations in project management which are time or project schedule and cost 5 1 1 Project Schedule First of all planning of a project s progress is essential so all the important works were scheduled into Gantt chart for FYP1 and FYP2 before this project is started as shown in Figure 5 1 and Figure 5 2 Gantt Chart FYP1 Activities Wk 3 Wk 4 Wk 5 Wk 6 Wk 7 FYP1 Briefing Section Identifying the FYP supervisor and meet with him Deciding the FYP tittle and discuss with supervisor Weekly short report writing Outlining
25. cannot be solved by a polynomial equation and its coefficients are themselves polynomials 1 Thus it is a function that is not algebraic which means that it cannot be express itself in terms of algebraic operations such as addition and multiplication Example of this function includes exponential trigonometric and hyperbolic functions Normally to implement these operations on a hardware design it requires large memory storage 22 have large number of clock cycles and also high cost of hardware organization since the calculation process for transcendental function are more complex Therefore to minimize this problem CORDIC algorithm which is an efficient hardware algorithm can be used to realize the solution for several transcendental functions Thus this algorithm can be developed on FPU to enhance the efficiency to solve some transcendental function 2 5 4 1 Coordinate Rotational Digital Computer CORDIC Algorithm Based on the research done by Shrugal V Dr Nisha S Richa U 12 this algorithm is specially developed for real time digital computers where the computations mainly related to elementary function Thus this algorithm needs only the shift registers adder subtractors and ROM to store some data that derived from look up table So the advantages to use this algorithm are low cost less hardware requirement and relatively simple for hardware implementation Historically it was first proposed by Jack Volder in 1959
26. data type representation for the fixed point number Thus it is also useful to represent fractional values by scaling to a fixed point number Therefore a value of fixed point data type is actually an integer that is scaled by a specific factor determined depending to the type 3 For example the value of 12 25 can be represented as 49 in fixed point data with a scaling factor of 4 and the value become 98 with the scaling factor of 8 Meanwhile for the floating point format the scaling factor is fixed during entire computation Thus the scaling factor is usually in power of 2 to compute the binary data efficiency in a digital design 2 4 1 O format To improve mathematical throughput or increase the execution rate calculations for fractional values can be performed by using unsigned fixed point representations or two s complement signed fixed point representations 13 Thus it requires the programmer to create a virtual decimal place for a given length of data For this purposes Q format can be used to realize it The convention is as shown in the following O m n where m number of integer bits including the sign bit for signed number n number of fractional bits m n Total bits of the representation number of integer bits number of fractional bits 15 Therefore the value of m and n is set based on the number of bits required for the system and the range of the computed data Meanwhile in order to scale a floatin
27. each Ims of time interval At the same time it will check the state of the each row within Ims Therefore the output gives a specific data to indicate which button is pressed Thus this scanner is very useful because it can send specific data to the system when any button was pressed The block diagram to design a working keypad scanner is shown in Figure 4 9 52 row 3 0 Keypad Scan Check for Button Pressed 1ms Counter col 3 0 data 3 0 Figure4 9 Block diagram of Keypad Scanner 4 3 2 De bouncer Initially some testing are done by sending the keypad data to the system by pressing button and then display a character on LCD based on the received data However the LCD does not properly receive the data for each time Sometime the data is sent more than one time although it is one time pressed and sometime even not received at all or received incorrect data It seems like the system is unstable Thus the problem for this issue was investigated Finally the problem was found where it is due to the debouncing glitch of the push buttons 14 Therefore a de bouncer has to be designed to filter out the glitches associated with switch transitions This design is based on FSM approach and uses a free running 10 ms timer The timer generate a one clock cycle enable tick every 10ms and then use the FSM approach to keep track of whether the input is stabilized However the FSM ignores the short bounces and chan
28. else begin case state 0 begin if db_level begin data_out lt key_data state lt 1 end else state lt 0 end 1 begin if db level state lt 0 else state lt 1 end endcase end end always begin case data 4 h0 key data lt 8 h30 4 h1 key data lt 8 h31 4 h2 key data lt 8 h32 4 h3 key data lt 8 h33 4 h4 key data lt 8 h34 4 h5 key data lt 8 h35 4 h6 key data lt 8 h36 4 h7 key data lt 8 h37 4 h8 key data lt 8 h38 4 h9 key data lt 8 h39 4 hA key data lt 8 h2B 4 hB key data lt 8 h2D 4 hC key data lt 8 h78 4 hD key data lt 8 hFD 4 hE key data lt 8 h2E 4 hF key data lt 8 h3D endcase end endmodule B 3 LCD Top Module module LCD CORDIC input clk rst n input ins input 3 0 col output 3 0 row output reg 7 0 LCD_DATA output LCD RW LCD_BLON output reg LCD EN LCD RS output reg LED wire ready cos ready sin ready cosh ready sinh ready exp wire 31 0 cos ieee sin ieee cosh ieee sinh ieee exponent ieee angle 330 degree 30 degree angle 360 2 32 wire 31 0 angle 32 b11101010101010101010101010101011 hyperIn 0 5 hyperIn 2430 wire 31 0 hyperIn 32 b00100000000000000000000000000000 97 CORDIC u10 clk rst_n angle hyperIn cos_ieee sin ieee cosh_ieee sinh_ieee exponent_ieee ready_cos ready_sin ready_cosh ready_sinh ready_exp s
29. en input key output reg db symbolic state declaration parameter 2 0 zero 3 b000 waitl_1 3 b001 waitl_2 3 b010 waitl_3 3 b011 one 3 b100 wait0_1 3 b101 wait0_2 3 b110 wait0_3 3 b111 number of counter bits parameter N 19 signal declaration reg N 1 0 q_reg wire N 1 0 q_next wire m_tick reg 2 0 state_reg state_next 93 always posedge clk _Teg lt q_next assign q_next q_reg 1 assign m_tick q_reg 0 1 bl 1 b0 state register always posedge clk negedge rst_n if rst_n state_reg lt zero else if en state_reg lt state_next next state logic and output logic always begin state_next state_reg default state the same db 1 b0 default output O case state_reg zero if key state next waitl_1 waitl_1 begin if key state_next zero else if m_tick state next waitl 2 end waitl 2 begin if key state next zero else if m_tick state next waitl 3 end Wait l 3 begin if key state next zero else if m_tick state next one end 94 end endmodule one begin db 1 b1 if key state_next wait0 1 end wait0_1 begin db 1 bl if key state_next one else if m_tick state next wait0_2 end wait0_2 begin db 1 bl if key state_next one else if m_tick state next wait0_3 end wait0_3 begin db 1 b1 if key state_next one
30. programmable switch can be customized to provide interconnections among the logic cells 14 Therefore a complex design can be implemented by proper setting the functions of each logic blocks and the connection of the interconnection switches through programming The generic structure of a FPGA fabric is shown in Figure 2 1 S Logic block Interconnection switches Figure 2 1 Generic structure of a FPGA fabric 14 Therefore the FPGA configuration is basically defined by using hardware description language HDL such as Verilog HDL and VHDL It is similar to that used for an application specified integrated circuit ASIC Therefore FPGAs can be used to perform any logical function as for ASIC Furthermore FPGAs also offer wide range of applications due to its ability in updating the functionality after shipping partial re configuration of a portion of the design and the low non recurring engineering costs of an ASIC design 15 Meanwhile if comparing the FPGAs to ASICs FPGAs offer much more design advantages such as rapid prototyping shorter time to market reprogram capability for debugging lower NRE costs and longer product life cycle With the evolution of FPGAs technology the devices have become more integrated therefore a new technology namely SoC FPGA was introduced 17 It integrates an ARM based hard processor system HPS with the FPGA fabric using a high bandwidth interconnect bac
31. quadrant end 2 b10 begin X 0 lt Yin Y 0 lt Xin Z 0 lt 2 b11 angle 29 0 add pi 2 to angle for this quadrant end endcase end 86 genvar i generate for G 0 i lt 15 i i 1 begin XYZ wire Z_sign wire signed 16 0 X_shr Y_shr assign X_shr X i gt gt gt i signed shift right assign Y_shr Y i gt gt gt i the sign of the current rotation angle assign Z sign Z i 31 Z sign 1 if Z i lt O always posedge clk begin add subtract shifted data X ir1 lt Z_sign X i Y_shr X i Y_shr Yfi 1 lt Z_sign Y i X_shr Y i X_shr Zli 1 lt Z sign Z i atan_table i Z i atan_table i end end endgenerate output assign cos_eff X 15 assign sin_eff Y 15 endmodule 87 A 6 Hyperbolic CORDIC CORDIC_hyperbolic module cordic_hyperbolic input clk input signed 31 0 data output 16 0 cosh_eff sinh_eff wire 15 0 Xin 16 b1001101010010000 wire 15 0 Yin 16 b0000000000000000 arctan table wire signed 31 0 atan_table 0 17 assign atan_table 00 32 b00100011001001111101010011110000 0 54930614 assign atan_table 01 32 b00010000010110001010111011111000 0 25541281 assign atan_table 02 32 b00001000000010101100010010001001 0 12565721 assign atan_table 03 32 b00000100000000010101011000100001 0 06258157 assign atan_table 04 32 b000000100000000000101010101 10010
32. trigonometry in faster way and lower cost as only shift register adder and look up table ROM are required Finally the design is implemented on the Altera FPGA board with an external circuit soldered on a donut board which consists of a 16x2 character LCD a 4x4 matrix keypad and some important electronic components Thus the matrix keypad is used as input interface and LCD as output interface This interface circuit can be used to test the functionality of the design without referring to the simulation waveform In addition the output results displayed on LCD are in hexadecimal form of the 32 bits IEEE 754 format to ease the designer to read the result from it vi ABSTRAK Projek ini bertujuan untuk mencipta dan membuat satu modul perkakasan matematik dengan titik terapung yang berasaskan FPGA Ciptaan modul ini adalah berasaskan teori seni bina FPU umum serta algoritma CORDIC Justeru itu ciptaan ini dapat digunakan untuk menyelesaikan pelbagai jenis operasi matematik seperti operasi penambahan penolokan pendaraban pembahagian eksponen trigonometri serta hiperbola Sehubungan dengan itu format IEEE 754 dengan ketepatan tunggal bit 32 dan format titik tetap digunakan untuk mewakili titik titik terapung dalam cipataan ini Selepas itu hubung kait antara kedua dua format tersebut dibincang berasaskan kepada ketepatan output serta prestasi ciptaan Selain itu satu algoritma berkesan yang bernama algoritma Coordinate Rotational Digital
33. which are addition subtraction multiplication and division in IEEE 754 format Then the CORDIC algorithm is further developed to solve some transcendental functions efficiently such as trigonometry hyperbolic and exponential After the design for hardware architecture is done the external VO interface circuit was designed and the schematic was drawn Prior to solder the whole circuit onto the donut board the design circuit was tested on the breadboard first to ensure that the circuit is functioning well Then the working circuit was soldered careful onto a piece of donut board Therefore after the interface circuit is constructed the controllers for interfacing the 4x4 matrix keypad and 16x2 character LCD were developed using Verilog HDL It used to interface with an external VO interface circuit through 40 pins GPIO port of the Altera DE board 33 3 1 3 Design Testing and Verification In this stage the behavioral simulation needs to be performed to test and verify the functionality of the design through waveform To do it specific waveform simulator software namely Altera ModelSim is required Firstly the project file is simulated by using the Altera ModelSim which is invoked from Ouartus II After that signal tracing is made to check with the desired functionality and perform verification Thus the verification can be done by comparing the result from the simulation with the result computed by scientific calculator If the res
34. zero prod temp2 gt gt 1 prod_temp2 78 always product casex product endcase endmodule prodshift lt prodshift lt prodshift lt prodshift lt prodshift lt prodshift lt prodshift lt prodshift lt prodshift lt prodshift lt prodshift lt prodshift lt prodshift lt 0 1 2 3 4 3 6 T 8 9 A 3 Floating Point Divider fpu_div module fpu_div input clk rst en input 31 0 opl input 31 0 op2 output sign output reg 8 0 exp_out output 26 0 frac_out parameter preset 24 79 reg en reg en_reg2 en_reg_a en_reg_b en_reg_c en_reg_d en_reg_e reg remainder msb count nonzero reg count nonzero reg2 reg expf temp3 term reg 5 0 dividend sh divisor sh dividend sh2 divisor sh2 count out reg 6 0 remainder sh term reg 8 0 expf_templ expf temp2 expf temp3 expf_temp4 reg 8 0 expsh opl expsh_op2 reg 8 0 exp term exp uf terml exp uf term2 exp uf term3 exp uf term4 reg 22 0 fracl reg 22 0 divided opl divided_op1_sh divisor_op2 divisor op2 sh reg 24 0 guotient guotient out remainder remainder out reg 24 0 reg 49 0 remainder op2 dividend reg divisor reg wire 5 0 count index count out wire 8 0 exp opl 1 b0 op1 30 23 wire 8 0 exp_op2 1 b0 0p2 30 23 wire 22 0 frac_opl op1 22 0 wire 22 0 frac_op2 op2 22 0 wire 22 0
35. 1 Soldering Iron 25W 10 00 10 00 12 Solder Stand ZD 10 8 00 8 00 13 Solder Lead 1 0mm 250gm 29 50 29 50 14 Pro skit Desoldering Pump 16 00 16 00 Total 165 50 Table 5 1 List of Components and Materials needed Thus the total amount of cost is RM165 60 which is within the budget Therefore the cost problem need not be worried and then the implementation works can be focused CHAPTER 6 RESULTS AND ANALYSIS In this chapter all the results that have been done in this project are verified and analyzed Thus the result from the LCD is verified by comparing to the simulation result In addition the performance of the design is also investigated based on the clock cycle or latency needed for computation done 6 1 Simulation Results The design units explained in the previous chapter has been coded in Verilog HDL and simulated using ModelSim Altera software which invoked from the Quartus II software Thus the output waveforms for each floating point math hardware module were shown in the following subsection In addition the output is also compared with the actual result that calculated by scientific calculator 6 1 1 Floating Point Adder The output waveform generated by fpu_add is shown in Figure 6 1 It performs the floating point addition between opl and op2 and gives the result in add_out All the data are represented in IEEE 754 single precision floating point format This design requires 12 clock
36. 6 Therefore this algorithm is derived from general rotation transform as shown below Xn X cos Yg sin Y Yo cos X sin Thus the simplified equations as shown below Xn cos 0 Xo Yo tan 0 Yn cos 0 Yg Xo sin 0 By assuming that tan 2 and i is the number of iteration then the multiplication in the above eguation replaced with simple shift operation Therefore the iteration eguation becomes Xi 1 KilXi Yidi2 Yna KilYi Xid 2 where K cos tan 1 27 dj 1 After that if the scaling factor K is removed the resulted equation will only consist of simple shift and add operation only Thus the value of K approaches 23 0 607252935 as the number of iteration approaches infinity Therefore the finalize iteration equation for CORDIC algorithm is shown below Xing Xi Yid 2 Vier Y Kdi Zi Zi ditan 2 1 Z lt 0 E wened 1 otherwise Since the equation above can only solve for trigonometric function J S Walter 7 modified the original CORDIC equation into a unified CORDIC algorithm It generalized several transcendental functions into a single algorithm Thus this algorithm defines a set of iteration equations to solve for trigonometry hyperbolic and exponential functions by using the same hardware resources The iteration equations are shown in the following Xi 1 Xi md 2 Y Yaa Y di2 Xi Zi Z1 die i where m i
37. CD EN lt 1bl enable LCD else LCD EN lt 1 b0 102 if count big_delay begin count lt 0 addrl lt addr1 1 state lt 6 if addr1 7 h38 state lt 7 end else count lt count 1 end clear the screen after 2s delay 7 begin if count2 delay2s begin LCD_DATA lt CLR LCD RS lt 1 b0 if count lt setup_delay LCD EN lt 1 b1 else LCD_EN lt 1 b0 if count long_delay begin state lt 8 count lt 0 end else count lt count 1 end else count2 lt count2 1 end display the mode selection screen 8 begin LCD_DATA lt fpu_op send msg to LCD LCD RS lt 1 bl set to data mode count lt 0 if count lt setup_delay LCD EN lt 1 bl enable LCD else LCD_EN lt 1 b0 if count big_delay begin count lt 0 addr2 lt addr2 1 state lt 8 if addr2 7 h38 begin state lt 9 db en 1 end end else count lt count 1 end 103 wait user to select a cordic_mode 9 begin if db_level begin state lt 9 if trigo begin cordic mode lt 2 b01 state lt 10 end else if hyper begin cordic mode lt 2 b10 state lt 10 end end else state lt 9 end clear the screen after selection chosen 10 begin if db_level begin LCD DATA lt CLR LCD RS lt 1 b0 db en lt 0 if count lt setup_delay LCD EN lt 1 b1 else LCD EN lt 1 b0 if count long delay
38. Computer CORDIC digunakan dalam ciptaan ini untuk menyelesaikan fungsi fungsi elemen seperti trigonometri dengan cara yang lebih cepat serta kos yang lebih murah kerana ia hanya memerlukan pengalih penambah dan ROM Akhirnya ciptaan ini dibuat ke atas papan Altera FPGA dengan satu litar luaran yang dipateri atas papan donat Antara komponen komponen yang penting adalah 16x2 character LCD 4x4 kekunci matriks dan sebagainya Justeru itu kekunci matriks adalah digunakan sebagai penyambung input dan LCD pula digunakan sebagai penyambung output Selain itu litar ini juga dapat digunakan untuk menguji fungsi ciptaan tanpa merujuk kepada keputusan simulasi Lantaran itu keputusan output yang dipaparkan di atas LCD adalah dalam bentuk perenambelasan dengan format ketepatan tunggal bit 32 supaya dapat mempercepat proses memperoleh sesuatu keputusan vii TABLE OF CONTENTS CHAPTER TITTLE PAGE DECLARATION ii DEDICATION iii ACKNOWLEDGEMENT iv ABSTRACT Y ABSTRAK vi TABLE OF CONTENTS vii LIST OF TABLES x LIST OF FIGURES xii LIST OF ABBREVIATIONS xiv LIST OF APPENDICES XV 1 INTRODUCTION 1 1 1 Project Overview 1 1 2 Motivations 2 1 3 Problem Statements 3 1 4 Project Objectives 3 1 5 Scope of Works 4 1 6 Organization of the Project 4 2 LITERATURE REVIEW 6 2 1 Field Programmable Gate Array FPGA 6 2 1 1 Altera Cyclone II FPGA 8 2 1 2 Altera DEl Development and Education Board 9 2 2 Floating Point Units FPUs 10 2 3 IEEE Standard for Fl
39. However the output results for cosh sinh and exp did not achieve high precision from the actual value They have only 1 2 decimal places precision This is due to the low precision of the number representation format that has been used for the hyperbolic operation which is Q2 30 format In order to achieve higher precision the hyperbolic CORDIC module needs to be designed using the higher precision floating point format such as IEEE 754 format to perform the computation Anyway although only low precision achieved for some parts of the design but the design is generally works and give the acceptable results 66 6 2 Interface Circuit Results from LCD display Based on the CORDIC module the design is further interfaced with an I O interface circuit to display the result so that the results can be checked more easily without tracing from the simulation waveform Thus Figure 6 6 shows the completed I O interface circuit that has been done on the donut board Figure 6 6 T O interface circuit on donut board with working LCD display By using this module the results are displayed in hexadecimal form that converted from the binary value of 32 bits single precision IEEE 754 floating point format Thus by introducing some inputs from the CORDIC module as discussed in previous section where angle 330 and hyper_in 0 5 the outputs on LCD displays for cos sin cosh sinh and exp were recorded as shown in Table 6 6
40. IC Circular and the block diagram is shown in Figure 4 5 CORDIC Circular 45 32 angle cos_eff CORDIC _Circular sin_eff clk Figure 4 6 Block diagram of CORDIC Circular Then Table 4 6 describes all the inputs and outputs for this block and brief description of their functions Signal Name Width Type Description clk 1 Input System Clock angle 32 Input Input angle in O format Q0 32 Notes Conversion equation Desired Angle in degree 360 2 cos_eff 17 Output Output value for cosine in Q format Q2 15 Notes Conversion equation Value 2 sin_eff 17 Output Output value for sine in Q format Q2 15 15 Notes Conversion equation Value 2 Table 4 6 VO Interface description for CORDIC_Circular Since the data is in Q format all the data inside this design as well as the data of look up table of rotational angles need to be converted to this format After that the algorithm to implement this module is shown in the following 1 Set two initial values for Xin and Yin e Xin and Yin is the initial values of cos_eff and sin_eff These values will become the answer for cos_eff and sin_eff after 15 iterations e Set Xin 0 607252935 16p0100110110111010 Gn Q1 15 format Notes that the conversion equation Value 2 46 e Set Yin 0 16 b0000000000000000 in O1 15 format 2 Construct look up table for rotational angle from O to 15 iterat
41. PSZ 19 16 Pind 1 07 UNIVERSITI TEKNOLOGI MALAYSIA DECLARATION OF THESIS UNDERGRADUATE PROJECT REPORT AND COPYRIGHT Author s full name CHEN KEAN TACK Date of Birth 14TH JANUARY 1989 Title DESIGN AND IMPLEMENTATION OF FPGA BASED FLOATING POINT MATH HARDWARE MODULE Academic Session 2012 2013 declare that this thesis is classified as CONFIDENTIAL Contains confidential information under the Official Secret Act 1972 RESTRICTED Contains restricted information as specified by the organization where research was done OPEN ACCESS I agree that my thesis to be published as online open access full text acknowledged that Universiti Teknologi Malaysia reserves the right as follows The thesis is the property of Universiti Teknologi Malaysia The Library of Universiti Teknologi Malaysia has the right to make copies for the purpose of research only The Library has the right to make copies of the thesis for academic exchange Certified by SIGNATURE SIGNATURE OF SUPERVISOR ASSOC PROF DR MUHAMMAD 890114 08 5549 NASIR BIN IBRAHIM NEW IC NO PASSPORT NAME OF SUPERVISOR Date 2414 JUNE 2013 Date 24H JUNE 2013 NOTES If the thesis is CONFIDENTAL or RESTRICTED please attach with the letter from the organization with period and reasons for confidentiality or restriction I hereby declare that I have read this thesis and in my our opinion this thesis is suffic
42. R the sign for both operands e The resulted exponent of is calculated based on the following equation Resulted E exponent of opl exponent of op2 127 number of leading zero of opl number of leading zero of op2 4 Normalization e Normalize the value by checking the number of leading zero of the tentative result and then shift the result to left and decrement exponent by the an amount same as the number of leading zeros However if the tentative result overflows shift right the mantissa and increment the exponent by 1 bit 4 1 5 Rounding Logic For the above modules the output is not yet rounded and concatenated to be a 32 bits IEEE 754 format Therefore each of the outputs from the above design 43 should be connected to a rounding logic to round the result and then concatenate the sign exponent and mantissa to be a 32 bits IEEE 754 format In my design the round to nearest mode is used for rounding the result 4 2 Efficient Floating Point Math Module For this project an efficient hardware algorithm namely CORDIC algorithm which proposed by Volder 7 is also used to realize solution for trigonometry and hyperbolic functions Based on my findings this algorithm is simple and inexpensive for hardware implementation as only shift registers adders and ROM Therefore there are two modules were designed based on CORDIC algorithm to solve for the trigonometry and hyperbolic with exponential functions Meanwhile th
43. Set two initial values for Xin and Yin e Xin and Yin is the initial values of cosh_eff and sinh_eff These values will become the answer for cosh_eff and sinh_eff after 15 iterations e Set Xin 1 20753406 16 b1001101010010000 in O1 15 format Notes that the conversion equation Value 2 e Set Yin 0 16 b0000000000000000 in O1 15 format 2 Construct look up table for rotational angle from 1 to 15 iterations e Convert all the values in Table 4 8 to Q2 30 format and store into the atan_table RAM e Conversion equation rotational angles 2 3 Setthe value of shifted X X_shr and shifted Y Y_shr e Set X_shr and Y_shr by right shifting by i iteration number places 4 Determine the rotation direction and the values of X Y and Angle for the next iteration e If Angle gt 0 rotate the angle in anti clockwise direction for the next iteration Thus set X to value of X Y_shr set Y to value of Y X_shr and set Angle to value of Angle atan table i in order to update the values for X Y and Angle e If Angle lt 0 rotate the angle in clockwise direction for the next iteration Thus set X to value of X Y_sh set Y to value of Y X_sh set Angle to value of Angle atan_table i in order to update the values for X Y and Angle 49 424 Q format to IEEE 754 format Converter Since the outputs of the module of CORDIC module are in O format O2 15 it needs to be converted to the IEEE 754 single preci
44. ally this algorithm is an iterative algorithm for the calculation of the rotation of two dimensional vector in linear circular and hyperbolic systems Since it does not use any Calculus based methods such as polynomial so it calculate all the functions in a rather simple and elegant way Furthermore it requires only shift registers adders and look up table ROM so it resulted in lower cost for the design Finally in order to implement the design for the real time verification an external circuit was built on the donut board which consists of 16x2 character LCD 4x4 matrix keypad and some electronic components Thus the keypad acts as the input interface to allow the user to give the input command to the system to perform specific operation Meanwhile the LCD acts as the output interface to display the useful messages to communicate with the user and then display the desired output Thus the output results were displayed in IEEE 754 floating point format 32 bits in hexadecimal 1 2 Motivations First of all Field Programmable Gate Array FPGA provides a convenient hardware environment in which the dedicated processor is reconfigurable and suitable for functionality testing 10 Thus FPGA provide a versatile and inexpensive way to implement a design Furthermore FPGA also can perform multiple operations concurrently which accelerate the performance of a system that cannot be realized by a simple microprocessor 10 Secondl
45. and Communication Engineering Thapar University 5 Aziz I 2012 Binary Floating Point Fused Multiply Add Unit Degree Thesis Egypt Falculty of Engineering Cairo University Giza 6 J E Volder 1959 The CORDIC trigonometric computing technique IRE Trans Electronic Computers vol EC 8 no 3 pp 330 334 Sept 1959 7 J S Walther 1971 A unified algorithm for elementary functions AFIPS Spring Joint Computer Conference vol 38 pp 379 85 1971 8 Yi Jun D and Zhuo B 2011 CORDIC algorithm based on FPGA Journal of Shanghai University vol 15 issues 4 pp 304 309 Aug 2011 9 Vikas S 2009 FPGA Implementation of EEAS CORDIC Based Sine and Cosine Generator Master Thesis India Department of Electronics and Communication Engineering Thapar University 70 10 Rohit K J 2011 Design and FPGA Implementation of CORDIC based 8 pint ID DCT Processor Degree Thesis India Department of Electronics and Communication Engineering National Institure of Technology Rourkela 11 Boudabous A Ghozzi F Kharrat M W Masmoudi N 2004 Implementation of Hyperbolic Functions Using CORDIC Algorithm The 16 International Conference on pp 738 741 6 8 Dec 2004 12 Shrugal V Nisha S and Richa U 2013 Hardware Implementation of Hyperbolic Tan Using Cordic On FPGA International Journal of Engineering Research and Applications IJERA Vol 3 Issue 2 pp696 699 March April 2013 13 Erick L 2007
46. and writing the project proposal Input on introduction section Proposal submission Input on literature review section Input on research methodology section Discussing with supervisor about the problem faced Implementing the project Obtaining the preliminary results Demonstrating of the preminary result for supervisor Writing conclusion and recommendations Seminar presentation draft slide submission Feedback from supervisor about the draft slide Seminar presentation Final Report Submission Wk 9 Wk 10 Wk 11 Wk 12 Wk 13 Wk 14 Wk 15 Wk 16 Figure 5 1 Gantt Chart of FYP1 57 Gantt Chart FYP2 Activities k 3 Wk 4 Wk 7 Wk 8 Wk 9 Wk 10 Wk 11 Wk 12 Wk 13 Wk 14 Wk 15 Wk 16 Wk 17 Wk 18 Wk 19 Weekly Report Discussing with supervisor about the problem faced Implementing the project Testing and verifying the results Analyze and discuss the final results Writing conclusion and future recommendations Seminar presentation draft slide submission Feedback from supervisor about the draft slide Seminar presentation Writing Thesis Thesis Draft Submission Submiting and Publishing Journal Stype Paper Hard Bound Thesis Submission Figure 5 2 Gantt Chart of FYP2 5 1 2 Project Cost Basi
47. ay to mount the keypad in a variety of applications Thus it uses a combination of four rows and four columns as shown in Figure 2 8 to provide button states to the host device Underneath each key is a push button with one end connected to one row and the other end connected to one column However there is no connection between rows and also column but the button make it connect if pressed Column 1 Column 2 Column 3 Column 4 Figure 2 8 4x4 Matrix Keypad columns and rows Thus to interface the keypad with DEI board the rows and columns pins are connected to the GPIO pins of the DEI board and make the proper pin assignment Thus to scan which button is pressed the users need to scan it column by column and row by row every certain short period The row pins should be connected to input port and then the column pins are connected to the output port At the same time the row pins need to pull up or pull down with resister to avoid floating case happen 17 Thus the basic connection diagram for 4x4 matrix keypad is shown in Figure 2 9 27 3 A B eC D KeyPad 4X4 Figure 2 9 4x4 Matrix Keypad Basic Connection Diagram 19 2 8 16x2 Character LCD Module Recently a lot of the projects using character LCD as the output interface due to the ability of displaying numbers letters symbols and even user defined or custom symbols 20 Basically this LCD module uses the Hitachi
48. b wire lead0_et_26 lead0 5 d26 wire in norm out denorm expb gt 0 amp exp 0 always posedge clk begin if rst begin exp_opl lt 0 exp_op2 lt lt 0 frac_opl lt 0 frac op2 lt 0 exps lt 0 expb lt 0 fracs lt 0 fracb lt 0 exp_diff lt 0 minuend lt 0 subtrahend lt 0 allign_subtra lt 0 final_subtra lt 0 diff lt 0 temp_diff lt 0 exp lt 0 end else if en begin exp_opl lt op1 30 23 exp_op2 lt 0p2 30 23 frac_opl lt op1 22 0 frac_op2 lt 0p2 22 0 if opl_ltet_op2 begin exps lt exp_op2 expb lt exp_opl fracs lt frac_op2 fracb lt frac_opl end else if opl_ltet_op2 begin exps lt exp_opl expb lt exp_op2 fracs lt frac_opl fracb lt frac_op2 end 75 end always end exp_diff lt expb exps b_norm_s_denorm minuend lt b_denorm fracb 2 b00 subtrahend lt s_denorm fracs 2 b00 allign subtra lt subtrahend gt gt exp_diff final_subtra lt subtra_frac_en special_subtra diff lt minuend final_subtra if lead0 It exp begin temp diff lt diff lt lt expb exp lt 0 end else if lead0_lt_exp begin temp_diff lt diff lt lt lead0 exp lt expb lead0 end diff begin if diff 25 leadO 5 d0 else if diff 24 leadO 5 d1 else if diff 23 else if diff 22 else if diff 21 else if diff 20 else if diff 19 el
49. by an amount of bits which same as the number of leading zero 4 13 Floating Point Multiplier A simple floating point multiplier is designed by using Verilog HDL Thus this module is mainly used to compute the multiplication operation in IEEE 754 single precision floating point The name for this module is fpu_mul and its block diagram is as shown in the Figure 4 3 opl op2 en Ist clk Figure 4 3 32 32 a sign final_exp A final_prod Block diagram of floating point multiplier fpu_mul Thus Table 4 3 describes all the inputs and outputs for this block and brief description of their functions 40 Signal Name Width Type Description clk 1 Input System Clock rst 1 Input Reset values for initializing en 1 Input Enable signal opl 32 Input Operand 1 in IEEE 754 format op2 32 Input Operand 2 in IEEE 754 format sign 1 Output Sign bit for output in IEEE 754 format final_exp 9 Output Exponent for output in IEEE 754 format with an extra bit for specific purpose final_prod 21 Output Mantissa for output in IEEE 754 format with extra 4 bits for specific purposes Table 4 3 VO interface description for fpu_mul Basically the algorithm of this design is similar to the design done by Mahendra Kumar Soni 4 but the algorithm is modified with some additional steps Thus the algorithm for my design is as shown in the following 1 Determine the va
50. cally projects have a budget and limited resources Thus the budget for this project is RM200 and resources are limited for the hardware logic elements on Altera DEI board Therefore this project was developed in two parts The first part is about the programming and the second part is the implementation Thus for the first part the required hardware resources need to be considered for the design Meanwhile for the second part the required cost to implement the design with I O interface circuit on the Altera DE1 board need to be calculated Thus an Altera DE1 board was borrowed from Dreamcatcher by participating in the Innovate Competition 2013 Then all the electronic components and material needed to construct an I O interface circuit with the prices are listed down as shown in Table 5 1 All of these components are available in Cytron Technologies Sdn Bhd No Component Names Quantity Unit Price Amount Set RM RM l Female to Female Jumper Wires 3 4 50 13 50 2 Resistor 0 25W 5 1K 4 0 05 0 20 3 Resistor 0 25W 5 10K 2 0 05 0 10 4 Preset 5K 1 0 50 0 50 5 Transistor 2N2222 1 0 40 0 40 6 Straight Pin Header Male 1x40 1 0 60 0 60 Ways 58 7 LCD 16x2 18 00 18 00 8 Keypad 4x4 25 00 25 00 9 Donut Board Fiber 1 mm 8 00 8 00 10x22cm 10 Rainbow Cable 20 Ways meter 8 00 8 00 10 Atten 830L Digital Multimeter 28 00 28 00 1
51. cycles to complete the addition operation as shown in Figure 6 1 The output will be zero before the computation is done 60 Figure 6 1 Simulation result of fpu add Thus the detailed description of the given inputs and output generated is shown in Table 6 1 Input operands op in IEEE 754 binary 0 10000011 010100011010100001 11000 e Decimal value 1 x 20317127 x 1 3189764 21 1036224 op2 in IEEE 754 binary 0 10000010 11011111111011100111001 e Decimal value 1 x 201307127 x 1 8747321 14 9978568 Output operands add_out in IEEE 754 binary 0 10000100 00100000110011111101010 e Decimal value 1 x 201327177 x 1 1281712 36 1014784 e Actual value by scientific calculator 36 1014792 Table 6 1 The detailed description of input and output operands from the output waveform of fpu_add Based on the result in Table 6 1 the output result from the fpu_add is closely the same as the result calculated by scientific calculator The precision of up to 5 decimal places was achieved if compare these two results Thus this module is working as desired and the result is verified 6 1 2 Floating Point Subtractor The output waveform generated by fpu_sub is shown in Figure 6 2 It performs the floating point subtraction between opl and op2 and gives the result in sub_out All the data are represented in IEEE 754 single precision floating point 61 format Similar
52. d Mnitial clear screen 2 begin LCD_DATA lt CLR LCD RS lt 1 b0 if count lt setup_delay LCD EN lt 1 bl enable LCD else LCD EN lt 1 b0 if count long_ delay begin state lt 3 count lt 0 end else count lt count 1 end Mnitialize the entry mode 3 begin LCD_DATA lt SEM LCD RS lt 1 b0 if count lt setup_delay LCD EN lt 1 bl enable LCD Else LCD_EN lt 1 b0 if count small delay begin state lt 4 count lt 0 end else count lt count 1 end 101 display the startup message on LCD for 2s 4 begin LCD_DATA lt startup Isend msg to LCD LCD RS lt 1 bl set to data mode if count lt setup_delay LCD EN lt 1 bl enable LCD else LCD EN lt 1 b0 if count big_delay begin count lt 0 addr0 lt addr0 1 state lt 4 if addr0 7 h38 state lt 5 end else count lt count 1 end clear the screen after 2s delay 5 begin if count2 delay2s begin LCD_DATA lt CLR LCD RS lt 1 b0 if count lt setup_delay LCD EN lt 1 b1 else LCD EN lt 1 b0 if count long_delay begin state lt 6 count lt 0 end else count lt count 1 end else count2 lt count2 1 end display the instruction to ask user to select mode for 2s 6 begin LCD DATA lt mode msg send msg to LCD LCD RS lt 1 bl set to data mode count lt 0 if count setup delay L
53. d to compute it Basically the algorithm of this design is similar to the design done by Mahendra Kumar Soni 12 but the algorithm is modified with some additional steps Thus the algorithm for my design is as shown in the following 1 Sort the input operands by comparing the values in opl with op2 e Store the exponent and mantissa of bigger number and smaller number into two different registers 2 Determine the exponent different for opl and op2 e Subtract the exponent of bigger number with smaller number 3 Expand the mantissa bits for opl and op2 into 26 bits e Concatenate an extra bit for leading one for normalized or leading zero for denormalized in left of the MSB of the mantissa e Append two bits of zero in right of the LSB of the mantissa e Resulted mantissa leading 0 1 mantissa 2 b00 39 4 Pre align the mantissa of smaller number e Shift to the right by an amount of bits that same as the exponent different 5 Subtract the mantissa of bigger number with the pre aligned mantissa of smaller number to get the tentative result 6 Count the number of leading zero in the mantissa of tentative result e If the number of leading zero gt the exponent of larger number shift left the mantissa of tentative result by 1 bit and set the exponent for the result to O e If the number of leading zero lt the exponent of larger number shift left the mantissa and decrement the exponent
54. d_data reg 7 0 data_in wire key amp col reg 16 0 count reg 26 0 count2 reg 11 0 initcount wire 7 0 startup mode_msg fpu_op trigo_msg hyper_msg ans_cos ans_sin ans_cosh ans_sinh ans_exp 99 reg 6 0 addrO addr1 addr2 addr3 addr4 addr5 wire 40 0 intpart1 intpart2 fracpartl fracpart2 assign LCD RW 1 b0 assign LCD BLON 1 bl always posedge clk negedge rst_n begin if rst_n begin db en lt 0 state lt 0 count lt 0 count lt 0 initcount lt 0 addr0 lt 0 addrl lt 0 addr2 lt 0 addr3 lt 0 addr4 lt 0 addr5 lt 0 cordic_mode lt 0 displayNo lt 0 end else begin case state Mnitialize for function set 0 begin if initcount lt big_delay create delay at beginning initcount lt initcount 1 else begin send SET instruction LCD DATA lt SET LCD_RS lt 1 b0 if count lt setup_delay LCD EN lt 1 bl enable LCD else LCD_EN lt 1 b0 when count small delay go onto the next state if count small delay begin state lt 1 count lt 0 end else else increment the count count lt count 1 end end 100 Mnitialize for display on 1 begin LCD_DATA lt DON LCD RS lt 1 b0 if count lt setup_delay LCD EN lt 1 bl enable LCD else LCD_EN lt 1 b0 if count small_delay begin state lt 2 count lt 0 end else count lt count 1 en
55. ds directly to zero d Round toward positive infinity which rounds directly towards positive infinity e Round toward negative infinity which rounds directly towards negative infinity 2 3 3 IEEE 754 Exception Handling Exception handling is important for the system to determine how to react when certain exception is occurred to prevent system error or crash Therefore a corresponding status flag is used to indicate that the exception is occurred or not and then handle it to return a valid output Thus there are also five possible exceptions 9 10 13 defined by IEEE 754 standards as shown in the followings a Invalid operation which is the non solution operation For example square root of a negative number which returns NaN by default b Division by zero which is an operation on finite operands gives an exact infinity result which returns positive infinity by default c Overflow which is an operation that caused by large number that cannot be represented correctly It returns positive or negative infinity by default d Underflow which is an operation that caused by very small number that cannot be represented correctly It returns a denormalized value by default 14 e Inexact which occurs when the result of an arithmetic operation is not exact that result from the restricted precision range Normally it return correctly rounded value by default 2 4 Fixed point Format Basically the fixed point format is a real
56. e 1 Cursor underline on 0 Off 10 7 1 5x10 dot format 0 5x7 dot format 1 Cursor blink on 0O Off 1 Display shift O Cursor move x Don t care Initialization settings Table 2 4 The command control codes 20 29 Meanwhile to write specific characters or symbols on the LCD the operation is made in write mode RS 1 Then the ASCII code in 1 byte for several characters and symbols were sent to the LCD one by one at each address of LCD Table 2 5 shows the standard character LCD ASCII table eln YT IJ R ICH mi br Ee n DM Fe la o EN Tr EM E ee FF r Pe pee Fd LLAFN er S I SSI TIE Pe C H fre BEI WE EN E II ie sg Pal EA 124 CE ae al AJ Heel Heel ea HAA ER U ER TTT el tT tT Jn m fae a 1 yx al Pee Pd fed td eo She iln cla aif tn Ll bei Fl Le KALET E em i LO i y i DOD el BH KaR SS CL ny CG LU ul EIA ey ls S E G gt SACS ra EA Il yr MON IDI ICO EDN ns n Jd LAC vs IE BE lled ln bell l 1 aT ral I Nini ER WIW WI jez ote eas eas feas nae Jane ete ode eas uae Jane Jane eac ete one Standard LCD ASCII Character Table 20 Table 2 5 CHAPTER 3 DESIGN METHODOLOGY This chapter describes the design methodology of this project Therefore the project works are divided into three stages which are design specification implementation and design testing and verification All of the design stages are briefly discussed in the fol
57. e algorithm of this design is similar to the design done by Mahendra Kumar Soni 12 but the algorithm is modified with some additional steps Thus the algorithm for my design is as shown in the following 1 Sort the input operands by comparing the values in opl with op2 e Store the exponent and mantissa of bigger number and smaller number into two different registers 2 Determine the exponent different for opl and op2 e Subtract the exponent of bigger number with smaller number 3 Expand the mantissa bits for opl and op2 into 27 bits e Concatenate an extra bit for leading one for normalized or leading zero for denormalized in front of the MSB of mantissa Then append one more zero on the left of it e Append two bits of zero after the LSB of the mantissa 37 e Resulted mantissa 1 b0 leading 0 1 mantissa 2 b00 4 Pre align the mantissa of smaller number e Shift to the right by an amount of bits that same as the exponent different 5 Add the mantissa of bigger number with the pre aligned mantissa of smaller number to get the tentative result 6 Check whether the mantissa of tentative result is overflow or not e If overflow occurs shift right the mantissa and increment the exponent by 1 bit 4 1 2 Floating Point Subtractor A simple floating point subtractor was designed by using Verilog HDL Thus this module is mainly used to compute the subtraction operation in IEEE 754 single precision floati
58. else if m_tick state_next zero end default state_next zero endcase B 2 Keypad Scanner keypad encoder module keypad_encoder input clk rst_n input 3 0 col output 3 0 row output reg 7 0 data_out output db_level debouncer ul clk rst_n key db level 95 wire key amp col reg state reg 3 0 data reg 7 0 key_data reg 13 0 msCnt wire clk1ms always posedge clk negedge rst_n if rst_n msCnt 14 h0 else if clk1ms msCnt 14 h0 else msCnt msCnt 1 bl assign clk1ms msCnt 14 d10000 reg 3 0 rowt always posedge clk negedge rst_n if rst_n rowt 4 h8 else if clk1ms rowt rowt 0 rowt 3 1 assign row rowt wire 3 0 column col always posedge clk or negedge rst_n begin if rst_n data lt 4 h0 else begin case row 4 h8 case column 4 h1 data 4 hE 4 h2 data lt 4 h0 4 h4 data lt 4 hF 4 h8 data lt 4 hD endcase 4 h4 case column 4 h1 data lt 4 h7 4 h2 data lt 4 h8 4 h4 data lt 4 h9 4 h8 data lt 4 hC endcase 96 4 h2 case column 4 h1 data lt 4 h4 4 h2 data lt 4 h5 4 h4 data lt 4 h6 4 h8 data lt 4 hB endcase 4 hl case column 4 hl data lt 4 h1 4 h2 data lt 4 h2 4 h4 data lt 4 h3 4 h8 data lt 4 h A endcase endcase end end always posedge clk negedge rst_n begin if rst_n data out lt 0
59. fic character is sent After that a pulse needs to be sent to enable pin and then delay for an amount of time 4 4 Overall Design Anyway although several kinds of floating point math hardware modules were successfully designed but only the CORDIC module was chosen to be implemented on the interface circuit This is because the limited hardware resources in Altera DEI board is limited and might not able to cover all in one design Thus some useful math modules have to be selected for interfacing Therefore CORDIC module can be considered a very useful module since it can solve for elementary functions such as trigonometry and hyperbolic which is applicable in the field of digital signal processing 12 Therefore all the related I O interface module are integrated with the CORDIC trigonometry and hyperbolic modules to build a simple generator that can generate an answer for cos sin cosh sinh and also exponent and then display on the LCD Thus the design architecture for overall design is shown in Figure 4 11 gt 55 Input from user CORDIC Circular Binary_to_IEEE Binary_to_IEEE converter 1 converter 2 Multiple RAMs or ROMs 7 8 Controller FSM LCD RW LCD BLON LCD DATA LCD EN LCD RS Figure 4 11 Design Architecture of the overall design From Figure 4 11 the design is basically controlled by a controller based on FSM The outputs of the CORDIC module keypad scanner de bouncer and multiple ROM
60. g divisor reg lt lt 1 end always posedge clk begin if rst begin exp term lt U expsh_opl lt 0 expsh_op2 lt 0 exp uf terml lt 0 exp uf term2 lt 0 exp uf term3 lt 0 exp uf term4 lt 0 expf_templ lt 0 expf temp2 lt 0 expf_temp3 lt 0 expf_temp3_term lt 0 expf_temp4 lt 0 divided_op1 lt 0 divisor_op2 lt 0 dividend_sh2 lt 0 82 end remainder_sh_term lt 0 remainder_op2 lt 0 divided_op1_sh lt 0 divisor_op2_sh lt 0 fracl lt 0 else if en_reg2 begin expf_temp3_term end end exp term lt exp_opl 8 d127 expsh_opl lt opl_norm 0 dividend_sh2 expsh_op2 lt op2_norm 0 divisor_sh2 exp uf terml lt exp_ufl exp_op2 exp_term 0 exp uf term2 lt exp_uf2 expsh_opl expf_templ 0 exp_uf_term3 lt exp_uf_term2 exp_uf_terml exp_uf_term4 lt exp_uf_gt_maxshift 23 exp_uf_term3 expf_templ lt exp_ufl 0 exp term exp_op2 expf_temp2 lt exp_uf2 0 expf_templ expsh_opl expf_temp3 lt expf_temp2 expsh_op2 expf temp3 term lt expf_temp3_et0 0 1 expf_temp4 lt quotient_msb expf_temp3 expf_temp3 divided_opl lt frac_opl divisor_op2 lt frac_op2 dividend_sh2 lt dividend_sh divisor_sh2 lt divisor_sh remainder_sh_term lt 5 d23 exp_uf_term4 remainder_op2 lt remainder_op1 lt lt remainder_sh_term divided_opl_sh lt di
61. g point to fixed point number we need to scale up the floating point number with a factor of 2 Thus the operation is based on the following equation 13 Fixed point number floating point number x 2 pounded towards 0 where n number of fractional bits 2 5 Algorithms of Floating Point Arithmetic in FPU Since the data in FPU is based on IEEE 754 standard the algorithms to perform floating point computation are totally different from the basic fixed point arithmetic operation because it needs to manipulate the data of sign exponent and mantissa from time to time Thus the algorithms are developed in various form based on the desired operations Typical operations for FPU are addition subtraction multiplication and division In addition some transcendental functions can also be implemented inside the FPU by using efficient algorithm to reduce the cost 2 5 1 Addition and Subtraction Based on the design done by Mahendra Kumar Soni 4 the conventional floating point addition and subtraction algorithms are based on five basic stages which are exponent difference pre alignment addition or subtraction rounding and normalization Therefore given two operands in which Op1 S1 El M1 and Op2 S2 E2 M2 then the steps to perform addition or subtraction of these two operands are described as the following 1 Stage 1 Exponent difference e Determine the difference between these two operands d E1 E2 if El gt E2
62. gements throughout the whole journey of supervision during my final year project Thus the supervision and support that he gave slightly help the progression and smoothness of my final year project Apart from that an honorable mention goes to my friends that always support me for their willing to share their knowledge and assist me when I faced the problem Without helps of the particular that mentioned above I would face many difficulties while doing the project or task Special thanks to En Muhammad Arif bin Abdul Rahim and Dr Usman Ullah Sheikh who give me briefing of the final year project and research methodology Finally I would like to thanks all the seminar panels for their valuable comments ABSTRACT This project is aimed to design and implement a FPGA based floating point math hardware module based on the conventional architecture of FPU and CORDIC algorithm Thus the design can be used to solve various mathematical operations such as addition subtraction multiplication division exponent trigonometry and hyperbolic Then the 32 bits single precision IEEE 754 format and fixed point format are used to represent floating point numbers in the design and trade off between these two formats are discussed based the result precision and design performance An efficient algorithm namely Coordinate Rotational Digital Computer CORDIC algorithm is developed in the design to realize the solutions for elementary functions such as
63. ges the value of the debounced output only Therefore the state diagram to construct the FSM is shown in Figure 4 10 53 sw m_tick sw sw m_tick sw m_tick sw m_tickK sw m_tick Figure 4 10 State diagram of De bouncer FSM 14 4 3 3 LCD Controller In order to display certain string or character on the LCD a controller is needed to control the LCD operations The design is also built using FSM approach Thus the RS is set to O and the sending of some initialization command is started Thus in my design some initialization command was sent and its descriptions are shown in Table 4 10 Command data in Descriptions binary 00111000 Function Set for 8 bits data transfer and 2 line display 00001111 Display On without cursor 00000001 Clear Screen 54 00000110 Entry mode set increment cursor automatically after each character was displayed 00000010 Return the cursor to home address Table 4 10 Initialization Command data and description According to the LCD operating theory to successful sent a data to LCD a pulse needs to be sent to the enable pin after the data is sent and then delay for a certain amount of time for the LCD to receive and process the data Meanwhile different type of command data might need different interval of the delay Similarly to display a character on the LCD the RS is set to 1 and the ASCII code for speci
64. hm Finally they succeeded to small amount of success in improving the FPU from the previous works due to the features of less memory requirement less delay comparable clock cycle and low code complexity 1 However the solving capability for transcendental function is not much developed in this project Therefore some of the more advanced operations such as exponential and hyperbolic functions can be added into the FPUs by using the unified CORDIC algorithm proposed by Walter 7 In addition the further works to implement the FPUs onto a real time application can be done to test the functionality in real time In a journal entitled Implementation of Hyperbolic Functions Using CORDIC Algorithm done by Anis Fahmi M Wajdi and Nouri 2004 11 a research on the precision of computing hyperbolic function using CORDIC algorithm was done In addition they also implement the exponent and logarithm function using CORDIC algorithm Finally they verified that the relative error to compute exponential and logarithms function by using CORDIC algorithm is small and acceptable Therefore for further works this approach to solve the hyperbolic shall be integrated into the FPU design to realize the high precision floating point computation using IEEE 754 standard 26 2 7 4x4 Matrix Keypad Module A 4x4 matrix keypad provides a useful human interface component for several electronic projects Convenient adhesive backing provides a simple w
65. hmetic formats which consist of a set of binary and decimal floating point numbers with finite numbers including subnormal number and signed zero infinity and also a special value namely not a number NaN b Jnterchange formats which are the bit string for exchange a floating point data on a compact and efficient form c Rounding rules which are the properties that should be satisfied while doing arithmetic operations and conversions of any numbers on arithmetic formats d Exception handling which indicates any exceptional conditions from the operations For example division by zero overflow underflow and so on 2 3 1 Single Precision Floating Point Formats Basically the IEEE 754 standard defines several basic formats which differ in its precision and number of bits used One of the commonly used formats is single precision floating point format with 32 bits in a computer memory According to IEEE 754 standard the data for this format has 1 bit of sign bit S 8 bits of biased exponent E and 23 bits of mantissa M 1 2 4 as shown in Figure 2 4 12 31 30 2322 0 II 1111 PDT Exponent Sign bit Figure 2 4 Mantissa IEEE 754 Single Precision Formats Thus this format represented a floating point number based on following equations 1 x 2 Bias x 1 M normalized 1 x 2 F Bias x 0 M denormalized Value S o where M mas 271 mau 277 Mao 2 m 27 mo 27 S Sign bit
66. ient in terms of scope and guality for the award of the degree of Bachelor of Engineering Electrical Microelectronics Signature EC HR o ER RR NN Name of Supervisor ASSOC PROF DR MUHAMMAD NASIR BIN IBRAHIM Date 24 JUNE 2013 DESIGN AND IMPLEMENTATION OF FPGA BASED FLOATING POINT MATH HARDWARE MODULE CHEN KEAN TACK A thesis submitted in fulfillment of the requirements for the award of the degree of Bachelor of Engineering Electrical Microelectronics Faculty of Electrical Engineering Universiti Teknologi Malaysia JUNE 2013 I declare that this thesis entitled Design and Implementation of FPGA based Floating Point Math Hardware Module is the result of my own research except as cited in the references The thesis has not been accepted for any degree and is not concurrently submitted in candidature of any other degree Signature ERA nh Name CHEN KEAN TACK Date 24 JUNE 2013 ii iii All glory be to the God above Special thanks to My beloved family members who are always there for me Father mother and my brothers My friends who never complain much accompanying me until the end of research And also to My supervisor who guide me through the research s hardships iv ACKNOWLEDGEMENT First and foremost I would like to express my sincere gratitude towards my supervisor Associate Professor Dr Muhammad Nasir bin Ibrahim for his invaluable guidance advice comments and encoura
67. ing point format Meanwhile this design requires 14 clock cycles to complete the multiplication operation as shown in Figure 6 3 The output will be zero before the computation is done Ja mul damu out 20000000000000000000000000000000 01000101010101100110010111100101 wit TRW as ee HEHE HER barhau mwn i 200 300 ns 400 Figure 6 3 Simulation Result of Floating Point Multiplier Thus the detailed description of the given inputs and output generated is shown in Table 6 3 Input operands opl in IEEE 754 binary 0 10000110 00100011001000101011101 e Decimal value 1 x 201347127 x 1 1372486 145 5678208 op2 in IEEE 754 binary 0 10000011 01111001000011000000000 e Decimal value 1 x 201317127 x 1 4728394 23 5654304 Output operands mul_out in IEEE 754 binary 0 100010101 0101100110010111100101 e Decimal value 1 x 201387127 x 1 6749846 3430 368461 e Actual value by scientific calculator 3430 36835 Table 6 3 The detailed description of input and output operands from the output waveform of fpu_mul Based on the result in Table 6 3 the output result from the fpu_mul is closely the same as the result calculated by scientific calculator The precision of up to 3 63 decimal places was achieved if compare these two results Thus this module is working as desired and the result is verified 6 1 4 Floating Point Divider The out
68. ions e Convert all the values in Table 4 5 to Q0 32 format and store into the atan_table RAM e Conversion eguation rotational angles 360 2 3 Set the value of shifted X X_shr and shifted Y Y_shr e Set X_shr and Y_shr by right shifting by i iteration number places 4 Determine the rotation direction and the values of X Y and Angle for the next iteration e If Angle gt 0 rotate the angle in anti clockwise direction for the next iteration Thus set X to value of X Y_shr set Y to value of Y X_shr and set Angle to value of Angle atan table i in order to update the values for X Y and Angle e If Angle lt 0 rotate the angle in clockwise direction for the next iteration Thus set X to value of X Y_sh set Y to value of Y X_sh set Angle to value of Angle atan _table i in order to update the values for X Y and Angle 4 2 3 Hyperbolic CORDIC Module From Table 2 2 to find the values of sinh cosh and exp the CORDIC algorithm need to be implemented in hyperbolic rotational mode Similar to the trigonometry CORDIC module a look up table needs to be constructed Thus Table 4 7 shows the look up table for rotational angles from 1 to 15 iterations which used to evaluate the hyperbolic functions Iteration Rotation angle amp tanh 2 tanh 27 number i 1 0 5493061443 1 2 2 0 2554128119 1 4 3 0 1256572141 1 8 4 0 0625815715 1 16 47
69. is computation for the design is made in fixed point format but the result is converted back to IEEE 754 single precision format to be the output of the floating point math module 4 2 1 The Architecture of CORDIC Algorithm Generally the architecture of CORDIC algorithm is illustrated in Figure 4 5 Figure 4 5 Basic Architecture of CORDIC Algorithm 4 2 2 Trigonometric CORDIC Module From Table 2 2 to find the values of sine and cosine the CORDIC algorithm need to be implemented in circular rotational mode Thus it performs a rotation with the help of a series of incremental rotation angles and then perform shift and add or subtract operations with a limit number of iterations In my design the angle is rotated by 15 times iteration number i 15 Table 4 5 shows the look up table for rotational angles from O to 15 iterations which used to evaluate the trigonometry functions Iteration Rotation angle amp tan 2 tan 37 number i 0 45 00000000 1 1 26 56505118 1 2 2 14 03624347 1 4 3 7 12501635 1 8 4 3 57633437 1 16 5 1 78991061 1 32 6 0 89517371 1 64 7 0 44761417 1 128 8 0 22381050 1 256 9 0 11190568 1 512 10 0 02797645 1 1024 11 0 01398823 1 2048 12 0 00699411 1 4096 13 0 00349706 1 8192 14 0 00174853 1 16384 15 0 00087426 1 32768 Table 4 5 Look up Table for Rotational Angles from 0 to 15 iterations Thus the name of this design is CORD
70. ister Select R W PIN B14 GPIO_0 pin 3 LCD Read Write E PIN_B15 GPIO Opin5 LCD Enable DBO PIN B16 GPIO_0 pin 7 LCD Data bit 0 51 DB PIN B17 GPIO_O pin 9 LCD Data bit 1 DB2 PIN_B18 GPIO Opin 11 LCD Data bit 2 DB3 PIN_B19 GPIO_0 pin 13 LCD Data bit 3 DB4 PIN B20 GPIO_0 pin 15 LCD Data bit 4 DB5 PIN C21 GPIO_0 pin 17 LCD Data bit 5 DB6 PIN_D21 GPIO_0 pin 18 LCD Data bit 6 DB7 PIN_B21 GPIO Opin 20 LCD Data bit 7 Port PIN_G21 GPIO 0 pin 24 LCD backlight control Input I C1 PIN_K20 GPIO_0 pin 33 Keypad Column 1 Input 2 C2 PIN_L19 GPIO_O pin 34 Keypad Column 2 Input 3 C3 PIN_J19 GPIO_O pin 30 Keypad Column 3 Input 4 C4 PIN_K21 GPIO Opin 28 Keypad Column 4 Output 1 R1 PIN A18 GPIO Opin 10 Keypad Row 1 Output 2 R2 PIN A16 GPIO Opin 6 Keypad Row 2 Output 3 R3 PIN A14 GPIO Opin 2 Keypad Row 3 Output 4 R4 PIN A13 GPIO_0 pin 0 Keypad Row 4 a PIN LI Clock _50MHz Internal Clock Source SOMHz PIN_R22 Key_0 Reset Button Table 4 9 4 3 1 Matrix Keypad Scanner Pin assignments on Altera DE1 Board In order to determine which buttons on the matrix keypad is pressed a keypad scanner has to be designed to scan the state of all buttons row by row and column by column every small time interval In my design the keypad is scanned by switching the number of column at
71. itecture has been used to code the design without optimization Hence some advance techniques such as loop unrolling chaining and multicycling can be used to optimize the area and performance of the design Apart from that the precision of the floating point number can be enhanced by using double precision 64 bits or quad precision 128 bits of IEEE 754 format instead of using single precision IEEE 754 format Furthermore the design can be further implemented by using the NIOS II processor and integrate the hardware and software design to build up a marketable embedded system 69 REFERENCES 1 Lipsa S and Ruby D 2012 An Efficient IEEE 754 Compliant Floating Point Unit Using Verilog Degree Thesis India Department of Computer Science and Engineering National Institute of Technology Rourkela 2 Ridhi S 2010 Design and Implementation of Low power High Speed Floating Point Adder and Multiplier Master Thesis India Department of Electronics and Communication Engineering Thapar University 3 B Sreenivasa J E N Abhilash G Rajesh Kumar 2012 Design and Implementation of Floating Point Multiplier for Better Timing Performance International Journal of Advanceed Research in Computer Science amp Technology URCET Vol 1 Issue 7 September 2012 4 Mahendra K S 2009 FPGA Implementation of IEEE 754 Standard Based Arithmetic Unit for Floating Point Numbers Master Thesis India Department of Electronics
72. kbone Thus the ARM based HPS consists of processor peripherals and memory interfaces 17 In addition it make use of intellectual property IP blocks and the flexibility of programmable logic which can widen its application while reducing power cost and also board size 17 2 1 1 Altera Cyclone II FPGA In this project the design was implemented using Altera Cyclone II FPGA which is one of the Altera s most successfully low cost FPGA families 16 Thus it uses TSMC 90nm process technology It also deliver high performance and low power consumption with core voltage at 1 2V Furthermore it was designed with high density architecture with up to 68 416 logic elements Therefore it has smaller die size and high volume fabrication Apart from that it consists of a dedicated 18x18 or 9x9 embedded multipliers with operating frequency up to 250MHz fastest performance 16 Besides that it also has a dedicated external memory interface circuitry including DDR DDR2 SDR SDRAM and ODRII SRAM In addition it has also up to 4 enhanced phase locked loops PLLs that provide advanced clock management capabilities such as frequency synthesis programmable phase shift external clock output programmable duty cycle lock detection spread spectrum input clocking and high speed differential support on the input and output clocks 16 Thus the timing issues can be resolved by using PLLs Figure 2 2 shows the block diagram of the PLL for Cycl
73. lowing sections 31 Design Stages Generally this project is divided into three stages which are design specifications design implementation and design testing and verification Thus this project was started by determining the design specification followed by implementing the design on Altera DEI board with an external I O interface circuit Finally the functionality of the design is tested and verified using Altera ModelSim through the simulated waveform and through the output from the interface circuit 3 1 1 Design Specifications For this stage the review of the previous works is needed to determine the design specifications Thus the design specifications should able to solve the 31 problem stated in the problem statement and achieve the objective of this project Therefore the design specifications for this project are listed as shown below a Floating point math hardware module that able to realize the solution of addition subtraction multiplication division trigonometry hyperbolic and exponential b Conventional floating point unit algorithm is developed based on the single precision IEEE 754 standard c CORDIC algorithm is used to solve the transcendental functions efficiently using rotational mode d 16x2 character LCD as the output interface to display the command message and answer e 4x4 matrix keypad as the input interface for the user to give the input 3 1 2 Design Implementation Basical
74. lue of exponent e Simply add the exponents from two operands and then subtract by 127 to become biased exponent 2 Expand the mantissa for both operands to 24 bits e Append a leading zero or leading one bit on the left of mantissa 3 Multiplication e Perform the multiplication between the mantissa from two operands after the range is expanded to 24 bits It will results in 48 bits result after multiplication e Sign is determined by exclusive OR the sign of both operands 4 Normalization e Normalize the value by checking the number of leading zero of the tentative result and then shift the result to left and decrement exponent by the an amount same as the number of leading zeros However if the tentative result overflows shift right the mantissa and increment the exponent by 1 bit 41 4 1 4 Floating Point Divider A simple floating point divider is designed by using Verilog HDL Thus this module is mainly used to compute the division operation in IEEE 754 single precision floating point The name for this module is fpu_div and its block diagram is as shown in the Figure 4 4 32 opl 32 op2 sign en exp_out 27 rst frac out clk Figure 4 4 Block diagram of floating point divider fpu_div Thus Table 4 4 describes all the inputs and outputs for this block and brief description of their functions Signal Name Width Type Description clk 1 Input System
75. ly the project consists of two parts for implementation which are design for hardware architecture and design for I O interface circuit Thus this project was implemented by using the FPGA on the Altera DEI board To develop the hardware programming the design was written in Verilog HDL Verilog Hardware Description Language coding styles and compiled using Altera Ouartus II software Therefore the general implementation steps of the floating point math hardware module were summarized in Figure 3 1 32 Part 1 Design for hardware architercture All the design were written in Verilog HDL coding styles using Altera Quartus II software Develop the design for conventional floating algorithms based on single precision IEEE 754 standard Develop the design for CORDIC algorithm to increase the efficiency to solve the transcendental functions Part 2 Design for I O interface circuit Develop the controller design to interface 4x4 matrix keypad and LCD It is writen in Verilog HDL Construct the interface circuit on the donut board by soldering Connect the interface circuit with the Altera DE1 board using through GPIO ports 40 pins expansion header Figure 3 1 General Design Implementation Steps According to Figure 3 1 the project implementation was started by developing the conventional floating point algorithm in Verilog HDL to build simple floating point math module that able to solve for the typical operations
76. modules are complied with IEEE 754 single precision floating point format 4 1 1 Floating Point Adder A simple floating point adder module was designed using Verilog HDL Thus this module is mainly used to compute the addition operation in IEEE 754 single precision floating point format The name for this module is fpu_add and its block diagram is as shown in the Figure 4 1 32 opl op2 S sign en gt final_exp rst 27 final sum clk Figure 4 1 Block diagram of floating point adder fpu_add 36 Thus Table 4 1 describes all the inputs and outputs for this block and brief description of their functions Signal Name Width Type Description clk 1 Input System Clock rst 1 Input Reset values for initializing en 1 Input Enable signal opl 32 Input Operand 1 in IEEE 754 format op2 32 Input Operand 2 in IEEE 754 format sign 1 Output Sign bit for output in IEEE 754 format final_exp 8 Output Exponent for output in IEEE 754 format final_sum 27 Output Mantissa for output in IEEE 754 format with 4 extra bits for specific purposes Table 4 1 V O interface description for fpu_add This module is only used to solve the addition operation when either both the operands have positive or negative sign same sign Therefore if two input operands have different sign this module cannot be used but we need to use floating point subtractor instead for this case Basically th
77. mp ep amp frac J ELSE 616p ere se frac frac frac fracg fracs fracg frac frac Shift Right frac diff bits frace fracs frac e e Round fraco Exception occurred In a gt L exception C Output signo amp ec amp fra Figure 2 5 The flowchart for the conventional floating point addition or subtraction 4 2 5 2 Multiplication Based on the design done by Mahendra Kumar Soni 4 in order to comply with the IEEE 754 standard two mantissas are to be multiplied and two exponents are to be added Therefore a simple algorithm to perform floating point multiplication is based on four stages as described in the following 18 1 Stage 1 Determine the value of exponent e Simply add the exponents from two operands and then subtract by 127 to become biased exponent 2 Stage 2 Multiplication e Perform multiplication between the mantissas from two operands At the same time determine the sign of the result where 1 to represent negative and O to represent positive value 3 Stage 3 Rounding e Round the mantissa of the result by following the rounding mode If the result become overflows due to rounding shift right and increment exponent back by 1 bit 4 Stage 4 Normalization e Normalize the resulting value if necessary by checking the number of leading zeros in the tentative result and then shift the result to lef
78. ng point format The name for this module is fpu_sub and its block diagram is as shown in the Figure 4 2 opl op2 addsub en TSL clk Figure 4 2 32 32 sign final_exp 29 final diff Block diagram of floating point subtractor fpu sub Thus Table 4 2 describes all the inputs and outputs for this block and brief description of their functions Signal Name Width Type Description cik 1 Input System Clock rst 1 Input Reset values for initializing en 1 Input Enable signal 38 addsub 1 Input addsub signal if addsub O subtraction operation resulted from the addition of two different sign numbers if addsub 1 subtraction operation resulted from the subtraction of two same sign numbers opl 32 Input Operand 1 in IEEE 754 format op2 32 Input Operand 2 in IEEE 754 format sign 1 Output Sign bit for output in IEEE 754 format final_exp 8 Output Exponent for output in IEEE 754 format final_diff 26 Output Mantissa for output in IEEE 754 format with extra 3 bits for specific purposes Table 4 2 V O interface description for fpu_sub This module is similar to floating point adder where it only used to solve the subtraction operation when either both the operands have positive or negative sign same sign Therefore if two input operands have different sign this module cannot be used but we need to use floating point adder instea
79. nt Adder 6 1 2 Floating Point Subtractor 6 1 3 Floating Point Multiplier 6 1 4 Floating Point Divider 6 1 5 CORDIC Module 6 2 Interface Circuit Results from LCD Display CONCLUSION AND FUTURE WORKS 7 1 Conclusion 7 2 Future Works REFERENCES APPENDIX A APPENDIX B 44 46 49 49 51 52 53 54 56 56 57 59 59 59 61 62 63 64 66 67 67 68 69 71 92 ix LIST OF TABLES TABLE NO TITTLE PAGE 2 1 List of invalid range for IEEE 754 single precision 12 format 2 Unified CORDIC Rotational Mode 23 2 3 Pin Layout functions for all character LCD 27 2 4 The command control codes 28 23 Standard LCD ASCI Character Table 29 4 1 VO interface description for fpu_add 36 4 2 VO interface description for fpu_sub 37 4 3 VO interface description for fpu mul 40 4 4 VO interface description for fpu_div 41 4 5 Look up Table for Rotational Angles from 0 to 15 44 iterations CORDIC_Circular 4 6 VO Interface description for CORDIC_Circular 45 4 7 Look up Table for Rotational Angles from 1 to 15 46 iterations CORDIC_Hyperbolic 4 8 VO Interface description for CORDIC_Hyperbolic 47 4 9 Pin assignments on Altera DE1 Board 50 4 10 Initialization Command data and description 33 5 1 List of Components and Materials needed 57 6 1 The detailed description of input and output operand 60 from the output waveform of fpu_add 6 2 The detailed description of input and output operand 61 from the output waveform of fpu_sub 6 3 The detailed desc
80. oating Point Arithmetic IEEE 11 754 2 3 1 Single Precision Floating Point Formats 2 3 2 IEEE 754 Rounding Modes 2 3 3 IEEE 754 Exception Handling 2 4 Fixed point Format 2 4 1 Q format 2 5 Algorithms of Floating Point Arithmetic in FPU 2 5 1 Addition and Subtraction 2 5 2 Multiplication 2 5 3 Division 2 5 4 Transcendental Functions 2 5 4 1 Coordinate Rotational Digital Computer CORDIC Algorithm 2 6 Related Works 2 7 4x4 Matrix Keypad Module 2 8 16x2 Character LCD Module DESIGN METHODOLOGY 3 1 Design Stages 3 1 1 Design Specifications 3 1 2 Design Implementation 3 1 3 Design Testing and Verification 3 1 4 Flowchart of the Overall Project Workflow PROJECT DESIGN AND ARCHITECTURE 4 1 Basic Floating Point Math Module Design 4 1 1 Floating Point Adder 4 1 2 Floating Point Subtractor 4 1 3 Floating Point Multiplier 4 1 4 Floating Point Divider 4 1 5 Rounding Logic 4 2 Efficient Floating Point Math Module 4 2 1 The Architecture of CORDIC Algorithm 11 13 13 14 14 15 15 17 19 21 22 25 26 27 30 30 30 31 33 34 35 35 35 37 39 41 42 43 43 viii 4 2 2 Trigonometric CORDIC Module 4 2 3 Hyperbolic CORDIC Module 4 2 4 Q format to IEEE 754 format Converter 4 3 External Interface Circuit 4 3 1 Matrix Keypad Scanner 4 3 2 De bouncer 4 3 3 LCD Controller 4 4 Overall Design PROJECT MANAGEMENT 5 1 1 Project Schedule 5 1 2 Project Cost RESULT AND ANALYSIS 6 1 Simulation Results 6 1 1 Floating Poi
81. one II Global Clock clock0 clock1 clock2 Lera l clock3 Global Clock UO Buffer Figure 2 2 Cyclone II PLL block diagram 16 There are a few types of Altera Cyclone II FPGA development kit available in the market such as Altera DE1 DE2 and DE2 70 boards The purpose of these development boards is to provide the ideal vehicle for advanced design prototyping in the multimedia storage and networking Thus it uses the state of the art technology in both hardware and CAD tools to expose designers to a wide range of applications 2 1 2 Altera DE1 Development and Education Board Basically the DEI board has several features that allow the user to implement wide range of designed circuit either for simple circuit or for complex projects Thus the available hardware on DEI board is briefly shown in the following 18 e Altera Cyclone II 2C20 FPGA device e Altera Serial Configuration device EPCS4 e USB Blaster on board for programming and user API control e 512 KB SRAM 8 MB SDRAM 4 MB Flash Memory SD Card Socket e 4 Pushbutton switches 10 toggle switches e 10 Red LEDs 8 Green LEDs e Oscillators 50MHz 27MHz and 24MHz e 24 bits CD quality audio CODEC e VGA DAC 4 bits resistor network with VGA connector e RS 232 transceiver and 9 pin connector e PS 2 mouse keyboard controller e Two 40 pins Expansion Header with resistor protection e Powered by either 7 5V DC adapter or a USB cable Therefore
82. put waveform generated by fpu_div is shown in Figure 6 4 It performs the floating point division between opl and op2 and gives the result in div_out All the data are represented in IEEE 754 single precision floating point format However this design requires about 40 clock cycles to complete the division operation as shown in Figure 6 4 due to iteration calculations in the algorithm to compute the quotient The output will be zero before the computation is done Figure 6 4 Simulation Result of Floating Point Divider Thus the detailed description of the given inputs and output generated is shown in Table 6 4 Input operands opl in IEEE 754 binary 0 10000110 00100011001000101011101 e Decimal value 1 x 201347127 x 1 1372486 145 5678208 op2 in IEEE 754 binary 0 10000011 0111100100001 1000000000 e Decimal value 1 x 201317127 x 1 4728394 23 5654304 Output operands div_out in IEEE 754 binary 0 10000001 10001011010101101101111 e Decimal value 1 x 2129 127 x 15442942 6 1771768 e Actual value by scientific calculator 6 177176412 Table 6 4 The detailed description of input and output operands from the output waveform of fpu_div 64 Based on the result in Table 6 4 the output result from the fpu_div is closely the same as the result calculated by scientific calculator The precision of up to 6 decimal places was achieved if compare these two
83. results Thus this module is working as desired and the result is verified 6 1 5 CORDIC Module This module combines the trigonometric CORDIC hyperbolic CORDIC and O format to IEEE 754 converter Therefore it can compute the result for cos sin cosh sinh and exp The output waveform is shown in Figure 6 5 feordic float tb clk cordic_float_tb angle 12102 010201020101010201010101011 Icordic Ba tb hyper_ in Jcordic_Rost_tb cos 400111111010111011011011000000000 Jcordic Baat tb sin 105911 1000900000000001000000000 Jord Boat eps 00111111100100001001111100000000 cordic_float_tb sinh POUT OO OII cordic_foat_tblexp 00111111110100110111001100000000 ru 250 ns Figure 6 5 Simulation result of CORDIC module Figure 6 5 shows the output waveform generated by CORDIC module It performs the CORDIC iteration calculations and gives the results of cos sin cosh sinh and exp The input data are represented in O format and the output data are represented in IEEE 754 single precision floating point format Meanwhile this design reguires about 18 clock cycles for computation as shown in Figure 6 5 due to iteration calculations in the CORDIC algorithm Thus the detailed description of the given inputs and output generated is shown in Table 6 5 Input operands angle in 00 32 unsigned binary 111010101 01010101010101010101011 3937053355
84. ription of input and output operand 62 from the output waveform of fpu_mul 6 4 6 5 xi The detailed description of input and output operand 63 from the output waveform of fpu_div The detailed description of input and output operand 64 from the output waveform of CORDIC module FIGURE NO 21 2 2 2 3 2 4 2 3 2 6 2 7 2 8 2 9 3 1 3 2 4 1 4 2 4 3 4 4 4 5 4 6 4 7 4 8 4 9 4 10 4 11 5 1 LIST OF FIGURES TITTLE Generic structure of an FPGA fabric Cyclone II PLL block diagram The schematic diagram for expansion headers IEEE 754 Single Precision Formats The flowchart for the conventional floating point addition or subtraction The flowchart for the conventional floating point multiplication The flowchart for the conventional floating point division 4x4 Matric Keypad columns and rows 4x4 Matrix Keypad Basic Connection Diagram General Design Implementation Steps The flowchart of overall project workflow Block diagram of floating point adder fpu_add Block diagram of floating point subtractor fpu_sub Block diagram of floating point multiplier fpu_mul Block diagram of floating point divider fpu_div Basic Architecture of CORDIC Algorithm Block diagram of CORDIC Circular Block diagram of CORDIC_Hyperbolic The schematic diagram of external interface circuit Block diagram of Keypad Scanner State diagram of De bouncer FSM Design Architecture of the overall design Gantt Chart of FYP1 xii PAGE 10 12 17
85. s the conventional architecture of FPU and CORDIC algorithm can be used to achieve this goal Furthermore by integrating floating point algorithm with the interface circuit a complete floating point math hardware module can be constructed which can act like a simple calculator for real time application 14 Project Objectives This project aims to design a FPGA based efficient floating point math hardware module that can solve for some typical and transcendental functions In addition the project also targets to implement the design on Altera FPGA development board with an external I O interface circuits 1 5 Scope of Works The floating point math hardware module will be designed and then implemented on the Altera DEI board with an external I O interface circuit by using the Verilog HDL coding styles Thus the design is based on the floating point unit with single precision and follows the IEEE 754 standard In addition the fixed point format Q format is also used to compute the CORDIC arithmetic but then the output data is converted back to IEEE 754 format Therefore the solving capability that have been developed in the module in this project includes addition subtraction multiplication division exponential trigonometry and hyperbolic For I O interface this project uses 4x4 matrix keypad as the input interface and 16x2 character LCD as the output interface 1 6 Organization of the Project Generally this thesis is organi
86. s the decision factor for the coordinate system as shown in Table 2 2 m Coordinate Value of e i Rotational Mode system dj sign Z Z rotate towards 0 1 Circular tan 42 For cos and sin set Xp 1 K Yo 0 where K 1 646760258121 Xn cos Z Yn sin Z tan Z Yn Xn O Linear 2 For multiplication set Yo 0 Xn Xo Yn Yo XoZo XoZo 1 Hyperbolic tanh 2 For cosh and sinh set Xo 1 K Yo 0 where K 0 8281339907 Xn cosh Z Yn sinh Z tanh Z Y Xn e X Yn Table 2 2 Unified CORDIC Rotational Mode 7 24 Therefore to implement the CORDIC algorithm to solve trigonometry function there are four stages for each iteration which are set the value of shifted X set the value of shifted Y set the value of delta Z and determine the rotation direction and the values of X Y and Z for next iteration as described in the following 1 1 Step 1 Set the value of dX e Set dX to a value after shifting X right by i places It is actually store the value for X tan Step 2 Set the value of dY e Set dY to a value after shifting Y right by i places It actually store the value for Y tan 0 Step 3 Set the value of dZ e Set dZ to value of Z tan 1 2 from LUT Step 4 Determine the rotation direction and the values of X Y and Z for next iteration e IfZ gt 0 rotate the angle in anti clockwise direction for the next iteration
87. se if diff 18 else if diff 17 else if diff 16 else if diff 15 else if diff 14 else if diff 13 else if diff 12 else if diff 11 else if diff 10 leadO 5 d2 leadO 5 d3 leadO 5 d4 leadO 5 d5 leadO 5 d6 leadO 5 d7 leadO 5 d8 leadO 5 d9 leadO 5 d10 leadO 5 d11 leadO 5 d12 leadO 5 d13 leadO 5 d14 leadO 5 d15 ub bo ee m 0 laen LL else if diff 9 leadO 5 d16 else if diff 8 leadO 5 d17 else if diff 7 leadO 5 d18 else if diff 5 leadO 5 d20 else if diff 4 leadO 5 d21 else if diff 6 leadO 5 d19 else if diff 3 leadO 5 d22 allign_subtra else if diff 2 leadO 5 d23 else if diff 1 leadO 5 d24 else if diff 0 leadO 5 d25 else lead0 5 d26 end assign sign opl_ltet_op2 op1 31 op2 31 4 fpu_mode 3 b000 assign final exp leadO_et_26 0 exp assign final diff in_norm_out_denorm 1 b0 temp diff gt gt 1 1 b0 temp diff endmodule A3 Floating Point Multiplier fpu_mul module fpu_mul input clk rst en input 31 0 opl op2 output sign output 8 0 final_exp output 26 0 final_prod reg 22 0 frac_opl frac_op2 reg 7 0 exp_opl exp_op2 reg 8 0 exp_terms exp_under exp_templ exp_temp2 reg 23 0 mul opl mul_op2 reg 47 0 product prod_temp1 prod_temp2 prod temp3 reg 4 0 prodshift wire opl_norm lexp_opl
88. sed In Chapter 6 the results that have been done in this project are verified and analyzed Thus the results from the LCD are verified by comparing to the simulation results In addition the performance of the design is also investigated based on the clock cycle needed or latency for certain computation done Lastly in Chapter 7 which is the final chapter concludes all the findings that have been discovered for this project Furthermore the future work of this project also been stated for the further improvement of this project CHAPTER 2 LITERATURE REVIEW In this chapter a brief explanation about all the relevant theories and concepts are discussed such as FPGA FPU CORDIC matrix keypad interface character LCD interface and so forth Apart from that some of the previous works in FPUs and CORDIC architecture design are also discussed so that some improvements can be made upon previous designs 2 1 Field Programmable Gate Array FPGA FPGA is a logic device that contains a two dimensional array of generic logic blocks and programmable interconnection switches 14 It uses a grid of logic gates similar to that of an ordinary gate array but the programming is done by the customer Thus the term field programmable means the array is done outside the factory or in the field In this case each logic block can be programmed to perform a specific function such as combinational or sequential logic functions and a
89. sion floating point format after the result is obtained After that the output in IEEE 754 format can be used to perform floating addition subtraction multiplication or subtraction Thus by this the value for tangent hyperbolic tangent and exponent can be computed by the following mathematical eguations sin tan FET sinh tanh anne cosh e sinh x cosh x The algorithm to convert from Q2 15 format to 32 bits single precision format is simple Thus the value for sign exponent and mantissas need to be determined shown below assuming that Q data is the data from Q2 15 format then 1 Sign Qdata 16 2 If sign 0 Mantissa Qdata 15 0 8 1 b0 If sign 1 Mantissa Qdata 15 0 1 8 1 bO 3 Exponent 127 number of leading zeroes in Mantissa 4 3 External Interface Circuit In order to develop I O interface to test the functionality of my design a 4x4 matrix keypad and 16x2 character LCD were used to construct an external interface circuit on a donut board by soldering Thus Figure 4 8 shows the schematic of the completed interface circuit Figure 4 8 The schematic diagram of external interface circuit 50 Thus the pin assignments on the Altera DE1 board for the design are shown in Table 4 9 Pins from Pins for FPGA DE1 Description interface circuit board side VCC 3 3VCC 3 3V Power Supply GND GND Ground RS PIN_B13 GPIO_O pin 1 LCD Reg
90. sults in lower precision Therefore both of the IEEE 754 format and fixed point format can be used to compute floating point arithmetic but the analysis upon the precision and speed requirements of the design should be made to decide which the best choice is As an example for the FPU it usually requires high precision computation to avoid error or 68 crash on the computers For this case the IEEE 754 format is the better floating point representation to be used in the design Apart from that based on the results obtained from the LCD display of the interface circuit it shows the same results as obtained from the simulated results but the number was converted to hexadecimal form due to insufficient spaces on LCD to display the 32 bits binary number in single line Thus the numbers that are displayed on LCD were shorter and easier to read So this circuit can be used to test the functionality of the design without referring to the simulation waveform In a nutshell floating point math hardware modules are successfully designed and implemented based on conventional floating point algorithm and CORDIC algorithm to solve addition subtraction multiplication division trigonometric hyperbolic and exponential with an acceptable precision Besides that a simple working I O interface circuit that can interface with the CORDIC module on Altera DEI board is also successfully built 7 2 Future Works In this project the simple arch
91. t and decrement exponent by the number of leading zeros However if the tentative result overflows shift right and increment exponent back by 1 bit Thus the flowchart for floating point multiplication is shown in Figure 2 6 In order to save the clock cycles needed and reduce the hardware resource the multiplication operation needs to be done in parallel or concurrently 4 19 a Operand sign amp ey amp r T C a fines Y fraco frac X fracg e0 4 eg bias 127 signs sign xor signs ae Round fraco Exception Occurred ra Output sign amp eg amp fraco gt Nas Pe Figure 2 6 The flowchart for the conventional floating point multiplication 4 2 5 3 Division Based on the design done by Mahendra Kumar Soni 4 the implementation of floating point division is done serially to reduce the hardware resources Basically the division operation is done through multiple subtractions and shifting Therefore the conventional floating point division algorithm is based on five stages which are counting leading zeroes in both operands shifting left division rounding and normalization Therefore given two operands in which Op1 S1 El M1 and Op2 S2 E2 M2 then the steps to compute Opl divide by Op2 is described as the following 20 1 Stage 1 Counting leading zeroes e Count the number of leading zeroes for M1 and M2 and store as Z1 and Z2 2
92. tartup rom u0 clk addr0 startup mode rom ul clk addr1 mode msg fpuop rom u2 clk addr2 fpu op trigo rom ul15 clk addr3 trigo msg hyper rom ul6 clk addr4 hyper msg ans rom u7 clk cos ieee addr5 ans cos ans rom ull clk sin ieee addr5 ans sin ans_ rom ul2 clk cosh ieee addr5 ans_cosh ans_ rom ul3 clk sinh ieee addr5 ans_sinh ans rom ul4 clk exponent ieee addr5 ans_exp keypad scan u3 clk rst_n col row keypad data debounce u4 clk rst_n db_en key db_level parameter delay2s 100000000 delay for 2s parameter long delay 80000 delay needed for long instruction parameter big_delay 2500 delay needed for slow instruction parameter small_delay 2200 delay needed for fast instruction parameter setup_delay 20 finitial delay Function set for 8 bits data transfer and 2 line display parameter SET 8 b00111000 parameter DON 8 b00001111 Display ON without cursor parameter CLR 8 b00000001 Clear Screen Set entry mode to increment cursor automatically after each character is displayed parameter SEM 8 b000001 10 Set entry mode to decrement cursor automatically after each character is displayed parameter SEMD 8 b00000100 LCD return to home parameter HOM 8 b00000010 wire trigo data_in 8 h2B wire hyper data in 8 h2D wire ready db_level reg db_en reg 1 0 cordic_mode displayNo reg 5 0 state wire 3 0 keypa
93. tate lt 16 end count lt count 1 empty state system end here 4 h1 data in lt 8 h31 4 h3 data in lt 8 h33 4 h5 data in lt 8 h35 4 h7 data in lt 8 h37 4 h9 data_in lt 8 h39 4 hB data in lt 8 h2D 4 hD data_in lt 8 hFD 4 hF data in lt 8 h7F
94. tion chosen 14 begin if db_level begin LCD_DATA lt CLR LCD_RS lt 1 b0 db en lt 0 if count lt setup_delay LCD EN lt 1 b1 else LCD EN lt 1 b0 if count long_delay begin state lt 15 count lt 0 end else count lt count 1 end end display the result according to displayNo and cordic_mode 15 begin if cordic_mode 2 b01 case displayNo 2 b01 begin if ready_cos begin LCD_DATA lt ans_cos LCD RS lt 1 b1 end end 2 b10 begin if ready_sin begin LCD_DATA lt ans_sin LCD RS lt 1 b1 end end endcase 106 else if cordic mode 2 b10 case displayNo 2 b01 begin if ready_cosh begin LCD_DATA lt lt ans_cosh LCD RS lt 1 bl end end 2 b10 begin if ready sinh begin LCD DATA lt ans_sinh LCD RS lt 1 bl end end 2 b11 begin if ready_exp begin LCD DATA lt ans_exp LCD RS lt 1 bl end end if count lt setup_delay else LCD EN lt 1 b1 LCD EN lt 1 b0 if count big_delay begin else end 16 state lt 16 endcase begin case keypad_data end end always endcase end endmodule 4 h0 data in lt 8 h30 4 h2 data in lt 8 h32 4 h4 data in lt 8 h34 4 h6 data in 8 h36 4 h8 data in lt 8 h38 4 h A data in lt 8 h2B 4 hC data in lt 8 h78 4 hE data in lt 8 h2E count lt 0 addr5 lt addr5 1 state lt 15 if addr5 7 h33 s
95. to fpu_add this design also requires 12 clock cycles for computation as shown in Figure 6 2 The output will be zero before the computation is done Figure 6 2 Simulation Result of fpu_sub Thus the detailed description of the given inputs and output generated is shown in Table 6 2 Input operands op in IEEE 754 binary 0 10000011 010100011010100001 11000 e Decimal value 1 x 2131 127 x 1 3189764 21 1036224 op2 in IEEE 754 binary 0 10000010 11011111111011100111001 e Decimal value 1 x 201307127 x 1 8747321 14 9978568 Output operands sub_out in IEEE 754 binary 0 10000001 10000110110001001101110 e Decimal value 1 x 2129 127 x 1 5264413 6 1057652 e Actual value by scientific calculator 6 1057656 Table 6 2 The detailed description of input and output operands from the output waveform of fpu_sub Based on the result in Table 6 2 the output result from the fpu_sub is closely the same as the result calculated by scientific calculator The precision of up to 5 decimal places was achieved if compare these two results Thus this module is working as desired and the result is verified 6 1 3 Floating Point Multiplier The output waveform generated by fpu_mul is shown in Figure 6 3 It performs the floating point multiplication between opl and op2 and gives the result in mul_out All the data are represented in IEEE 754 single precision float
96. ult is incorrect the design stage needs to be turned back to design implementation and then debug the programming code to find out the error part Moreover to test the functionality of the math module with the interface circuit it is required to program onto the Altera DE1 board and then observe the functionality on the interface circuit If it is improper or not working it needs to turn back to design implementation stage to troubleshoot the problem either from the programming code or the discontinuity of the soldered circuit Therefore this stage might consume a lot of time in troubleshooting the design errors Finally if all the designs either for the hardware part or interface part are working fine the verification was done by comparing the results that output from the interface circuit with the simulation results The result should be the same for each other 34 3 1 4 Flowchart of the Overall Project Workflow The summarized workflow of the project is illustrated in the Figure 3 2 Limit the Project Scope Desired Results Yes Analyze and discuss the final results Figure 3 2 The flowchart of the overall project workflow CHAPTER 4 PROJECT DESIGN AND ARCHITECTURE 41 Basic Floating Point Math Module Design For this project there are four basic modules were designed to compute four typical operations which are addition subtraction multiplication and division by using the conventional algorithm These
97. vided_op1 lt lt dividend sh divisor_op2_sh lt divisor_op2 lt lt divisor_sh fracl lt guotient_out 24 2 gt gt exp_uf_term4 83 always posedge clk begin if rst begin count nonzero reg lt 0 count nonzero reg2 lt 0 en reg lt 0 en reg a lt 0 en reg b lt 0 en reg c lt 0 en reg d lt 0 en reg e lt 0 end else begin count nonzero reg lt count nonzero count nonzero reg2 lt count nonzero reg en reg lt en reg e en reg a lt en en reg b lt en reg a en reg c sen reg b en reg d lt lt en reg C en reg e en reg d end end always posedge clk begin if rst en reg2 lt 0 elseif en en reg2 lt l end always divided_opl casex divided_opl 84 23 b0000000000000000001 dividend sh lt 18 23 b00000000000000000001 dividend sh lt 19 23 b000000000000000000001 dividend sh lt 20 23 b0000000000000000000001 dividend sh lt 21 23 b00000000000000000000001 dividend sh lt 22 23 b00000000000000000000000 dividend sh lt 23 endcase always divisor_op2 casex divisor op2 23 b0000000000000000001 divisor sh lt 18 23 b00000000000000000001 divisor sh lt 19 23 b000000000000000000001 divisor sh lt 20 23 b0000000000000000000001 divisor sh lt 21 23 b00000000000000000000001 divisor_sh lt 22 23 b00000000000000000000000 divisor_sh lt 23 endcase endmodule
98. vision operations In addition some FPUs can perform several more sophisticated functions such as exponentials logarithms and trigonometry operations which are useful in modern processor 1 Since the FPU is specially designed for floating point mathematical operation it eventually becomes more efficient in computing the operations that involve real numbers In the past the FPUs were in the form of individual chips but currently FPUs were integrated inside a CPU 11 2 3 IEEE Standard for Floating Point Arithmetic IEEE 754 IEEE 754 standard is a technical standard which was established by IEEE in 1985 for floating point computation 1 Thus most of the hardware implementation whether for CPU or FPU complied with this standard Prior to the IEEE 754 standard several forms of floating point were adopted by computer but they have the difference in the word sizes the format of the representations and rounding behavior of the operations Therefore it caused the different systems implemented with different accuracy and format Thus IEEE 754 standard was proposed with the aims to standardize the all the floating point format that used for different systems In addition this standard provides a precisely encoding of the bits so that all computers able to interpret bit patterns in the same way and then allow the transfer of floating point data from one computer to another Furthermore this standard was defined 1 as the followings a Arit
99. wire op2_norm lexp_op2 wire opl_zero lop1 30 0 wire op2_zero lop2 30 0 wire zero_in opl_zero op2_zero wire exp_lt_expos exp_terms gt 8 d125 wire exp It prodshift exp_temp1 gt prodshift wire exp et zero exp_temp2 0 wire prod Isb Iprod_temp3 22 0 assign sign op1 31 op2 31 assign final exp zero_in 8 b0 exp_temp2 assign final prod 1 b0 prod_temp3 47 23 prod_lsb 74 always posedge clk begin if rst begin end frac_opl lt 0 frac_op2 lt 0 exp_opl lt 0 exp_op2 lt 0 exp terms lt 0 exp under lt 0 exp_templ lt 0 exp_temp2 lt 0 mul_opl lt 0 mul_op2 lt 0 product lt 0 prod_temp1 lt 0 prod_temp2 lt 0 prod_temp3 lt 0 else if en begin exp_under frac_opl lt op1 22 0 frac_op2 lt 0p2 22 0 exp_opl lt op1 30 23 exp_op2 lt 0p2 30 23 exp terms lt exp_opl exp_op2 opl_norm op2_norm exp_under lt 8 d126 exp_terms exp_templ lt exp_lt_expos exp terms 8 d126 0 exp_temp2 lt exp It prodshift exp_temp1 prodshift 0 mul_opl lt fopl norm frac_op1 mul_op2 lt op2_norm frac_op2 product lt mul opl mul op2 prod temp1 lt exp It expos product product gt gt prod temp2 lt exp It prodshift prod templ lt lt prodshift prod templ lt lt exp temp2 end end prod temp3 lt exp et
100. y FPU is one of the most essential custom applications required in most hardware design since it can enhance floating point performance and accuracy of number representation 5 Thus floating point arithmetic is useful in various applications where a large dynamic range is required Thirdly we usually compute the values of sine or cosine by using look up table LUT polynomial approximation and evaluation of Taylor Series 8 However the algorithms to realize these approaches are complex low precision and even require a lot of memory and large number of clock cycles 8 Therefore it needs an expensive hardware organization to implement Thus CORDIC arithmetic is a recursive algorithm by introducing some initial values and combining simple shifters and sub adders to realize several transcendental operations such as exponential trigonometry and hyperbolic 8 Furthermore this algorithm is relatively simple in design and smaller in area 1 3 Problem Statements With the state of the art computer technology available today the floating point unit FPU colloquially math coprocessor is widely used in the computer system either for PC or supercomputer to deal with floating point number Thus most compilers are called from time to time to deal with the floating point algorithms Therefore it is important to study in what approaches to develop the floating point algorithms which can lead to high efficiency but low complexity Thu
101. zed into seven chapters which consist of introduction literature review project methodology project design and architecture project management result and analysis conclusion and future works In Chapter 1 the introduction of the project in which the project overview motivations problem statement project objectives and scope of works as well as the organization of the project are presented In Chapter 2 a brief explanation about all the relevant theories and concepts are discussed such as FPGA FPU CORDIC matrix keypad interface character LCD interface and so forth Apart from that some of the previous works in FPUs and CORDIC architecture design are also discussed so that some improvements can be made upon previous designs In Chapter 3 the design methodology for this project is discussed based on the findings that have been made Thus it is presented in three main stages which are design specification design implementation and design testing and verification In Chapter 4 the project design and architecture that have been made in this project are explained and discussed Thus some tables and block diagrams are shown to give a clearer illustration on the design In Chapter 5 the project management about the project scheduling and cost are discussed Thus the Gantt chart is used to schedule the activities throughout this project Apart from that a list of components with price for this project is shown and discus

Download Pdf Manuals

image

Related Search

Related Contents

MODEL PQ3470 - Fondriest Environmental, Inc.    Arctic Control Hardware Manual  TM-T20II Software User`s Manual  N232377 impact wrenchs NA.indd  説明書(PDF約170KB)  Sony VPCCA22FX/B Quick Start Manual  F-PCC Assembly Manual - Australian Fitness Supplies  Bubble Maker User`s Guide Important safety instructions  2.1 Electricidad puede causar danos severos y hasta la muerte  

Copyright © All rights reserved.
Failed to retrieve file