VHDL Arithmetic Functions

I’m currently working on an ALU in VHDL.  To get warmed up, I tried my hand at some basic arithmetic, which I’m going to discuss in this blog post.  Here’s the diagram of the ALU:

A and B are the two inputs and F will be the result.  Each input and output is a 4-bit number.  S represents the function selected.  For now I’m going to use the following:

  • 1 = Add
  • 2 = Subtract
  • 3 = Multiply
  • 4 = Divide

I can add new functions as I need them.  I also have the option of expanding the number of bits to work with.  For now, I’ll just keep this simple.

I’m a software developer, so I look at this problem and I think “switch/case statement”.  As it turns out, there is a case statement for VHDL.  Some searching on Bing turns up this website: VHDL-Online.  Which I found to be very clear and easy to read.  I learn quicker from examples, so this site will be my go-to site for looking up VHDL syntax.  As I looked over the syntax examples, I noticed that the case statement doesn’t work without a process block.  I just wrapped my case statement with a generic process block and came up with this block of code:

entity smallalu is port (
S: in integer range 0 to 15;
A,B: in signed(3 DOWNTO 0);
F: out signed(3 DOWNTO 0)
);
end smallalu;

architecture Behavioral of smallalu is
begin
    process (S,A,B)
    begin
        case S is
            when 1 => 
                F <= A+B;
            when 2 =>
                F <= A-B;
            when 3 =>
                F <= RESIZE(A*B,4);
            when 4 =>
                F <= A/B;
            when others =>
                
        end case;
    end process;
end Behavioral;

You’ll have to include “use IEEE.numeric_std.ALL;” at the top in order to use the math functions (and “signed” data types).  Most of the code is pretty obvious: When the selector is set to “1”, then add the two inputs and assign to the output and so on.  The multiply was a bit of a challenge.  My simulation was showing “U” for the outputs of a multiply.  I did some investigating and discovered (or rather, rediscovered) that multiplying two 4-bit numbers results in an 8-bit number.  At one time, I knew that, but it’s been a while.  So I did some research and discovered the “resize” function that allowed me to take the 8-bit result and resize to a 4-bit result.  The understanding is that I can’t really multiple any more than 2-bits from A with 2-bits from B, otherwise, it’ll overflow.  So I’ll need to figure out a solution to that issue in the future, when I decided to expand the data path width.

There is also an unsigned data type.  If you change your inputs to unsigned, you must also change F to unsigned.  Everything will work correctly (except you’ll be working with positive numbers only).

Next, I wanted to add a reset or clear function.  Technically, it’s just a constant zero output because there are not latches inside this ALU.  This code is pure logic.  Here is what the code looks like after the change:

entity smallalu is port (
S: in integer range 0 to 15;
A,B: in signed(3 DOWNTO 0);
F: out signed(3 DOWNTO 0)
);
end smallalu;

architecture Behavioral of smallalu is
begin
    process (S,A,B)
    begin
        case S is
            when 0 =>
                F <= to_signed(0,4);
            when 1 => 
                F <= A+B;
            when 2 =>
                F <= A-B;
            when 3 =>
                F <= RESIZE(A*B,4);
            when 4 =>
                F <= A/B;
            when others =>
                
        end case;
    end process;
end Behavioral;

As you can see, assigning a zero to F is not just a matter of using an assignment.  A constant zero is assumed to be an integer data type.  The “to_signed()” function can be used to convert it into a signed data type.  This function requires the number of bits, so I put in a 4.  The simulation look like this:

The first block up to 10ns is just the clear.  From 10ns to 20ns is “add”, from 20-30 is “subtract”, 30-40 is multiply and finally 40-50 is divide (as you can see I’m dividing 4 by 2).

One last test, I decided to compile this code for the mimas board, just to see what kind of resources it would occupy on my FPGA.  I didn’t map any inputs and outputs, and I didn’t transfer this to the board since I don’t have enough dip switches to represent S, A and B (though I’m sure I could get creative and use the push buttons for “B” inputs or something).  Anyway, here is the result:

Slice Logic Utilization:
  Number of Slice Registers:                     0 out of  11,440    0%
  Number of Slice LUTs:                         47 out of   5,720    1%
    Number used as logic:                       47 out of   5,720    1%
      Number using O6 output only:              38
      Number using O5 output only:               0
      Number using O5 and O6:                    9
      Number used as ROM:                        0
    Number used as Memory:                       0 out of   1,440    0%

Slice Logic Distribution:
  Number of occupied Slices:                    19 out of   1,430    1%
  Number of MUXCYs used:                         8 out of   2,860    1%
  Number of LUT Flip Flop pairs used:           47
    Number with an unused Flip Flop:            47 out of      47  100%
    Number with an unused LUT:                   0 out of      47    0%
    Number of fully used LUT-FF pairs:           0 out of      47    0%
    Number of slice register sites lost
      to control set restrictions:               0 out of  11,440    0%

As you can see 47 LUTs are used for the logic as well as 19 slices.  This represents about 1% of the chip resources.  Not bad.  I’m betting that a multiplier scales up exponentially.  So an 8-bit alu is going to take up more than double the resources.  Let’s find out…

Slice Logic Utilization:
  Number of Slice Registers:                     0 out of  11,440    0%
  Number of Slice LUTs:                        112 out of   5,720    1%
    Number used as logic:                      112 out of   5,720    1%
      Number using O6 output only:             101
      Number using O5 output only:               0
      Number using O5 and O6:                   11
      Number used as ROM:                        0
    Number used as Memory:                       0 out of   1,440    0%

Slice Logic Distribution:
  Number of occupied Slices:                    42 out of   1,430    2%
  Number of MUXCYs used:                        32 out of   2,860    1%
  Number of LUT Flip Flop pairs used:          112
    Number with an unused Flip Flop:           112 out of     112  100%
    Number with an unused LUT:                   0 out of     112    0%
    Number of fully used LUT-FF pairs:           0 out of     112    0%
    Number of slice register sites lost
      to control set restrictions:               0 out of  11,440    0%

Hmmm…. Only a little over double (2.38 x).  Time to setup a multiply only and see what resources it takes to multiply two numbers together.  Here’s my basic code:

entity multiplier is port (
A,B: in signed(3 DOWNTO 0);
Y: out signed(3 DOWNTO 0)
);
end multiplier;

architecture Behavioral of multiplier is
    
begin
    Y <= RESIZE(A*B,4);
end Behavioral;
Slice Logic Utilization:
  Number of Slice Registers:                     0 out of  11,440    0%
  Number of Slice LUTs:                         15 out of   5,720    1%
    Number used as logic:                       15 out of   5,720    1%
      Number using O6 output only:              10
      Number using O5 output only:               0
      Number using O5 and O6:                    5
      Number used as ROM:                        0
    Number used as Memory:                       0 out of   1,440    0%

That’s 15 LUTs to multiply two 4-bit numbers together.  8-bit numbers:

Device Utilization Summary:

Slice Logic Utilization:
  Number of Slice Registers:                     0 out of  11,440    0%
  Number of Slice LUTs:                          0 out of   5,720    0%

Slice Logic Distribution:
  Number of occupied Slices:                     0 out of   1,430    0%
  Number of MUXCYs used:                         0 out of   2,860    0%
  Number of LUT Flip Flop pairs used:            0

IO Utilization:
  Number of bonded IOBs:                        24 out of     200   12%

Specific Feature Utilization:
  Number of RAMB16BWERs:                         0 out of      32    0%
  Number of RAMB8BWERs:                          0 out of      64    0%
  Number of BUFIO2/BUFIO2_2CLKs:                 0 out of      32    0%
  Number of BUFIO2FB/BUFIO2FB_2CLKs:             0 out of      32    0%
  Number of BUFG/BUFGMUXs:                       0 out of      16    0%
  Number of DCM/DCM_CLKGENs:                     0 out of       4    0%
  Number of ILOGIC2/ISERDES2s:                   0 out of     200    0%
  Number of IODELAY2/IODRP2/IODRP2_MCBs:         0 out of     200    0%
  Number of OLOGIC2/OSERDES2s:                   0 out of     200    0%
  Number of BSCANs:                              0 out of       4    0%
  Number of BUFHs:                               0 out of     128    0%
  Number of BUFPLLs:                             0 out of       8    0%
  Number of BUFPLL_MCBs:                         0 out of       4    0%
  Number of DSP48A1s:                            1 out of      16    6%
  Number of ICAPs:                               0 out of       1    0%
  Number of MCBs:                                0 out of       2    0%
  Number of PCILOGICSEs:                         0 out of       2    0%
  Number of PLL_ADVs:                            0 out of       2    0%
  Number of PMVs:                                0 out of       1    0%
  Number of STARTUPs:                            0 out of       1    0%
  Number of SUSPEND_SYNCs:                       0 out of       1    0%

Well, that’s interesting.  Apparently, there are 16 DSP modules and one of those was used for an 8-bit multiplier.  The same results from a 16-bit multiplier.  Let’s push it a little.  Here’s a 32-bit multiplier:

Specific Feature Utilization:
  Number of RAMB16BWERs:                         0 out of      32    0%
  Number of RAMB8BWERs:                          0 out of      64    0%
  Number of BUFIO2/BUFIO2_2CLKs:                 0 out of      32    0%
  Number of BUFIO2FB/BUFIO2FB_2CLKs:             0 out of      32    0%
  Number of BUFG/BUFGMUXs:                       0 out of      16    0%
  Number of DCM/DCM_CLKGENs:                     0 out of       4    0%
  Number of ILOGIC2/ISERDES2s:                   0 out of     200    0%
  Number of IODELAY2/IODRP2/IODRP2_MCBs:         0 out of     200    0%
  Number of OLOGIC2/OSERDES2s:                   0 out of     200    0%
  Number of BSCANs:                              0 out of       4    0%
  Number of BUFHs:                               0 out of     128    0%
  Number of BUFPLLs:                             0 out of       8    0%
  Number of BUFPLL_MCBs:                         0 out of       4    0%
  Number of DSP48A1s:                            4 out of      16   25%
  Number of ICAPs:                               0 out of       1    0%
  Number of MCBs:                                0 out of       2    0%
  Number of PCILOGICSEs:                         0 out of       2    0%
  Number of PLL_ADVs:                            0 out of       2    0%
  Number of PMVs:                                0 out of       1    0%
  Number of STARTUPs:                            0 out of       1    0%
  Number of SUSPEND_SYNCs:                       0 out of       1    0%

No LUTs were used, but 4 DSPs were used.  For a 64-bit multiplier:

ERROR:Place:543 – This design does not fit into the number of slices available

Darn!  I had high-hopes.  Oh well.  Now we know a limit to the Spartan-6 XC6SLX9 FPGA chip.

One other arithmetic function available is the modulo (mod).  Which gives the remainder.  Let’s add that to the ALU:

entity smallalu is port (
S: in integer range 0 to 15;
A,B: in signed(7 DOWNTO 0);
F: out signed(7 DOWNTO 0)
);
end smallalu;

architecture Behavioral of smallalu is
begin
    process (S,A,B)
    begin
        case S is
            when 0 =>
                F <= to_signed(0,8);
            when 1 => 
                F <= A+B;
            when 2 =>
                F <= A-B;
            when 3 =>
                F <= RESIZE(A*B,8);
            when 4 =>
                F <= A/B;
            when 5 =>
                F <= A mod B;
            when others =>
                
        end case;
    end process;
end Behavioral;

As you can see from the simulation, 7/2 gives a remainder of 1:

Finally, here’s 8/2:

 

VHDL – Shift Register

My latest hobby is to learn VHDL and apply to the Mimas V2 FPGA board.  As with any language, reading about the language is all well and good, but attempting a real application is where the rubber hits the road.  I read through several tutorials and some introduction material to familiarize myself with some of the syntax.  I copied some code from example articles and observed how the code worked in simulation mode.  Then I finally decided to try and build a circuit without the code from a tutorial.  Keeping it somewhat simple, I chose to implement a shift register.  My goal was to create a 4-bit shift register that had one input that I could feed a “1” or a “0” per clock cycle.  I also wanted to see all 4 outputs to observe the data bits shifting down the line.  Here’s the circuit:

My first attempt was to just shift the outputs as though they are storage locations.  That caused an error: Cannot read from ‘out’ object output ; use ‘buffer’ or ‘inout’  I discovered that I needed some sort of storage inside my object (the flip-flops that represent my last state).  So I set up a “signal”:

signal dflipflops: STD_LOGIC_VECTOR(3 downto 0):="0000";

You can give your signal any name, I just called it dflipflops because that’s what popped into my head.  As you can see, the data in the signal can be pre-set to some value (in quotes because it’s a vector).

Next, I coded the reset.  I didn’t really need a reset for the simulation since I set the dflipflops to all zeros when the object is initialized.  However, if I decided to use this as a real circuit, I’d have to have a way to reset this at any time.  So I coded my reset as simple as possible:

if (reset = '1') then
    dflipflops <= "0000";
else
    -- shift logic goes here
end if;

Next, I hard-coded the logic of shifting bits (this is just the inner logic):

if (clock='1' and clock'event) then    
    dflipflops(3) <= dflipflops(2);
    dflipflops(2) <= dflipflops(1);
    dflipflops(1) <= dflipflops(0);
    dflipflops(0) <= datain;
end if;

I did this because I wanted to see if the simulation worked, and it didn’t.  I ended up with unknown outputs:

Yup, forgot to translate my signal back out to the outputs:

output(0) <= dflipflops(0);
output(1) <= dflipflops(1);
output(2) <= dflipflops(2);
output(3) <= dflipflops(3);

That worked:

Next, I converted to “for” loops and here’s the final code:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;

-- 4-bit shift register
entity shiftregister is port (
datain,clock,reset: in STD_LOGIC;
output: out STD_LOGIC_VECTOR(3 DOWNTO 0)
);
end shiftregister;

architecture Behavioral of shiftregister is
    
    signal dflipflops: STD_LOGIC_VECTOR(3 downto 0):="0000";
    
begin
    process (datain,clock,reset)
    begin
        if (reset = '1') then
            dflipflops <= "0000";
        else
            if (clock='1' and clock'event) then
                for i in 2 downto 0 loop
                    dflipflops(i+1) <= dflipflops(i);
                end loop;
                dflipflops(0) <= datain;
            end if;
        end if;
        
        for k in 3 downto 0 loop
            output(k) <= dflipflops(k);
        end loop;
        
    end process;
end Behavioral;

The test bench code is here:

ENTITY shiftregistertest IS
END shiftregistertest;
 
ARCHITECTURE behavior OF shiftregistertest IS 
 
    -- Component Declaration for the Unit Under Test (UUT)
 
    COMPONENT shiftregister
    PORT(
         datain : IN  std_logic;
         clock : IN  std_logic;
            reset : IN std_logic;
         output : OUT  std_logic_vector(3 downto 0)
        );
    END COMPONENT;
    

   --Inputs
   signal datain : std_logic := '0';
   signal clock : std_logic := '0';
   signal reset : std_logic := '1';

     --Outputs
   signal output : std_logic_vector(3 downto 0);

   -- Clock period definitions
   constant clock_period : time := 10 ns;
 
BEGIN
 
   -- Instantiate the Unit Under Test (UUT)
   uut: shiftregister PORT MAP (
          datain => datain,
          clock => clock,
             reset => reset,
          output => output
        );

   -- Clock process definitions
   clock_process :process
   begin
        clock <= '0';
        wait for clock_period/2;
        clock <= '1';
        wait for clock_period/2;
   end process;
 

   -- Stimulus process
   stim_proc: process
   begin        
      -- hold reset state for 100 ns.
      wait for 100 ns;    

      reset <= '0';
        
      wait for clock_period*6;

      -- test 1
      datain <= '1';
      wait for clock_period;
        
      datain <= '0';
      wait for clock_period*6;
      
      -- test 2
      datain <= '1';
      wait for clock_period*2;
                
      datain <= '0';
      wait for clock_period*6;

      wait;
   end process;

END;

For the test bench I first defaulted the reset to a “1” to force a reset at the beginning.  Then I set the reset back to “0” before testing data inputs.  The first test (test 1) feeds a “1” into the datain and then shifts it one clock cycle, then sets the datain back to “0”.  Then I shift 6 times to make the “1” shift all the way out of the shift register.  The next test (test 2), I set the datain to a “1” and shifted it in for two clock cycles, causing two “1”s to be inputted into the shift register.  Then I set datain back to “0” and shifted for 6 clock cycles to watch the two bits shift all the way through the shift register.  Here’s the simulation output:

The first test starts at 170ns and ends around 210ns.  The second test starts at 240ns and ends at 290ns.

One thing I noticed about the editor is that you must select the test source file before double-clicking on “Simulate Behavioral Model”:

Otherwise, you’ll get a result like this:

You also need to close the ISim window before you can run another simulation otherwise, you’ll get an error like:

ERROR:Simulator:904 – Unable to remove previous simulation file isim/shiftregistertest_isim_beh.exe.sim/shiftregistertest_isim_beh.exe. Please check if you have another instance of this simulation running on your system, terminate it and then recompile your design. System Error Message: boost::filesystem::remove: Access is denied: “isim\shiftregistertest_isim_beh.exe.sim\shiftregistertest_isim_beh.exe”ERROR:Simulator:861 – Failed to link the design

Once this error occurs, you’ll need to close the ISim window, then you will need to right-click on the “Simluate Behavioral Model” and select “rerun all”.  Double-clicking just gives this error:

INFO:ProjectMgmt – The selected process was not run because a prior process failed.

One other thing I find annoying about the editor is that there is no file name change capability.  I’ve attempted to change the name of a file and ended up with a mess.  There is a lot of smart linking that goes on between the project and the files that belong to it.  My quick fix is to create a new file with the new name and scrape the code from the old source and paste into the new source.  Then I delete the old file.  It’s dumb and dirty, but it’s also pretty quick.

Other than the few quirks that I’ve worked around, I am happy that the editor is similar to Visual Studio in commands and syntax highlighting.